Smart model routing (auto-routing short/simple turns to a cheap model
across providers) was opt-in and disabled by default. This removes the
feature wholesale: the routing module, its config keys, docs, tests, and
the orchestration scaffolding it required in cli.py / gateway/run.py /
cron/scheduler.py.
The /fast (Priority Processing / Anthropic fast mode) feature kept its
hooks into _resolve_turn_agent_config — those still build a route dict
and attach request_overrides when the model supports it; the route now
just always uses the session's primary model/provider rather than
running prompts through choose_cheap_model_route() first.
Also removed:
- DEFAULT_CONFIG['smart_model_routing'] block and matching commented-out
example sections in hermes_cli/config.py and cli-config.yaml.example
- _load_smart_model_routing() / self._smart_model_routing on GatewayRunner
- self._smart_model_routing / self._active_agent_route_signature on
HermesCLI (signature kept; just no longer initialised through the
smart-routing pipeline)
- route_label parameter on HermesCLI._init_agent (only set by smart
routing; never read elsewhere)
- 'Smart Model Routing' section in website/docs/integrations/providers.md
- tip in hermes_cli/tips.py
- entries in hermes_cli/dump.py + hermes_cli/web_server.py
- row in skills/autonomous-ai-agents/hermes-agent/SKILL.md
Tests:
- Deleted tests/agent/test_smart_model_routing.py
- Rewrote tests/agent/test_credential_pool_routing.py to target the
simplified _resolve_turn_agent_config directly (preserves credential
pool propagation + 429 rotation coverage)
- Dropped 'cheap model' test from test_cli_provider_resolution.py
- Dropped resolve_turn_route patches from cli + gateway test_fast_command
— they now exercise the real method end-to-end
- Removed _smart_model_routing stub assignments from gateway/cron test
helpers
Targeted suites: 74/74 in the directly affected test files;
tests/agent + tests/cron + tests/cli pass except 5 failures that
already exist on main (cron silent-delivery + alias quick-command).
Extends _hydrate_bot_identity() to also populate _bot_open_id (not just
_bot_name) by probing /open-apis/bot/v3/info — the same endpoint the
scan-to-create wizard uses. No extra scopes required beyond the tenant
access token.
Closes the manual-setup gap in #12450: users who configured Feishu
without running the wizard, and never set FEISHU_BOT_OPEN_ID, now get
a bot identity that _is_self_sent_bot_message() can actually use to
filter the adapter's own bot-sent events.
Each field is hydrated independently:
- Env vars (FEISHU_BOT_OPEN_ID / FEISHU_BOT_USER_ID / FEISHU_BOT_NAME)
still take precedence and skip their respective probe.
- /bot/v3/info provides open_id + name.
- Application-info endpoint remains as a best-effort fallback for
bot_name only (needs admin:app.info:readonly scope).
Tests: 5 new cases covering env-var precedence, probe success, probe
failure fallback, and the end-to-end self-send filter gate after
hydration.
PR #12558 was heavy for what the fix actually is — essay-length
comments, a dedicated helper method where a setdefault would do, and
a source-inspection test with no real behavior coverage. The
genuine code change is ~5 lines of new logic (1 field, 2 async with,
an on_ready wait block).
Trimmed:
- Replaced the 12-line _voice_lock_for helper with a setdefault
one-liner at each call site (join_voice_channel, leave_voice_channel).
- Collapsed the 12-line comment on on_message's _ready_event wait to
3 lines. Dropped the warning log on timeout — pass-on-timeout is
fine; if on_ready hangs that long, the bot is already broken and
the log wouldn't help.
- Dropped the source-inspection test (greps the module source for
expected substrings). It was low-value scaffolding; the
voice-serialization test covers actual behavior.
Net: -73 lines vs PR #12558. Same two guarantees preserved, same
test passes (verified by stashing the fix and confirming failure).
Two small races in gateway/platforms/discord.py, bundled together
since they're adjacent in the adapter and both narrow in impact.
1. on_message vs _resolve_allowed_usernames (startup window)
DISCORD_ALLOWED_USERS accepts both numeric IDs and raw usernames.
At connect-time, _resolve_allowed_usernames walks the bot's guilds
(fetch_members can take multiple seconds) to swap usernames for IDs.
on_message can fire during that window; _is_allowed_user compares
the numeric author.id against a set that may still contain raw
usernames — legitimate users get silently rejected for a few
seconds after every reconnect.
Fix: on_message awaits _ready_event (with a 30s timeout) when it
isn't already set. on_ready sets the event after the resolve
completes. In steady state this is a no-op (event already set);
only the startup / reconnect window ever blocks.
2. join_voice_channel check-and-connect
The existing-connection check at _voice_clients.get() and the
channel.connect() call straddled an await boundary with no lock.
Two concurrent /voice channel invocations could both see None and
both call connect(); discord.py raises ClientException
("Already connected") on the loser. Same race class for leave
running concurrently with _voice_timeout_handler.
Fix: per-guild asyncio.Lock (_voice_locks dict with lazy alloc via
_voice_lock_for). join_voice_channel and leave_voice_channel both
run their body under the lock. Sequential within a guild, still
fully concurrent across guilds.
Both: LOW severity. The first only affects username-based allowlists
on fast-follow-up messages at startup; the second is a narrow
exception on simultaneous voice commands. Bundled so the adapter
gets a single coherent polish pass.
Tests (tests/gateway/test_discord_race_polish.py): 2 regression cases.
- test_concurrent_joins_do_not_double_connect: two concurrent
join_voice_channel calls on the same guild result in exactly one
channel.connect() invocation.
- test_on_message_blocks_until_ready_event_set: asserts the expected
wait pattern is present in on_message (source inspection, since
full discord.py client setup isn't practical here).
Regression-guard validated: against unpatched gateway/platforms/discord.py
both tests fail. With the fix they pass. Full Discord suite (118
tests) green.
External services can now push plain-text notifications to a user's chat
via the webhook adapter without invoking the agent. Set deliver_only=true
on a route and the rendered prompt template becomes the literal message
body — dispatched directly to the configured target (Telegram, Discord,
Slack, GitHub PR comment, etc.).
Reuses all existing webhook infrastructure: HMAC-SHA256 signature
validation, per-route rate limiting, idempotency cache, body-size limits,
template rendering with dot-notation, home-channel fallback. No new HTTP
server, no new auth scheme, no new port.
Use cases: Supabase/Firebase webhooks → user notifications, monitoring
alert forwarding, inter-agent pings, background job completion alerts.
Changes:
- gateway/platforms/webhook.py: new _direct_deliver() helper + early
dispatch branch in _handle_webhook when deliver_only=true. Startup
validation rejects deliver_only with deliver=log.
- hermes_cli/main.py + hermes_cli/webhook.go: --deliver-only flag on
subscribe; list/show output marks direct-delivery routes.
- website/docs/user-guide/messaging/webhooks.md: new Direct Delivery
Mode section with config example, CLI example, response codes.
- skills/devops/webhook-subscriptions/SKILL.md: document --deliver-only
with use cases (bumped to v1.1.0).
- tests/gateway/test_webhook_deliver_only.py: 14 new tests covering
agent bypass, template rendering, status codes, HMAC still enforced,
idempotency still applies, rate limit still applies, startup
validation, and direct-deliver dispatch.
Validation: 78 webhook tests pass (64 existing + 14 new). E2E verified
with real aiohttp server + real urllib POST — agent not invoked, target
adapter.send() called with rendered template, duplicate delivery_id
suppressed.
Closes the gap identified in PR #12117 (thanks to @H1an1 / Antenna team)
without adding a second HTTP ingress server.
Follow-up on top of the helix4u #12388 cherry-picks:
- make deferred post-delivery callbacks generation-aware end-to-end so
stale runs cannot clear callbacks registered by a fresher run for the
same session
- bind callback ownership to the active session event at run start and
snapshot that generation inside base adapter processing so later event
mutation cannot retarget cleanup
- pass run_generation through proxy mode and drop stale proxy streams /
final results the same way local runs are dropped
- centralize stop/new interrupt cleanup into one helper and replace the
open-coded branches with shared logic
- unify internal control interrupt reason strings via shared constants
- remove the return from base.py's finally block so cleanup no longer
swallows cancellation/exception flow
- add focused regressions for generation forwarding, proxy stale
suppression, and newer-callback preservation
This addresses all review findings from the initial #12388 review while
keeping the fix scoped to stale-output/typing-loop interrupt handling.
Follow-up on top of the helix4u #6392 cherry-pick:
- reuse one helper for actionable Docker-local file-not-found errors
across document/image/video/audio local-media send paths
- include /outputs/... alongside /output/... in the container-local
path hint
- soften the gateway startup warning so it does not imply custom
host-visible mounts are broken; the warning now targets the specific
risky pattern of emitting container-local MEDIA paths without an
explicit export mount
- add focused regressions for /outputs/... and non-document media hint
coverage
This keeps the salvage aligned with the actual MEDIA delivery problem on
current main while reducing false-positive operator messaging.
When _send_fallback_final() is called with nothing new to deliver
(the visible partial already matches final_text), the last edit may
still show the cursor character because fallback mode was entered
after a failed edit. Before this fix the early-return path left
_already_sent = True without attempting to strip the cursor, so the
message stayed frozen with a visible ▉ permanently.
Adds a best-effort edit inside the empty-continuation branch to clean
the cursor off the last-sent text. Harmless when fallback mode
wasn't actually armed or when the cursor isn't present. If the strip
edit itself fails (flood still active), we return without crashing
and without corrupting _last_sent_text.
Adapted from PR #7429 onto current main — the surrounding fallback
block grew the #10807 stale-prefix handling since #7429 was written,
so the cursor strip lives in the new else-branch where we still
return early.
3 unit tests covering: cursor stripped on empty continuation, no edit
attempted when cursor is not configured, cursor-strip edit failure
handled without crash.
Originally proposed as PR #7429.
During gateway shutdown, a message arriving while
cancel_background_tasks is mid-await (inside asyncio.gather) spawns
a fresh _process_message_background task via handle_message and adds
it to self._background_tasks. The original implementation's
_background_tasks.clear() at the end of cancel_background_tasks
dropped the reference; the task ran untracked against a disconnecting
adapter, logged send-failures, and lingered until it completed on
its own.
Fix: wrap the cancel+gather in a bounded loop (MAX_DRAIN_ROUNDS=5).
If new tasks appeared during the gather, cancel them in the next
round. The .clear() at the end is preserved as a safety net for
any task that appeared after MAX_DRAIN_ROUNDS — but in practice the
drain stabilizes in 1-2 rounds.
Tests: tests/gateway/test_cancel_background_drain.py — 3 cases.
- test_cancel_background_tasks_drains_late_arrivals: spawn M1, start
cancel, inject M2 during M1's shielded cleanup, verify M2 is
cancelled.
- test_cancel_background_tasks_handles_no_tasks: no-op path still
terminates cleanly.
- test_cancel_background_tasks_bounded_rounds: baseline — single
task cancels in one round, loop terminates.
Regression-guard validated: against the unpatched implementation,
the late-arrival test fails with exactly the expected message
('task leaked'). With the fix it passes.
Blast radius is shutdown-only; the audit classified this as MED.
Shipping because the fix is small and the hygiene is worth it.
While investigating the audit's other MEDs (busy-handler double-ack,
Discord ExecApprovalView double-resolve, UpdatePromptView
double-resolve), I verified all three were false positives — the
check-and-set patterns have no await between them, so they're
atomic on single-threaded asyncio. No fix needed for those.
When a streaming edit fails mid-stream (flood control, transport error)
and a tool boundary arrives before the fallback threshold is reached,
the pre-boundary tail in `_accumulated` was silently discarded by
`_reset_segment_state`. The user saw a frozen partial message and
missing words on the other side of the tool call.
Flush the undelivered tail as a continuation message before the reset,
computed relative to the last successfully-delivered prefix so we don't
duplicate content the user already saw.
When Discord splits a long message at 2000 chars, _enqueue_text_event
buffers each chunk and schedules a _flush_text_batch task with a
short delay. If another chunk lands while the prior flush task is
already inside handle_message, _enqueue_text_event calls
prior_task.cancel() — and without asyncio.shield, CancelledError
propagates from the flush task into handle_message → the agent's
streaming request, aborting the response the user was waiting on.
Reproducer: user sends a 3000-char prompt (split by Discord into 2
messages). Chunk 1 lands, flush delay starts, chunk 2 lands during
the brief window when chunk 1's flush has already committed to
handle_message. Agent's current streaming response is cancelled
with CancelledError, user sees a truncated or missing reply.
Fix (gateway/platforms/discord.py):
- Wrap the handle_message call in asyncio.shield so the inner
dispatch is protected from the outer task's cancel.
- Add an except asyncio.CancelledError clause so the outer task
still exits cleanly when cancel lands during the sleep window
(before the pop) — semantics for that path are unchanged.
The new flush task spawned by the follow-up chunk still handles its
own batch via the normal pending-message / active-session machinery
in base.py, so follow-ups are not lost.
Tests: tests/gateway/test_text_batching.py —
test_shield_protects_handle_message_from_cancel. Tracks a distinct
first_handle_cancelled event so the assertion fails cleanly when the
shield is missing (verified by stashing the fix and re-running).
Live E2E on the live-loaded DiscordAdapter:
first_handle_cancelled: False (shield worked)
first_handle_completed: True (handle_message ran to completion)
Two related race conditions in gateway/platforms/base.py that could
produce duplicate agent runs or silently drop messages. Neither is
specific to any one platform — all adapters inherit this logic.
R5 (HIGH) — duplicate agent spawn on turn chain
In _process_message_background, the pending-drain path deleted
_active_sessions[session_key] before awaiting typing_task.cancel()
and then recursively awaiting _process_message_background for the
queued event. During the typing_task await, a fresh inbound message
M3 could pass the Level-1 guard (entry now missing), set its own
Event, and spawn a second _process_message_background for the same
session_key — two agents running simultaneously, duplicate responses,
duplicate tool calls.
Fix: keep the _active_sessions entry populated and only clear() the
Event. The guard stays live, so any concurrent inbound message takes
the busy-handler path (queue + interrupt) as intended.
R6 (MED-HIGH) — message dropped during finally cleanup
The finally block has two await points (typing_task, stop_typing)
before it deletes _active_sessions. A message arriving in that
window passes the guard (entry still live), lands in
_pending_messages via the busy-handler — and then the unconditional
del removes the guard with that message still queued. Nothing
drains it; the user never gets a reply.
Fix: before deleting _active_sessions in finally, pop any late
pending_messages entry and spawn a drain task for it. Only delete
_active_sessions when no pending is waiting.
Tests: tests/gateway/test_pending_drain_race.py — three regression
cases. Validated: without the fix, two of the three fail exactly
where the races manifest (duplicate-spawn guard loses identity,
late-arrival 'LATE' message not in processed list).
Gateway startup leaks aiohttp.ClientSession (and other partial-init
resources) when an adapter's connect() returns False or raises. The
adapter is never added to self.adapters, so the shutdown path at
gateway/run.py:2426 never calls disconnect() on it — Python GC later
logs 'Unclosed client session' at process exit.
Seen on 2026-04-18 18:08:16 during a double --replace takeover cycle:
one of the partial-init sessions survived past shutdown and emitted
the warning right before status=75/TEMPFAIL.
Fix:
- New GatewayRunner._safe_adapter_disconnect() helper — calls
adapter.disconnect() and swallows any exception. Used on error paths.
- Connect loop calls it in both failure branches: success=False and
except Exception.
- Adapter disconnect() implementations are already expected to be
idempotent and tolerate partial-init state (they all guard on
self._http_session / self._bridge_process before touching them).
Tests: tests/gateway/test_safe_adapter_disconnect.py — 3 cases verify
the helper forwards to disconnect, swallows exceptions, and tolerates
platform=None.
Any recognized slash command now bypasses the Level-1 active-session
guard instead of queueing + interrupting. A mid-run /model (or
/reasoning, /voice, /insights, /title, /resume, /retry, /undo,
/compress, /usage, /provider, /reload-mcp, /sethome, /reset) used to
interrupt the agent AND get silently discarded by the slash-command
safety net — zero-char response, dropped tool calls.
Root cause:
- Discord registers 41 native slash commands via tree.command().
- Only 14 were in ACTIVE_SESSION_BYPASS_COMMANDS.
- The other ~15 user-facing ones fell through base.py:handle_message
to the busy-session handler, which calls running_agent.interrupt()
AND queues the text.
- After the aborted run, gateway/run.py:9912 correctly identifies the
queued text as a slash command and discards it — but the damage
(interrupt + zero-char response) already happened.
Fix:
- should_bypass_active_session() now returns True for any resolvable
slash command. ACTIVE_SESSION_BYPASS_COMMANDS stays as the subset
with dedicated Level-2 handlers (documentation + tests).
- gateway/run.py adds a catch-all after the dedicated handlers that
returns a user-visible "agent busy — wait or /stop first" response
for any other resolvable command.
- Unknown text / file-path-like messages are unchanged — they still
queue.
Also:
- gateway/platforms/discord.py logs the invoker identity on every
slash command (user id + name + channel + guild) so future
ghost-command reports can be triaged without guessing.
Tests:
- 15 new parametrized cases in test_command_bypass_active_session.py
cover every previously-broken Discord slash command.
- Existing tests for /stop, /new, /approve, /deny, /help, /status,
/agents, /background, /steer, /update, /queue still pass.
- test_steer.py's ACTIVE_SESSION_BYPASS_COMMANDS check still passes.
Fixes#5057. Related: #6252, #10370, #4665.
When `message.from_user` is None — which can happen for forwarded messages,
anonymous admin mode in groups, or certain Telegram client edge cases —
`_build_message_event` set `source.user_id` to None. This caused:
1. `_is_user_authorized()` to early-return False (`if not user_id: return False`)
2. The access check never compared against `TELEGRAM_ALLOWED_USERS` even when
the user actually was in the allowlist
3. The pairing flow fired and generated a code for `user_id=None`
4. The pairing approval saved an entry under the literal string key "null"
5. The user was effectively locked out because their real user_id never
matched the "null" key on subsequent messages
For DMs (`chat_type == "dm"`), Telegram guarantees `chat.id == user.id` —
they are the same numeric ID for private chats. Falling back to `chat.id`
when `from_user` is None for DMs restores the expected access-control
behavior without weakening it (group/channel chats correctly stay None).
Also adds a parallel `user_name` fallback to `chat.full_name` so the
display name still works in the same edge case.
Follow-up to #12301.
The drain-timeout branch of _stop_impl() was iterating the drain-start
snapshot (active_agents) when marking sessions resume_pending. That
snapshot can include sessions that finished gracefully during the drain
window — marking them would give their next turn a stray
'your previous turn was interrupted by a gateway restart' system note
even though the prior turn actually completed cleanly.
Iterate self._running_agents at timeout time instead, mirroring
_interrupt_running_agents() exactly:
- only sessions still blocking the shutdown get marked
- pending sentinels (AIAgent construction not yet complete) are skipped
Changes:
- gateway/run.py: swap active_agents.keys() for filtered
self._running_agents.items() iteration in the drain-timeout mark loop.
- tests/gateway/test_restart_resume_pending.py: two regression tests —
finisher-during-drain not marked, pending sentinel not marked.
The shutdown banner promised "send any message after restart to resume
where you left off" but the code did the opposite: a drain-timeout
restart skipped the .clean_shutdown marker, which made the next startup
call suspend_recently_active(), which marked the session suspended,
which made get_or_create_session() spawn a fresh session_id with a
'Session automatically reset. Use /resume...' notice — contradicting
the banner.
Introduce a resume_pending state on SessionEntry that is distinct from
suspended. Drain-timeout shutdown flags active sessions resume_pending
instead of letting startup-wide suspension destroy them. The next
message on the same session_key preserves the session_id, reloads the
transcript, and the agent receives a reason-aware restart-resume
system note that subsumes the existing tool-tail auto-continue note
(PR #9934).
Terminal escalation still flows through the existing
.restart_failure_counts stuck-loop counter (PR #7536, threshold 3) —
no parallel counter on SessionEntry. suspended still wins over
resume_pending in get_or_create_session() so genuinely stuck sessions
converge to a clean slate.
Spec: PR #11852 (BrennerSpear). Implementation follows the spec with
the approved correction (reuse .restart_failure_counts rather than
adding a resume_attempts field).
Changes:
- gateway/session.py: SessionEntry.resume_pending/resume_reason/
last_resume_marked_at + to_dict/from_dict; SessionStore
.mark_resume_pending()/clear_resume_pending(); get_or_create_session()
returns existing entry when resume_pending (suspended still wins);
suspend_recently_active() skips resume_pending entries.
- gateway/run.py: _stop_impl() drain-timeout branch marks active
sessions resume_pending before _interrupt_running_agents();
_run_agent() injects reason-aware restart-resume system note that
subsumes the tool-tail case; successful-turn cleanup also clears
resume_pending next to _clear_restart_failure_count();
_notify_active_sessions_of_shutdown() softens the restart banner to
'I'll try to resume where you left off' (honest about stuck-loop
escalation).
- tests/gateway/test_restart_resume_pending.py: 29 new tests covering
SessionEntry roundtrip, mark/clear helpers, get_or_create_session
precedence (suspended > resume_pending), suspend_recently_active
skip, drain-timeout mark reason (restart vs shutdown), system-note
injection decision tree (including tool-tail subsumption), banner
wording, and stuck-loop escalation override.
* feat(steer): /steer <prompt> injects a mid-run note after the next tool call
Adds a new slash command that sits between /queue (turn boundary) and
interrupt. /steer <text> stashes the message on the running agent and
the agent loop appends it to the LAST tool result's content once the
current tool batch finishes. The model sees it as part of the tool
output on its next iteration.
No interrupt is fired, no new user turn is inserted, and no prompt
cache invalidation happens beyond the normal per-turn tool-result
churn. Message-role alternation is preserved — we only modify an
existing role:"tool" message's content.
Wiring
------
- hermes_cli/commands.py: register /steer + add to ACTIVE_SESSION_BYPASS_COMMANDS.
- run_agent.py: add _pending_steer state, AIAgent.steer(), _drain_pending_steer(),
_apply_pending_steer_to_tool_results(); drain at end of both parallel and
sequential tool executors; clear on interrupt; return leftover as
result['pending_steer'] if the agent exits before another tool batch.
- cli.py: /steer handler — route to agent.steer() when running, fall back to
the regular queue otherwise; deliver result['pending_steer'] as next turn.
- gateway/run.py: running-agent intercept calls running_agent.steer(); idle-agent
path strips the prefix and forwards as a regular user message.
- tui_gateway/server.py: new session.steer JSON-RPC method.
- ui-tui: SessionSteerResponse type + local /steer slash command that calls
session.steer when ui.busy, otherwise enqueues for the next turn.
Fallbacks
---------
- Agent exits mid-steer → surfaces in run_conversation result as pending_steer
so CLI/gateway deliver it as the next user turn instead of silently dropping it.
- All tools skipped after interrupt → re-stashes pending_steer for the caller.
- No active agent → /steer reduces to sending the text as a normal message.
Tests
-----
- tests/run_agent/test_steer.py — accept/reject, concatenation, drain,
last-tool-result injection, multimodal list content, thread safety,
cleared-on-interrupt, registry membership, bypass-set membership.
- tests/gateway/test_steer_command.py — running agent, pending sentinel,
missing steer() method, rejected payload, empty payload.
- tests/gateway/test_command_bypass_active_session.py — /steer bypasses
the Level-1 base adapter guard.
- tests/test_tui_gateway_server.py — session.steer RPC paths.
72/72 targeted tests pass under scripts/run_tests.sh.
* feat(steer): register /steer in Discord's native slash tree
Discord's app_commands tree is a curated subset of slash commands (not
derived from COMMAND_REGISTRY like Telegram/Slack). /steer already
works there as plain text (routes through handle_message → base
adapter bypass → runner), but registering it here adds Discord's
native autocomplete + argument hint UI so users can discover and
type it like any other first-class command.
base.py's _keep_typing refresh loop calls send_typing every ~2s while
the agent is processing. If signal-cli returns NETWORK_FAILURE for the
recipient (offline, unroutable, group membership lost), the unmitigated
path was a WARNING log every 2 seconds for as long as the agent stayed
busy — a user report showed 1048 warnings in 41 minutes for one
offline contact, plus the matching volume of pointless RPC traffic to
signal-cli.
- _rpc() accepts log_failures=False so callers can route repeated
expected failures (typing) to DEBUG while keeping send/receive at
WARNING.
- send_typing() tracks consecutive failures per chat. First failure
still logs WARNING so transport issues remain visible; subsequent
failures log at DEBUG. After three consecutive failures we skip the
RPC during an exponential cooldown (16s, 32s, 60s cap) so we stop
hammering signal-cli for a recipient it can't deliver to. A
successful sendTyping resets the counters.
- _stop_typing_indicator() clears the backoff state so the next agent
turn starts fresh.
E2E simulation against the reported 41-minute window: RPCs drop from
1230 to 45 (-96%), log lines from 1048 WARNINGs to 1 WARNING + 44
DEBUGs.
Credits kshitijk4poor (#12056) for the _rpc log_failures kwarg idea;
the broader restructure in that PR (nested per-chat loop inside
send_typing) is avoided here in favour of stateful backoff that
preserves base.py's existing _keep_typing architecture.
When a Telegram /restart fires and PTB's graceful-shutdown `get_updates`
ACK call times out ("When polling for updates is restarted, updates may
be received twice" in gateway.log), the new gateway receives the same
/restart again and restarts a second time — a self-perpetuating loop.
Record the triggering update_id in `.restart_last_processed.json` when
handling /restart. On the next process, reject a /restart whose
update_id <= the recorded one as a stale redelivery. 5-minute staleness
guard so an orphaned marker can't block a legitimately new /restart.
- gateway/platforms/base.py: add `platform_update_id` to MessageEvent
- gateway/platforms/telegram.py: propagate `update.update_id` through
_build_message_event for text/command/location/media handlers
- gateway/run.py: write dedup marker in _handle_restart_command;
_is_stale_restart_redelivery checks it before processing /restart
- tests/gateway/test_restart_redelivery_dedup.py: 9 new tests covering
fresh restart, redelivery, staleness window, cross-platform,
malformed-marker resilience, and no-update_id (CLI) bypass
Only active for Telegram today (the one platform with monotonic
cross-session update ordering); other platforms return False from
_is_stale_restart_redelivery and proceed normally.
Error messages that tell users to install optional extras now use
{sys.executable} -m pip install ... instead of a bare 'pip install
hermes-agent[extra]' string. Under the curl installer, bare 'pip'
resolves to system pip, which either fails with PEP 668
externally-managed-environment or installs into the wrong Python.
Affects: hermes dashboard, hermes web server startup, mcp_serve,
hermes doctor Bedrock check, CLI voice mode, voice_mode tool runtime
error, Discord voice-channel join failure message.
Extend forum support from PR #10145:
- REST path (_send_discord): forum thread creation now uploads media
files as multipart attachments on the starter message in a single
call. Previously media files were silently dropped on the forum
path.
- Websocket media paths (_send_file_attachment, send_voice, send_image,
send_animation — covers send_image_file, send_video, send_document
transitively): forum channels now go through a new _forum_post_file
helper that creates a thread with the file as starter content,
instead of failing via channel.send(file=...) which forums reject.
- _send_to_forum chunk follow-up failures are collected into
raw_response['warnings'] so partial-send outcomes surface.
- Process-local probe cache (_DISCORD_CHANNEL_TYPE_PROBE_CACHE) avoids
GET /channels/{id} on every uncached send after the first.
- Dedup of TestSendDiscordMedia that the PR merge-resolution left
behind.
- Docs: Forum Channels section under website/docs/user-guide/messaging/discord.md.
Tests: 117 passed (22 new for forum+media, probe cache, warnings).
* fix(gateway): detect legacy hermes.service units from pre-rename installs
Older Hermes installs used a different service name (hermes.service) before
the rename to hermes-gateway.service. When both units remain installed, they
fight over the same bot token — after PR #5646's signal-recovery change,
this manifests as a 30-second SIGTERM flap loop between the two services.
Detection is an explicit allowlist (no globbing) plus an ExecStart content
check, so profile units (hermes-gateway-<profile>.service) and unrelated
third-party services named 'hermes' are never matched.
Wired into systemd_install, systemd_status, gateway_setup wizard, and the
main hermes setup flow — anywhere we already warn about scope conflicts now
also warns about legacy units.
* feat(gateway): add migrate-legacy command + install-time removal prompt
- New hermes_cli.gateway.remove_legacy_hermes_units() removes legacy
unit files with stop → disable → unlink → daemon-reload. Handles user
and system scopes separately; system scope returns path list when not
running as root so the caller can tell the user to re-run with sudo.
- New 'hermes gateway migrate-legacy' subcommand (with --dry-run and -y)
routes to remove_legacy_hermes_units via gateway_command dispatch.
- systemd_install now offers to remove legacy units BEFORE installing
the new hermes-gateway.service, preventing the SIGTERM flap loop that
hits users who still have pre-rename hermes.service around.
Profile units (hermes-gateway-<profile>.service) remain untouched in
all paths — the legacy allowlist is explicit (_LEGACY_SERVICE_NAMES)
and the ExecStart content check further narrows matches.
* fix(gateway): mark --replace SIGTERM as planned so target exits 0
PR #5646 made SIGTERM exit the gateway with code 1 so systemd's
Restart=on-failure revives it after unexpected kills. But when a user has
two gateway units fighting for the same bot token (e.g. legacy
hermes.service + hermes-gateway.service from a pre-rename install), the
--replace takeover itself becomes the 'unexpected' SIGTERM — the loser
exits 1, systemd revives it 30s later, and the cycle flaps indefinitely.
Before calling terminate_pid(), --replace now writes a short-lived marker
file naming the target PID + start_time. The target's shutdown_signal_handler
consumes the marker and, when it names this process, leaves
_signal_initiated_shutdown=False so the final exit code stays 0.
Staleness defences:
- PID + start_time combo prevents PID reuse matching an old marker
- Marker older than 60s is treated as stale and discarded
- Marker is unlinked on first read even if it doesn't match this process
- Replacer clears the marker post-loop + on permission-denied give-up
Cherry-picked from #10985 by pedh, adapted to current main:
* Keeps main's full group-chat gating (require_mention + allowed_users +
free_response_chats + mention_patterns) — PR's simpler subset dropped.
* Keeps main's fire-and-forget process() dispatch + session_webhook
fallback for SDK >= 0.24.
* Picks up PR's REQUIRES_EDIT_FINALIZE capability flag on
BasePlatformAdapter + finalize kwarg on edit_message(), plumbed through
stream_consumer. Default False so Telegram/Slack/Discord/Matrix stay
on the zero-overhead fast path.
* DingTalk AI Card lifecycle: per-chat _message_contexts, two-card flow
(tool-progress + final response) with sibling auto-close driven by
reply_to, idempotent 🤔Thinking → 🥳Done swap, $alibabacloud-dingtalk$
for media URL resolution (replaces raw HTTP that was 403-ing).
* pyproject: dingtalk extra now dingtalk-stream>=0.20,<1 +
alibabacloud-dingtalk>=2.0.0 + qrcode.
Closes#10991
Co-authored-by: pedh
Follow-up polish on top of the cherry-picked #11023 commit.
- feishu_comment_rules.py: replace import-time "~/.hermes" expanduser fallback
with get_hermes_home() from hermes_constants (canonical, profile-safe).
- tools/feishu_doc_tool.py, tools/feishu_drive_tool.py: drop the
asyncio.get_event_loop().run_until_complete(asyncio.to_thread(...)) dance.
Tool handlers run synchronously in a worker thread with no running loop, so
the RuntimeError branch was always the one that executed. Calls client.request
directly now. Unused asyncio import removed.
- tests/gateway/test_feishu.py: add register_p2_customized_event to the mock
EventDispatcher builder so the existing adapter test matches the new handler
registration for drive.notice.comment_add_v1.
- scripts/release.py: map liujinkun@bytedance.com -> liujinkun2025 for
contributor attribution on release notes.
- Full comment handler: parse drive.notice.comment_add_v1 events, build
timeline, run agent, deliver reply with chunking support.
- 5 tools: feishu_doc_read, feishu_drive_list_comments,
feishu_drive_list_comment_replies, feishu_drive_reply_comment,
feishu_drive_add_comment.
- 3-tier access control rules (exact doc > wildcard "*" > top-level >
defaults) with per-field fallback. Config via
~/.hermes/feishu_comment_rules.json, mtime-cached hot-reload.
- Self-reply filter using generalized self_open_id (supports future
user-identity subscriptions). Receiver check: only process events
where the bot is the @mentioned target.
- Smart timeline selection, long text chunking, semantic text extraction,
session sharing per document, wiki link resolution.
Change-Id: I31e82fd6355173dbcc400b8934b6d9799e3137b9
Follow-up to the cherry-picked contributor fix:
- Extract `_remember_chat_req_id()` and bound it at DEDUP_MAX_SIZE like
`_reply_req_ids` — the unbounded dict would grow forever on a long-
running gateway with many chats.
- Move the cache write to AFTER the group/DM policy check so we don't
cache req_ids from blocked senders.
- Revert the undocumented `is_group` change: the contributor flipped
`chattype == 'group'` to `bool(chatid)`, which wasn't mentioned in
the PR description and weakens the signal (chattype is the explicit
hint; relying on chatid presence assumes DMs never carry it). Keep
the original check.
- Drop the defensive `getattr(self, '_last_chat_req_ids', {})` reads
at both send sites — the attribute is initialized in __init__.
- Update `test_send_uses_passive_reply_stream_...` → `_markdown_...`
to match the new msgtype, and add a new TestWeComZombieSessionFix
class covering device_id presence in subscribe, per-chat req_id
caching + bounding, blocked-sender cache exclusion, and the group
APP_CMD_RESPONSE fallback path.
Follow-up to WideLee's salvaged PR #11582.
Back-compat for QQ_HOME_CHANNEL → QQBOT_HOME_CHANNEL rename:
- gateway/config.py reads QQBOT_HOME_CHANNEL, falls back to QQ_HOME_CHANNEL
with a one-shot deprecation warning so users on the old name aren't
silently broken.
- cron/scheduler.py: _HOME_TARGET_ENV_VARS['qqbot'] now maps to the new
name; _get_home_target_chat_id falls back to the legacy name via a
_LEGACY_HOME_TARGET_ENV_VARS table.
- hermes_cli/status.py + hermes_cli/setup.py: honor both names when
displaying or checking for missing home channels.
- hermes_cli/config.py: keep legacy QQ_HOME_CHANNEL[_NAME] in
_EXTRA_ENV_KEYS so .env sanitization still recognizes them.
Scope cleanup:
- Drop qrcode from core dependencies and requirements.txt (remains in
messaging/dingtalk/feishu extras). _qqbot_render_qr already degrades
gracefully when qrcode is missing, printing a 'pip install qrcode' tip
and falling back to URL-only display.
- Restore @staticmethod on QQAdapter._detect_message_type (it doesn't
use self). Revert the test change that was only needed when it was
converted to an instance method.
- Reset uv.lock to origin/main; the PR's stale lock also included
unrelated changes (atroposlib source URL, hermes-agent version bump,
fastapi additions) that don't belong.
Verified E2E:
- Existing user (QQ_HOME_CHANNEL set): gateway + cron both pick up the
legacy name; deprecation warning logs once.
- Fresh user (QQBOT_HOME_CHANNEL set): gateway + cron use new name,
no warning.
- Both set: new name wins on both surfaces.
Targeted tests: 296 passed, 4 skipped (qqbot + cron + hermes_cli).
- Re-export _ssrf_redirect_guard from __init__.py
- Fix _parse_json @staticmethod using self._log_tag
- Update test_detect_message_type to call as instance method
- Fix mock.patch path for httpx.AsyncClient in adapter submodule
- Remove @staticmethod from _detect_message_type, _convert_silk_to_wav,
_convert_raw_to_wav, _convert_ffmpeg_to_wav so they can use self._log_tag
- Replace all remaining hardcoded "QQBot" log args with self._log_tag
- Downgrade STT routine flow logs (download, convert, success) from info to debug
- Keep warning level for actual failures (STT failed, ffmpeg error, empty transcript)
Three closely-related fixes for shutdown / lifecycle hygiene.
1. _release_running_agent_state(session_key) helper
----------------------------------------------------
Per-running-agent state lived in three dicts that drifted out of sync
across cleanup sites:
self._running_agents — AIAgent per session_key
self._running_agents_ts — start timestamp per session_key
self._busy_ack_ts — last busy-ack timestamp per session_key
Inventory before this PR:
8 sites: del self._running_agents[key]
— only 1 (stale-eviction) cleaned all three
— 1 cleaned _running_agents + _running_agents_ts only
— 6 cleaned _running_agents only
Each missed entry was a (str, float) tuple per session per gateway
lifetime — small, persistent, accumulates across thousands of
sessions over months. Per-platform leaks compounded.
This change adds a single helper that pops all three dicts in
lockstep, and replaces every bare 'del self._running_agents[key]'
site with it. Per-session state that PERSISTS across turns
(_session_model_overrides, _voice_mode, _pending_approvals,
_update_prompt_pending) is intentionally NOT touched here — those
have their own lifecycles tied to user actions, not turn boundaries.
2. _running_agents_ts cleared in _stop_impl
----------------------------------------
Was being missed alongside _running_agents.clear(); now included.
3. SessionDB close() in _stop_impl
---------------------------------
The SQLite WAL write lock stayed held by the old gateway connection
until Python actually exited — causing 'database is locked' errors
when --replace launched a new gateway against the same file. We
now explicitly close both self._db and self.session_store._db
inside _stop_impl, with try/except so a flaky close on one doesn't
block the other.
Tests
-----
tests/gateway/test_session_state_cleanup.py — 10 cases covering:
* helper pops all three dicts atomically
* idempotent on missing/empty keys
* preserves other sessions
* tolerates older runners without _busy_ack_ts attribute
* thread-safe under concurrent release
* regression guard: scans gateway/run.py and fails if a future
contributor reintroduces 'del self._running_agents[...]'
outside docstrings
* SessionDB close called on both holders during shutdown
* shutdown tolerates missing session_store
* shutdown tolerates close() raising on one db (other still closes)
Broader gateway suite: 3108 passed (vs 3100 on baseline) — failure
delta is +8 net passes; the 10 remaining failures are pre-existing
cross-test pollution / missing optional deps (matrix needs olm,
signal/telegram approval flake, dingtalk Mock wiring), all reproduce
on stashed baseline.
Telegram's MarkdownV2 has no table syntax — pipes get backslash-escaped
and tables render as noisy unaligned text. format_message now detects
GFM-style pipe tables (header row + delimiter row + optional body) and
wraps them in ``` fences before the existing MarkdownV2 conversion runs.
Telegram renders fenced code blocks as monospace preformatted text with
columns intact.
Tables already inside an existing code block are left alone. Plain
prose with pipes, lone '---' horizontal rules, and non-table content
are unaffected.
Closes the recurring community request to stop having to ask the agent
to re-render tables as code blocks manually.
SessionStore._entries grew unbounded. Every unique
(platform, chat_id, thread_id, user_id) tuple ever seen was kept in
RAM and rewritten to sessions.json on every message. A Discord bot
in 100 servers x 100 channels x ~100 rotating users accumulates on
the order of 10^5 entries after a few months; each sessions.json
write becomes an O(n) fsync. Nothing trimmed this — there was no
TTL, no cap, no eviction path.
Changes
-------
* SessionStore.prune_old_entries(max_age_days) — drops entries whose
updated_at is older than the cutoff. Preserves:
- suspended entries (user paused them via /stop for later resume)
- entries with an active background process attached
Pruning is functionally identical to a natural reset-policy expiry:
SQLite transcript stays, session_key -> session_id mapping dropped,
returning user gets a fresh session.
* GatewayConfig.session_store_max_age_days (default 90; 0 disables).
Serialized in to_dict/from_dict, coerced from bad types / negatives
to safe defaults. No migration needed — missing field -> 90 days.
* _session_expiry_watcher calls prune_old_entries once per hour
(first tick is immediate). Uses the existing watcher loop so no
new background task is created.
Why not more aggressive
-----------------------
90 days is long enough that legitimate long-idle users (seasonal,
vacation, etc.) aren't surprised — pruning just means they get a
fresh session on return, same outcome they'd get from any other
reset-policy trigger. Admins can lower it via config; 0 disables.
Tests
-----
tests/gateway/test_session_store_prune.py — 17 cases covering:
* entry age based on updated_at, not created_at
* max_age_days=0 disables; negative coerces to 0
* suspended + active-process entries are skipped
* _save fires iff something was removed
* disk JSON reflects post-prune state
* thread safety against concurrent readers
* config field roundtrips + graceful fallback on bad values
* watcher gate logic (first tick prunes, subsequent within 1h don't)
119 broader session/gateway tests remain green.
- Use certifi CA bundle for aiohttp SSL in qr_login(), start(), and
send_weixin_direct() to fix SSL verification failures against
Tencent's iLink server on macOS (Homebrew OpenSSL lacks system certs)
- Fix QR code data: encode qrcode_img_content (full liteapp URL) instead
of raw hex token — WeChat needs the full URL to resolve the scan
- Render ASCII QR on refresh so the user can re-scan without restarting
- Improve error message on QR render failure to show the actual exception
Tested on macOS (Apple Silicon, Homebrew Python 3.13)
iLink context_token has a limited TTL. When no user message has arrived
for an extended period (e.g. overnight), cron-initiated pushes fail with
errcode -14 (session timeout).
Tested that iLink accepts sends without context_token as a degraded
fallback, so we now automatically strip the expired token and retry
once. This keeps scheduled push messages (weather, digests, etc.)
working reliably without requiring a user message to refresh the
session first.
Changes:
- _send_text_chunk() catches iLinkDeliveryError with session-expired
errcode (-14) and retries without context_token
- Stale tokens are cleared from ContextTokenStore on session expiry
- All 34 existing weixin tests pass
Previously a message like `<@&1490963422786093149> help` would spawn a
thread literally named `<@&1490963422786093149> help`, exposing raw
Discord mention markers in the thread list. Only user mentions
(`<@id>`) were being stripped upstream — role mentions (`<@&id>`) and
channel mentions (`<#id>`) leaked through.
Fix: strip all three mention patterns in `_auto_create_thread` before
building the thread name. Collapse runs of whitespace left by the
removal. If the entire content was mention-only, fall back to 'Hermes'
instead of an empty title.
Fixes#6336.
Tests: two new regression guards in test_discord_slash_commands.py
covering mixed-mention content and mention-only content.
Free-response channels already bypassed the @mention gate so users could
chat inline with the bot, but auto-threading still fired on every
message — spinning off a thread per message and defeating the
lightweight-chat purpose.
Fix: fold `is_free_channel` into `skip_thread` so threading is skipped
whenever the channel is in DISCORD_FREE_RESPONSE_CHANNELS (via env or
discord.free_response_channels in config.yaml).
Net change: one line in _handle_message + one regression test.
Partially addresses #9399. Authored by @Hypn0sis (salvaged from PR #9650;
the bundled 'smart' auto-thread mode from that PR was dropped in favor
of deterministic true/false semantics).
* fix(gateway): bound _agent_cache with LRU cap + idle TTL eviction
The per-session AIAgent cache was unbounded. Each cached AIAgent holds
LLM clients, tool schemas, memory providers, and a conversation buffer.
In a long-lived gateway serving many chats/threads, cached agents
accumulated indefinitely — entries were only evicted on /new, /model,
or session reset.
Changes:
- Cache is now an OrderedDict so we can pop least-recently-used entries.
- _enforce_agent_cache_cap() pops entries beyond _AGENT_CACHE_MAX_SIZE=64
when a new agent is inserted. LRU order is refreshed via move_to_end()
on cache hits.
- _sweep_idle_cached_agents() evicts entries whose AIAgent has been idle
longer than _AGENT_CACHE_IDLE_TTL_SECS=3600s. Runs from the existing
_session_expiry_watcher so no new background task is created.
- The expiry watcher now also pops the cache entry after calling
_cleanup_agent_resources on a flushed session — previously the agent
was shut down but its reference stayed in the cache dict.
- Evicted agents have _cleanup_agent_resources() called on a daemon
thread so the cache lock isn't held during slow teardown.
Both tuning constants live at module scope so tests can monkeypatch
them without touching class state.
Tests: 7 new cases in test_agent_cache.py covering LRU eviction,
move_to_end refresh, cleanup thread dispatch, idle TTL sweep,
defensive handling of agents without _last_activity_ts, and plain-dict
test fixture tolerance.
* tweak: bump _AGENT_CACHE_MAX_SIZE 64 -> 128
* fix(gateway): never evict mid-turn agents; live spillover tests
The prior commit could tear down an active agent if its session_key
happened to be LRU when the cap was exceeded. AIAgent.close() kills
process_registry entries for the task, tears down the terminal
sandbox, closes the OpenAI client (sets self.client = None), and
cascades .close() into any active child subagents — all fatal if
the agent is still processing a turn.
Changes:
- _enforce_agent_cache_cap and _sweep_idle_cached_agents now look at
GatewayRunner._running_agents and skip any entry whose AIAgent
instance is present (identity via id(), so MagicMock doesn't
confuse lookup in tests). _AGENT_PENDING_SENTINEL is treated
as 'not active' since no real agent exists yet.
- Eviction only considers the LRU-excess window (first size-cap
entries). If an excess slot is held by a mid-turn agent, we skip
it WITHOUT compensating by evicting a newer entry. A freshly
inserted session (zero cache history) shouldn't be punished to
protect a long-lived one that happens to be busy.
- Cache may therefore stay transiently over cap when load spikes;
a WARNING is logged so operators can see it, and the next insert
re-runs the check after some turns have finished.
New tests (TestAgentCacheActiveSafety + TestAgentCacheSpilloverLive):
- Active LRU entry is skipped; no newer entry compensated
- Mixed active/idle excess window: only idle slots go
- All-active cache: no eviction, WARNING logged, all clients intact
- _AGENT_PENDING_SENTINEL doesn't block other evictions
- Idle-TTL sweep skips active agents
- End-to-end: active agent's .client survives eviction attempt
- Live fill-to-cap with real AIAgents, then spillover
- Live: CAP=4 all active + 1 newcomer — cache grows to 5, no teardown
- Live: 8 threads racing 160 inserts into CAP=16 — settles at 16
- Live: evicted session's next turn gets a fresh agent that works
30 tests pass (13 pre-existing + 17 new). Related gateway suites
(model switch, session reset, proxy, etc.) all green.
* fix(gateway): cache eviction preserves per-task state for session resume
The prior commits called AIAgent.close() on cache-evicted agents, which
tears down process_registry entries, terminal sandbox, and browser
daemon for that task_id — permanently. Fine for session-expiry (session
ended), wrong for cache eviction (session may resume).
Real-world scenario: a user leaves a Telegram session open for 2+ hours,
idle TTL evicts the cached AIAgent, user returns and sends a message.
Conversation history is preserved via SessionStore, but their terminal
sandbox (cwd, env vars, bg shells) and browser state were destroyed.
Fix: split the two cleanup modes.
close() Full teardown — session ended. Kills bg procs,
tears down terminal sandbox + browser daemon,
closes LLM client. Used by session-expiry,
/new, /reset (unchanged).
release_clients() Soft cleanup — session may resume. Closes
LLM client only. Leaves process_registry,
terminal sandbox, browser daemon intact
for the resuming agent to inherit via
shared task_id.
Gateway cache eviction (_enforce_agent_cache_cap, _sweep_idle_cached_agents)
now dispatches _release_evicted_agent_soft on the daemon thread instead
of _cleanup_agent_resources. All session-expiry call sites of
_cleanup_agent_resources are unchanged.
Tests (TestAgentCacheIdleResume, 5 new cases):
- release_clients does NOT call process_registry.kill_all
- release_clients does NOT call cleanup_vm / cleanup_browser
- release_clients DOES close the LLM client (agent.client is None after)
- close() vs release_clients() — semantic contract pinned
- Idle-evicted session's rebuild with same session_id gets same task_id
Updated test_cap_triggers_cleanup_thread to assert the soft path fires
and the hard path does NOT.
35 tests pass in test_agent_cache.py; 67 related tests green.
- gateway/platforms/weixin.py:
- Split aiohttp.ClientSession into _poll_session and _send_session
- Add _LIVE_ADAPTERS registry so send_weixin_direct() reuses the connected gateway adapter instead of creating a competing session
- Fixes silent message loss when gateway is running (iLink token contention)
- cron/scheduler.py:
- Support comma-separated deliver values (e.g. 'feishu,weixin') for multi-target delivery
- Delay pconfig/enabled check until standalone fallback so live adapters work even when platform is not in gateway config
- tools/send_message_tool.py:
- Synthesize PlatformConfig from WEIXIN_* env vars when gateway config lacks a weixin entry
- Fall back to WEIXIN_HOME_CHANNEL env var for home channel resolution
- tests/gateway/test_weixin.py:
- Update mocks to include _send_session
Follow-ups to the salvaged commits in this PR:
* gateway/config.py — strip trailing whitespace from youngDoo's diff
(line 315 had ~140 trailing spaces).
* hermes_cli/tools_config.py — replace `config.get("platform_toolsets", {})`
with `config.get("platform_toolsets") or {}`. Handles the case where the
YAML key is present but explicitly null (parses as None, previously
crashed with AttributeError on the next line's .get(platform)).
Cherry-picked from yyq4193's #9003 with attribution.
* tests/gateway/test_config.py — 4 new tests for TestGetConnectedPlatforms
covering DingTalk via extras, via env vars, disabled, and missing creds.
* tests/hermes_cli/test_tools_config.py — regression test for the null
platform_toolsets edge case.
* scripts/release.py — add kagura-agent, youngDoo, yyq4193 to AUTHOR_MAP.
Co-authored-by: yyq4193 <39405770+yyq4193@users.noreply.github.com>
Fixes#11463: DingTalk channel receives messages but fails to reply
with 'No session_webhook available'.
Two changes:
1. **Fire-and-forget message processing**: process() now dispatches
_on_message as a background task via asyncio.create_task instead of
awaiting it. This ensures the SDK ACK is returned immediately,
preventing heartbeat timeouts and disconnections when message
processing takes longer than the SDK's ACK deadline.
2. **session_webhook extraction fallback**: If ChatbotMessage.from_dict()
fails to map the sessionWebhook field (possible across SDK versions),
the handler now falls back to extracting it directly from the raw
callback data dict using both 'sessionWebhook' and 'session_webhook'
key variants.
Added 3 tests covering webhook extraction, fallback behavior, and
fire-and-forget ACK timing.
Two follow-ups to the cherry-picked PR #9873 (`e3bcc819`):
1. `_is_allowed_user` now uses `getattr(self, '_allowed_*_ids', set())`
so test fixtures that build the adapter via `object.__new__`
(skipping __init__) don't crash with AttributeError.
See AGENTS.md pitfall #17 — same pattern as gateway.run.
2. New 3-case regression coverage in test_discord_bot_auth_bypass.py:
- role-only config bypasses the gateway 'no allowlists' branch
- roles + users combined still authorizes user-allowlist matches
- the role bypass does NOT leak to other platforms (Telegram, etc.)
3. Autouse fixture in test_discord_bot_auth_bypass.py clears all Discord
auth env vars before each test so DISCORD_ALLOWED_ROLES leakage from
a previous test in the session can't flip later 'should-reject' tests
into false-pass.
Required because the bare cherry-pick of #9873 only added the adapter-
level role check — it didn't cover the gateway-level _is_user_authorized,
which still rejected role-only setups via the 'no allowlists configured'
branch.
Adds a new DISCORD_ALLOWED_ROLES environment variable that allows filtering
bot interactions by Discord role ID. Uses OR semantics with the existing
DISCORD_ALLOWED_USERS - if a user matches either allowlist, they're permitted.
Changes:
- Parse DISCORD_ALLOWED_ROLES comma-separated role IDs on connect
- Enable members intent when roles are configured (needed for role lookup)
- Update _is_allowed_user() to accept optional author param for direct role check
- Fallback to scanning mutual guilds when author object lacks roles (DMs, voice)
- Fully backwards compatible: no behavior change when env var is unset
Fixes#4466.
Root cause: two sequential authorization gates both independently rejected
bot messages, making DISCORD_ALLOW_BOTS completely ineffective.
Gate 1 — `discord.py` `on_message`:
_is_allowed_user ran BEFORE the bot filter, so bot senders were dropped
before the DISCORD_ALLOW_BOTS policy was ever evaluated.
Gate 2 — `gateway/run.py` _is_user_authorized:
The gateway-level allowlist check rejected bot IDs with 'Unauthorized
user: <bot_id>' even if they passed Gate 1.
Fix:
gateway/platforms/discord.py — reorder on_message so DISCORD_ALLOW_BOTS
runs BEFORE _is_allowed_user. Bots permitted by the filter skip the
user allowlist; non-bots are still checked.
gateway/session.py — add is_bot: bool = False to SessionSource so the
gateway layer can distinguish bot senders.
gateway/platforms/base.py — expose is_bot parameter in build_source.
gateway/platforms/discord.py _handle_message — set is_bot=True when
building the SessionSource for bot authors.
gateway/run.py _is_user_authorized — when source.is_bot is True AND
DISCORD_ALLOW_BOTS is 'mentions' or 'all', return True early. Platform
filter already validated the message at on_message; don't re-reject.
Behavior matrix:
| Config | Before | After |
| DISCORD_ALLOW_BOTS=none (default) | Blocked | Blocked |
| DISCORD_ALLOW_BOTS=all | Blocked | Allowed |
| DISCORD_ALLOW_BOTS=mentions + @mention | Blocked | Allowed |
| DISCORD_ALLOW_BOTS=mentions, no mention | Blocked | Blocked |
| Human in DISCORD_ALLOWED_USERS | Allowed | Allowed |
| Human NOT in DISCORD_ALLOWED_USERS | Blocked | Blocked |
Co-authored-by: Hermes Maintainer <hermes@nousresearch.com>
Closes#11321, closes#10259.
## Problem
The nested /skill command group (category subcommand groups + skill
subcommands) serialized to ~14KB with the default 75-skill catalog,
exceeding Discord's ~8000-byte per-command registration payload. The
entire tree.sync() rejected with error 50035 — ALL slash commands
including the 27 base commands failed to register.
## Fix
Replace the nested Group layout with a single flat Command:
/skill name:<autocomplete> args:<optional string>
Autocomplete options are fetched dynamically by Discord when the user
types — they do NOT count against the per-command registration budget.
So this single command registers at ~200 bytes regardless of how many
skills exist. Scales to thousands of skills with no size calculations,
no splitting, no hidden skills.
UX improvements:
- Discord live-filters by user's typed prefix against BOTH name and
description, so '/skill pdf' finds 'ocr-and-documents' via its
description. More discoverable than clicking through category menus.
- Unknown skill name → ephemeral error pointing user at autocomplete.
- Stable alphabetical ordering across restarts.
## Why not the other proposed approaches
Three prior PRs tried to fit within the 8KB limit by modifying the
nested layout:
- #10214 (njiangk): truncated all descriptions to 'Run <name>' and
category descriptions to 'Skills'. Works but destroys slash picker UX.
- #11385 (LeonSGP43): 40-char description clamp + iterative
trim-largest-category fallback. Works but HIDES skills the user can
no longer invoke via slash — functional regression.
- #10261 (zeapsu): adaptive split into /skill-<cat> top-level groups.
Preserves all skills but pollutes the slash namespace with 20
top-level commands.
All three work around the symptom. The flat autocomplete design
dissolves the problem — there is no payload-size pressure to manage.
## Tests
tests/gateway/test_discord_slash_commands.py — 5 new test cases replace
the 3 old nested-structure tests:
- flat-not-nested structure assertion
- empty skills → no command registered
- callback dispatches the right cmd_key by name
- unknown name → ephemeral error, no dispatch
- large-catalog regression guard (500 skills) — command payload stays
under 500 bytes regardless
E2E validated against real discord.py 2.7.1:
- Command registers as discord.app_commands.Command (not Group).
- Autocomplete filters by name AND description (verified across several
queries including description-only matches like 'pdf' → OCR skill).
- 500-skill catalog returns max 25 results per autocomplete query
(Discord's hard cap), filtered correctly.
- Choice labels formatted as 'name — description' clamped to 100 chars.
- feat: support one-click QR scan to create DingTalk bot and establish connection
- fix(gateway): wrap blocking DingTalkStreamClient.start() with asyncio.to_thread()
- fix(gateway): extract message fields from CallbackMessage payload instead of ChatbotMessage
- fix(gateway): add oapi.dingtalk.com to allowed webhook URL domains
- stop rewriting markdown tables, headings, and links before delivery
- keep markdown table blocks and headings together during chunking
- update Weixin tests and docs for native markdown rendering
Closes#10308
The Weixin adapter's send() method previously split and delivered the
raw response text without first extracting MEDIA: tags or bare local
file paths. This meant images, documents, and voice files referenced
by the agent were silently dropped in normal (non-streaming,
non-background) conversations.
Changes:
- In WeixinAdapter.send(), call extract_media() and
extract_local_files() before formatting/splitting text.
- Deliver extracted files via send_image_file(), send_document(),
send_voice(), or send_video() prior to sending text chunks.
- Also fix two minor typing issues in gateway/run.py where
extract_media() tuples were not unpacked correctly in background
and /btw task handlers.
Fixes missing media delivery on Weixin personal accounts.
Three open issues — #8242, #6587, #11345 — all trace to the same root
cause: the image / audio / document download paths in
`DiscordAdapter._handle_message` used plain, unauthenticated HTTP to
fetch `att.url`. That broke in three independent ways:
#8242 cdn.discordapp.com attachment URLs increasingly require the
bot session to download; unauthenticated httpx sees 403
Forbidden, image/voice analysis fail silently.
#6587 Some user environments (VPNs, corporate DNS, tunnels) resolve
cdn.discordapp.com to private-looking IPs. Our is_safe_url()
guard correctly blocks them as SSRF risks, but the user
environment is legitimate — image analysis and voice STT die.
#11345 The document download path skipped is_safe_url() entirely —
raw aiohttp.ClientSession.get(att.url) with no SSRF check,
inconsistent with the image/audio branches.
Unified fix: use `discord.Attachment.read()` as the primary download
path on all three branches. `att.read()` routes through discord.py's
own authenticated HTTPClient, so:
- Discord CDN auth is handled (#8242 resolved).
- Our is_safe_url() gate isn't consulted for the attachment path at
all — the bot session handles networking internally (#6587 resolved).
- All three branches now share the same code path, eliminating the
document-path SSRF gap (#11345 resolved).
Falls back to the existing cache_*_from_url helpers (image/audio) or an
SSRF-gated aiohttp fetch (documents) when `att.read()` is unavailable
or fails — preserves defense-in-depth for any future payload-schema
drift that could slip a non-CDN URL into att.url.
New helpers on DiscordAdapter:
- _read_attachment_bytes(att) — safe att.read() wrapper
- _cache_discord_image(att, ext) — primary + URL fallback
- _cache_discord_audio(att, ext) — primary + URL fallback
- _cache_discord_document(att, ext) — primary + SSRF-gated aiohttp fallback
Tests:
- tests/gateway/test_discord_attachment_download.py — 12 new cases
covering all three helpers: primary path, fallback on missing
.read(), fallback on validator rejection, SSRF guard on document
fallback, aiohttp fallback happy-path, and an E2E case via
_handle_message confirming cache_image_from_url is never invoked
when att.read() succeeds.
- All 11 existing document-handling tests continue to pass via the
aiohttp fallback path (their SimpleNamespace attachments have no
.read(), which triggers the fallback — now SSRF-gated).
Closes#8242, closes#6587, closes#11345.
When a WebSocket-based platform adapter (e.g. QQ Bot) temporarily
loses its connection, send() now polls is_connected for up to 15s
instead of immediately returning a non-retryable failure. If the
auto-reconnect completes within the window, the message is delivered
normally. On timeout, the SendResult is marked retryable=True so the
base class retry mechanism can attempt re-delivery.
Same treatment applied to _send_media().
Adds 4 async tests covering:
- Successful send after simulated reconnection
- Retryable failure on timeout
- Immediate success when already connected
- _send_media reconnection wait
Fixes#11163
DingTalk was the only messaging platform without group-mention gating or a
per-user allowlist. Slack, Telegram, Discord, WhatsApp, Matrix, and Mattermost
all support these via config.yaml + matching env vars; this change closes the
gap for DingTalk using the same surface:
Config:
platforms.dingtalk.require_mention: bool (env: DINGTALK_REQUIRE_MENTION)
platforms.dingtalk.mention_patterns: list (env: DINGTALK_MENTION_PATTERNS)
platforms.dingtalk.free_response_chats: list (env: DINGTALK_FREE_RESPONSE_CHATS)
platforms.dingtalk.allowed_users: list (env: DINGTALK_ALLOWED_USERS)
Semantics mirror Telegram's implementation:
- DMs are always accepted (subject to allowed_users).
- Group messages are accepted only when the chat is allowlisted, mention is
not required, the bot was @mentioned (dingtalk_stream sets is_in_at_list),
or the text matches a configured regex wake-word.
- allowed_users matches sender_id / sender_staff_id case-insensitively;
a single "*" disables the check.
Rationale: without this, any DingTalk user in a group chat can trigger the
bot, which makes DingTalk less safe to deploy than the other platforms. A
user's config.yaml already accepts require_mention for dingtalk but the value
was silently ignored.
The send_image_file method in WeixinAdapter used 'path' as parameter
name, but BasePlatformAdapter and gateway callers use 'image_path'.
This mismatch caused image sending to fail when called through the
gateway's extract_media path.
Changed parameter name from 'path' to 'image_path' to match the
interface defined in base.py and the calls in gateway/run.py.
discord.py does not apply a default AllowedMentions to the client, so any
reply whose content contains @everyone/@here or a role mention would ping
the whole server — including verbatim echoes of user input or LLM output
that happens to contain those tokens.
Set a safe default on commands.Bot: everyone=False, roles=False,
users=True, replied_user=True. Operators can opt back in via four
DISCORD_ALLOW_MENTION_* env vars or discord.allow_mentions.* in
config.yaml. No behavior change for normal user/reply pings.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The cherry-picked SDK compat fix (previous commit) wired process() to
parse CallbackMessage.data into a ChatbotMessage, but _extract_text()
was still written against the pre-0.20 payload shape:
* message.text changed from dict {content: ...} → TextContent object.
The old code's str(text) fallback produced 'TextContent(content=...)'
as the agent's input, so every received message came in mangled.
* rich_text moved from message.rich_text (list) to
message.rich_text_content.rich_text_list.
This preserves legacy fallbacks (dict-shaped text, bare rich_text list)
while handling the current SDK layout via hasattr(text, 'content').
Adds regression tests covering:
* webhook domain allowlist (api.*, oapi.*, and hostile lookalikes)
* _IncomingHandler.process is a coroutine function
* _extract_text against TextContent object, dict, rich_text_content,
legacy rich_text, and empty-message cases
Also adds kevinskysunny to scripts/release.py AUTHOR_MAP (release CI
blocks unmapped emails).
Inbound Feishu messages arriving during brief windows when the adapter
loop is unavailable (startup/restart transitions, network-flap reconnect)
were silently dropped with a WARNING log. This matches the symptom in
issue #5499 — and users have reported seeing only a subset of their
messages reach the agent.
Fix: queue pending events in a thread-safe list and spawn a single
drainer thread that replays them once the loop becomes ready. Covers
these scenarios:
* Queue events instead of dropping when loop is None/closed
* Single drainer handles the full queue (not thread-per-event)
* Thread-safe with threading.Lock on the queue and schedule flag
* Handles mid-drain bursts (new events arrive while drainer is working)
* Handles RuntimeError if loop closes between check and submit
* Depth cap (1000) prevents unbounded growth during extended outages
* Drops queue cleanly on disconnect rather than holding forever
* Safety timeout (120s) prevents infinite retention on broken adapters
Based on the approach proposed in #4789 by milkoor, rewritten for
thread-safety and correctness.
Test plan:
* 5 new unit tests (TestPendingInboundQueue) — all passing
* E2E test with real asyncio loop + fake WS thread: 10-event burst
before loop ready → all 10 delivered in order
* E2E concurrent burst test: 20 events queued, 20 more arrive during
drainer dispatch → all 40 delivered, no loss, no duplicates
* All 111 existing feishu tests pass
Related: #5499, #4789
Co-authored-by: milkoor <milkoor@users.noreply.github.com>
All 61 TUI-related tests green across 3 consecutive xdist runs.
tests/tui_gateway/test_protocol.py:
- rename `get_messages` → `get_messages_as_conversation` on mock DB (method
was renamed in the real backend, test was still stubbing the old name)
- update tool-message shape expectation: `{role, name, context}` matches
current `_history_to_messages` output, not the legacy `{role, text}`
tests/hermes_cli/test_tui_resume_flow.py:
- `cmd_chat` grew a first-run provider-gate that bailed to "Run: hermes
setup" before `_launch_tui` was ever reached; 3 tests stubbed
`_resolve_last_session` + `_launch_tui` but not the gate
- factored a `main_mod` fixture that stubs `_has_any_provider_configured`,
reused by all three tests
tests/test_tui_gateway_server.py:
- `test_config_set_personality_resets_history_and_returns_info` was flaky
under xdist because the real `_write_config_key` touches
`~/.hermes/config.yaml`, racing with any other worker that writes
config. Stub it in the test.
The Discord voice receive path skipped RFC 3550 §5.1 padding handling,
passing padding-contaminated payloads into DAVE E2EE decrypt and Opus
decode. Symptoms in live VC sessions: deaf inbound speech, intermittent
empty STT results, "corrupted stream" decode errors — especially on the
first reply after join.
When the P bit is set in the RTP header, the last payload byte holds the
count of trailing padding bytes (including itself) that must be removed.
Receive pipeline now follows the spec order:
1. RTP header parse
2. NaCl transport decrypt (aead_xchacha20_poly1305_rtpsize)
3. strip encrypted RTP extension data from start
4. strip RTP padding from end if P bit set ← was missing
5. DAVE inner media decrypt
6. Opus decode
Drops malformed packets where pad_len is 0 or exceeds payload length.
Adds 7 integration tests covering valid padded packets, the X+P combined
case, padding under DAVE passthrough, and three malformed-padding paths.
Closes#11267
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* - make buffered streaming
- fix path naming to expand `~` for agent.
- fix stripping of matrix ID to not remove other mentions / localports.
* fix(matrix): register MembershipEventDispatcher for invite auto-join
The mautrix migration (#7518) broke auto-join because InternalEventType.INVITE
events are only dispatched when MembershipEventDispatcher is registered on the
client. Without it, _on_invite is dead code and the bot silently ignores all
room invites.
Closes#10094Closes#10725
Refs: PR #10135 (digging-airfare-4u), PR #10732 (fxfitz)
* fix(matrix): preserve _joined_rooms reference for CryptoStateStore
connect() reassigned self._joined_rooms = set(...) after initial sync,
orphaning the reference captured by _CryptoStateStore at init time.
find_shared_rooms() returned [] forever, breaking Megolm session rotation
on membership changes.
Mutate in place with clear() + update() so the CryptoStateStore reference
stays valid.
Refs #8174, PR #8215
* fix(matrix): remove dual ROOM_ENCRYPTED handler to fix dedup race
mautrix auto-registers DecryptionDispatcher when client.crypto is set.
The adapter also registered _on_encrypted_event for the same event type.
_on_encrypted_event had zero awaits and won the race to mark event IDs
in the dedup set, causing _on_room_message to drop successfully decrypted
events from DecryptionDispatcher. The retry loop masked this by re-decrypting
every message ~4 seconds later.
Remove _on_encrypted_event entirely. DecryptionDispatcher handles decryption;
genuinely undecryptable events are logged by mautrix and retried on next
key exchange.
Refs #8174, PR #8215
* fix(matrix): re-verify device keys after share_keys() upload
Matrix homeservers treat ed25519 identity keys as immutable per device.
share_keys() can return 200 but silently ignore new keys if the device
already exists with different identity keys. The bot would proceed with
shared=True while peers encrypt to the old (unreachable) keys.
Now re-queries the server after share_keys() and fails closed if keys
don't match, with an actionable error message.
Refs #8174, PR #8215
* fix(matrix): encrypt outbound attachments in E2EE rooms
_upload_and_send() uploaded raw bytes and used the 'url' key for all
rooms. In E2EE rooms, media must be encrypted client-side with
encrypt_attachment(), the ciphertext uploaded, and the 'file' key
(with key/iv/hashes) used instead of 'url'.
Now detects encrypted rooms via state_store.is_encrypted() and
branches to the encrypted upload path.
Refs: PR #9822 (charles-brooks)
* fix(matrix): add stop_typing to clear typing indicator after response
The adapter set a 30-second typing timeout but never cleared it.
The base class stop_typing() is a no-op, so the typing indicator
lingered for up to 30 seconds after each response.
Closes#6016
Refs: PR #6020 (r266-tech)
* fix(matrix): cache all media types locally, not just photos/voice
should_cache_locally only covered PHOTO, VOICE, and encrypted media.
Unencrypted audio/video/documents in plaintext rooms were passed as MXC
URLs that require authentication the agent doesn't have, resulting
in 401 errors.
Refs #3487, #3806
* fix(matrix): detect stale OTK conflict on startup and fail closed
When crypto state is wiped but the same device ID is reused, the
homeserver may still hold one-time keys signed with the previous
identity key. Identity key re-upload succeeds but OTK uploads fail
with "already exists" and a signature mismatch. Peers cannot
establish new Olm sessions, so all new messages are undecryptable.
Now proactively flushes OTKs via share_keys() during connect() and
catches the "already exists" error with an actionable log message
telling the operator to purge the device from the homeserver or
generate a fresh device ID.
Also documents the crypto store recovery procedure in the Matrix
setup guide.
Refs #8174
* docs(matrix): improve crypto recovery docs per review
- Put easy path (fresh access token) first, manual purge second
- URL-encode user ID in Synapse admin API example
- Note that device deletion may invalidate the access token
- Add "stop Synapse first" caveat for direct SQLite approach
- Mention the fail-closed startup detection behavior
- Add back-reference from upgrade section to OTK warning
* refactor(matrix): cleanup from code review
- Extract _extract_server_ed25519() and _reverify_keys_after_upload()
to deduplicate the re-verification block (was copy-pasted in two
places, three copies of ed25519 key extraction total)
- Remove dead code: _pending_megolm, _retry_pending_decryptions,
_MAX_PENDING_EVENTS, _PENDING_EVENT_TTL — all orphaned after
removing _on_encrypted_event
- Remove tautological TestMediaCacheGate (tested its own predicate,
not production code)
- Remove dead TestMatrixMegolmEventHandling and
TestMatrixRetryPendingDecryptions (tested removed methods)
- Merge duplicate TestMatrixStopTyping into TestMatrixTypingIndicator
- Trim comment to just the "why"
Users (Teknium) report missing debug reports before the 1-hour auto-delete
fires. 6 hours gives enough window for async bug-report triage without
leaving sensitive log data on public paste services indefinitely.
Applies to both the CLI (hermes debug share) and gateway (/debug) paths.
Initialize next_channel_prompt before the pending_event check and use
getattr with None default, matching the existing pattern for
next_source/next_message/next_message_id. Prevents AttributeError
when pending_event is None (interrupt path).
Cherry-picked from #10953 by @jackjin1997.
Switch from fragile Markdown V1 to HTML parse mode with html.escape()
for exec approval messages. Add fallback to text-based approval when
the formatted send fails.
Cherry-picked from #10999 by @danieldoderlein.
config.yaml terminal.cwd is now the single source of truth for working
directory. MESSAGING_CWD and TERMINAL_CWD in .env are deprecated with a
migration warning.
Changes:
1. config.py: Remove MESSAGING_CWD from OPTIONAL_ENV_VARS (setup wizard
no longer prompts for it). Add warn_deprecated_cwd_env_vars() that
prints a migration hint when deprecated env vars are detected.
2. gateway/run.py: Replace all MESSAGING_CWD reads with TERMINAL_CWD
(which is bridged from config.yaml terminal.cwd). MESSAGING_CWD is
still accepted as a backward-compat fallback with deprecation warning.
Config bridge skips cwd placeholder values so they don't clobber
the resolved TERMINAL_CWD.
3. cli.py: Guard against lazy-import clobbering — when cli.py is
imported lazily during gateway runtime (via delegate_tool), don't
let load_cli_config() overwrite an already-resolved TERMINAL_CWD
with os.getcwd() of the service's working directory. (#10817)
4. hermes_cli/main.py: Add 'hermes memory reset' command with
--target all/memory/user and --yes flags. Profile-scoped via
HERMES_HOME.
Migration path for users with .env settings:
Remove MESSAGING_CWD / TERMINAL_CWD from .env
Add to config.yaml:
terminal:
cwd: /your/project/path
Addresses: #10225, #4672, #10817, #7663
When execute_code times out, the result JSON had status="timeout" and an
error field, but the output field was empty. Many models treat empty
output as "nothing happened" and produce an empty/minimal response. The
gateway stream consumer then considers the response "already sent" (from
pre-tool streaming) and silently drops it — leaving the user staring at
silence.
Three changes:
1. Include the timeout message in the output field (both local and remote
paths) so the model always has visible content to relay to the user.
2. Add periodic activity callbacks to the local execution polling loop so
the gateway's inactivity monitor knows execute_code is alive during
long runs.
3. Fix stream_consumer._send_fallback_final to not silently drop content
when the continuation appears empty but the final text differs from
what was previously streamed (e.g. after a tool boundary reset).
When the LLM returns an empty completion, gateway/run.py replaced
final_response with the literal string '(No response generated)'.
This defeated cron/scheduler.py's empty-response skip guard, causing
the placeholder to be delivered to home channels.
Changes:
- gateway/run.py: return empty string instead of placeholder when
there is no error and no response content
- cron/scheduler.py: defensively strip the placeholder text in case
any upstream path still produces it
FixesNousResearch/hermes-agent#9270
The cancellation handler previously promoted any partial send
(already_sent=True) to final_response_sent=True unconditionally.
This meant if intermediate text (e.g. 'Let me search…') was streamed
and the consumer was cancelled before delivering the actual answer,
the gateway's suppression check would still prevent the fallback send.
Now final_response_sent is only set in the cancellation path when:
- The best-effort send of accumulated content actually succeeded, OR
- It was already confirmed before cancellation
Companion fix for PR #11000's run.py changes — closes the
cancellation-path loophole that would otherwise let partial streams
suppress final delivery during queued follow-ups.
All 10 call sites in gateway/run.py and gateway/platforms/api_server.py
are inside async functions where a loop is guaranteed to be running.
get_event_loop() is deprecated since Python 3.10 — it can silently
create a new loop when none is running, masking bugs.
get_running_loop() raises RuntimeError instead, which is safer.
Surfaced during review of PRs #10533 and #10647.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Each top-level Slack DM now gets its own Hermes session, matching the
per-thread behavior channels already have. Previously all top-level DM
messages shared one continuous session because thread_ts was None,
causing context to accumulate across unrelated conversations.
The behavior is controlled by platforms.slack.extra.dm_top_level_threads_as_sessions
in config.yaml (default: true). Set to false to restore legacy behavior.
Based on PR #10789 by helix4u. Changes from original:
- Default flipped to true (was opt-in, now opt-out)
- Removed env var fallback (config.yaml only per project policy)
- Tests updated to cover both default and opt-out paths
Bump connect retry attempts from 3 to 8 and cap exponential backoff at
15 seconds. Old budget: 3 attempts, 1+2+4=7s total — insufficient for
cold boot on slow networks or embedded devices. New budget: 8 attempts,
1+2+4+8+15+15+15=~60s total.
Inspired by PR #5770 by @Bartok9 (re-implemented against current main
since original was 913 commits stale with conflicts).
Three targeted fixes for the 'agent stuck on terminal command' report:
1. **Concurrent tool wait loop now checks interrupts** (run_agent.py)
The sequential path checked _interrupt_requested before each tool call,
but the concurrent path's wait loop just blocked with 30s timeouts.
Now polls every 5s and cancels pending futures on interrupt, giving
already-running tools 3s to notice the per-thread interrupt signal.
2. **Cancelled concurrent tools get proper interrupt messages** (run_agent.py)
When a concurrent tool is cancelled or didn't return a result due to
interrupt, the tool result message says 'skipped due to user interrupt'
instead of a generic error.
3. **Typing indicator fires before follow-up turn** (gateway/run.py)
After an interrupt is acknowledged and the pending message dequeued,
the gateway now sends a typing indicator before starting the recursive
_run_agent call. This gives the user immediate visual feedback that
the system is processing their new message (closing the perceived
'dead air' gap between the interrupt ack and the response).
Reported by @_SushantSays.
Fixes 12 CI test failures:
1. test_cli_new_session (4): _FakeAgent missing commit_memory_session
attribute added in the memory provider refactoring. Added MagicMock.
2. test_run_progress_topics (1): already_sent detection only checked
stream consumer flags, missing the response_previewed path from
interim_assistant_callback. Restructured guard to check both paths.
3. test_timezone (1): HERMES_TIMEZONE leaked into child processes via
_SAFE_ENV_PREFIXES matching HERMES_*. The code correctly converts
it to TZ but didn't remove the original. Added child_env.pop().
4. test_session_env (1): contextvars baseline captured from a different
context couldn't be restored after clear. Changed assertion to verify
the test's value was removed rather than comparing to a fragile baseline.
5. test_discord_slash_commands (5): already fixed on current main.
Gateway executor work now inherits the active session contextvars via
copy_context() so background process watchers retain the correct
platform/chat/user/session metadata for routing completion events back
to the originating chat.
Cherry-picked from #10647 by @helix4u with:
- Use asyncio.get_running_loop() instead of deprecated get_event_loop()
- Strip trailing whitespace
- Add *args forwarding test
- Add exception propagation test
In Telegram forum-enabled groups, the General topic does not include
message_thread_id in incoming messages (it is None). This caused:
1. Messages in General losing thread context — replies went to wrong place
2. Typing indicator failing because thread_id=1 was rejected by Telegram
Fix: synthesize thread_id="1" for forum groups when message_thread_id
is None, then handle it correctly per operation:
- send: omit message_thread_id (Telegram rejects thread_id=1 for sends)
- typing: pass thread_id=1, retry without it on "thread not found"
Also centralizes thread_id extraction into _metadata_thread_id() across
all send methods (send, send_voice, send_image, send_document, send_video,
send_animation, send_photo), replacing ~10 duplicate patterns.
Salvaged from PR #7892 by @corazzione.
Closes#7877, closes#7519.
Pass platform_env_var="TELEGRAM_PROXY" to resolve_proxy_url() in both
telegram.py (main connect) and telegram_network.py (fallback transport),
so a Telegram-specific proxy takes priority over the generic HTTPS_PROXY.
Also bridge telegram.proxy_url from config.yaml to the TELEGRAM_PROXY
env var (env var takes precedence if both are set), add OPTIONAL_ENV_VARS
entry, docs, and tests.
Composite salvage of four community PRs:
- Core approach (both call sites): #9414 by @leeyang1990
- config.yaml bridging + docs: #6530 by @WhiteWorld
- Naming convention: #9074 by @brantzh6
- Earlier proxy work: #7786 by @ten-ltw
Closes#9414, closes#9074, closes#7786, closes#6530
Co-authored-by: WhiteWorld <WhiteWorld@users.noreply.github.com>
Co-authored-by: brantzh6 <brantzh6@users.noreply.github.com>
Co-authored-by: ten-ltw <ten-ltw@users.noreply.github.com>
The command preview and description were wrapped in Markdown v1 inline
code (backticks) without escaping, causing Telegram API parse errors
when the command itself contained backticks or asterisks.
Fixes: 'Can't parse entities: can't find end of the entity'
Telegram on iOS auto-converts double hyphens (--) to em dashes (—)
or en dashes (–) via autocorrect. This breaks /model flag parsing
since parse_model_flags() only recognizes literal '--provider' and
'--global'.
When the flag isn't parsed, the entire string (e.g. 'glm-5.1 —provider zai')
gets treated as the model name and fails with 'Model names cannot
contain spaces.'
Fix: normalize Unicode dashes (U+2012-U+2015) to '--' when they
appear before flag keywords (provider, global), before flag extraction.
The existing test suite in test_model_switch_provider_routing.py
already covers all four dash variants — this commit adds the code
that makes them pass.
Replace inline Path.home() / '.hermes' / 'profiles' detection in both CLI
and gateway /profile handlers with the existing get_active_profile_name()
from hermes_cli.profiles — which already handles custom-root deployments,
standard profiles, and Docker layouts.
Fixes /profile incorrectly reporting 'default' when HERMES_HOME points to
a custom-root profile path like /opt/data/profiles/coder.
Based on PR #10484 by Xowiek.
Background review notifications ("💾 Skill created", "💾 Memory updated")
could race ahead of the main assistant reply in chat, making it look like
the agent stopped after creating a skill.
Gate bg-review notifications behind a threading.Event + pending queue.
Register a release callback on the adapter's _post_delivery_callbacks dict
so base.py's finally block fires it after the main response is delivered.
The queued-message path in _run_agent pops and calls the callback directly
to prevent double-fire.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Closes#10541
WecomCallbackAdapter declared a _seen_messages dict and
MESSAGE_DEDUP_TTL_SECONDS constant but never actually checked
them in _handle_callback(). WeCom retries callback deliveries
on timeout, and each retry with the same MsgId was treated as
a fresh message and queued for processing.
Fix: check _seen_messages before enqueuing. Uses the same TTL-
based pattern as MessageDeduplicator (fixed in #10306) — check
age before returning duplicate, prune on overflow.
Closes#10305
Extract resolve_channel_prompt() shared helper into
gateway/platforms/base.py. Refactor Discord to use it.
Wire channel_prompts into Telegram (groups + forum topics),
Slack (channels), and Mattermost (channels).
Config bridging now applies to all platforms (not just Discord).
Added channel_prompts defaults to telegram/slack/mattermost
config sections.
Docs added to all four platform pages with platform-specific
examples (topic inheritance for Telegram, channel IDs for Slack,
etc.).
- Remove double str() normalization in _resolve_channel_prompt since
config bridging already handles numeric YAML key conversion
- Remove dead prompts.get(str(key)) fallback that could never match
after keys were already normalized to strings
- Replace getattr(event, "channel_prompt", None) with direct attribute
access since channel_prompt is a declared dataclass field
- Update test to verify normalization responsibility lives in config bridging
_parse_session_key() blindly assigned parts[5] as thread_id for all
chat types. For group sessions with per-user isolation, parts[5] is
a user_id, not a thread_id. This could cause shutdown notifications
to route with incorrect thread metadata.
Only return thread_id for chat types where the 6th element is
unambiguous: dm and thread. For group/channel sessions, omit
thread_id since the suffix may be a user_id.
Based on the approach from PR #9938 by @Ruzzgar.
Commentary messages (interim assistant status updates like "Using browser
tool...") are sent via _send_commentary(), which was incorrectly setting
_already_sent = True on success. This caused the final response to be
suppressed when there were multiple tool calls, because the gateway checks
already_sent to decide whether to skip re-sending the response.
The fix: commentary messages are interim status updates, not the final
response, so _already_sent should not be set when they succeed. This
ensures the final response is always delivered regardless of how many
commentary messages were sent during the turn.
Fixes: #10454
After clear_session_vars() reset contextvars to their default (''),
get_session_env() treated the empty string as falsy and fell through
to os.environ — resurrecting stale HERMES_SESSION_* values from CLI
startup, cron, or previous sessions. This broke session isolation
in the gateway where concurrent messages could see each other's
stale environment values.
Fix: use a sentinel (_UNSET) as the contextvar default instead of ''.
get_session_env() now checks 'value is not _UNSET' instead of
truthiness. Three states are cleanly distinguished:
- _UNSET (never set): fall back to os.environ (CLI/cron compat)
- '' (explicitly cleared): return '' — no os.environ fallback
- 'telegram' (actively set): return the value
clear_session_vars() now uses var.set('') instead of var.reset(token)
to mark vars as explicitly cleared rather than reverting to _UNSET.
Closes#10304
When a model (e.g. mimo-v2-pro) streams intermediate text alongside tool
calls ("Let me search for that") but then returns empty after processing
tool results, the stream consumer already_sent flag is True from the
earlier text delivery. The gateway suppression check
(already_sent=True, failed=False → return None) would swallow the final
response, leaving the user staring at silence after the search.
Two changes:
1. gateway/run.py return path: skip already_sent suppression when the
final_response is "(empty)" or empty — the user needs to know the
agent finished even if streaming sent partial content earlier.
2. gateway/run.py response handler: convert the internal "(empty)"
sentinel to a user-friendly warning instead of delivering the raw
sentinel string.
Tests added for all empty/None/sentinel cases plus preserved existing
suppression behavior for normal non-empty responses.
Discord's _register_slash_commands() had a hardcoded list of ~27 commands
while COMMAND_REGISTRY defines 34+ gateway-available commands. Missing
commands (debug, branch, rollback, snapshot, profile, yolo, fast, reload,
commands) were invisible in Discord's / autocomplete — users couldn't
discover them.
Add a dynamic catch-all loop after the explicit registrations that
iterates COMMAND_REGISTRY, skips already-registered commands, and
auto-registers the rest using discord.app_commands.Command(). Commands
with args_hint get an optional string parameter; parameterless commands
get a simple callback.
This ensures any future commands added to COMMAND_REGISTRY automatically
appear on Discord without needing a manual entry in discord.py.
Telegram and Slack already derive dynamically from COMMAND_REGISTRY
via telegram_bot_commands() and slack_subcommand_map() — no changes
needed there.
- Pastes uploaded by /debug now auto-delete after 1 hour via a detached
background process that sends DELETE to paste.rs
- CLI: shows privacy notice listing what data will be uploaded
- Gateway: only uploads summary report (system info + log tails), NOT
full log files containing conversation content
- Added 'hermes debug delete <url>' for immediate manual deletion
- 16 new tests covering auto-delete scheduling, paste deletion, privacy
notices, and the delete subcommand
Addresses user privacy concern where /debug uploaded full conversation
logs to a public paste service with no warning or expiry.
Two gateway fixes:
1. MessageDeduplicator.is_duplicate() now checks TTL at query time (#10306)
Previously, is_duplicate() returned True for any previously seen ID
without checking its age — expired entries were only purged when cache
size exceeded max_size. On normal workloads that never overflow, message
IDs stayed deduplicated forever instead of expiring after the TTL.
Fix: check `now - timestamp < ttl` before returning True. Expired
entries are removed and treated as new messages.
2. Gateway --config flag now uses yaml.safe_load() (#10216)
The --config CLI flag in gateway/run.py main() used json.load() to
parse config files. YAML is the only documented config format and
every other config loader uses yaml.safe_load(). A YAML config file
passed via --config would crash with json.JSONDecodeError.
Closes#10306Closes#10216
_parse_session_key() now extracts the optional 6th part (thread_id) from
session keys, and _notify_active_sessions_of_shutdown uses _parsed.get()
instead of the removed 'parts' variable. Without this, shutdown notifications
silently failed (NameError caught by try/except) and forum topic routing
was lost.
- Populate watcher_* routing fields for watch-only processes (not just
notify_on_complete), so watch-pattern events carry direct metadata
instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
(empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
Three fixes for the duplicate reply bug affecting all gateway platforms:
1. base.py: Suppress stale response when the session was interrupted by a
new message that hasn't been consumed yet. Checks both interrupt_event
and _pending_messages to avoid false positives. (#8221, #2483)
2. run.py (return path): Remove response_previewed guard from already_sent
check. Stream consumer's already_sent alone is authoritative — if
content was delivered via streaming, the duplicate send must be
suppressed regardless of the agent's response_previewed flag. (#8375)
3. run.py (queued-message path): Same fix — already_sent without
response_previewed now correctly marks the first response as already
streamed, preventing re-send before processing the queued message.
The response_previewed field is still produced by the agent (run_agent.py)
but is no longer required as a gate for duplicate suppression. The stream
consumer's already_sent flag is the delivery-level truth about what the
user actually saw.
Concepts from PR #8380 (konsisumer). Closes#8375, #8221, #2483.
Three independent fixes:
1. Reset activity timestamp on cached agent reuse (#9051)
When the gateway reuses a cached AIAgent for a new turn, the
_last_activity_ts from the previous turn (possibly hours ago)
carried over. The inactivity timeout handler immediately saw
the agent as idle for hours and killed it.
Fix: reset _last_activity_ts, _last_activity_desc, and
_api_call_count when retrieving an agent from the cache.
2. Detect uv-managed virtual environments (#8620 sub-issue 1)
The systemd unit generator fell back to sys.executable (uv's
standalone Python) when running under 'uv run', because
sys.prefix == sys.base_prefix (uv doesn't set up traditional
venv activation). The generated ExecStart pointed to a Python
binary without site-packages, crashing the service on startup.
Fix: check VIRTUAL_ENV env var before falling back to
sys.executable. uv sets VIRTUAL_ENV even when sys.prefix
doesn't reflect the venv.
3. Nudge model to continue after empty post-tool response (#9400)
Weaker models (GLM-5, mimo-v2-pro) sometimes return empty
responses after tool calls instead of continuing to the next
step. The agent silently abandoned the remaining work with
'(empty)' or used prior-turn fallback text.
Fix: when the model returns empty after tool calls AND there's
no prior-turn content to fall back on, inject a one-time user
nudge message telling the model to process the tool results and
continue. The flag resets after each successful tool round so it
can fire again on later rounds.
Test plan: 97 gateway + CLI tests pass, 9 venv detection tests pass
When a user sends a message while the agent is executing a task on the
gateway, the agent is now interrupted immediately — not silently queued.
Previously, messages were stored in _pending_messages with zero feedback
to the user, potentially leaving them waiting 1+ hours.
Root cause: Level 1 guard (base.py) intercepted all messages for active
sessions and returned with no response. Level 2 (gateway/run.py) which
calls agent.interrupt() was never reached.
Fix: Expand _handle_active_session_busy_message to handle the normal
(non-draining) case:
1. Call running_agent.interrupt(text) to abort in-flight tool calls
and exit the agent loop at the next check point
2. Store the message as pending so it becomes the next turn once the
interrupted run returns
3. Send a brief ack: 'Interrupting current task (10 min elapsed,
iteration 21/60, running: terminal). I'll respond shortly.'
4. Debounce acks to once per 30s to avoid spam on rapid messages
Reported by @Lonely__MH.
When compression fails after max attempts, the agent returns
{completed: False, partial: True} but was missing the 'failed' flag.
The gateway's agent_failed_early guard checked for 'failed' AND
'not final_response', but _run_agent_blocking always converts errors
to final_response — making the guard dead code. This caused the
oversized session to persist, creating an infinite fail loop where
every subsequent message hits the same compression failure.
Changes:
- run_agent.py: add 'failed: True' and 'compression_exhausted: True'
to all 5 compression-exhaustion return paths
- gateway/run.py (_run_agent_blocking): forward 'failed' and
'compression_exhausted' flags through to the caller
- gateway/run.py (_handle_message_with_agent): fix agent_failed_early
to check bool(failed) without the broken 'not final_response' clause;
auto-reset the session when compression is exhausted so the next
message starts fresh
- Update tests to match new guard logic and add
TestCompressionExhaustedFlag test class
Closes#9893
The /v1/responses endpoint generated a new UUID session_id for every
request, even when previous_response_id was provided. This caused each
turn of a multi-turn conversation to appear as a separate session on the
web dashboard, despite the conversation history being correctly chained.
Fix: store session_id alongside the response in the ResponseStore, and
reuse it when a subsequent request chains via previous_response_id.
Applies to both the non-streaming /v1/responses path and the streaming
SSE path. The /v1/runs endpoint also gains session continuity from
stored responses (explicit body.session_id still takes priority).
Adds test verifying session_id is preserved across chained requests.
* fix: hermes gateway restart waits for service to come back up (#8260)
Previously, systemd_restart() sent SIGUSR1 to the gateway, printed
'restart requested', and returned immediately. The gateway still
needed to drain active agents, exit with code 75, wait for systemd's
RestartSec=30, and start the new process. The user saw 'success' but
the gateway was actually down for 30-60 seconds.
Now the SIGUSR1 path blocks with progress feedback:
Phase 1 — wait for old process to die:
⏳ User service draining active work...
Polls os.kill(pid, 0) until ProcessLookupError (up to 90s)
Phase 2 — wait for new process to become active:
⏳ Waiting for hermes-gateway to restart...
Polls systemctl is-active + verifies new PID (up to 60s)
Success:
✓ User service restarted (PID 12345)
Timeout:
⚠ User service did not become active within 60s.
Check status: hermes gateway status
Check logs: journalctl --user -u hermes-gateway --since '2 min ago'
The reload-or-restart fallback path (line 1189) already blocks because
systemctl reload-or-restart is synchronous.
Test plan:
- Updated test to verify wait-for-restart behavior
- All 118 gateway CLI tests pass
* fix: add 402 billing error hint to gateway error handler (#5220)
The gateway's exception handler for agent errors had specific hints for
HTTP 401, 429, 529, 400, 500 — but not 402 (Payment Required / quota
exhausted). Users hitting billing limits from custom proxy providers
got a generic error with no guidance.
Added: 'Your API balance or quota is exhausted. Check your provider
dashboard.'
The underlying billing classification (error_classifier.py) already
correctly handles 402 as FailoverReason.billing with credential
rotation and fallback. The original issue (#5220) where 402 killed
the entire gateway was from an older version — on current main, 402
is excluded from the is_client_error abort path (line 9460) and goes
through the proper retry/fallback/fail flow. Combined with PR #9875
(auto-recover from unexpected SIGTERM), even edge cases where the
gateway dies are now survivable.
The streaming path emits output as content-part arrays for Open WebUI
compatibility, but the batch (non-streaming) Responses API path must
return output as a plain string per the OpenAI Responses API spec.
Reverts the _extract_output_items change from the cherry-picked commits
while preserving the streaming path's array format.
When a session gets stuck (hung terminal, runaway tool loop) and the
user restarts the gateway, the same session history loads and puts the
agent right back in the stuck state. The user is trapped in a loop:
restart → stuck → restart → stuck.
Fix: track restart-failure counts per session using a simple JSON file
(.restart_failure_counts). On each shutdown with active agents, the
counter increments for those sessions. On startup, if any session has
been active across 3+ consecutive restarts, it's auto-suspended —
giving the user a clean slate on their next message.
The counter resets to 0 when a session completes a turn successfully
(response delivered), so normal sessions that happen to be active
during planned restarts (/restart, hermes update) won't accumulate
false counts.
Implementation:
- _increment_restart_failure_counts(): called during stop() when
agents are active. Writes {session_key: count} to JSON file.
Sessions NOT active are dropped (loop broken).
- _suspend_stuck_loop_sessions(): called on startup. Reads the file,
suspends sessions at threshold (3), clears the file.
- _clear_restart_failure_count(): called after successful response
delivery. Removes the session from the counter file.
No SessionEntry schema changes. No database migration. Pure file-based
tracking that naturally cleans up.
Test plan:
- 9 new stuck-loop tests (increment, accumulate, threshold, clear,
suspend, file cleanup, edge cases)
- All 28 gateway lifecycle tests pass (restart drain + auto-continue
+ stuck loop)
When the gateway restarts mid-agent-work, the session transcript ends
on a tool result the agent never processed. Previously, the user had
to type 'continue' or use /retry (which replays from scratch, losing
all prior work).
Now, when the next user message arrives and the loaded history ends
with role='tool', a system note is prepended:
[System note: Your previous turn was interrupted before you could
process the last tool result(s). Please finish processing those
results and summarize what was accomplished, then address the
user's new message below.]
This is injected in _run_agent()'s run_sync closure, right before
calling agent.run_conversation(). The agent sees the full history
(including the pending tool results) and the system note, so it can
summarize what was accomplished and then handle the user's new input.
Design decisions:
- No new session flags or schema changes — purely detects trailing
tool messages in the loaded history
- Works for any restart scenario (clean, crash, SIGTERM, drain timeout)
as long as the session wasn't suspended (suspended = fresh start)
- The user's actual message is preserved after the note
- If the session WAS suspended (unclean shutdown), the old history is
abandoned and the user starts fresh — no false auto-continue
Also updates the shutdown notification message from 'Use /retry after
restart to continue' to 'Send any message after restart to resume
where it left off' — which is now accurate.
Test plan:
- 6 new auto-continue tests (trailing tool detection, no false
positives for assistant/user/empty history, multi-tool, message
preservation)
- All 13 restart drain tests pass (updated /retry assertion)
Instead of consuming one top-level slash command slot per skill (hitting the
100-command limit with ~26 built-ins + 74 skills), skills are now organized
under a single /skill group command with category-based subcommand groups:
/skill creative ascii-art [args]
/skill media gif-search [args]
/skill mlops axolotl [args]
Discord supports 25 subcommand groups × 25 subcommands = 625 max skills,
well beyond the previous 74-slot ceiling.
Categories are derived from the skill directory structure:
- skills/creative/ascii-art/ → category 'creative'
- skills/mlops/training/axolotl/ → category 'mlops' (top-level parent)
- skills/dogfood/ → uncategorized (direct subcommand)
Changes:
- hermes_cli/commands.py: add discord_skill_commands_by_category() with
category grouping, hub/disabled filtering, Discord limit enforcement
- gateway/platforms/discord.py: replace top-level skill registration with
_register_skill_group() using app_commands.Group hierarchy
- tests: 7 new tests covering group creation, category grouping,
uncategorized skills, hub exclusion, deep nesting, empty skills,
and handler dispatch
Inspired by Discord community suggestion from bottium.
When the gateway receives SIGTERM/SIGINT, the shutdown handler now
runs 'ps aux' and logs every hermes/gateway-related process (excluding
itself). This will show in agent.log as:
WARNING: Shutdown diagnostic — other hermes processes running:
hermes 1234 ... hermes update --gateway
hermes 5678 ... hermes gateway restart
This is the missing diagnostic for #5646 / #6666 — we can prove
the restarts are from systemctl but can't determine WHO issues the
systemctl command. Next time it happens, the agent.log will contain
the evidence (the process that sent the signal or called systemctl
should still be alive when the handler fires).
The dashboard's gateway status detection relied solely on local PID checks
(os.kill + /proc), which fails when the gateway runs in a separate container.
Changes:
- web_server.py: Add _probe_gateway_health() that queries the gateway's HTTP
/health/detailed endpoint when the local PID check fails. Activated by
setting the GATEWAY_HEALTH_URL env var (e.g. http://gateway:8642/health).
Falls back to standard PID check when the env var is not set.
- api_server.py: Add GET /health/detailed endpoint that returns full gateway
state (platforms, gateway_state, active_agents, pid, etc.) without auth.
The existing GET /health remains unchanged for backwards compatibility.
- StatusPage.tsx: Handle the case where gateway_pid is null but the gateway
is running remotely, displaying 'Running (remote)' instead of 'PID null'.
Environment variables:
- GATEWAY_HEALTH_URL: URL of the gateway health endpoint (e.g.
http://gateway-container:8642/health). Unset = local PID check only.
- GATEWAY_HEALTH_TIMEOUT: Probe timeout in seconds (default: 3).
Root cause: when the gateway received SIGTERM (from hermes update,
external kill, WSL2 runtime, etc.), it exited with status 0. systemd's
Restart=on-failure only restarts on non-zero exit, so the gateway
stayed dead permanently. Users had to manually restart.
Fix 1: Signal-initiated shutdown exits non-zero
When SIGTERM/SIGINT is received and no restart was requested (via
/restart, /update, or SIGUSR1), start_gateway() returns False which
causes sys.exit(1). systemd sees a failure exit and auto-restarts
after RestartSec=30.
This is safe because systemctl stop tracks its own stop-requested
state independently of exit code — Restart= never fires for a
deliberate stop, regardless of exit code.
Also logs 'Received SIGTERM/SIGINT — initiating shutdown' so the
cause of unexpected shutdowns is visible in agent.log.
Fix 2: PID file ownership guard
remove_pid_file() now checks that the PID file belongs to the current
process before removing it. During --replace handoffs, the old
process's atexit handler could fire AFTER the new process wrote its
PID file, deleting the new record. This left the gateway running but
invisible to get_running_pid(), causing 'Another gateway already
running' errors on next restart.
Test plan:
- All restart drain tests pass (13)
- All gateway service tests pass (84)
- All update gateway restart tests pass (34)
Feishu approval clicks need the resolved card to come back from the
synchronous callback path itself. Leaving approval resolution to the
generic asynchronous card-action flow made button feedback depend on
later loop work instead of the callback response the client is waiting
for.
Change-Id: I574997cbbcaa097fdba759b47367e28d1b56b040
Constraint: Feishu card-action callbacks must acknowledge quickly and reflect final approval state from the callback response path
Rejected: Keep approval handling on the generic async card-action route | leaves card state synchronization vulnerable to callback timing and follow-up update ordering
Confidence: high
Scope-risk: narrow
Reversibility: clean
Directive: Keep approval callback response construction separate from async queue unblocking unless Feishu callback semantics change
Tested: pytest tests/gateway/test_feishu.py tests/gateway/test_feishu_approval_buttons.py tests/gateway/test_approve_deny_commands.py tests/gateway/test_slack_approval_buttons.py tests/gateway/test_telegram_approval_buttons.py -q
Not-tested: Live Feishu workspace end-to-end callback rendering
Three fixes for gateway lifecycle stability:
1. Notify active sessions before shutdown (#new)
When the gateway receives SIGTERM or /restart, it now sends a
notification to every chat with an active agent BEFORE starting
the drain. Users see:
- Shutdown: 'Gateway shutting down — your task will be interrupted.'
- Restart: 'Gateway restarting — use /retry after restart to continue.'
Deduplicates per-chat so group sessions with multiple users get
one notification. Best-effort: send failures are logged and swallowed.
2. Skip .clean_shutdown marker when drain timed out
Previously, a graceful SIGTERM always wrote .clean_shutdown, even if
agents were force-interrupted when the drain timed out. This meant
the next startup skipped session suspension, leaving interrupted
sessions in a broken state (trailing tool response, no final message).
Now the marker is only written if the drain completed without timeout,
so interrupted sessions get properly suspended on next startup.
3. Post-restart health check for hermes update (#6631)
cmd_update() now verifies the gateway actually survived after
systemctl restart (sleep 3s + is-active check). If the service
crashed immediately, it retries once. If still dead, prints
actionable diagnostics (journalctl command, manual restart hint).
Also closes#8104 — already fixed on main (the /restart handler
correctly detects systemd via INVOCATION_ID and uses via_service=True).
Test plan:
- 6 new tests for shutdown notifications (dedup, restart vs shutdown
messaging, sentinel filtering, send failure resilience)
- Existing restart drain + update tests pass (47 total)
When BlueBubbles posts webhook events to the adapter, it uses the exact
URL registered via /api/v1/webhook — and BB's registration API does not
support custom headers. The adapter currently registers the bare URL
(no credentials), but then requires password auth on inbound POSTs,
rejecting every webhook with HTTP 401.
This is masked on fresh BB installs by a race condition: the webhook
might register once with a prior (possibly patched) URL and keep working
until the first restart. On v0.9.0, _unregister_webhook runs on clean
shutdown, so the next startup re-registers with the bare URL and the
401s begin. Users see the bot go silent with no obvious cause.
Root cause: there's no way to pass auth credentials from BB to the
webhook handler except via the URL itself. BB accepts query params and
preserves them on outbound POSTs.
## Fix
Introduce `_webhook_register_url` — the URL handed to BB's registration
API, with the configured password appended as a `?password=<value>`
query param. The existing webhook auth handler already accepts this
form (it reads `request.query.get("password")`), so no change to the
receive side is needed.
The bare `_webhook_url` is still used for logging and for binding the
local listener, so credentials don't leak into log output. Only the
registration/find/unregister paths use the password-bearing form.
## Notes
- Password is URL-encoded via urllib.parse.quote, handling special
characters (&, *, @, etc.) that would otherwise break parsing.
- Storing the password in BB's webhook table is not a new disclosure:
anyone with access to that table already has the BB admin password
(same credential used for every other API call).
- If `self.password` is empty (no auth configured), the register URL
is the bare URL — preserves current behavior for unauthenticated
local-only setups.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
BlueBubbles v1.9+ webhook payloads for new-message events do not always
include a top-level chatGuid field on the message data object. Instead,
the chat GUID is nested under data.chats[0].guid.
The adapter currently checks five top-level fallback locations (record and
payload, snake_case and camelCase, plus payload.guid) but never looks
inside the chats array. When none of those top-level fields contain the
GUID, the adapter falls through to using the sender's phone/email as the
session chat ID.
This causes two observable bugs when a user is a participant in both a DM
and a group chat with the bot:
1. DM and group sessions merge. Every message from that user ends up with
the same session_chat_id (their own address), so the bot cannot
distinguish which thread the message came from.
2. Outbound routing becomes ambiguous. _resolve_chat_guid() iterates all
chats and returns the first one where the address appears as a
participant; group chats typically sort ahead of DMs by activity, so
replies and cron messages intended for the DM can land in a group.
This was observed in production: a user's morning brief cron delivered to
a group chat with his spouse instead of his DM thread.
The fix adds a single fallback that extracts chat_guid from
record["chats"][0]["guid"] when the top-level fields are empty. The chats
array is included in every new-message webhook payload in BB v1.9.9
(verified against a live server). It is backwards compatible: if a future
BB version starts including chatGuid at the top level, that still wins.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The BlueBubbles adapter registers its webhook with three events:
["new-message", "updated-message", "message"]. The third, "message",
is not a valid event type in the BlueBubbles server API — BB rejects
the registration payload with HTTP 400 Bad Request.
Currently this is masked by the "crash resilience" check in
_register_webhook, which reuses any existing registration matching the
webhook URL and short-circuits before reaching the API call. So an
already-registered webhook from a prior run keeps working. But any fresh
install, or any restart after _unregister_webhook has run during a clean
shutdown, fails to re-register and silently stops receiving messages.
Observed in production: after a gateway restart in v0.9.0 (which auto-
unregisters on shutdown), the next startup hit this 400 and the bot went
silent until the invalid event was removed.
BlueBubbles documents "new-message" and "updated-message" as the message
event types (see https://docs.bluebubbles.app/). There is no "message"
event, and no harm in dropping it — the two remaining events cover all
inbound message webhooks.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When GATEWAY_PROXY_URL (or gateway.proxy_url in config.yaml) is set,
the gateway becomes a thin relay: it handles platform I/O (encryption,
threading, media) and delegates all agent work to a remote Hermes API
server via POST /v1/chat/completions with SSE streaming.
This enables the primary use case of running a Matrix E2EE gateway in
Docker on Linux while the actual agent runs on the host (e.g. macOS)
with full access to local files, memory, skills, and a unified session
store. Works for any platform adapter, not just Matrix.
Configuration:
- GATEWAY_PROXY_URL env var (Docker-friendly)
- gateway.proxy_url in config.yaml
- GATEWAY_PROXY_KEY env var for API auth (matches API_SERVER_KEY)
- X-Hermes-Session-Id header for session continuity
Architecture:
- _get_proxy_url() checks env var first, then config.yaml
- _run_agent_via_proxy() handles HTTP forwarding with SSE streaming
- _run_agent() delegates to proxy path when URL is configured
- Platform streaming (GatewayStreamConsumer) works through proxy
- Returns compatible result dict for session store recording
Files changed:
- gateway/run.py: proxy mode implementation (~250 lines)
- hermes_cli/config.py: GATEWAY_PROXY_URL + GATEWAY_PROXY_KEY env vars
- tests/gateway/test_proxy_mode.py: 17 tests covering config
resolution, dispatch, HTTP forwarding, error handling, message
filtering, and result shape validation
Closes discussion from Cars29 re: Matrix gateway mixed-mode issue.
The /new and /reset commands were not calling shutdown_memory_provider()
on the cached agent before eviction. This caused OpenViking (and any
memory provider that relies on session-end shutdown) to skip commit,
leaving memories un-indexed until idle timeout or gateway shutdown.
Add the missing shutdown_memory_provider() call in _handle_reset_command(),
matching the behavior already present in the session expiry watcher.
Fixes#7759
During rapid tool-calling, the model often emits 1-2 tokens before
switching to tool calls. The stream consumer would create a new message
with 'X ▉' (short text + cursor), and if the follow-up edit to strip
the cursor was rate-limited by the platform, the cursor remained as
a permanent standalone message — reported on Telegram as 'white box'
artifacts.
Add a minimum-content guard in _send_or_edit: when creating a new
standalone message (no existing message_id), require at least 4
visible characters alongside the cursor before sending. Shorter text
accumulates into the next streaming segment instead.
This prevents cursor-only 'tofu' messages across all platforms without
affecting normal streaming (edits to existing messages, final sends
without cursor, and messages with substantial text are all unaffected).
Reported by @michalkomar on X.
Production fixes:
- Add clear_session_context() to hermes_logging.py (fixes 48 teardown errors)
- Add clear_session() to tools/approval.py (fixes 9 setup errors)
- Add SyncError M_UNKNOWN_TOKEN check to Matrix _sync_loop (bug fix)
- Fall back to inline api_key in named custom providers when key_env
is absent (runtime_provider.py)
Test fixes:
- test_memory_user_id: use builtin+external provider pair, fix honcho
peer_name override test to match production behavior
- test_display_config: remove TestHelpers for non-existent functions
- test_auxiliary_client: fix OAuth tokens to match _is_oauth_token
patterns, replace get_vision_auxiliary_client with resolve_vision_provider_client
- test_cli_interrupt_subagent: add missing _execution_thread_id attr
- test_compress_focus: add model/provider/api_key/base_url/api_mode
to mock compressor
- test_auth_provider_gate: add autouse fixture to clean Anthropic env
vars that leak from CI secrets
- test_opencode_go_in_model_list: accept both 'built-in' and 'hermes'
source (models.dev API unavailable in CI)
- test_email: verify email Platform enum membership instead of source
inspection (build_channel_directory now uses dynamic enum loop)
- test_feishu: add bot_added/bot_deleted handler mocks to _Builder
- test_ws_auth_retry: add AsyncMock for sync_store.get_next_batch,
add _pending_megolm and _joined_rooms to Matrix adapter mocks
- test_restart_drain: monkeypatch-delete INVOCATION_ID (systemd sets
this in CI, changing the restart call signature)
- test_session_hygiene: add user_id to SessionSource
- test_session_env: use relative baseline for contextvar clear check
(pytest-xdist workers share context)
Improvements from our earlier #8269 salvage work applied to #7616:
- Platform token lock: acquire_scoped_lock/release_scoped_lock prevents
two profiles from double-connecting the same QQ bot simultaneously
- Send retry with exponential backoff (3 attempts, 1s/2s/4s) with
permanent vs transient error classification (matches Telegram pattern)
- Proper long-message splitting via truncate_message() instead of
hard-truncating at MAX_MESSAGE_LENGTH (preserves code blocks, adds 1/N)
- REST-based one-shot send in send_message_tool — uses QQ Bot REST API
directly with httpx instead of creating a full WebSocket adapter per
message (fixes the connect→send race condition)
- Use shared strip_markdown() from helpers.py instead of 15 lines of
inline regex with import-inside-method (DRY, same as BlueBubbles/SMS)
- format_message() now wired into send() pipeline
- Add Platform.QQBOT to _UPDATE_ALLOWED_PLATFORMS (enables /update command)
- Add 'qqbot' to webhook cross-platform delivery routing
- Add 'qqbot' to hermes dump platform detection
- Fix test_name_property casing: 'QQBot' not 'QQBOT'
- Add _parse_qq_timestamp() for ISO 8601 + integer ms compatibility
(QQ API changed timestamp format — from PR #2411 finding)
- Wire timestamp parsing into all 4 message handlers
- Rename platform from 'qq' to 'qqbot' across all integration points
(Platform enum, toolset, config keys, import paths, file rename qq.py → qqbot.py)
- Add PLATFORM_HINTS for QQBot in prompt_builder (QQ supports markdown)
- Set SUPPORTS_MESSAGE_EDITING = False to skip streaming on QQ
(prevents duplicate messages from non-editable partial + final sends)
- Add _send_qqbot() standalone send function for cron/send_message tool
- Add interactive _setup_qq() wizard in hermes_cli/setup.py
- Restore missing _setup_signal/email/sms/dingtalk/feishu/wecom/wecom_callback
functions that were lost during the original merge
Models like MiniMax emit inline <think>...</think> reasoning blocks in
their content field. The CLI already suppresses these via a state machine
in _stream_delta, but the gateway's GatewayStreamConsumer had no
equivalent filtering — raw think blocks were streamed directly to
Discord/Telegram/Slack.
The fix adds a _filter_and_accumulate() method that mirrors the CLI's
approach: a state machine tracks whether we're inside a reasoning block
and silently discards the content. Includes the same block-boundary
check (tag must appear at line start or after whitespace-only prefix)
to avoid false positives when models mention <think> in prose.
Handles all tag variants: <think>, <thinking>, <THINKING>, <thought>,
<reasoning>, <REASONING_SCRATCHPAD>.
Also handles edge cases:
- Tags split across streaming deltas (partial tag buffering)
- Unclosed blocks (content suppressed until stream ends)
- Multiple consecutive blocks
- _flush_think_buffer on stream end for held-back partial tags
Adds 22 unit tests + 1 integration test covering all scenarios.
When the 5-second stream_task timeout in gateway/run.py expires (due to
slow Telegram API calls from rate limiting after several messages), the
stream consumer is cancelled via asyncio.CancelledError. The
CancelledError handler did a best-effort final edit but never set
final_response_sent, so the gateway fell through to the normal send path
and delivered the full response again as a reply — causing a duplicate.
The fix: in the CancelledError handler, set final_response_sent = True
when already_sent is True (i.e., the stream consumer had already
delivered content to the user). This tells the gateway's already_sent
check that the response was delivered, preventing the duplicate send.
Adds two tests verifying the cancellation behavior:
- Cancelled with already_sent=True → final_response_sent=True (no dup)
- Cancelled with already_sent=False → final_response_sent=False (normal
send path proceeds)
Reported by community user hume on Discord.
/stop was calling suspend_session() which marked the session for auto-reset
on the next message. This meant users lost their conversation history every
time they stopped a running agent — especially painful for untitled sessions
that can't be resumed by name.
Now /stop just interrupts the agent and cleans the session lock. The session
stays intact so users can continue the conversation.
The suspend behavior was introduced in #7536 to break stuck session resume
loops on gateway restart. That case is already handled by
suspend_recently_active() which runs at gateway startup, so removing it from
/stop doesn't regress the original fix.
The v11→v12 migration converts custom_providers (list) into providers
(dict), then deletes the list. But all runtime resolvers read from
custom_providers — after migration, named custom endpoints silently stop
resolving and fallback chains fail with AuthError.
Add get_compatible_custom_providers() that reads from both config schemas
(legacy custom_providers list + v12+ providers dict), normalizes entries,
deduplicates, and returns a unified list. Update ALL consumers:
- hermes_cli/runtime_provider.py: _get_named_custom_provider() + key_env
- hermes_cli/auth_commands.py: credential pool provider names
- hermes_cli/main.py: model picker + _model_flow_named_custom()
- agent/auxiliary_client.py: key_env + custom_entry model fallback
- agent/credential_pool.py: _iter_custom_providers()
- cli.py + gateway/run.py: /model switch custom_providers passthrough
- run_agent.py + gateway/run.py: per-model context_length lookup
Also: use config.pop() instead of del for safer migration, fix stale
_config_version assertions in tests, add pool mock to codex test.
Co-authored-by: 墨綠BG <s5460703@gmail.com>
Closes#8776, salvaged from PR #8814
Updated the acquire_scoped_lock function to treat empty or corrupt lock files as stale. This change ensures that if a lock file exists but is invalid, it will be removed to prevent issues with stale locks. Added tests to verify recovery from both empty and corrupt lock files.
- Store source metadata on /voice channel join so voice input shares the
same session as the linked text channel conversation
- Treat voice-linked text channels as free-response (skip @mention and
auto-thread) while voice is active
- Scope the voice-linked exemption to the exact bound channel, not
sibling threads
- Guard signal handler registration in start_gateway() for non-main
threads (prevents RuntimeError when gateway runs in a daemon thread)
- Clean up _voice_sources on leave_voice_channel
Salvaged from PR #3475 by twilwa (Modal runtime portions excluded).
When HTTPS_PROXY / HTTP_PROXY / ALL_PROXY env vars are set (or macOS system proxy
is detected), pass the proxy URL explicitly via HTTPXRequest(proxy=proxy_url) instead
of relying on httpx's trust_env mechanism, which is unreliable for HTTP CONNECT
proxies (e.g. Clash / ClashMac in fake-ip mode).
Uses the shared resolve_proxy_url() from base.py (handles env vars + macOS system
proxy detection) instead of duplicating env var reading inline. Consolidates the
proxy_configured boolean into a single proxy_url = resolve_proxy_url() call that
serves as both the gate for skipping fallback-IP transport and the value passed
to HTTPXRequest.
Co-authored-by: Hermes Agent <hermes@nousresearch.com>
Salvaged from PR #8931 by MaybeRichard.
The detached bash subprocess spawned by /restart gets killed by
systemd's KillMode=mixed cgroup cleanup, leaving the gateway dead.
Under systemd (detected via INVOCATION_ID env var), /restart now uses
via_service=True which exits with code 75 — RestartForceExitStatus=75
in the unit file makes systemd auto-restart the service. The detached
subprocess approach is preserved as fallback for non-systemd
environments (Docker, tmux, foreground mode).
When a user sends /restart, the gateway now persists their routing info
(platform, chat_id, thread_id) to .restart_notify.json. After the new
gateway process starts and adapters connect, it reads the file, sends a
'Gateway restarted successfully' message to that specific chat, and
cleans up the file.
This follows the same pattern as _send_update_notification (used by
/update). Thread IDs are preserved so the notification lands in the
correct Telegram topic or Discord thread.
Previously, after /restart the user had no feedback that the gateway was
back — they had to send a message to find out. Now they get a proactive
notification and know their session continues.