Commit Graph

3441 Commits

Author SHA1 Message Date
GinWU
6d9b30632d fix(cli): honor positive tool preview length 2026-05-07 05:26:28 -07:00
Teknium
fdb9e0f6a6
fix(kanban): auto-block workers that exit without completing (#20894) (#21214)
When a kanban worker subprocess exits rc=0 but its task is still in
status='running', the agent almost certainly answered the task
conversationally without calling kanban_complete or kanban_block. The
dispatcher used to classify this as a generic crash and respawn, which
loops forever on small local models (gemma4-e2b q4 etc.) that keep
returning clean but unproductive output.

Dispatcher changes:
- The waitpid reap loop at the top of dispatch_once now records each
  reaped child's raw exit status in a bounded module registry
  (_recent_worker_exits, TTL 600s, size cap 4096).
- _classify_worker_exit distinguishes clean_exit / nonzero_exit /
  signaled / unknown using os.WIFEXITED / WIFSIGNALED.
- detect_crashed_workers consults the classification when a worker
  is found dead. clean_exit → protocol_violation event + immediate
  circuit-breaker trip (failure_limit=1). Everything else keeps the
  existing crashed-event + counter behavior.
- DispatchResult.auto_blocked now includes protocol-violation trips.

Gateway fix (Bug A in #20894):
- gateway.run._notify_active_sessions_of_shutdown snapshots
  self.adapters with list(...) before iterating. adapter.send() can
  hit a fatal-error path that pops the adapter from the dict, which
  was raising 'RuntimeError: dictionary changed size during iteration'
  during shutdown.

Regression tests:
- test_detect_crashed_workers_protocol_violation_auto_blocks verifies
  rc=0 + still-running → status=blocked on first occurrence with
  protocol_violation + gave_up events and NO crashed event.
- test_detect_crashed_workers_nonzero_exit_uses_default_limit verifies
  non-zero exits keep the existing 2-strike behavior.

Closes #20894.
2026-05-07 05:24:16 -07:00
Hao Zhe
2b6345cee3 fix(memory): harden OpenViking local path uploads 2026-05-07 05:21:50 -07:00
Hao Zhe
187951ec6b test(memory): harden OpenViking local upload coverage 2026-05-07 05:21:50 -07:00
nan
7137cccbd1 fix(memory): support OpenViking local resource uploads 2026-05-07 05:21:50 -07:00
0oAstro
abe5a3c937 fix(model_switch): live model discovery for custom_providers in /model picker
custom_providers entries (section 4 of list_authenticated_providers) only
read the static models: dict from config.yaml, ignoring the live /v1/models
endpoint.  This means gateways like Bifrost that expose hundreds of models
only show the handful explicitly listed in config.

Add live discovery via fetch_api_models() for custom_providers entries
that have api_key + base_url, matching the existing behavior for user
providers: entries (section 3).  When the endpoint is reachable and
returns models, the live list replaces the static subset.

Fixes: /model picker showing only 9 models from a Bifrost gateway that
actually exposes 581.
2026-05-07 05:21:26 -07:00
Teknium
e82f3b0c41 test: update send_message_tool mocks for force_document kwarg 2026-05-07 05:20:10 -07:00
leon7609
d34f03c32a feat(gateway): support [[as_document]] directive for skill media routing
Skills that produce large/lossless images (e.g. info-graph, where a
rendered JPG is 1-2 MB) currently lose quality in Telegram delivery
because `_IMAGE_EXTS` membership routes the file through
`send_multiple_images` → `sendMediaGroup`, which Telegram's server
re-encodes to JPEG @ 1280px max edge. The original bytes only survive
when the file goes through `send_document`, which the dispatch tables
in three places (`_process_message_background`, `_deliver_media_from_response`,
and the `send_message` tool's telegram path) only reach for files
whose extension is NOT in `_IMAGE_EXTS`.

This commit adds an `[[as_document]]` directive that mirrors the
existing `[[audio_as_voice]]` shape: a skill emits the directive once
in its response, and every image-extension MEDIA: file in that response
is delivered via `send_document` instead of `send_multiple_images` /
`sendPhoto`. The directive is detected at the dispatch sites (which see
the raw response) and the directive string is stripped from the
user-visible cleaned text in `extract_media` so it never leaks.

Granularity is intentionally all-or-nothing per response, matching
[[audio_as_voice]]'s scope. Skills that need fine control can split into
two responses.

Verified the targeted use case: info-graph emits

    信息图已生成(...)
    [[as_document]]
    MEDIA:/tmp/info-graph-x/infographic.jpg

→ Telegram receives `infographic.jpg` via sendDocument, original 1MB
JPEG bytes preserved, no recompression. Forwarding and download
filenames stay clean (`infographic.jpg`).

Tests: +3 cases in TestExtractMedia covering directive strip, isolation
from voice flag, and coexistence with [[audio_as_voice]]. All
113 pre-existing media/extract/send tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 05:20:10 -07:00
Molvikar
8d363f8d54 fix(bedrock): preserve reasoningContent across converse normalization 2026-05-07 05:17:16 -07:00
badfriend
4f364c4e99 fix(mcp): give 'mcp add --command' a distinct argparse dest
The --command flag of `hermes mcp add` shared its argparse dest with the
top-level subparser (`dest="command"` in `hermes_cli/_parser.py`). When
the flag was omitted, argparse still wrote `args.command = None`,
clobbering the top-level value of `"mcp"`. The dispatcher then saw
`args.command is None` and fell through to interactive chat, so
`hermes mcp add ...` silently launched chat instead of registering the
server. `cmd_mcp_add` was never reached.

Use `dest="mcp_command"` on the flag and read it from `cmd_mcp_add`.
The user-facing CLI flag `--command` is unchanged; only the in-memory
namespace attribute moves. Also updates the `_make_args` helper in
`tests/hermes_cli/test_mcp_config.py` to populate the new dest, and
adds `tests/hermes_cli/test_mcp_add_command_dest.py` with a parser-
level regression test.

Closes #19785.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-07 05:17:03 -07:00
teknium1
333598cb0e fix(gateway): cap cached session sources with LRU eviction
Follow-up on top of Zyproth's session-source cache: swap the unbounded
dict for an OrderedDict with a 512-entry LRU cap so long-running
gateways can't accumulate stale entries for dead sessions forever.

- self._session_sources is now an OrderedDict
- _cache_session_source() move_to_end + popitem(last=False) above cap
- _get_cached_session_source() move_to_end on hit (LRU read bump)
- restart_test_helpers.py wires OrderedDict + _session_sources_max
2026-05-07 05:16:38 -07:00
Zyproth
176b93575a fix(gateway): preserve thread routing from cached live session sources 2026-05-07 05:16:38 -07:00
Teknium
042eb930e2
fix(security): close TOCTOU window in hermes_cli/auth.py credential writers (#21194)
`_save_auth_store`, `_save_qwen_cli_tokens`, and `_write_shared_nous_state`
all created the temp file via `Path.open('w')` / `Path.write_text` and only
tightened permissions to 0o600 afterward. Between create and chmod the file
existed at the process umask (commonly 0o644 = world-readable on multi-user
hosts), briefly exposing OAuth access/refresh tokens for Nous, Codex,
Copilot, Claude, Qwen, Gemini, and every other native OAuth provider that
flows through auth.json.

Switch all three to `os.open(O_WRONLY|O_CREAT|O_EXCL, 0o600)` + `os.fdopen`
+ `fsync` so the file is atomic at 0o600 on creation. Tighten each parent
directory (`~/.hermes/`, Qwen auth dir, Nous shared auth dir) to 0o700 so
siblings can't traverse to the creds. `_save_auth_store` also gains a
per-process random temp suffix to match `agent/google_oauth.py` (#19673)
and `tools/mcp_oauth.py` (#21148).

Adds `tests/hermes_cli/test_auth_toctou_file_modes.py` asserting final
file mode 0o600 and parent dir mode 0o700 across all three writers, plus
an explicit `os.open(flags, mode)` check on the main auth.json writer
that would fail if anyone reintroduces the `Path.open('w')` pattern.
POSIX-only (mode bits skipped on Windows).
2026-05-07 05:12:05 -07:00
Brian Su
8b32a9d0f1 feat: add Discord message deletion action 2026-05-07 05:11:09 -07:00
Teknium
fb1ce793e6
feat(security): enable secret redaction by default (#17691, #20785) (#21193)
Flip the default for HERMES_REDACT_SECRETS from off to on so the redactor
already wired into send_message_tool, logs, and tool output actually runs
on a fresh install.

- agent/redact.py: env-var default "" → "true"
- hermes_cli/config.py: DEFAULT_CONFIG security.redact_secrets True;
  two config-template comments rewritten
- gateway/run.py + cli.py: startup log / banner warning when the user
  has explicitly opted out, so the downgrade is visible in agent.log
  and at CLI banner time
- docs/reference/environment-variables.md: description reconciled
- tests: flipped the default-pin, restructured the force=True
  regression test to explicit-false instead of unset

Users who need raw credential values (redactor development) can still
opt out via security.redact_secrets: false in config.yaml or
HERMES_REDACT_SECRETS=false in .env.

Closes #17691.
Addresses #20785 (short-term output-pipeline recommendation).
2026-05-07 05:10:33 -07:00
Teknium
ecaafe5f22 test(weixin): update timeout assertion for asyncio.wait_for migration 2026-05-07 05:10:04 -07:00
Zyproth
6e8f1e09a9 fix(gateway): use monotonic deadlines in QR onboarding flows 2026-05-07 05:09:39 -07:00
thelumiereguy
8a96fa48c1 fix(gateway): avoid duplicated responses history 2026-05-07 05:07:59 -07:00
Michael Nguyen
a84e56d4c6 fix(auth): sync shared Nous refresh tokens 2026-05-07 05:07:06 -07:00
Teknium
38b1c7dce5 refactor(gateway): simplify auto-resume + extend to crash recovery
Follow-up on top of @kyan12's PR #20888 — same feature, cleaner shape,
wider coverage.

Changes:
- Drop the synthetic '[System note: ...]' in the internal MessageEvent.
  The existing _is_resume_pending branch in _handle_message_with_agent
  (run.py ~L13738) already injects a reason-aware recovery system note
  on the next turn.  With kyan's text in place the model saw two stacked
  system notes.  Now the event text is empty and the existing injection
  path owns the wording.
- Drop SessionStore.list_resume_pending() as a new public method.  The
  filter is 8 lines inline in _schedule_resume_pending_sessions() —
  one caller, no other pluggability need.
- Add 'restart_interrupted' to the auto-resume reason set.  That's the
  reason SessionStore.suspend_recently_active() stamps on sessions
  recovered from a crash/OOM/SIGKILL (no .clean_shutdown marker).
  Previously those sessions had to wait for a real user message to
  auto-resume; now they continue automatically at startup like
  drain-timeout interruptions do.
- Reasons live in a _AUTO_RESUME_REASONS frozenset at class scope so
  future reasons (e.g. 'manual_resume_request') can be opted in with
  one line.

Test coverage added:
- drain-timeout + crash-recovery both scheduled
- stale entries skipped (outside freshness window)
- suspended entries skipped (suspended > resume_pending)
- originless entries skipped (no routing target)
- disallowed reasons skipped (graceful forward-compat)

E2E verified end-to-end with a real on-disk SessionStore: 2 eligible
sessions scheduled, 2 ineligible skipped, empty-text internal events
delivered to the adapter.

Co-authored-by: Kevin Yan <kevyan1998@gmail.com>
2026-05-07 05:05:34 -07:00
Kevin Yan
961a3535fa fix(gateway): preserve resume marker on interrupted restart 2026-05-07 05:05:34 -07:00
Kevin Yan
fad684b1f3 feat(gateway): auto-resume interrupted sessions after restart 2026-05-07 05:05:34 -07:00
mwnickerson
411cfa26e3 fix: auto-block repeated kanban retries 2026-05-07 05:05:20 -07:00
LeonSGP43
06f24351c5 fix(kanban): stop reclaimed workers before retry 2026-05-07 05:05:20 -07:00
stephen0110
40b51c93a2 fix(kanban): heartbeat tool extends claim TTL, not just last_heartbeat_at
The kanban_heartbeat tool called heartbeat_worker but never
heartbeat_claim, so a worker that loops the tool while a single tool
call blocks the agent for >DEFAULT_CLAIM_TTL_SECONDS still got
reclaimed by release_stale_claims. The function name and
heartbeat_claim's own docstring imply otherwise:

  "Workers that know they'll exceed 15 minutes should call this
   every few minutes to keep ownership."

But there was no caller in the worker tool path. Workers couldn't
invoke heartbeat_claim themselves either — it isn't exposed as a tool.

Fix: _handle_heartbeat now calls heartbeat_claim first, reading
HERMES_KANBAN_CLAIM_LOCK from the worker env (the dispatcher pins
this in _default_spawn). Falls back to _claimer_id() for locally-
driven workers that didn't go through dispatcher spawn.

Test: tests/tools/test_kanban_tools.py::test_heartbeat_extends_claim_expires
rewinds claim_expires into the past, calls the tool, and asserts the
new value is at least now + DEFAULT_CLAIM_TTL_SECONDS // 2. Verified to
fail against the unfixed code (claim_expires stays at the rewound
value).

Closes the root cause underlying the symptom in #21141 (15-min
respawns of long-running workers). #21141 separately addresses
post-reclaim cleanup; this fixes the upstream "shouldn't have been
reclaimed in the first place" half.
2026-05-07 05:05:20 -07:00
Teknium
bf843adf05
feat(gateway): opt-in cleanup of temporary progress bubbles (#21186)
When display.cleanup_progress (or display.platforms.<plat>.cleanup_progress)
is true, the gateway deletes tool-progress bubbles, long-running ' Still
working...' notices, and status-callback messages after the final response
is delivered successfully. Currently effective on adapters that implement
delete_message (Telegram); silently no-ops elsewhere. Off by default.
Failed runs skip cleanup so bubbles stay as breadcrumbs.

Minimal plumbing: base.py's existing post_delivery_callback slot now chains
new registrations onto any existing callback (with per-callback exception
isolation) rather than clobbering. Stale-generation registrations are
rejected so they can't step on a fresher run's callbacks. This lets the
cleanup callback coexist with the background-review release hook already
registered on the same slot.

Co-authored-by: mrcharlesiv <Mrcharlesiv@gmail.com>
2026-05-07 05:04:37 -07:00
Tranquil-Flow
d4de7d4179 test(skills): cover additional rescan paths in skill_commands cache (#14536)
The rescan-on-platform-change fix landed in #18739 ships one regression
test that exercises the HERMES_PLATFORM env-var path. Three other code
paths in get_skill_commands / _resolve_skill_commands_platform have no
direct coverage; this commit adds a regression test for each.

- Gateway session context (HERMES_SESSION_PLATFORM via ContextVar): the
  resolver consults get_session_env after HERMES_PLATFORM, and the
  gateway sets that variable through set_session_vars (a ContextVar),
  not os.environ. The test uses set_session_vars / clear_session_vars
  to drive the actual gateway signal, and the disabled-skill stub reads
  the same value via get_session_env. A regression that swapped
  get_session_env for plain os.getenv would still pass an env-var-based
  test but break concurrent gateway sessions, which is the bug the
  ContextVar plumbing exists to prevent.
- Returning to no-platform-scope (CLI / cron / RL rollouts after a
  gateway session): the cached telegram view must be dropped and the
  unfiltered scan repopulated when HERMES_PLATFORM is unset again.
- Same-platform cache hit: consecutive calls under the same platform
  scope must NOT rescan. The rescan trigger is change in scope, not
  "always re-resolve" — a gateway serving many consecutive telegram
  requests should pay the scan cost once, not per request.

The third test wraps scan_skill_commands with a spy after the cache is
primed, so the assertion is on call_count == 0 across three subsequent
get_skill_commands() calls.

All 39 tests in tests/agent/test_skill_commands.py pass under
scripts/run_tests.sh.
2026-05-07 04:59:43 -07:00
briandevans
11b9b146f1 fix(image-routing): expose attached image paths in native multimodal text part
In native image mode (vision-capable models like gpt-4o, claude-sonnet-4),
build_native_content_parts() previously emitted only the user's caption
plus image_url parts. The local file path of each attached image never
appeared in the conversation text, so the model could see the pixels but
had no string handle for tools that take image_url: str (custom MCP
tools, vision_analyze on a re-look, attach-to-tracker workflows).

The text-mode path already injects an equivalent hint via
Runner._enrich_message_with_vision ("...vision_analyze using image_url:
<path>..."). This brings native mode to parity by appending one
"[Image attached at: <path>]" line per successfully attached image to
the user-text part of the multimodal turn. Skipped (unreadable) paths
are NOT advertised, so the model is never told a non-existent file is
attached.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 04:58:00 -07:00
Sanjay Santhanam
1f27ca638f test(update): teach restart-mocks about the post-update survivor sweep
Issue #17648 added a post-update SIGTERM-survivor sweep to `cmd_update`:
~3s after issuing graceful/SIGTERM restarts, the code re-queries
`find_gateway_pids` and SIGKILLs anything still alive. That's the
right fix for stuck-drain gateways in production, but it broke three
unit tests that assumed `find_gateway_pids` would keep returning the
same PIDs forever:

  FAILED ::TestCmdUpdateLaunchdRestart::test_update_restarts_profile_manual_gateways
    AssertionError: Expected 'kill' to not have been called. Called 1 times.
    Calls: [call(12345, <Signals.SIGKILL: 9>)].

  FAILED ::TestCmdUpdateLaunchdRestart::test_update_profile_manual_gateway_falls_back_to_sigterm
    AssertionError: Expected 'kill' to have been called once. Called 2 times.
    Calls: [call(12345, SIGTERM), call(12345, SIGKILL)].

  FAILED ::TestServicePidExclusion::test_update_kills_manual_pid_but_not_service_pid
    assert 2 == 1
      manual_kills = [call(42999, SIGTERM), call(42999, SIGKILL)]

In each test `os.kill` is mocked, so the simulated PID never actually
exits \u2014 the sweep finds it again and escalates. The production code
is correct; the tests just need to model OS behaviour properly.

Two-test fix (profile-manual restart cases): use
`side_effect=[[12345], []]` so the first `find_gateway_pids` call
returns the live PID and the second (the sweep) returns nothing, as if
the OS had reaped the process.

Service-PID-exclusion fix: track which PIDs got killed in a closure
set, and exclude them on subsequent `fake_find` calls. `os.kill`
gets a `side_effect` that records the kill instead of swallowing it
silently. Now the sweep doesn't re-find the manual PID, no SIGKILL
escalation, `manual_kills == 1`.

Validation:

    $ pytest tests/hermes_cli/test_update_gateway_restart.py -q
    43 passed in 4.13s

No production code change. Fixes the three failures observed on `main`
(run 25250051126):

  test_update_restarts_profile_manual_gateways
  test_update_profile_manual_gateway_falls_back_to_sigterm
  test_update_kills_manual_pid_but_not_service_pid

Refs: #17648 (post-update survivor sweep that the tests didn't model).
2026-05-07 04:56:25 -07:00
Gutslabs
7d36e8346b fix(security): close TOCTOU window when saving MCP OAuth credentials
_write_json (the persistence helper used by HermesTokenStorage for both
tokens and client_info) created the temp file via Path.write_text and
only chmod'd it to 0o600 afterward. Between create and chmod the file
existed on disk at the process umask (commonly 0o644 = world-readable),
briefly exposing MCP OAuth access/refresh tokens to other local users.

Use os.open with O_WRONLY|O_CREAT|O_EXCL and an explicit S_IRUSR|S_IWUSR
mode so the file is created atomically at 0o600, plus tighten the parent
dir to 0o700 so siblings can't traverse to the creds file. The temp name
also gains a per-process random suffix to avoid collisions between
concurrent writers and stale leftovers from a crashed prior write.

Mirrors the fix shipped for agent/google_oauth.py in #19673.

Adds a regression test asserting the resulting file mode is 0o600 and
the parent directory is 0o700 (skipped on Windows where POSIX mode bits
aren't enforced).
2026-05-07 04:56:13 -07:00
Sanjay Santhanam
595bcc89fc test(update): patch isatty on real streams to fix xdist-flaky --yes tests
Two CI tests for the new `--yes` update flag (#18261) flaked under
`pytest-xdist` on Linux/Python 3.11 even though they passed every
local run on macOS/Python 3.14.4:

  FAILED tests/hermes_cli/test_update_yes_flag.py
    ::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty
      `AssertionError: assert <MagicMock 'input'>.called is False`
  FAILED tests/hermes_cli/test_update_yes_flag.py
    ::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting
      `AssertionError: assert <MagicMock '_restore_stashed_changes'>.called is False`

Captured stdout for the first failure shows `cmd_update` taking the
"Non-interactive session \u2014 skipping config migration prompt." branch
\u2014 i.e. the `sys.stdin.isatty() and sys.stdout.isatty()` check at
`hermes_cli/main.py:7118` evaluated to `False` despite the test doing:

    with patch("hermes_cli.main.sys") as mock_sys:
        mock_sys.stdin.isatty.return_value = True
        mock_sys.stdout.isatty.return_value = True

The whole-module mock is fragile under xdist worker reuse: a sibling
test that imports `hermes_cli.main` first can leave another `sys`
reference resolved inside the function (re-import in a helper, etc.),
and the wholesale module replacement never gets consulted.

Switch to `patch.object(_sys.stdin, "isatty", return_value=True)` (and
the same for `stdout`). That patches the *attribute on the real stream
object* \u2014 every call site, no matter how it reached `sys.stdin`,
hits the patched method. Same fix applied to the stash-restore test
(it took the "non-TTY \u2192 skip restore prompt" branch for the same reason).

Validation:

    $ pytest tests/hermes_cli/test_update_yes_flag.py -q
    3 passed in 5.47s

No production code change. Fixes the two failures observed on `main`
(run 25250051126):

`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesConfigMigration::test_no_yes_flag_still_prompts_in_tty`
`tests/hermes_cli/test_update_yes_flag.py::TestUpdateYesStashRestore::test_yes_restores_stash_without_prompting`

Refs: #18261 (added the `--yes` flag + these tests).
2026-05-07 04:54:57 -07:00
Sanjay Santhanam
033e533d05 test(docker): align Dockerfile contract tests with simplified TUI flow
The Dockerfile dropped the manual `@hermes/ink` materialisation gymnastics
in favour of letting npm workspaces resolve the bundled package
naturally. Two contract tests still asserted the older flow:

`test_dockerfile_installs_tui_dependencies` required:
    'ui-tui/packages/hermes-ink/package-lock.json' in dockerfile_text

…but the lockfile is no longer COPIED individually \u2014 the entire
`ui-tui/packages/hermes-ink/` tree is COPIED instead (the workspace
reference from `ui-tui/package.json` is `file:` so npm needs the
real source, not just a manifest stub).

`test_dockerfile_materializes_local_tui_ink_package` required a 7-clause
conjunction matching specific `rm -rf` / `npm install --omit=dev`
`--prefix node_modules/@hermes/ink` / `rm -rf .../react` invocations
that were stripped out when the workspace resolution was simplified.

Update the assertions to pin the *contract* the image actually has to
carry rather than the *exact shell incantations* the old flow used:

* TUI deps install: ui-tui/package.json + ui-tui/package-lock.json +
  ui-tui/packages/hermes-ink/ tree are all COPIED, and an npm
  install/ci step runs in ui-tui.
* Bundled hermes-ink: the workspace package source is COPIED (so
  `await import('@hermes/ink')` resolves at runtime).

This keeps the spirit of #15012 / #16690 (zombie reaping + bundled
workspace materialisation must continue to work) without locking the
Dockerfile into one specific implementation flavour.

Validation:

    $ pytest tests/tools/test_dockerfile_pid1_reaping.py -q
    6 passed in 1.43s

No production code change. Fixes the two failures observed on `main`
(run 25250051126):

`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_installs_tui_dependencies`
`tests/tools/test_dockerfile_pid1_reaping.py::test_dockerfile_materializes_local_tui_ink_package`
2026-05-07 04:53:10 -07:00
mrcoferland
bd0c54d171 fix: route Telegram image documents through photo handling 2026-05-07 04:51:46 -07:00
Teknium
51f9953e69
feat(profiles): --no-skills flag for empty profile creation (#20986)
Adds `hermes profile create <name> --no-skills` to create a profile with
zero bundled skills. Writes a `.no-bundled-skills` marker file in the
profile root so `hermes update`'s all-profile skill sync loop also skips
the profile — without the marker, every update would re-seed skills and
the user would have to delete them again.

Use case (from @hiut1u): orchestrator profiles and narrow-task profiles
don't need 100+ bundled skills polluting their system prompt.

- create_profile() gains a `no_skills` param, mutually exclusive with
  `--clone` / `--clone-all` (cloning explicitly copies skills).
- seed_profile_skills() no-ops on opted-out profiles and returns
  `{skipped_opt_out: True}` so callers can report cleanly.
- Web API (POST /api/profiles) accepts `no_skills: bool`.
- Delete `.no-bundled-skills` to opt back in — next `hermes update`
  re-seeds normally.

6 new tests in TestNoSkillsOptOut cover marker write, mutual exclusion
with clone, seed_profile_skills opt-out, fresh profile unaffected, and
delete-marker-re-enables-seeding.
2026-05-07 04:34:38 -07:00
Teknium
5a3cadf6eb fix(discord): narrow rate-limit catch and move sync state under gateway/
Two follow-ups on top of helix4u's slash-command sync hardening:

- Only suppress exceptions that are actually Discord 429 rate limits
  (discord.RateLimited, HTTPException with status 429, or a clearly
  rate-limit-named duck type). Arbitrary failures that happen to expose
  a retry_after attribute now re-raise to the outer handler instead of
  silently swallowing a cooldown.
- Move the sync-state JSON under $HERMES_HOME/gateway/ so the home root
  stops collecting ad-hoc runtime files.

Added a test verifying unrelated exceptions don't get misclassified as
rate limits.
2026-05-06 18:12:35 -07:00
helix4u
d797755a1c fix(gateway): wait for systemd restart readiness 2026-05-06 18:12:35 -07:00
Austin Pickett
65c762b2e8 fix(tui): preserve session when switching personality
Previously, /personality in the TUI called _reset_session_agent() which
destroyed the agent, cleared conversation history, and effectively started
a new session. This made personality switching disruptive — users lost
their entire conversation context.

Now /personality updates the agent's ephemeral_system_prompt in-place and
injects a pivot marker into the conversation history. The marker tells
the model to adopt the new persona from that point forward, which is
necessary because LLMs tend to pattern-match their prior responses and
continue the established tone without an explicit signal.

Changes:
- tui_gateway/server.py: Rewrite _apply_personality_to_session to update
  the agent in-place instead of resetting. Inject a user-role pivot
  marker so the model actually switches style mid-conversation.
- ui-tui/src/app/slash/commands/session.ts: Update help text (no longer
  mentions history reset).
- tests/test_tui_gateway_server.py: Update test to verify history is
  preserved, pivot marker is injected, and ephemeral prompt is set.
2026-05-06 19:30:46 -04:00
Teknium
3cdbf334d5 fix(gateway): don't dead-end setup wizard when only system-scope unit is installed
The setup wizard dropped non-root users at a bare shell prompt when
trying to start a system-scope gateway service. Previously
_require_root_for_system_service called sys.exit(1), which the
wizard's `except Exception` guards cannot catch (SystemExit is a
BaseException). Users with a pre-existing /etc/systemd/system unit
(e.g. from an earlier `sudo hermes setup` run) hit this whenever
they re-ran `hermes setup` as a regular user.

- Convert _require_root_for_system_service to raise a typed
  SystemScopeRequiresRootError (RuntimeError subclass) instead of
  sys.exit(1). The direct CLI path (`hermes gateway install|start|stop|
  restart|uninstall` without sudo) still exits 1 cleanly via a new
  catch at the top of gateway_command, matching the existing
  UserSystemdUnavailableError pattern.
- Add _system_scope_wizard_would_need_root() pre-check and
  _print_system_scope_remediation() helper. Both setup wizards
  (hermes_cli/setup.py and hermes_cli/gateway.py::gateway_setup) now
  detect the dead-end before prompting and print actionable guidance:
  either `sudo systemctl start <service>` this time, or uninstall the
  system unit and install a per-user one.
- Defense-in-depth: all 5 wizard prompt sites also catch
  SystemScopeRequiresRootError and fall back to the remediation
  helper if the pre-check is bypassed (race, etc.).

Tests: 12 new tests in TestSystemScopeRequiresRootError,
TestSystemScopeWizardPreCheck, TestSystemScopeRemediationOutput, and
TestGatewayCommandCatchesSystemScopeError covering the exception
contract, pre-check matrix (root vs non-root, system-only vs
user-present vs none vs explicit system=True), remediation output
for each action, and the direct-CLI exit-1 path.
2026-05-06 15:58:02 -07:00
brooklyn!
04cf4788cc
fix(tui): restore voice push-to-talk parity (#20897)
* fix(tui): restore classic CLI voice push-to-talk parity

(cherry picked from commit 93b9ae301bb89f5b5e01b4b9f8ac91ffa74fbd9d)

* fix(tui): harden voice push-to-talk stop flow

Address review feedback from PR #16189 by stopping the active recorder before background transcription, documenting single-shot voice capture, and covering the TUI gateway flags with regression tests.

* fix(tui): preserve silent voice strike tracking

Keep single-shot voice recording's no-speech counter alive across starts so the TUI can still emit the three-strikes auto-disable event, and bind the auto-restart state at module scope for type checking.

* fix(tui): clean up voice stop failure path

Address follow-up review by naming the TUI flow as single-shot push-to-talk and cancelling the recorder when forced stop cannot produce a WAV.

* fix(tui): report busy voice capture starts

Return explicit start state from the voice wrapper so the TUI gateway does not report recording while forced-stop transcription is still cleaning up.

* fix(tui): handle busy voice record responses

Apply the gateway busy status immediately in the TUI and route forced-stop voice events to the session that sent the stop request.

* fix(tui): clear voice recording on null response

Treat a null voice.record RPC result as a failed optimistic start so the REC badge cannot stick after gateway-side errors.

* fix(tui): count silent manual voice stops

Preserve single-shot voice no-speech strikes through forced stop transcription so empty push-to-talk captures still trigger the three-strikes guard.

---------

Co-authored-by: Montbra <montbra@gmail.com>
2026-05-06 15:49:59 -07:00
brooklyn!
5044e1cbf1
fix(cli): submit LF enter in thin PTYs (#20896) 2026-05-06 13:51:13 -07:00
Guillaume Meyer
7df6115199 feat(gateway): also gate pre-restart "Gateway restarting" notification
Extend the gateway_restart_notification flag to cover
_notify_active_sessions_of_shutdown — the message that fires just
before drain ("⚠️ Gateway restarting — Your current task will be
interrupted. Send any message after restart and I'll try to resume
where you left off.") sent to active sessions and home channels.

Same operator/end-user reasoning: on a Slack workspace shared with
end users, "Gateway restarting" reads as "the bot is broken" — the
operator should be able to suppress it consistently with the other
two lifecycle pings rather than having a partial opt-out.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 13:39:43 -07:00
Guillaume Meyer
b71f80e6ce feat(gateway): per-platform gateway_restart_notification flag
Adds an opt-out toggle on PlatformConfig that gates both restart
lifecycle pings: the "♻ Gateway restarted" message sent to the chat
that issued /restart, and the "♻️ Gateway online" home-channel
startup notification. Defaults to True so existing deployments are
unaffected.

The motivating split is operator vs. end-user surfaces: a back-channel
like Telegram should keep these pings, while a Slack workspace shared
with end users should not surface gateway lifecycle noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 13:39:43 -07:00
Teknium
33bf5f6292 fix(auth): fall back to global-root auth.json for providers missing in profile
Profile processes (kanban workers, cron subprocesses, delegated subagents)
read the profile's auth.json only. If a provider was authenticated at the
global root but not inside the profile, the profile's credential_pool
comes back empty and the process fails with 'No LLM provider configured'
— even though the credentials are sitting in ~/.hermes/auth.json. #18594
propagated HERMES_HOME correctly, which is what surfaced this: workers
now land in the right profile, and the profile turns out to shadow global
with no fallback.

Semantics (read-only, per-provider shadowing):
* Profile has any entries for provider X → use profile only (global ignored).
* Profile has zero entries for provider X → fall back to global.
* Writes (write_credential_pool, _save_auth_store) still target the profile.
* Classic mode (HERMES_HOME == global root) skips the fallback entirely —
  _global_auth_file_path() returns None.

Also mirrors the fallback in get_provider_auth_state so OAuth singletons
(nous, minimax-oauth, openai-codex, spotify) inherit cleanly — the Nous
shared-token store (PR #19712) remains the authoritative path for Nous
OAuth rotation, this just makes the read side consistent with it.

Seat belt: _load_global_auth_store() refuses to read the real user's
~/.hermes/auth.json under PYTEST_CURRENT_TEST even when HERMES_HOME points
to a profile-shaped path. Guard uses $HOME (stable across fixtures) rather
than Path.home() (which fixtures often monkeypatch to a tmp root).

Reported by @SeedsForbidden on Twitter as the credential_pool shadowing
follow-up to the #18594 fix.
2026-05-06 13:29:54 -07:00
kshitijk4poor
a2ff193050 chore: follow-up cleanup for Kanban migration fix
- Expand migration comment to name the primary failure mode (missing
  column OperationalError from #20842) ahead of the secondary SQLite
  schema-reparse concern; also document the stale-cols-snapshot invariant
- Add clarifying comments on from_row() legacy fallback branches noting
  they are belt-and-suspenders dead code post-migration
- Add task_events comment in existing test explaining why the table is
  required by the migrator
- Add test_legacy_migration_no_legacy_columns_at_all: Scenario A —
  explicitly asserts the exact #20842 crash no longer occurs and that
  consecutive_failures defaults to 0 on a DB that never had spawn_failures
- Add test_legacy_migration_both_columns_already_present: Scenario D —
  asserts the migration is a no-op when both columns already exist,
  preserving the existing counter value
2026-05-06 11:25:16 -07:00
helix4u
b1d420e75f fix(kanban): avoid fragile failure-column renames 2026-05-06 11:25:16 -07:00
Yuqian
441ef75d15 fix(feishu): keep topic replies in threads
Route Feishu topic progress, status, approval, stream, and fallback messages through threaded replies by preserving the originating message id as the reply target. Add regressions for tool progress topic metadata and Feishu metadata-driven reply routing.
2026-05-06 10:52:51 -07:00
kshitij
5c906d7026
feat(web): add SearXNG as a native search-only backend
Adds SearXNG as a free, self-hosted web search provider.  SearXNG is a
privacy-respecting metasearch engine that requires no API key — just a
running instance and SEARXNG_URL pointing at it.

## What this adds

- `tools/web_providers/searxng.py` — `SearXNGSearchProvider` implementing
  `WebSearchProvider` (search only; no extract capability)
- `_is_backend_available("searxng")` — gates on SEARXNG_URL
- `_get_backend()` — accepts "searxng" as a configured value; adds it to
  auto-detect candidates (lower priority than paid services)
- `web_search_tool` — dispatches to SearXNG when it is the active backend
- `check_web_api_key()` — includes SearXNG in availability check
- `OPTIONAL_ENV_VARS["SEARXNG_URL"]` — registered with tools=["web_search"]
- `tools_config.py` — SearXNG appears in the `hermes tools` provider picker
- `nous_subscription.py` — `direct_searxng` detection, web_active / web_available
- `setup.py` — SEARXNG_URL listed in the missing-credential hint
- 23 tests covering: is_configured, happy-path search, score sorting, limit,
  HTTP/request errors, _is_backend_available, _get_backend, check_web_api_key

## Config

```yaml
# Use SearXNG for search, any paid provider for extract
web:
  search_backend: "searxng"
  extract_backend: "firecrawl"

# Or: SearXNG as the sole backend (web_extract will use the next available)
web:
  backend: "searxng"
```

SearXNG is search-only — it does not implement WebExtractProvider.  Users
who only configure SEARXNG_URL get web_search available; web_extract falls
back to the next available extract provider (or is unavailable if none).

Closes #19198 (Phase 2 Task 4 — SearXNG provider)
Ref: #11562 (original SearXNG PR)
2026-05-06 10:05:29 -07:00
kshitij
cd2cbc73b7
refactor(web): per-capability backend selection for search/extract split
Introduce the foundation for independently selecting web search and
extract backends — enabling future combinations like SearXNG for
search + Firecrawl for extract.

Architecture:
- tools/web_providers/base.py: WebSearchProvider and WebExtractProvider
  ABCs with normalized result contracts (mirrors CloudBrowserProvider)
- tools/web_tools.py: _get_search_backend() and _get_extract_backend()
  read per-capability config keys, fall through to shared web.backend
- hermes_cli/config.py: web.search_backend and web.extract_backend in
  DEFAULT_CONFIG (empty = inherit from web.backend)

Behavioral change:
- web_search_tool() now dispatches via _get_search_backend()
- web_extract_tool() now dispatches via _get_extract_backend()
- When per-capability keys are empty (default), behavior is identical
  to before — _get_search_backend() falls through to _get_backend()

This is purely structural — no new backends are added. SearXNG and
other search-only/extract-only providers can now be added as simple
drop-in modules in follow-up PRs.

12 new tests, 49 existing tests pass with zero regressions.

Ref: #19198
2026-05-06 09:16:25 -07:00
Teknium
a24789d738
fix(opencode-go): keep users on opencode-go instead of hijacking to native providers (#20802)
OpenCode Go and OpenCode Zen are flat-namespace model resellers — their
/v1/models returns bare IDs (deepseek-v4-flash, minimax-m2.7), and the
inference API rejects vendor-prefixed names with HTTP 401 'Model not
supported'. Two bugs fixed:

1. `switch_model` in hermes_cli/model_switch.py was silently switching the
   user off opencode-go to native deepseek when they typed
   `/model deepseek-v4-flash`. Step d found the model in opencode-go's live
   catalog, but step e (detect_provider_for_model) still ran and matched
   the bare name against deepseek's static catalog. Fix: track whether
   the live catalog resolved it; skip step e when it did.

2. `normalize_model_for_provider` in hermes_cli/model_normalize.py only
   stripped the exact `opencode-zen/` prefix, leaving arbitrary vendor
   prefixes like `minimax/minimax-m2.7` (commonly copied from aggregator
   slugs into fallback_model configs) intact — causing HTTP 401s when
   the fallback chain activated. Fix: opencode-go/opencode-zen strip ANY
   leading vendor prefix because their APIs are flat-namespace.

Tests: 11 new cases in tests/hermes_cli/test_opencode_go_flat_namespace.py
covering both normalization (prefix stripping, regression guards for
opencode-zen Claude hyphenation and openrouter vendor-prepending) and
switch_model (bare-name resolution on opencode-go's live catalog must
not trigger cross-provider hijack).

Reported by @Ufonik via Discord; Kimi K2.6 always worked because moonshotai
has no overlapping entry in a native provider's static catalog. Deepseek
and minimax failed because their v4/v2.7 names existed in the native
deepseek/minimax catalogs.
2026-05-06 09:08:33 -07:00
Cleo
906881c38b fix(cli): catch OSError in _resolve_attachment_path to prevent ENAMETOOLONG dropping long slash commands
When the user pastes a long slash command like \`/goal <long prose>\` into
\`hermes chat\`, the input flows into \`_detect_file_drop()\`, whose
\`starts_like_path\` prefilter accepts anything starting with \`/\` and
forwards it to \`_resolve_attachment_path()\`. That helper calls
\`Path.exists()\` which invokes \`os.stat()\`, which raises
\`OSError(errno=ENAMETOOLONG)\` — 63 on macOS, 36 on Linux — when the
candidate exceeds NAME_MAX (typically 255 bytes).

The OSError propagates up to the broad \`except Exception\` in
\`process_loop\` (cli.py:11798), gets logged at WARNING level, and the
user's input is silently dropped. From the user's POV the chat prompt
hangs — the only signal is in agent.log:

  WARNING cli: process_loop unhandled error (msg may be lost):
    [Errno 63] File name too long: "/goal Drive the space board..."

This affects any slash command with prose-length arguments — \`/goal\`
in particular but also \`/skill\`, \`/cron\`, custom user commands.

Fix: wrap the \`exists()\`/\`is_file()\` calls in try/except OSError so
structurally-invalid path candidates cleanly return None. The slash-
command dispatch path downstream (cli.py:11718) then handles the
input correctly.

Tests: two new regression cases in test_cli_file_drop.py cover the
original \`/goal\` reproducer and a synthetic long path. All 35 file-
drop tests pass.

Reproducer (without the fix):
  python -c "from cli import _detect_file_drop;
             _detect_file_drop('/goal ' + 'a'*300)"
  → OSError: [Errno 63] File name too long
2026-05-06 06:34:48 -07:00
Teknium
a0fedfbb1b
feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709)
Replaces the per-directory shadow-repo design with a single shared shadow
git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated
across every working directory the agent has ever touched; a dozen
worktrees of the same project cost near-zero in additional disk.

Why
---
Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/
grow to multi-GB on active machines:

1. Each working directory got its own full shadow git repo — no object
   dedup across projects or across worktrees of the same project.
2. _prune() was a documented no-op: max_snapshots only limited the
   /rollback listing. Loose objects accumulated forever.
3. Defaults: enabled=True, auto_prune=False — users paid the disk cost
   without ever asking for /rollback.

Field report on a single workstation: 847 MB across 47 shadow repos,
mostly redundant clones of the hermes-agent source tree.

Changes
-------
- tools/checkpoint_manager.py: full rewrite. Single bare store, per-project
  refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>),
  per-project metadata (store/projects/<hash>.json with workdir +
  created_at + last_touch). On first v2 init, any pre-v2 per-directory
  shadow repos are auto-migrated into legacy-<timestamp>/ so the new
  store starts clean. _prune() now actually rewrites the per-project ref
  to the last max_snapshots commits and runs git gc --prune=now. New
  _enforce_size_cap() drops oldest commits round-robin across projects
  when the store exceeds max_total_size_mb. _drop_oversize_from_index()
  filters any single file larger than max_file_size_mb out of the snapshot.
- hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI
  (status / list / prune / clear / clear-legacy) for managing the store
  outside a session.
- hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20,
  auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10.
  Tightened DEFAULT_EXCLUDES (added target/, *.so/*.dylib/*.dll,
  *.mp4/*.mov, *.zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.).
- run_agent.py / cli.py / gateway/run.py: thread the new kwargs through
  AIAgent and the startup auto_prune hooks.
- Tests rewritten to match v2 storage while keeping backwards-compat
  coverage for the pre-v2 prune path (per-directory shadow repos under
  base/ are still swept correctly for anyone mid-migration).
- Docs updated: user-guide/checkpoints-and-rollback.md explains the
  shared store, new defaults, migration, and the new CLI;
  reference/cli-commands.md documents 'hermes checkpoints'.

E2E validated
-------------
- Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/.
- Object dedup: two projects with an identical shared.py blob resolve to
  7 total objects in the store (v1 would have stored the blob twice).
- max_snapshots=3 actually enforced: after 6 commits, list shows 3.
- Orphan prune: deleting a project's workdir + 'hermes checkpoints prune
  --retention-days 0' removes its ref, index, and metadata; GC reclaims
  the objects.
- max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the
  tracked source code files.
- hermes checkpoints {status,prune,clear,clear-legacy} all work from the
  CLI without an agent running.

Breaking / migration
--------------------
No in-place data migration — legacy per-directory shadow repos are moved
into legacy-<timestamp>/ on first run. Old /rollback history is still
accessible by inspecting the archive with git; run
'hermes checkpoints clear-legacy' to reclaim the space when ready. Users
relying on /rollback must now set checkpoints.enabled=true (or pass
--checkpoints) explicitly.
2026-05-06 05:44:35 -07:00
helix4u
76074d9ee6 fix(cli): recover classic CLI output after resize 2026-05-06 04:20:54 -07:00
adybag14-cyber
e45df2e81e fix(ui): reduce status-line jitter while scrolling 2026-05-06 04:02:09 -07:00
adybag14-cyber
043a118d41 fix: harden install.sh against inherited Python env leakage 2026-05-06 04:02:02 -07:00
Teknium
e70e49016f
fix(cli): guard logger.debug in signal handler (#13710 regression) (#20673)
CPython's logging module is not reentrant-safe.  `Logger.isEnabledFor`
caches level results in `Logger._cache`; under shutdown races the cache
can be cleared (`Logger._clear_cache`, triggered by logging config changes
from another thread) or mid-mutation when a signal fires, raising
`KeyError: <level_int>` (e.g. `KeyError: 10` for DEBUG) inside the signal
handler.

When that happens, the KeyError escapes before the `raise KeyboardInterrupt()`
on the next line can fire, which bypasses prompt_toolkit's normal interrupt
unwind and surfaces as the EIO cascade originally reported in #13710.

Issue #13710 shipped two defenses (asyncio exception handler + outer
`except (KeyError, OSError)` with EIO suppression) that cover the EIO
unwind path.  This patch closes the remaining escape hatch: the
`logger.debug` call at the top of `_signal_handler` itself.  Wrap it in a
bare `try/except Exception: pass` so logging can never raise through a
signal handler.

Observed in the wild: debug report on 0.12.0 (commit 8163d371) shows the
exact stack — KeyError: 10 at logging/__init__.py:1742 inside the
signal handler's `logger.debug`, followed by the EIO cascade from
prompt_toolkit's emergency flush.

Tests: adds `TestSignalHandlerLoggingRace` to
`tests/hermes_cli/test_suppress_eio_on_interrupt.py` with 6 new cases:
- normal path still raises KeyboardInterrupt
- KeyError(10) from logger.debug does not escape
- any Exception from logger.debug is swallowed
- agent.interrupt still fires when logger.debug raises
- agent.interrupt raising also does not escape
- BaseException (SystemExit) is NOT swallowed — guard uses `except Exception`
  deliberately so real shutdown signals still propagate

Closes #13710 regression.
2026-05-06 03:55:47 -07:00
helix4u
466f3a11de fix(gateway): preserve model picker current context 2026-05-06 03:50:59 -07:00
Kshitij Kapoor
629d8b843d fix(browser): tighten Lightpanda fallback edge cases 2026-05-06 03:41:21 -07:00
Kshitij Kapoor
3ebdd26449 fix(browser): surface Lightpanda Chrome fallback warnings 2026-05-06 03:23:19 -07:00
kshitijk4poor
395dbcc873 feat(browser): add Lightpanda engine support with automatic Chrome fallback
Add Lightpanda as an optional browser engine for local mode.
Lightpanda is a headless browser built from scratch in Zig -- faster
navigation than Chrome with significantly less memory.

One config line to enable:
  browser:
    engine: lightpanda

New functions in browser_tool.py:
- _get_browser_engine() -- config/env reader with validation + caching
- _should_inject_engine() -- only inject in local non-cloud mode
- _needs_lightpanda_fallback() -- detect empty/failed LP results
- _chrome_fallback_screenshot() -- temporary Chrome session for screenshots
- Engine injection in _run_browser_command (--engine flag)
- browser_vision pre-routes screenshots to Chrome when engine=lightpanda

Config:
- browser.engine in DEFAULT_CONFIG (auto/lightpanda/chrome)
- AGENT_BROWSER_ENGINE in OPTIONAL_ENV_VARS
- /browser status shows engine info in local mode

Rebased from PR #7144 onto current main. All existing code preserved --
pure additions only (+520/-2).

25 new tests + 81 total browser tests pass (0 failures).
2026-05-06 03:23:19 -07:00
etherman-os
39f451f5ad fix: add Turkish locale references in config, tests, and docs
- hermes_cli/config.py: add tr to supported languages comment
- locales/en.yaml: add tr to locale file list comment
- tests/agent/test_i18n.py: add Turkish alias tests + explicit lang test
- website/docs/user-guide/configuration.md: add tr to supported values
2026-05-05 17:29:12 -07:00
LeonSGP43
a49670c21b fix(kanban): wire dependency selects 2026-05-05 17:26:15 -07:00
Brecht-H
3f97297413 feat(kanban): surface task_runs.summary on dashboard cards + `kanban show`
The kanban-worker skill (built into the gateway dispatcher's spawn
prompt) instructs every worker to hand off via
``kanban_complete(summary=..., metadata=...)``. That writes the summary
onto the closing ``task_runs`` row, NOT onto ``tasks.result`` — the
latter is left NULL unless the caller passes ``result=`` explicitly.

Result: a glance at the dashboard or ``hermes kanban show <id>`` shows
a blank "Result:" section even when the worker did real work, which
on 2026-05-05 caused a Mac false-alarm ("Hermes did nothing") on a
task that had a 10-line completion summary on its run.

This patch surfaces the latest non-null run summary as
``latest_summary`` so the worker's actual handoff lands in front of
operators.

* New helpers ``kanban_db.latest_summary(conn, task_id)`` and
  ``kanban_db.latest_summaries(conn, task_ids)``. The batch variant
  uses a single window-function SELECT so the dashboard board endpoint
  doesn't pay an N+1 cost on multi-hundred-task boards.
* CLI ``hermes kanban show <id>`` prints a "Latest summary:" block
  when ``tasks.result`` is empty but a run has produced a summary
  (the existing "Result:" section still wins when populated, so the
  back-compat path for hand-edited results is untouched). JSON output
  gains a top-level ``latest_summary`` field.
* Dashboard ``/board`` and ``/tasks/{id}`` now include a
  ``latest_summary`` field on every task. Cards on /board carry a
  200-character preview (cheap to render, plenty for "what did this
  worker do?" at a glance); the drawer/detail endpoint returns the
  full summary.
* Five new tests cover: empty-runs case, post-complete surface,
  newest-of-multiple selection, empty-string skip, batch with
  missing tasks + empty input.

Smoke-tested locally against the live profile DB on the three
acceptance-criterion targets (t_f08fef91 cron-hygiene-audit,
t_007b7f1c EMA-analysis, t_05746fa4 self-assessment) — all three now
return their populated summaries via both ``latest_summary`` and
``latest_summaries``.

Test plan: 255/255 kanban tests pass + 91/91 dashboard plugin tests
pass. No regression on tasks where ``tasks.result`` is explicitly
populated (the existing "Result:" branch is preserved).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 17:26:15 -07:00
daixin1204
d2c6eceed9 fix(kanban): prevent child task dispatch when parent is not done
Add parent dependency guard to _set_status_direct so dragging
a task to the ready column is rejected (409) when its parents
are not all done. Previously the guard only existed in
recompute_ready, allowing direct status writes via the
dashboard API to bypass the dependency engine.

Root cause: after reclaiming stale workers, both T3 and T4
were set to ready via dashboard status writes in quick
succession, causing the writer to be spawned while the analyst
was blocked — upstream work wasn't done yet.
2026-05-05 17:26:15 -07:00
Teknium
8a1a42d098 test(kanban): backdate task_runs.started_at alongside tasks.started_at
After #19473 landed (enforce_max_runtime reads from task_runs.started_at
rather than tasks.started_at), a regression test added earlier still
only backdated the tasks column. Backdate both so the test is robust
regardless of which column the enforcer reads from.
2026-05-05 17:26:15 -07:00
澪 / Mio
b28ab4fc3f fix(kanban): measure max runtime from current run 2026-05-05 17:26:15 -07:00
LeonSGP43
6d302b340e fix(kanban): accept created_cards linked as child of completing task
Widens _verify_created_cards to also accept ids that are children of the
completing task in task_links. Previously we only accepted cards where
created_by matched the completing task's assignee, which was too strict
for legitimate orchestrator flows: a specifier creates a card (so
created_by=specifier, not worker), then a worker picks it up and passes
parents=[current_task] to kanban_create. The explicit link proves the
relationship and should be trusted.

Salvaged from #20022 @LeonSGP43 (full PR superseded by #20232 +
this patch; the linked-children relaxation was the portable
improvement).
2026-05-05 17:26:15 -07:00
suncokret12
eda326df16 fix(doctor): report Kanban worker tools as runtime-gated 2026-05-05 17:26:15 -07:00
Teknium
f0b95cc93d test(arcee): cover Trinity Large Thinking temperature + compression overrides
Salvage follow-up for PR #20344:
- AUTHOR_MAP entry for rob-maron (required by CI)
- 17 parametrized tests covering _is_arcee_trinity_thinking,
  _fixed_temperature_for_model Trinity override, and
  _compression_threshold_for_model, including sibling-model negatives
  (trinity-large-preview, trinity-mini) and the OpenRouter slug form.
2026-05-05 17:23:45 -07:00
Oleksii Lisikh
c4b287ba53 feat(i18n): add Ukrainian locale 2026-05-05 17:21:59 -07:00
Nicolò Boschi
3082fa0829 feat(hindsight): probe API for update_mode='append' support, dedupe across processes
Mirrors the pattern already shipping in hindsight-integrations/openclaw:
probe `<api_url>/version` once per process, gate on Hindsight ≥ 0.5.0.
When supported, retains use a stable session-scoped `document_id`
(`session_id`) plus `update_mode='append'` so cross-process retains for
the same session merge into one document instead of producing
N-different-process-stamped duplicates. When unsupported (or probe
fails), fall back to the existing per-process unique
`f"{session_id}-{start_ts}"` document_id with no `update_mode` — the
resume-overwrite fix (#6654) keeps working unchanged on legacy servers.

Closes the dedup half of #20115. The proposed `document_id_strategy`
config knob isn't needed: auto-detection via the same /version probe
the OpenClaw plugin already uses gives the same outcome with no extra
config burden, and the choice is purely a function of what the server
can do.

Plumbing
--------
- Module-level helpers (`_meets_minimum_version`, `_fetch_hindsight_api_version`,
  `_check_api_supports_update_mode_append`) cache the result per api_url
  so every provider in the process gets one /version round-trip.
- One-time WARN logged when the API is older than 0.5.0, telling the
  user to upgrade for cross-session deduplication.
- New instance helper `_resolve_retain_target(fallback_doc_id)` returns
  `(document_id, update_mode)` based on cached capability. Wired into
  `sync_turn` and the `on_session_switch` flush path.
- For local_embedded mode, the probe URL is taken from the running
  client (`client.url`) so we hit the actual daemon port rather than
  the configured default.
- `update_mode` is set on the per-item dict; `aretain_batch` already
  threads `item['update_mode']` into the API call.

Tests
-----
- `TestUpdateModeAppendCapability` (5 cases): legacy fallback, modern
  stable+append, per-url cache, one-time warn, flush-on-switch resolves
  against the OLD session.
- Existing `_make_hindsight_provider` factory in the manager-side test
  file extended to seed `_mode`/`_api_url`/`_api_key`/`_client` and stub
  `_resolve_retain_target` so the bypass-init pattern keeps working.

E2E verified against installed `~/.hermes/hermes-agent`:
- Legacy probe (unreachable host) → `legacy-session-<ts>` doc_id,
  no `update_mode`.
- Modern probe (live local_embedded 0.5.6 daemon) → stable
  `modern-session` doc_id + `update_mode='append'`.
- `test_hermes_embedded_smoke.py` passes (90s).
2026-05-05 15:09:59 -07:00
misery-hl
56b4795115 guard kanban worker lifecycle by run id 2026-05-05 15:09:28 -07:00
0xVox
0b9cbc8b23 test(kanban): cover metadata handoff round-trip 2026-05-05 15:09:28 -07:00
Teknium
1fc8733a69
fix(kanban): unify failure counter across spawn/timeout/crash outcomes (#20410)
The dispatcher's circuit breaker only protected against spawn-side
failures (profile missing, workspace mount error, exec failure).
Workers that successfully spawned but then timed out or crashed
re-queued to ``ready`` with no counter increment, so the next tick
re-spawned them — loops forever until someone noticed. Reported
externally on Twitter (Forbidden Seeds) and confirmed by walking the
kernel: ``enforce_max_runtime`` flipped the task back to ready, emitted
a ``timed_out`` event, and never touched ``spawn_failures``; same for
``detect_crashed_workers``.

Fix: unify the counter across all non-success outcomes.

Schema
------
* ``tasks.spawn_failures`` → ``tasks.consecutive_failures``
* ``tasks.last_spawn_error`` → ``tasks.last_failure_error``
* Migration renames the columns in-place on existing DBs (``ALTER
  TABLE RENAME COLUMN`` — SQLite >= 3.25) so historical counter
  values are preserved. Row mappers fall through to the legacy names
  if both column renames and a migration somehow got out of sync.

Counter lifecycle
-----------------
New helper ``_record_task_failure(conn, task_id, error, *, outcome,
release_claim, end_run, event_payload_extra)`` is the single point
every non-success outcome funnels through:

* ``spawn_failed``  → ``_record_spawn_failure`` (kept as alias)
  calls it with ``release_claim=True, end_run=True`` — transitions
  running→ready, clears claim, closes run.
* ``timed_out`` → ``enforce_max_runtime`` already does the status
  transition + run close + event emission, then calls
  ``_record_task_failure`` with ``release_claim=False, end_run=False``
  just to bump the counter (and trip the breaker if needed).
* ``crashed`` → ``detect_crashed_workers`` same pattern, but the
  counter increment runs after the main write_txn closes (SQLite
  doesn't nest write transactions).

If the counter hits the breaker threshold (``DEFAULT_FAILURE_LIMIT=5``,
same as before), the task transitions to ``blocked`` with a ``gave_up``
event on top of whatever outcome-specific event was already emitted.

Reset semantics changed: the counter now clears only on successful
``complete_task`` (and operator ``reclaim_task`` — an explicit "I've
looked at this, try again with a fresh budget"). Previously
``_clear_spawn_failures`` ran on every successful spawn, which would
have wiped the counter before a timeout could accumulate past threshold
— exactly the loop this fix prevents.

Diagnostics
-----------
* ``_rule_repeated_spawn_failures`` → ``_rule_repeated_failures``. Now
  fires regardless of which outcome is at fault. Classifies the most
  recent failure (spawn_failed / timed_out / crashed) from the run
  history so the title ("Agent timeout x3", "Agent crash x4", "Agent
  spawn x5") and suggested action (``doctor`` for spawn, ``log`` for
  timeout/crash) stay outcome-specific without N duplicate rules.
* ``_rule_repeated_crashes`` kept as a narrower early-warning at
  threshold 2 (vs 3 for the unified rule), but now suppresses itself
  when the unified rule would also fire — avoids double-flagging.
* Diagnostic ``data`` payload now carries
  ``{consecutive_failures, most_recent_outcome, last_error}`` instead
  of spawn-specific keys.

CLI
---
* ``Task.consecutive_failures`` / ``Task.last_failure_error`` are the
  public fields now. Existing callers that referenced the old names
  get migrated (tests updated in this commit).
* Backward-compat: ``DEFAULT_SPAWN_FAILURE_LIMIT``,
  ``_clear_spawn_failures``, ``_record_spawn_failure`` stay as aliases.

Tests
-----
* 6 new kernel tests: timeout increments counter, 3 consecutive
  timeouts trip the breaker (was the reported gap), crash increments
  counter, reclaim clears counter, completion clears counter, spawn
  success does NOT clear counter.
* Diagnostic tests: updated ``repeated_spawn_failures`` cases to use
  the new kind name and add a timeout-loop test.
* Dashboard API test: spawn_failures column update → consecutive_failures.

389/389 kanban-suite tests pass.

Live verification
-----------------
Seeded 4 tasks in an isolated HERMES_HOME: 3 timeouts, 4 crashes,
2-spawn-failed + 2-timed-out, and a task that had prior failures but
completed successfully. Board correctly shows "!! 3 tasks need
attention" (the successful one has no badge because the counter
reset). Drawer for the timeout-loop task renders "Agent timeout x3"
with most_recent_outcome=timed_out and the "Check logs" suggested
action (not the spawn-flavoured "Verify profile"). The successful
task has zero diagnostics.

Closes the Forbidden-Seeds-reported gap.
2026-05-05 13:55:37 -07:00
LeonSGP43
80c579a9dd docs(skills): explain restoring bundled skills 2026-05-05 13:46:20 -07:00
brooklyn!
794f48766c
fix(tui): close slash parity gaps with CLI (#20339)
* fix(tui): close slash parity gaps with CLI

Route unsupported /skills subcommands through slash.exec, support /new <name>
titles, and handle /redraw natively so TUI behavior matches classic CLI. Also
filter gateway-only commands out of the TUI catalog while keeping /status
discoverable.

* fix(tui): run remaining CLI parity paths natively

Forward chat launch flags into the TUI runtime and handle live-session status
and skill reloads in the gateway process so TUI state no longer depends on the
slash worker's stale CLI instance.

* fix(tui): block stale snapshot restores

Prevent snapshot restore from running through the isolated slash worker because
it mutates disk state without refreshing the live TUI agent.

* chore: uptick

* fix(tui): guard async session title updates

Handle failures from the fire-and-forget session.title RPC so title-setting errors do not surface as unhandled promise rejections while preserving session-scoped messaging.
2026-05-05 15:42:39 -05:00
Teknium
9022804d78 feat(providers): make all 33 providers pluggable under plugins/model-providers/
Every provider profile is now a self-contained plugin under
plugins/model-providers/<name>/, mirroring the plugins/platforms/
pattern established for IRC and Teams. The ProviderProfile ABC
stays in providers/; the per-provider profile data moves out.

- plugins/model-providers/<name>/__init__.py calls register_provider()
- plugins/model-providers/<name>/plugin.yaml declares kind: model-provider
- providers/__init__.py._discover_providers() lazily scans bundled plugins
  then $HERMES_HOME/plugins/model-providers/<name>/ (user override path)
- User plugins with the same name override bundled ones (last-writer-wins
  in register_provider)
- Legacy providers/<name>.py layout still supported for back-compat with
  out-of-tree editable installs
- Hermes PluginManager: new kind=model-provider; skipped like memory
  plugins (providers/ discovery owns them); standalone plugins with
  register_provider+ProviderProfile in their __init__.py auto-coerce to
  this kind (same heuristic as memory providers)
- skip_names extended to include 'model-providers' so the general
  PluginManager doesn't double-scan the category
- 4 new tests in tests/providers/test_plugin_discovery.py covering
  bundled discovery, user override, and general-loader isolation
- Docs updated: website/docs/developer-guide/adding-providers.md,
  provider-runtime.md, providers/README.md, plugins/model-providers/README.md

No API break: auth.py / config.py / doctor.py / models.py / runtime_provider.py /
model_metadata.py / auxiliary_client.py / chat_completions.py / run_agent.py
all still consume providers via get_provider_profile() / list_providers() —
they just now see plugin-discovered entries instead of pkgutil-iterated ones.

Third parties can now drop a single directory into
~/.hermes/plugins/model-providers/<name>/ to add or override an inference
provider without touching the repo.
2026-05-05 13:40:01 -07:00
kshitijk4poor
20a4f79ed1 feat: provider modules — ProviderProfile ABC, 33 providers, fetch_models, transport single-path
Introduces providers/ package — single source of truth for every
inference provider. Adding a simple api-key provider now requires one
providers/<name>.py file with zero edits anywhere else.

What this PR ships:
- providers/ package (ProviderProfile ABC + 33 profiles across 4 api_modes)
- ProviderProfile declarative fields: name, api_mode, aliases, display_name,
  env_vars, base_url, models_url, auth_type, fallback_models, hostname,
  default_headers, fixed_temperature, default_max_tokens, default_aux_model
- 4 overridable hooks: prepare_messages, build_extra_body,
  build_api_kwargs_extras, fetch_models
- chat_completions.build_kwargs: profile path via _build_kwargs_from_profile,
  legacy flag path retained for lmstudio/tencent-tokenhub (which have
  session-aware reasoning probing that doesn't map cleanly to hooks yet)
- run_agent.py: profile path for all registered providers; legacy path
  variable scoping fixed (all flags defined before branching)
- Auto-wires: auth.PROVIDER_REGISTRY, models.CANONICAL_PROVIDERS,
  doctor health checks, config.OPTIONAL_ENV_VARS, model_metadata._URL_TO_PROVIDER
- GeminiProfile: thinking_config translation (native + openai-compat nested)
- New tests/providers/ (79 tests covering profile declarations, transport
  parity, hook overrides, e2e kwargs assembly)

Deltas vs original PR (salvaged onto current main):
- Added profiles: alibaba-coding-plan, azure-foundry, minimax-oauth
  (were added to main since original PR)
- Skipped profiles: lmstudio, tencent-tokenhub stay on legacy path (their
  reasoning_effort probing has no clean hook equivalent yet)
- Removed lmstudio alias from custom profile (it's a separate provider now)
- Skipped openrouter/custom from PROVIDER_REGISTRY auto-extension
  (resolve_provider special-cases them; adding breaks runtime resolution)
- runtime_provider: profile.api_mode only as fallback when URL detection
  finds nothing (was breaking minimax /v1 override)
- Preserved main's legacy-path improvements: deepseek reasoning_content
  preserve, gemini Gemma skip, OpenRouter response caching, Anthropic 1M
  beta recovery, etc.
- Kept agent/copilot_acp_client.py in place (rejected PR's relocation —
  main has 7 fixes landed since; relocation would revert them)
- _API_KEY_PROVIDER_AUX_MODELS alias kept for backward compat with existing
  test imports

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #14418
2026-05-05 13:40:01 -07:00
Teknium
f67063ba81
feat(kanban): generic diagnostics engine for task distress signals (#20332)
* feat(kanban): generic diagnostics engine for task distress signals

Replaces the hallucination-specific ``warnings`` / ``RecoverySection``
surface (shipped in PR #20232) with a reusable diagnostic-rule engine
that covers five distress kinds in v1 and can be extended without
touching UI code. The "something's wrong with this task" signal is
no longer limited to phantom card ids.

Closes the follow-up from #20232 discussion.

New module
----------
``hermes_cli/kanban_diagnostics.py`` — stateless, no-side-effect rule
engine. Each rule is a pure function of
``(task, events, runs, now, config) -> list[Diagnostic]``. Registry
is a simple list; adding a new distress kind is one function + one
import, no UI or API changes required.

v1 rule set
-----------
* ``hallucinated_cards`` (error) — folds the existing
  ``completion_blocked_hallucination`` event into the new surface.
* ``prose_phantom_refs`` (warning) — folds
  ``suspected_hallucinated_references``.
* ``repeated_spawn_failures`` (error → critical at 2x threshold) —
  fires when ``tasks.spawn_failures >= 3``; suggests
  ``hermes -p <profile> doctor`` / ``auth``.
* ``repeated_crashes`` (error → critical) — fires after N consecutive
  ``crashed`` run outcomes with no successful completion between;
  suggests ``hermes kanban log <id>``.
* ``stuck_in_blocked`` (warning) — fires after 24h in ``blocked``
  state with no comments / unblock attempts; suggests commenting.

Every diagnostic carries structured ``actions`` (reclaim, reassign,
unblock, cli_hint, comment, open_docs) that render consistently in
both CLI and dashboard. Suggested actions are highlighted; generic
recovery actions (reclaim / reassign) are available on every kind as
fallbacks.

Diagnostics auto-clear when the underlying failure resolves — a
clean ``completed``/``edited`` event drops hallucination diagnostics,
a successful run drops crash diagnostics, a comment drops
stuck-blocked diagnostics. Audit events persist; the badge goes away.

API
---
``plugin_api.py``:
* ``/board`` now attaches ``diagnostics`` (full list) and
  ``warnings`` (compact summary with ``highest_severity``) per task.
* ``/tasks/{id}`` attaches diagnostics so the drawer's Diagnostics
  section auto-opens on flagged tasks.
* NEW ``/diagnostics`` endpoint — fleet-wide listing, filterable by
  severity, sorted critical-first.

CLI
---
* NEW ``hermes kanban diagnostics [--severity X] [--task id]
  [--json]`` — fleet view or single-task view, matches dashboard rule
  output so CLI users see the same picture.
* ``hermes kanban show <id>`` now renders a Diagnostics section near
  the top with severity markers + suggested actions.

Dashboard
---------
* Card badge is severity-coloured (⚠ amber warning, !! orange error,
  !!! red critical) using ``warnings.highest_severity``.
* Attention strip above the toolbar counts EVERY task with active
  diagnostics (not just hallucinations), severity-coloured, lists
  affected tasks with Open buttons when expanded.
* Drawer's old ``RecoverySection`` replaced with generic
  ``DiagnosticsSection`` rendering a card per active diagnostic:
  title + detail + structured data (task-id chips when payload keys
  look like id lists) + action buttons. Reassign profile picker is
  inline per-diagnostic. Clipboard fallback uses ``.catch()`` for
  environments where writeText rejects.
* Three-rung severity palette; amber for warning, orange for error,
  red for critical. Uses CSS variables so theming is straightforward.

Tests
-----
* NEW ``tests/hermes_cli/test_kanban_diagnostics.py`` — 14 unit tests
  covering each rule's positive/negative/threshold paths, severity
  sorting, broken-rule isolation, and sqlite3.Row integration.
* Dashboard plugin tests extended: ``/diagnostics`` endpoint (empty,
  populated, severity-filtered), ``/board`` exposes both diagnostic
  list and compact summary with ``highest_severity``.
* Existing hallucination-specific test (``test_board_surfaces_
  warnings_field_for_hallucinated_completions``) updated to reflect
  the new contract: warning summary keys by diagnostic kind
  (``hallucinated_cards``) not event kind.

379 kanban-suite tests pass (+16 net from this PR).

Live verification
-----------------
Seeded all 5 diagnostic kinds + one clean + one plain-running task
(7 total) into an isolated HERMES_HOME, spun up the dashboard, and
verified:

* Attention strip: shows ``!! 5 tasks need attention`` in the
  error-severity orange; Show expands to a list of 5 rows ordered
  critical > error > warning.
* Card badges: error tasks render ``!!`` orange, warning tasks
  render ``⚠`` amber, clean and plain-running tasks render no badge.
* Each of the 5 rules opens a correctly-coloured, correctly-styled
  diagnostic card in the drawer with its specific suggested action.
* Live reassign from a diagnostic card flipped
  ``broken-ml-worker → alice`` and the drawer refreshed with the
  new assignee + the same diagnostic still firing (correct:
  spawn_failures counter hasn't reset yet).
* CLI ``hermes kanban diagnostics`` prints all 5 in severity order;
  ``--severity error`` narrows to 3; ``kanban show <id>`` includes
  the Diagnostics block at the top with suggested action hint.

Migration note
--------------
The old ``warnings`` shape (``{count, kinds, latest_at}``) is
preserved on the API but ``kinds`` now keys by diagnostic kind
(``hallucinated_cards``) instead of event kind
(``completion_blocked_hallucination``). ``highest_severity`` is a
new required field. The dashboard was the only consumer and has
been updated in the same commit; external API consumers of the
``warnings`` field will need to update their kind-match logic.

* feat(kanban/diagnostics): lead titles with the actual error text

The generic 'Worker crashed N runs in a row' / 'Worker failed to spawn
N times' titles buried the actual cause in the data section. Operators
had to open logs or expand the diagnostic to see WHY the worker is
stuck — rate-limit vs insufficient quota vs bad auth vs context
overflow vs network blip all looked identical at a glance.

New titles:

  Agent crashed 3x: openai: 429 Too Many Requests - rate limit reached
  Agent crashed 3x: anthropic: 402 insufficient_quota - credit balance
  Agent crashed 3x: provider auth error: 401 Unauthorized
  Agent spawn failed 4x: insufficient_quota: You exceeded your current

Detail keeps the full error snippet (capped at 500 chars + ellipsis
for tracebacks). Title takes the first line capped at 160 chars.
Fallback title if no error recorded stays honest ('no error recorded').

Tests: 4 new cases covering 429/billing/spawn/truncation. 383 total
pass (+4).

Live-verified on dashboard with 6 seeded scenarios
(rate-limit, billing, auth, context, network, spawn-billing) —
each card title leads with the actionable error text.
2026-05-05 13:32:42 -07:00
Teknium
d5357f816d refactor(telegram): make typing thread-id resolver symmetric with send
Mirror _message_thread_id_for_typing() with _message_thread_id_for_send():
both now map the General forum topic (thread id "1") to None upfront.

That removes the need for the retry-without-thread fallback in send_typing()
entirely — if _message_thread_id_for_typing() returns a non-None value, it's
a real user-created topic and falling back to the root chat is never correct.
If Telegram rejects the typing action (e.g. topic deleted mid-session), we
swallow it at debug level instead of bleeding the indicator into All Messages.

Updates the General-topic typing regression test to assert the new single-call
contract.
2026-05-05 13:28:08 -07:00
helix4u
41545f7ec5 fix(telegram): keep DM topic typing scoped 2026-05-05 13:28:08 -07:00
Siddharth Balyan
3b750715a3
fix: resolve lazy session creation regressions (#18370 fallout) (#20363)
Fix three regressions introduced by PR #18370 (lazy session creation):

1. _finalize_session() uses stale session_key after compression (#20001)
2. session_key not synced after auto-compression in run_conversation (#20001)
3. pending_title ValueError leaves title wedged forever (#19029)
4. Gateway silently swallows null responses when agent did work (#18765)
5. One-time cleanup for accumulated ghost compression continuations (#20001)

Changes:
- tui_gateway/server.py: _finalize_session() now uses agent.session_id
  (falls back to session_key when agent is None). Refactor
  _sync_session_key_after_compress() with clear_pending_title and
  restart_slash_worker policy flags. Call it post-run_conversation()
  to sync session_key after auto-compression. Add ValueError handler
  to pending_title flush.
- gateway/run.py: Extract _normalize_empty_agent_response() helper that
  consolidates failed/partial/null response handling. Surfaces user-facing
  error when agent did work (api_calls > 0) but returned no text.
- hermes_state.py: Add finalize_orphaned_compression_sessions() — marks
  ghost continuation sessions as ended (non-destructive, preserves data).
- cli.py: One-time startup migration for orphaned compression sessions.

Test changes:
- tests/test_tui_gateway_server.py: Update pending_title ValueError test
  for post-#18370 architecture (title applied post-message, not at create).
- tests/test_lazy_session_regressions.py: 14 new regression tests covering
  all fixed paths.
2026-05-06 01:11:49 +05:30
Traemond Anderson
60235dba5e feat(cli): add list_picker_providers for credential-filtered picker
The Telegram/Discord /model pickers currently call
list_authenticated_providers(), which returns every provider whose
credentials resolve locally and every model in its curated snapshot.
Two failure modes fall out:

- OpenRouter rows can include IDs the live catalog no longer carries.
- Provider rows can surface with zero callable models (e.g. a slug
  whose credential pool entry exists but has nothing behind it).

list_picker_providers() wraps the base function and post-processes the
result so the interactive picker only shows models the user can
actually select:

- OpenRouter's models come from fetch_openrouter_models() (live-catalog
  filtered against the curated OPENROUTER_MODELS snapshot).
- Rows with an empty models list are dropped, except custom endpoints
  (is_user_defined=True with an api_url) where the user may enter
  model ids manually.
- All other fields pass through unchanged.

The gateway /model handler switches to the new helper for the
interactive picker payload only. Typed /model <name> and the text
fallback list stay on list_authenticated_providers() so nothing is
hidden from power users or platforms without a picker.

Covered by nine focused unit tests in
tests/hermes_cli/test_list_picker_providers.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 10:18:58 -07:00
Aslaaen
e8e9147377 fix(acp): preserve assistant reasoning metadata in session persistence 2026-05-05 10:18:28 -07:00
Zeejay
f8ba265340 fix(aux): trigger fallback on 429 rate-limit errors in auxiliary client
When a provider returns a 429 rate-limit error (not billing-related),
the auxiliary client's call_llm/async_call_llm previously did NOT trigger
the fallback chain. This caused auxiliary tasks like session_search to
exhaust all 3 retries against the same rate-limited endpoint, losing
session metadata that depended on the summarization completing.

Root cause: `_is_payment_error()` only matched 429s containing billing
keywords ("credits", "insufficient funds", etc.). Provider-specific
rate-limit messages like Nous's "Hold up for a bit, you've exceeded the
rate limit on your API key" didn't match, so `_is_payment_error` returned
False, `_is_connection_error` returned False, and `should_fallback` was
False — all retries hit the same rate-limited provider.

Fix:
- New `_is_rate_limit_error()` function that detects 429 + rate-limit
  keywords, generic 429 without billing keywords, and OpenAI SDK
  `RateLimitError` class instances (which may omit .status_code).
- Updated `should_fallback` in both `call_llm` and `async_call_llm` to
  include `_is_rate_limit_error`.
- Updated the max_tokens retry path to also check for rate-limit errors.
- Updated the reason string to include "rate limit".

This complements the Nous rate guard (PR #10568) which prevents new calls
to Nous when already rate-limited — this fix handles the case where a
request is already in flight when the 429 arrives.

Related: #8023, #12554, #11034
Co-authored-by: Zeejay <zjtan1@gmail.com>
2026-05-05 10:15:57 -07:00
LeonSGP43
244bacd0dc fix(skills): support category-qualified local skill names 2026-05-05 10:15:31 -07:00
Es1la
a877c3f6d9 fix(feishu): tolerate malformed dedup timestamps
Salvages @Es1la's PR #13632 — a non-numeric timestamp in the persisted
feishu dedup state crashed adapter startup with ValueError/TypeError
from the unguarded float() call. Wrap the float() conversion in
try/except; skip the bad key and keep loading the rest.

The original PR also restructured existing TestDedupTTL tests to use
tempfile.TemporaryDirectory + HERMES_HOME patching — that was
test-hygiene scope creep unrelated to the bug. Kept only the
malformed-timestamp fix and added a focused regression test.
2026-05-05 10:15:09 -07:00
Justin Kausel
526742199b Prefer fallback for Gemini CloudCode rate limits 2026-05-05 10:14:48 -07:00
Wysie
0120d8f31e fix: merge plugin tools into builtin toolsets 2026-05-05 10:14:17 -07:00
hharry11
247c9d468c fix(gateway): ensure deterministic thread eviction in helpers 2026-05-05 10:13:55 -07:00
Jonathan Troyer
6430d67569 fix(openrouter): use canonical X-Title attribution header
OpenRouter's dashboard attributes usage via the `X-Title` header.
Hermes was sending `X-OpenRouter-Title`, which OpenRouter does not
recognize, so Hermes usage showed up unlabeled. Rename to `X-Title`
to match the canonical header (already used elsewhere in the same
file via _AI_GATEWAY_HEADERS).

Salvages the core fix from @JTroyerOvermatch's PR #13649. Dropped the
PR's `HERMES_OPENROUTER_TITLE` / `HERMES_OPENROUTER_REFERER` env-var
override plumbing per the '.env is for secrets only' policy — if
per-deployment attribution is needed later it should go under
`openrouter.title` / `openrouter.referer` in config.yaml instead.
2026-05-05 10:13:34 -07:00
Teknium
e4e0090b54 test(acp): regression for #13675 — save_session preserves existing messages on encode failure 2026-05-05 10:05:23 -07:00
Teknium
b014a3d315 test(cron): update _isolate_tick_lock fixture for _get_lock_paths
After PR #13725 replaced the module-level _LOCK_DIR/_LOCK_FILE constants
with a dynamic _get_lock_paths() helper, the xdist-isolation fixture
needs to patch the function instead of the removed constants.
2026-05-05 09:57:06 -07:00
Teknium
de9238d37e
feat(kanban): hallucination gate + recovery UX for worker-created-card claims (#20232)
Workers completing a kanban task can now claim the ids of cards they
created via an optional ``created_cards`` field on ``kanban_complete``.
The kernel verifies each id exists and was created by the completing
worker's profile; any phantom id blocks the completion with a
``HallucinatedCardsError`` and records a
``completion_blocked_hallucination`` event on the task so the rejected
attempt is auditable. Successful completions also get a non-blocking
prose-scan pass over their ``summary`` + ``result`` that emits a
``suspected_hallucinated_references`` event for any ``t_<hex>``
reference that doesn't resolve.

Closes #20017.

Recovery UX (kernel + CLI + dashboard)
--------------------------------------

A structural gate alone isn't enough — operators also need to see and
act on stuck workers, especially when a profile's model is the root
cause. This PR ships the full loop:

* ``kanban_db.reclaim_task(task_id)`` — operator-driven reclaim that
  releases an active worker claim immediately (unlike
  ``release_stale_claims`` which only acts after claim_expires has
  passed). Emits a ``reclaimed`` event with ``manual: True`` payload.
* ``kanban_db.reassign_task(task_id, profile, reclaim_first=...)`` —
  switch a task to a different profile, optionally reclaiming a stuck
  running worker in the same call.
* ``hermes kanban reclaim <id> [--reason ...]`` and
  ``hermes kanban reassign <id> <profile> [--reclaim] [--reason ...]``
  CLI subcommands wired through to the same helpers.
* ``POST /api/plugins/kanban/tasks/{id}/reclaim`` and
  ``POST /api/plugins/kanban/tasks/{id}/reassign`` endpoints on the
  dashboard plugin.

Dashboard surfacing
-------------------

* ⚠ **warning badge** on cards with active hallucination events.
* **attention strip** at the top of the board listing all flagged
  tasks; dismissible per session.
* **events callout** in the task drawer — hallucination events render
  with a red left border, amber icon, and phantom ids as styled chips.
* **recovery section** in the task drawer with three actions: Reclaim,
  Reassign (with profile picker + reclaim-first checkbox), and a
  copy-to-clipboard hint for ``hermes -p <profile> model`` since
  profile config lives on disk and can't be edited from the browser.
  Auto-opens when the task has warnings, collapsed otherwise.
  Keyed by task id so state doesn't leak between drawers.

Active-vs-stale rule: warnings clear when a clean ``completed`` or
``edited`` event supersedes the hallucination, so recovery is never
permanently stigmatising — the audit events persist for debugging but
the badge goes away once the worker succeeds.

Skill updates
-------------

* ``skills/devops/kanban-worker/SKILL.md`` documents the
  ``created_cards`` contract with good/bad examples.
* ``skills/devops/kanban-orchestrator/SKILL.md`` gains a "Recovering
  stuck workers" section with the three actions and when to use each.

Tests
-----

* Kernel gate: verified-cards manifest, phantom rejection + audit
  event, cross-worker rejection, prose scan positive + negative.
* Recovery helpers: reclaim on running task, reclaim on non-running
  returns False, reassign refuses running without reclaim_first,
  reassign with reclaim_first succeeds on running.
* API endpoints: warnings field present on /board and /tasks/:id,
  warnings cleared after clean completion, reclaim 200 + 409 paths,
  reassign 200 + 409 + reclaim_first paths.
* CLI smoke: reclaim + reassign subcommands.

Live-verified end-to-end on a dashboard with seeded scenarios:
attention strip renders, badges land on the right cards, drawer
callout shows phantom chips, Reclaim on a running task flips status to
ready + emits manual reclaimed event + refreshes the drawer,
Reassign swaps the assignee and triggers board refresh.

359/359 kanban-suite tests pass
(test_kanban_{db,cli,boards,core_functionality} + dashboard + tools).
2026-05-05 08:06:55 -07:00
Teknium
7de3c86c5a
feat(i18n): add display.language for static message translation (zh/ja/de/es) (#20231)
* revert(gateway): remove stale-code self-check and auto-restart

Removes the _detect_stale_code / _trigger_stale_code_restart mechanism
introduced in #17648 and iterated in #19740. On every incoming message
the gateway compared the boot-time git HEAD SHA to the current SHA on
disk, and if they differed it would reply with

    Gateway code was updated in the background --
    restarting this gateway so your next message runs
    on the new code. Please retry in a moment.

and then kick off a graceful restart. This is unwanted behaviour:
users who run a long-lived gateway and do their own ad-hoc git
operations on the checkout end up with their chat interrupted and
the current message dropped every time HEAD moves, with no way to
opt out.

If an operator really needs the old protection against stale
sys.modules after "hermes update", the SIGKILL-survivor sweep in
hermes update (hermes_cli/main.py, also tagged #17648) already
handles the supervisor-respawn case on its own.

Removed:
  gateway/run.py:
    - _STALE_CODE_SENTINELS, _GIT_SHA_CACHE_TTL_SECS
    - _read_git_head_sha(), _compute_repo_mtime() module helpers
    - class-level _boot_wall_time / _boot_repo_mtime / _boot_git_sha /
      _stale_code_restart_triggered defaults
    - __init__ boot-snapshot block (_boot_*, _cached_current_sha*,
      _repo_root_for_staleness, _stale_code_notified)
    - _current_git_sha_cached(), _detect_stale_code(),
      _trigger_stale_code_restart() methods
    - stale-code check + user-facing restart notice at the top of
      _handle_message()
  tests/gateway/test_stale_code_self_check.py (deleted, 412 lines)

No new logic added. Zero remaining references to any removed
symbol. Gateway test suite passes the same 4589 tests it passed
before; the 3 pre-existing unrelated failures (discord free-channel,
feishu bot admission, teams typing) are unchanged by this commit.

* feat(i18n): add display.language for static message translation (zh/ja/de/es)

Adds a thin-slice i18n layer covering the highest-impact static user-facing
messages: the CLI dangerous-command approval prompt and a handful of gateway
slash-command replies (restart-drain, goal cleared, approval expired, config
read/save errors).

Out of scope (stays English): agent responses, log lines, tool outputs,
slash-command descriptions, error tracebacks.

Infrastructure:
- agent/i18n.py: catalog loader, t() helper, language resolution
  (HERMES_LANGUAGE env var > display.language config > en)
- locales/{en,zh,ja,de,es}.yaml: ~19 translated strings per language
- display.language in DEFAULT_CONFIG (hermes_cli/config.py)

Tests:
- tests/agent/test_i18n.py: 21 tests covering catalog parity, placeholder
  parity across locales, fallback behavior, env-var override, alias
  normalization, missing-key graceful degradation.

Docs:
- website/docs/user-guide/configuration.md: display.language entry plus a
  short section explaining scope so users don't expect agent responses to
  translate via this knob.
2026-05-05 08:03:07 -07:00
MaHaoHao-ch
02147cc850 fix(cli): sanitize bracketed paste markers during setup
Strip bracketed-paste control sequences from setup prompt input so pasted API keys work on Linux and WSL terminals, and add regression tests for normal/password prompts.

Closes #16491
2026-05-05 06:12:42 -07:00
Teknium
285c208cf7 fix(gateway): also tolerate malformed env vars in custom human-delay mode
Widens @Krionex's PR #16933 fix to cover the second bug class at the sibling
site. natural mode used to pass env values through int() before the PR
caught mis-typed values crashing the gateway; custom mode had the exact
same bug one branch away (HERMES_HUMAN_DELAY_MIN_MS=oops in custom mode
still crashed). Same try/except/fallback pattern, scoped to the two
int() calls that feed random.uniform().
2026-05-05 06:11:38 -07:00
Krionex
3b16c590e0 fix(gateway): ignore malformed custom delay env vars in natural mode 2026-05-05 06:11:38 -07:00
novax635
4e6f51167d fix(cli): fall back on invalid HERMES_MAX_ITERATIONS 2026-05-05 06:11:03 -07:00
Leon
19eebf6e0d fix(openrouter): treat xiaomi models as reasoning-capable 2026-05-05 06:07:44 -07:00
JC的AI分身
80b386a472 fix(feishu): refresh bot identity during hydration 2026-05-05 06:04:20 -07:00
Teknium
314361733f test(api_server): _run_agent result now carries session_id for #16938 2026-05-05 06:01:03 -07:00
briandevans
c1a2710a32 test(aux): cover effort: 0 fallback in Codex reasoning translation
Copilot review on PR #17012 noted the docstring/comment lists `0`
among the falsy effort values that fall back to `medium`, but the
existing regression tests only cover `None` and `""`. Add the third
case to lock in the full contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:47:50 -07:00
briandevans
9e893d16d1 fix(aux): default Codex reasoning effort to medium when extra_body.reasoning.effort is falsy
auxiliary.<task>.extra_body.reasoning, but the new translation path in
_CodexCompletionsAdapter.create() reads the effort with
``reasoning_cfg.get("effort", "medium")``.  That returns the configured
value verbatim when the key is present, so ``effort: null`` /
``effort: ""`` (both common YAML shapes) flow through as
``{"effort": null, "summary": "auto"}`` and Codex rejects the request
with "Invalid value for parameter ``reasoning.effort``".

agent/transports/codex.py::build_kwargs() — which the new adapter is
documented to mirror — uses a truthy check (``elif
reasoning_config.get("effort"):``) so the same falsy values keep the
"medium" default.  Switch the auxiliary adapter to the same
``or "medium"`` truthy form so identical config produces identical
requests on both paths.

- [x] Two new regression tests cover ``effort: None`` and
  ``effort: ""`` and assert the request goes out as
  ``{"effort": "medium", "summary": "auto"}``.
- [x] Old behaviour fails the new tests (``{'effort': None} !=
  {'effort': 'medium'}``); fixed behaviour passes all 11 tests in the
  ``TestCodexAdapterReasoningTranslation`` class.
- [x] Adjacent suites green: ``tests/agent/test_auxiliary_client.py``
  (108 passed) and ``tests/agent/transports/test_codex_transport.py +
  test_chat_completions.py`` (73 passed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 05:47:50 -07:00
beardthelion
f15b0fbb4f fix: add PLATFORM_HINTS entry for api_server platform
The API server is a documented, first-class messaging platform with its own
gateway adapter, docs pages, and toolset. But it's the only messaging
platform missing from PLATFORM_HINTS in agent/prompt_builder.py.

Without a platform hint, the agent has no context about the API server's
rendering environment and defaults to markdown-heavy document-style outputs
(code fences, bold, bullet points) — which break on the plain-text frontends
most API server consumers wrap (Open WebUI, custom agents, third-party
bridges).

Adds a generic api_server entry that describes the medium (unknown rendering,
assume plain text) without encoding any specific use case. Individual consumers
can layer additional style guidance via ephemeral system prompts.

Before (DeepSeek V4 Pro via API server, no hint):
  **Sendblue bridge** at /opt/sendblue-bridge - **68MB** on disk

After (same prompt, with hint):
  Sendblue bridge at /opt/sendblue-bridge, 68MB on disk

No breaking changes — new dict entry only. Existing API server consumers see
no behavioral change except for models that previously defaulted to markdown
formatting, which now produce cleaner plain-text output.
2026-05-05 05:46:16 -07:00
Teknium
b10e38e392
fix(skills): pin protects against deletion only, not edits (#20220)
Previously, pinning a skill blocked every skill_manage write action
(edit, patch, delete, write_file, remove_file). The 'hard fence'
design conflated two concerns:

  1. Pin as deletion protection — don't let the curator archive
     or the agent delete a stable skill.
  2. Pin as content freeze — don't let the agent rewrite it mid-conversation.

In practice (1) is what users pin for: they want a skill to survive
curator passes. (2) created friction — agents finding a new pitfall
in a pinned skill had to ask the user to unpin, then the agent
patches, then the user re-pins. The dance discouraged skill
maintenance and pinned skills went stale.

This narrows the _pinned_guard to skill_manage(action='delete') only.
Patches, edits, and supporting-file writes go through on pinned
skills so the agent can keep improving them. The curator's own
pinned-skip behavior (agent/curator.py:271 for auto-archive,
line 349 for the LLM review prompt) is unchanged — curator still
never touches pinned skills.

Changes:
- tools/skill_manager_tool.py: remove _pinned_guard calls from
  _edit_skill, _patch_skill, _write_file, _remove_file; keep on
  _delete_skill. Updated _pinned_guard docstring and error message.
- tools/skill_manager_tool.py: updated skill_manage model-facing tool
  description to reflect the new semantic.
- website/docs/user-guide/features/curator.md: updated pinning
  section.
- tests/tools/test_skill_manager_tool.py: flipped refuses-pinned
  tests for edit/patch/write_file/remove_file into allowed-when-pinned;
  kept test_delete_refuses_pinned (strengthened assertion to check the
  'cannot be deleted' wording).

Closes #18354
2026-05-05 05:43:10 -07:00
Teknium
fe8560fc12
feat(api-server): X-Hermes-Session-Key header for long-term memory scoping (#20199)
* feat(api-server): X-Hermes-Session-Key header for long-term memory scoping

API Server integrations (Open WebUI, custom web UIs) can now pass a stable
per-channel identifier via X-Hermes-Session-Key that scopes long-term memory
(Honcho, etc.) independently of the transcript-scoped X-Hermes-Session-Id.
This matches the native gateway's session_key / session_id split: one stable
key per assistant channel, many independent transcripts that rotate on /new.

- _create_agent and _run_agent accept gateway_session_key and pass it to
  AIAgent(gateway_session_key=...), which is already honored by the Honcho
  memory provider (plugins/memory/honcho/client.py resolve_session_name).
- New shared helper _parse_session_key_header applies the same API-key
  gate, control-character sanitization, and a 256-char length cap as the
  existing session-id header.
- All three agent endpoints honor the header: /v1/chat/completions,
  /v1/responses, /v1/runs. JSON and SSE responses echo it back.
- /v1/capabilities advertises session_key_header so clients can
  feature-detect.

Closes #20060.

Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>

* chore: AUTHOR_MAP entry for manateelazycat

---------

Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>
2026-05-05 05:34:47 -07:00
Teknium
436672de0e
feat(curator): add archive and prune subcommands (#20200)
* fix(curator): protect hub skills by frontmatter name

* test(skill_usage): add mark_agent_created to regression test

The cherry-picked test predates #19618/#19621 which rewrote
list_agent_created_skill_names() to require an explicit
created_by: 'agent' provenance marker. Without mark_agent_created(),
my-skill is excluded from the list and the positive assertion fails.

* feat(curator): add archive and prune subcommands

Adds 'hermes curator archive <skill>' and 'hermes curator prune
[--days N] [--yes] [--dry-run]' alongside the existing status, run,
pause, resume, pin, unpin, restore, backup, rollback verbs.

These are the two genuinely new user-facing verbs requested in #19384.
The other verbs proposed there ('stats' and 'restore') already exist
as 'curator status' and 'curator restore', so no duplicate surface is
added — all skill lifecycle commands live under the single 'hermes
curator' namespace.

- archive: manual archive of an agent-created skill. Refuses pinned
  skills with a hint pointing at 'hermes curator unpin'.
- prune: bulk-archive unpinned skills idle for >= N days (default 90).
  Falls back to created_at when last_activity_at is null so never-used
  skills can still be pruned. --dry-run previews, --yes skips prompt.

Adapted from @elmatadorgh's PR #19454 which placed the same verbs
under 'hermes skills' with a separate hermes_cli/skills_config.py
handler and rich table for stats. The 'stats' and 'restore' parts of
that PR duplicated existing surface, so only archive and prune are
kept, rewritten to match hermes_cli/curator.py's existing plain-text
handler style. Tests rewritten from scratch against the new handlers.

Closes #19384

Co-authored-by: elmatadorgh <coktinbaran5@gmail.com>

---------

Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: elmatadorgh <coktinbaran5@gmail.com>
2026-05-05 05:15:54 -07:00
Teknium
9e0ef2a1bc test: pin per-turn reasoning extraction semantics
Covers four scenarios for the reasoning-box extraction loop:
 - simple turn with reasoning
 - simple turn with no reasoning
 - tool-calling turn where reasoning lives on the tool-call step
 - prior turn had reasoning, current turn does not (the stale-display
   bug the fix exists for)
 - tool-calling turn where reasoning lives on BOTH steps (latest wins)
 - empty-string reasoning treated as missing

Also updates the four inline replica loops in tests/cli/test_reasoning_command.py
to match the new turn-boundary shape so the test file reflects
production semantics.
2026-05-05 05:00:05 -07:00
Asher Morse
6b76ea4707 fix(gateway): load reply_to_mode from config.yaml for Discord and Telegram
The YAML-to-env-var bridge in load_gateway_config() mapped every Discord
and Telegram config key (require_mention, auto_thread, reactions, etc.)
except reply_to_mode. Users setting discord.reply_to_mode or
telegram.reply_to_mode in ~/.hermes/config.yaml got no effect — the
adapter only read the env var, which nothing populated from YAML.

Add the missing bridge for both platforms, following the existing pattern.
Top-level <platform>.reply_to_mode preferred, falls back to
<platform>.extra.reply_to_mode, env var never overwritten. Handles YAML
1.1 bare `off` → Python False coercion.

This is a re-submission of the work from #9837 and #13930, which both
implemented the same fix but neither landed (see co-authors below).

Co-authored-by: Matteo De Agazio <hypnosis.mda@gmail.com>
Co-authored-by: ishardo <239075732+ishardo@users.noreply.github.com>
2026-05-05 04:58:23 -07:00
LeonSGP43
354502ee48 fix(kanban): preserve dashboard completion summaries 2026-05-05 04:57:38 -07:00
Teknium
4d0f59fa5a test(skill_usage): add mark_agent_created to regression test
The cherry-picked test predates #19618/#19621 which rewrote
list_agent_created_skill_names() to require an explicit
created_by: 'agent' provenance marker. Without mark_agent_created(),
my-skill is excluded from the list and the positive assertion fails.
2026-05-05 04:55:22 -07:00
LeonSGP43
68c1a08ad1 fix(curator): protect hub skills by frontmatter name 2026-05-05 04:55:22 -07:00
Teknium
5168226d60
feat(file_tools): post-write delta lint on write_file + patch, add JSON/YAML/TOML/Python in-process linters (#20191)
Closes the gap where write_file skipped the post-edit syntax check that
patch already ran, so silent file corruption (bad quote escaping,
truncated writes, etc.) would persist on disk until a later read.

## Changes

tools/file_operations.py:
- Add in-process linters for .py, .json, .yaml, .toml (LINTERS_INPROC).
  Python uses ast.parse, JSON/YAML/TOML use stdlib/PyYAML parsers.
  Zero subprocess overhead; preferred over shell linters when both apply.
- _check_lint() now accepts optional content and routes to in-process
  linter first. Shell linter (py_compile, node --check, tsc, go vet,
  rustfmt) remains the fallback for languages without an in-process
  equivalent.
- New _check_lint_delta() implements the post-first/pre-lazy pattern
  borrowed from Cline and OpenCode: lint post-write state first; only
  if errors are found AND pre-content was captured does it lint the
  pre-state and diff. If the pre-existing file had the SAME errors the
  edit didn't introduce anything new, so the file is reported as 'still
  broken, pre-existing' with success=False but a message explaining the
  errors were pre-existing. If the edit introduced genuinely new errors,
  those are surfaced and pre-existing ones are filtered out.
- WriteResult gains a lint field.
- write_file() captures pre-content for in-process-lintable extensions
  and calls _check_lint_delta after a successful write.
- patch_replace() switches from _check_lint to _check_lint_delta,
  reusing the pre-edit content it already has in scope.

tools/file_tools.py:
- Update write_file schema description to mention the post-write lint.

tests/tools/test_file_operations_edge_cases.py:
- Update existing brace-path tests to use .js (shell linter) now that
  .py is in-process.
- Add TestCheckLintInproc (9 tests) covering Python/JSON/YAML/TOML
  in-process linters.
- Add TestCheckLintDelta (5 tests) covering the post-first/pre-lazy
  short-circuit, new-file path, and the single-error-parser caveat.

## Performance

In-process linters are microseconds per call (ast.parse, json.loads).
The hot path (clean write) runs exactly one lint — matches main's cost
for patch. Pre-state capture is skipped when the file has no applicable
linter. Measured 4.89ms/write average over 100 .py writes including lint.

## Inspiration

- Cline's DiffViewProvider.getNewDiagnosticProblems() — filters pre-write
  diagnostics from post-write diagnostics (src/integrations/editor/DiffViewProvider.ts).
- OpenCode's WriteTool — runs lsp.diagnostics() after write and appends
  errors to tool output (packages/opencode/src/tool/write.ts).
- Claude Code's DiagnosticTrackingService — captures baseline via
  beforeFileEdited() and returns new-diagnostics-only from
  getNewDiagnostics() (src/services/diagnosticTracking.ts).

## Validation

- tests/tools/test_file_operations.py + test_file_operations_edge_cases.py
  + test_file_tools.py + test_file_tools_live.py + test_file_write_safety.py
  + test_write_deny.py + test_patch_parser.py + test_file_ops_cwd_tracking.py:
  228 passed locally.
- Live E2E reproduction of the tips.py corruption incident: broken
  content written; lint field surfaces 'SyntaxError: invalid syntax.
  Perhaps you forgot a comma? (line 6, column 5)' — the exact error
  that would have self-corrected the bug on the next turn.
2026-05-05 04:54:17 -07:00
wmagev
2eef395e1c fix(compaction): mark end of context summary in role=user fallback
When the head ends with assistant/tool and the tail starts with assistant,
the summary is inserted as a standalone role="user" message. The body's
verbatim "## Active Task" quote then gets read as fresh user input by
weak/local models (#11475, #14521).

The merge-into-tail path already appends an explicit end-of-summary marker
for this reason. Mirror it on the standalone path so both insertion routes
give the model the same "summary above, not new input" signal.
2026-05-05 04:51:29 -07:00
LeonSGP43
1a03e3b1c6 fix(kanban): detect darwin zombie workers 2026-05-05 04:43:40 -07:00
0xsir0000
f6b68f0f50 fix(gateway): keep DoH-confirmed Telegram IPs that match system DNS (#14520)
discover_fallback_ips() filtered out any DoH-resolved IP that also appeared
in the system resolver's answer set, on the assumption that the system IP
was unreachable. When DoH and system DNS agreed (a common case), the
function returned the hardcoded _SEED_FALLBACK_IPS list instead — and on
networks where those seed addresses are not routable, the Telegram fallback
transport had nothing usable to retry against and polling failed.

Drop the system_ips exclusion so DoH-confirmed IPs are preserved regardless
of system DNS overlap. The TelegramFallbackTransport already tries the
primary path first via system DNS, then falls through to the IP-rewrite
path on connect failure; including the same IP in both lanes lets a
transient primary failure recover via the explicit IP route instead of
escalating to seed addresses.

Update the two tests that codified the old exclusion to reflect the new,
inclusion-by-default behaviour.

Fixes #14520
2026-05-05 04:42:59 -07:00
revaraver
aacf36e943 fix(cli): persist manual compress handoff 2026-05-05 04:42:48 -07:00
revaraver
4a3e3e20e5 fix(compression): preserve iterative summary continuity 2026-05-05 04:42:44 -07:00
Teknium
f8a6db68ca test(kanban): isolate HERMES_KANBAN_BOARD writes in pin-env tests
The helper under test writes to os.environ directly, bypassing
monkeypatch tracking. Without an explicit snapshot/restore fixture,
the mutation leaks into subsequent tests and breaks TestSharedBoardPaths
(kanban path resolution reads HERMES_KANBAN_BOARD and routes through
boards/<leaked-slug>/ instead of the test's own HERMES_HOME).

Add an autouse fixture that snapshots the env var before the test and
restores (or pops) it after, regardless of what the helper did.
2026-05-05 04:37:47 -07:00
0xDevNinja
b22b3f506a fix(cli): pin HERMES_KANBAN_BOARD at chat boot to stop subprocess board drift
Without an explicit pin, in-process kanban tools and shelled-out
`hermes kanban …` subprocesses resolve the active board on different
paths: the env var when set, otherwise the global `<root>/kanban/current`
file. When a concurrent session toggles the current-board pointer
mid-turn, the same chat ends up routing tool calls to board A while its
shell calls hit board B, surfacing as phantom "no such task" errors.

Pin the resolved board into env once at `cmd_chat` boot when
HERMES_KANBAN_BOARD isn't already set. Mirrors what the dispatcher does
for spawned workers (kanban_db.py:2622-2623). Idempotent and a no-op
when the env is already pinned by the caller.

Closes #20074
2026-05-05 04:37:47 -07:00
Steve Kelly
8c82d0664d fix(kanban): ignore stale current board pointers 2026-05-05 04:34:45 -07:00
Teknium
2a285d5ec2
fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924) (#20184)
* revert(gateway): remove stale-code self-check and auto-restart

Removes the _detect_stale_code / _trigger_stale_code_restart mechanism
introduced in #17648 and iterated in #19740. On every incoming message
the gateway compared the boot-time git HEAD SHA to the current SHA on
disk, and if they differed it would reply with

    Gateway code was updated in the background --
    restarting this gateway so your next message runs
    on the new code. Please retry in a moment.

and then kick off a graceful restart. This is unwanted behaviour:
users who run a long-lived gateway and do their own ad-hoc git
operations on the checkout end up with their chat interrupted and
the current message dropped every time HEAD moves, with no way to
opt out.

If an operator really needs the old protection against stale
sys.modules after "hermes update", the SIGKILL-survivor sweep in
hermes update (hermes_cli/main.py, also tagged #17648) already
handles the supervisor-respawn case on its own.

Removed:
  gateway/run.py:
    - _STALE_CODE_SENTINELS, _GIT_SHA_CACHE_TTL_SECS
    - _read_git_head_sha(), _compute_repo_mtime() module helpers
    - class-level _boot_wall_time / _boot_repo_mtime / _boot_git_sha /
      _stale_code_restart_triggered defaults
    - __init__ boot-snapshot block (_boot_*, _cached_current_sha*,
      _repo_root_for_staleness, _stale_code_notified)
    - _current_git_sha_cached(), _detect_stale_code(),
      _trigger_stale_code_restart() methods
    - stale-code check + user-facing restart notice at the top of
      _handle_message()
  tests/gateway/test_stale_code_self_check.py (deleted, 412 lines)

No new logic added. Zero remaining references to any removed
symbol. Gateway test suite passes the same 4589 tests it passed
before; the 3 pre-existing unrelated failures (discord free-channel,
feishu bot admission, teams typing) are unchanged by this commit.

* fix(agent): stateful streaming scrubber for reasoning-block leaks (#17924)

Per-delta _strip_think_blocks ran at _fire_stream_delta and destroyed
downstream state. When MiniMax-M2.7 / DeepSeek / Qwen3 streamed a tag
split across deltas (delta1='<think>', delta2='Let me check'), the
regex case-2 match erased delta1 entirely, so CLI/gateway state
machines never learned a block was open and leaked delta2 as content.
Raw consumers (ACP, api_server, TTS) had no downstream defense at all.

Replace the per-delta regex with a stateful StreamingThinkScrubber
that survives delta boundaries:
  - Closed <tag>X</tag> pairs always stripped (matches _strip_think_blocks
    case 1).
  - Unterminated open at block boundary enters a block; content
    discarded until close tag arrives.  At end-of-stream, held
    content is dropped.
  - Orphan close tags stripped without boundary gating.
  - Partial tags at delta boundaries held back until resolved.
  - Block-boundary rule (start-of-stream, after \n, or
    whitespace-only since last \n) preserves prose that mentions
    tag names.

Reset at turn start alongside the existing context scrubber; flush at
turn end so a benign '<' held back at end-of-stream reaches the UI.

E2E-verified on live OpenRouter->MiniMax-m2 streams: closed pairs
strip cleanly, first word of post-block content is preserved, pure
content passes through unchanged.  Stefan's screenshot case (#17924)
— 'Let me check' getting chopped to ' me check' — no longer happens.

Final _strip_think_blocks calls on completed strings (final_response,
replay, compression) are preserved; only the streaming per-delta call
site switched to the scrubber.
2026-05-05 04:33:38 -07:00
Chris Danis
28f4d6db63 fix(tool-schemas): reactive strip of pattern/format on llama.cpp grammar 400s
MCP servers commonly emit JSON Schema `pattern` (e.g. `\\d{4}-\\d{2}-\\d{2}`
for date-time params) and `format` keywords. llama.cpp's
`json-schema-to-grammar` converter rejects regex escape classes
(\\d/\\w/\\s) and most format values, returning HTTP 400
"parse: error parsing grammar: unknown escape at \\d" — the whole request
fails.

Cloud providers (OpenAI, Anthropic, OpenRouter, Gemini) accept these
keywords fine and use them as prompting hints. Stripping unconditionally
loses useful hints for every cloud user to fix a llama.cpp-only bug.

Approach: classify the llama.cpp grammar-parse 400 in the error
classifier, and on match do a one-shot in-place strip of pattern/format
from `self.tools`, then retry. Follows the existing
`thinking_signature` recovery pattern. Cloud users hit zero overhead;
llama.cpp users pay one failed request per session.

Changes
- agent/error_classifier.py: new `FailoverReason.llama_cpp_grammar_pattern`
  + narrow HTTP-400 branch matching "error parsing grammar",
  "json-schema-to-grammar", or "unable to generate parser ... template".
- tools/schema_sanitizer.py: new `strip_pattern_and_format()` helper —
  reactive, walks schema nodes, skips property names (search_files.pattern
  survives). Returns strip count for logging.
- run_agent.py: new one-shot recovery block in the retry loop. Strips,
  logs, continues. Falls through to normal retry if nothing to strip.
- tests: 4 classifier tests (3 variants + 1 non-400 negative), 7 strip
  tests including the property-name preservation and idempotency checks.

Co-authored-by: Chris Danis <cdanis@gmail.com>
2026-05-05 04:25:18 -07:00
Interstellar-code
542e06c789 fix: include default profile in kanban assignees 2026-05-05 04:25:05 -07:00
Brecht-H
f25d3ec917 fix(kanban): suppress dispatcher stuck-warn when ready queue holds only non-spawnable assignees
After PR #20105 (dispatcher skips ready tasks whose assignee fails
``profile_exists()`` to prevent the orion-cc/orion-research crash
loop), the gateway and CLI emit a spurious "kanban dispatcher stuck:
ready queue non-empty for N consecutive ticks but 0 workers spawned"
warning every 5 minutes on multi-lane setups where the queue is
steadily full of human-pulled work assigned to terminal lanes.

The warn is intended to catch real failure modes (broken PATH,
missing venv, credential loss for a real Hermes profile). On a
multi-lane host it fires forever even though everything is healthy:
the dispatcher correctly chose not to spawn, and there is nothing
for the operator to fix.

Changes:

* ``DispatchResult`` gains a ``skipped_nonspawnable`` field
  (separate from ``skipped_unassigned``) so callers can distinguish
  "task missing an owner — operator should route it" from "task
  owned by a control-plane lane — terminal will pull it".
* ``dispatch_once`` routes the ``not profile_exists(assignee)`` skip
  into the new bucket (was lumped into ``skipped_unassigned``).
* New helper ``has_spawnable_ready(conn)`` returns True iff at least
  one ready+assigned+unclaimed task in the DB has an assignee that
  maps to a real Hermes profile. Falls back to legacy "any
  ready+assigned" when ``profile_exists`` is unimportable so degraded
  installs still surface the original warn.
* The gateway dispatcher (``gateway/run.py``) and the CLI standalone
  daemon (``hermes_cli/kanban.py``) both swap their cheap
  ``ready_nonempty`` probe to use ``has_spawnable_ready``. Stuck-warn
  now fires only when there is genuine spawnable work the dispatcher
  failed to start.
* CLI dispatch output prints ``Skipped (non-spawnable assignee —
  terminal lane, OK)`` for visibility without alarm.

Tests:

* New ``has_spawnable_ready`` cases (empty queue, terminal-lane
  only, mixed real+terminal).
* New ``test_dispatch_skips_nonspawnable_into_separate_bucket``
  verifies the bucketing change.
* Updated ``test_dispatch_skips_unassigned`` to assert no
  cross-leak.
* Added ``all_assignees_spawnable`` fixture in
  ``tests/hermes_cli/conftest.py`` and threaded it through dispatcher
  tests that use synthetic assignees ("alice", "bob"). PR #20105
  (the parent commit) silently broke 8 such tests by routing those
  assignees into ``skipped_nonspawnable`` instead of spawning; this
  PR repairs them as part of the same code area.

Verified locally: 246/246 kanban-suite tests pass.

Stacks on top of fix/kanban-dispatcher-skip-missing-profile-2026-05-05
(PR #20105). Reviewer: this PR is meant to merge AFTER #20105.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-05 04:13:12 -07:00
Teknium
91ce8fc000 fix(setup): offer Keep/Replace/Clear when API key already exists
hermes setup / hermes model used to silently skip the key prompt when
any value was present in .env — even a malformed paste — leaving users
with a stuck '✓' and no way to recover without hand-editing .env.

Replace the silent acknowledgement at all three API-key provider flows
(Kimi, Stepfun, generic) with a single [K]eep / [R]eplace / [C]lear
menu via a shared `_prompt_api_key` helper.

- K / Enter / Ctrl-C / unknown input → keep (never destroys the key)
- R → getpass for new key; empty input cancels and preserves existing
- C → clears the env var, tells user to rerun hermes setup, aborts flow

LM Studio's no-auth-placeholder substitution stays on first-time entry
only; on Replace an empty input means 'cancel', not 'overwrite with
dummy key'.

11 unit tests cover all branches incl. garbage-input-keeps-key, Ctrl-C
at the choice prompt, Replace-cancel preserving the old key, Clear
wiping only the target env var, and lmstudio placeholder semantics.

Fixes #16394
Reshapes #18355 — original PR pasted the menu inline at 3 sites with
no tests; this consolidates to one helper (+88/-66) with coverage.

Co-authored-by: Feranmi10 <89228157+Feranmi10@users.noreply.github.com>
2026-05-05 04:08:11 -07:00
simbam99
8ad5e98f8d fix(gateway): preserve pending update prompts across restarts 2026-05-05 03:59:39 -07:00
Aamir Jawaid
2333b7a7ec fix(tests): patch TypingActivityInput after mock on Python <3.12
The SDK requires Python >=3.12 so CI (3.11) falls to the except
ImportError branch, leaving TypingActivityInput=None. After loading
the adapter module, explicitly restore it from the mock so
test_send_typing doesn't silently no-op.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 20:59:18 -07:00
Stephen Schoettler
1d938832a7 test(kanban): patch dashboard websocket token stub 2026-05-04 20:50:24 -07:00
Stephen Schoettler
f7918c9349 test(teams): mock ClientOptions in adapter tests 2026-05-04 20:50:24 -07:00
Jeffrey Quesnelle
d12f59aa53
Merge pull request #19866 from NousResearch/fix/clarify-placeholder-credential
clarify placeholder telegram credential in tests
2026-05-04 22:24:52 -04:00
helix4u
b632290166 fix(gateway): handle planned service stops 2026-05-04 16:00:49 -07:00
brooklyn!
20428f5e60
fix(tui): respect voice.record_key config (supersedes #19028, #19339) (#19835)
* fix(tui): respect voice.record_key config instead of hardcoded Ctrl+B

Classic CLI loaded ``voice.record_key`` from config.yaml and bound the
prompt-toolkit handler dynamically (``cli.py`` paths). The new TUI hard-
coded ``Ctrl+B`` everywhere — ``isVoiceToggleKey`` (input handler),
``/voice status`` ("Record key: Ctrl+B"), and ``/voice on`` ("Ctrl+B to
start/stop recording"). A user who set ``voice.record_key: ctrl+o``
(or any other key) saw the documented config silently ignored — only
Ctrl+B worked, the displayed shortcut lied about it.

Wire the configured key end to end through the existing channels:

* **Backend** (``tui_gateway/server.py``): ``voice.toggle`` action=status
  AND action=on/off responses now include ``record_key``, sourced from
  ``config.get('voice', {}).get('record_key', 'ctrl+b')``.
* **Backend types** (``ui-tui/src/gatewayTypes.ts``): ``ConfigFullResponse``
  now exposes ``config.voice.record_key`` and ``VoiceToggleResponse``
  carries ``record_key`` so the TUI can both bind and display it.
* **Frontend parser/formatter** (``ui-tui/src/lib/platform.ts``):
  ``parseVoiceRecordKey()`` accepts ``ctrl+b`` / ``alt+r`` / ``cmd+space``
  and the common aliases (``option``, ``cmd``, ``win``, …); falls back to
  the documented Ctrl+B for empty / multi-character / malformed input so
  a typo never silently disables the shortcut. ``formatVoiceRecordKey()``
  renders for status text. ``isVoiceToggleKey`` now takes a parsed
  ``ParsedVoiceRecordKey`` argument; the hardcoded ``ch === 'b'`` is
  gone. Default arg keeps existing call sites back-compat.
* **Hydration** (``ui-tui/src/app/useConfigSync.ts``,
  ``useMainApp.ts``): startup ``config.get full`` already runs; extract
  ``cfg.voice.record_key`` from it, parse, push into a new
  ``voiceRecordKey`` state, and forward to the input handler ctx
  (``InputHandlerContext.voice.recordKey``). Mtime-poll path also
  re-applies the parsed key so a hand-edit of config.yaml takes effect
  the next tick — matches existing behaviour for display options.
* **Input handler** (``ui-tui/src/app/useInputHandlers.ts``):
  ``isVoiceToggleKey(key, ch, voice.recordKey)`` so the configured
  binding fires.
* **Slash command** (``ui-tui/src/app/slash/commands/session.ts``):
  ``/voice status`` and ``/voice on`` use ``formatVoiceRecordKey`` on
  the response's ``record_key`` instead of the hardcoded label.

Tests:
* ``parseVoiceRecordKey`` covers ctrl/alt/cmd/super aliases, multi-char
  rejection, and empty fallback.
* ``formatVoiceRecordKey`` covers the doc examples (``Ctrl+B``,
  ``Ctrl+O``, ``Alt+R``, ``Cmd+B``).
* ``isVoiceToggleKey`` regression: ``ctrl+o`` configured → only ``o``
  matches, not ``b``; ``alt+r`` matches both alt-bit and meta-bit
  encodings (terminal protocol parity); omitted-arg call still binds
  Ctrl+B for back-compat.

Full TUI suite (555 tests) passes; ``tsc --noEmit`` clean.

Fixes #18994

Co-authored-by: asheriif <ahmedsherif95@gmail.com>

* fix(tui): support named-key tokens in voice.record_key (space, enter, …)

Reviewer caught that the round-1 parser in #18994 rejected every
multi-character token, so a config value like ``ctrl+space`` (which the
CLI happily binds via prompt_toolkit's ``c-space`` rewrite in
``cli.py``) silently fell back to the documented Ctrl+B default —
re-introducing the same false-shortcut bug the PR was meant to fix,
just at a different surface.

Add explicit named-key support that mirrors what the CLI accepts:

* ``space``         (alias: ``spc``)        → matches ``ch === ' '``
* ``enter``         (alias: ``return``, ``ret``) → matches ``key.return``
* ``tab``                                   → matches ``key.tab``
* ``escape``        (alias: ``esc``)        → matches ``key.escape``
* ``backspace``     (alias: ``bs``)         → matches ``key.backspace``
* ``delete``        (alias: ``del``)        → matches ``key.delete``

``ParsedVoiceRecordKey`` gains an optional ``named`` field; ``ch``
holds either a single char (back-compat) or the canonical named token,
and the runtime matcher dispatches on ``named`` before checking the
modifier shape. Aliases collapse to one canonical name so
``ctrl+esc`` and ``ctrl+escape`` behave identically.

Unrecognised multi-character tokens (e.g. ``ctrl+spcae`` typo, or
unsupported keys like ``ctrl+f5``) still fall back to the Ctrl+B
default rather than silently disabling the binding — keeps the "typo
never silently kills the shortcut" guarantee.

Tests:

* ``parseVoiceRecordKey`` parametrised over every named token + each
  alias variant.
* New ``isVoiceToggleKey`` cases for space (ch-based match), enter
  (``key.return``), tab, escape, backspace, delete, including
  modifier-mismatch negatives.
* ``formatVoiceRecordKey`` renders named keys in title case
  (``Ctrl+Space``, ``Ctrl+Enter``).
* Existing fall-back-to-Ctrl+B contract preserved for empty input
  AND unrecognised multi-char tokens.

Full TUI suite: 559/559 pass; ``tsc --noEmit`` clean.

Refs #18994 (round-1 review feedback)

Co-authored-by: asheriif <ahmedsherif95@gmail.com>

* test(tui): assert voice.toggle returns configured record_key

Salvage the backend regression from #19339 — asserts ``voice.toggle``
action=on AND action=status responses carry the configured
``voice.record_key`` end-to-end through ``_load_cfg()``. Keeps the
CLI→TUI parity contract visible in the Python test suite alongside
the existing frontend parser/matcher/formatter coverage from #19028.

* fix(tui): address Copilot review on #19835 voice.record_key wiring

Five tightenings on the parser + matcher + hydration surface, all
caught by the Copilot review on the PR — each one turns a silent
false-fire or display/binding skew into a deterministic behaviour.

* **isVoiceToggleKey ctrl branch was too permissive for named keys.**
  The doc-default macOS Cmd+B muscle-memory fallback
  (``isActionMod(key)`` on top of ``key.ctrl``) fired for every
  configured key, so bare Esc — which hermes-ink reports with
  ``key.meta`` on some macOS terminals — triggered ``ctrl+escape``,
  and Alt+Space / Alt+Tab triggered ``ctrl+space`` / ``ctrl+tab``.
  Gate the fallback to the literal ``ctrl+b`` binding so any custom
  chord requires the real Ctrl bit.
* **Alt branch guarded against Ctrl/Cmd co-press.** Without this,
  Ctrl+Alt+<letter> and Cmd+Alt+<letter> also fired ``alt+<letter>``.
* **Dropped the ``meta`` modifier variant and its alias.** In
  hermes-ink ``key.meta`` is Alt on xterm-style terminals and Cmd on
  legacy macOS ones, so a literal ``meta+b`` config displayed as
  ``Cmd+B`` while matching Alt+B — exactly the kind of false
  shortcut the PR was meant to remove. ``cmd`` / ``command`` now
  collapse onto ``super`` (kitty-style ``key.super``, with a macOS
  ``key.meta`` fallback) and render as ``Cmd+B``. Unknown modifier
  tokens fall back to the documented Ctrl+B default rather than
  silently coercing to Ctrl.
* **Slash-command display/binding skew.** ``/voice status`` and
  ``/voice on`` rendered from the fresh gateway ``record_key``
  response, but ``useInputHandlers()`` still bound the old key
  until the next 5s mtime poll. Thread ``setVoiceRecordKey``
  through ``SlashHandlerContext.voice`` and push the parsed spec
  into frontend state on every response so text and binding stay
  consistent.
* **Test coverage for the two paths Copilot flagged.** Added
  vitest coverage for (a) the three-case ``/voice`` slash output
  in ``createSlashHandler.test.ts`` and (b) the
  ``applyDisplay → voice.record_key`` hydration + omit-setter
  back-compat paths in ``useConfigSync.test.ts``. Plus regression
  cases for every false-fire scenario above.

Suite: 575/575 green, tsc --noEmit clean.

* fix(tui): address Copilot round-2 review on #19835

Three tightenings on the surface introduced in the round-1 fix:

* **``/voice tts`` reset custom bindings to Ctrl+B.** The ``tts`` branch
  of ``voice.toggle`` omitted ``record_key`` from its response, so the
  frontend's ``r.record_key ?? 'ctrl+b'`` coerced a user's custom
  binding back to the default on every TTS toggle. Two-sided fix:
  the backend now includes ``record_key`` on the ``tts`` branch (parity
  with ``status``/``on``/``off``), and the slash handler only pushes
  frontend state when the response actually carries ``record_key`` —
  belt-and-suspenders against any future branch forgetting to include
  it.

* **``super+b`` / ``win+b`` / ``cmd+b`` displayed "Cmd+B" on Linux and
  Windows.** ``formatVoiceRecordKey`` rendered ``mod === 'super'`` as
  ``Cmd`` universally, which told non-mac users the wrong modifier to
  press even though ``isVoiceToggleKey`` matched the right event bits.
  Gate the label to ``isMac`` so non-mac renders ``Super+B``.

* **``control+b`` / ``ctrl + b`` lost the macOS Cmd+B fallback.**
  ``_isDefaultVoiceKey`` keyed off ``parsed.raw`` — so
  semantically-equal aliases of the documented default dropped into
  the strict branch even though they bind Ctrl+B. Compare on the
  parsed spec (mod + ch + named) instead.

Coverage added: Linux ``Super+B`` rendering (and macOS ``Cmd+B``),
``control+b`` / ``ctrl + b`` accepting the Cmd+B fallback on darwin,
``/voice tts`` without ``record_key`` not clobbering cached binding,
and a backend regression asserting every ``voice.toggle`` branch
carries the configured key.

Suite: 579/579 TUI vitest green, 2/2 backend voice tests green,
tsc --noEmit clean.

* fix(tui): address Copilot round-3 review on #19835

Three classes of robustness issue caught on the second pass — all
revolve around malformed YAML tipping ``parseVoiceRecordKey`` or
``_voice_record_key`` into a crash instead of the documented
fallback.

* **Parser crashed on non-string YAML scalars.** ``config.get full``
  returns raw ``yaml.safe_load`` output, so ``voice.record_key: 1``
  or ``voice.record_key: true`` in a hand-edited config would hit
  ``.trim()`` on a number/bool and throw, breaking startup and
  every mtime re-apply. Accept ``unknown`` at the signature, guard
  with ``typeof raw !== 'string'``, and fall back to the default.

* **Backend blew up on non-dict ``voice:``.** Same YAML hazard on
  the gateway side: ``voice: true`` / ``voice: cmd+b`` left
  ``_load_cfg().get("voice")`` as a bool/str, so ``.get("record_key")``
  raised AttributeError and took every ``voice.toggle`` branch down
  with it. Centralised the lookup in a single
  ``_voice_record_key()`` helper that ``isinstance``-guards both
  ``voice`` and ``record_key`` and falls back to ``ctrl+b``.

* **Multi-modifier chords silently dropped extras.** The previous
  validator only checked the first modifier token, so ``ctrl+alt+r``
  silently parsed as ``ctrl+r`` and ``cmd+ctrl+b`` as ``super+b`` —
  a typo bound a different shortcut than the user configured.
  Reject multi-modifier spellings outright; the classic CLI only
  supports single-modifier bindings via prompt_toolkit's ``c-x`` /
  ``a-x`` rewrite, so this matches CLI parity.

Coverage added:

* ``parseVoiceRecordKey`` fallback on ``1`` / ``true`` / ``null`` /
  ``undefined`` / ``{}``.
* ``parseVoiceRecordKey`` fallback on ``ctrl+alt+r`` /
  ``cmd+ctrl+b`` / ``alt+ctrl+space``.
* ``test_voice_toggle_handles_non_dict_voice_cfg`` exercises
  every non-dict ``voice:`` shape (bool, str, None, int, list) and
  asserts each falls back to ``record_key: 'ctrl+b'``.

Suite: 581/581 TUI vitest green, 3/3 backend voice tests green,
tsc --noEmit clean.

* fix(tui): address Copilot round-4 review on #19835

Four final corners of the voice.record_key surface:

* **Bare-char configs silently coerced to ``ctrl+<key>``.** A config
  like ``voice.record_key: o`` / ``space`` / ``escape`` fell through
  to the default ``mod = 'ctrl'`` and silently bound Ctrl+O, while
  the classic CLI's prompt_toolkit would bind the raw key (no
  rewrite) — so the two runtimes silently disagreed on what "o"
  means. Require an explicit modifier; bare-char configs fall back
  to the documented Ctrl+B default.

* **Reserved ctrl+<letter> bindings would never fire.**
  ``useInputHandlers()`` intercepts ``ctrl+c`` (interrupt),
  ``ctrl+d`` (quit), and ``ctrl+l`` (clear screen) before the voice
  check runs, so those configs would be advertised in /voice
  status but the advertised shortcut never actually triggers
  push-to-talk. Added ``_RESERVED_CTRL_CHARS`` at parse time so
  the user gets the documented default instead of a dead shortcut.
  (``alt+c``, ``cmd+l``, etc. are not intercepted and stay usable.)

* **``_load_cfg()`` root itself may be a non-dict.**
  ``_voice_record_key()`` isinstance-guarded the ``voice`` subkey
  but not the root — a malformed config.yaml that collapsed to a
  scalar/list at the top level (``config.yaml: true`` or ``[]``)
  would still raise on ``.get("voice")``. Added the top-level
  guard too so every malformed shape falls back to ``ctrl+b``.

* **Stale header comment on ``isVoiceToggleKey``.** The doc-comment
  still claimed "On macOS we additionally accept the platform
  action modifier (Cmd) for the configured letter" even though the
  implementation gates the Cmd fallback to the documented default
  only. Rewrote to match.

Coverage added:

* ``parseVoiceRecordKey`` fallback on bare chars (``o``, ``b``,
  ``space``, ``escape``).
* ``parseVoiceRecordKey`` fallback on ``ctrl+c`` / ``ctrl+d`` /
  ``ctrl+l``; positive case for ``alt+c`` / ``cmd+l`` still usable.
* Backend ``test_voice_toggle_handles_non_dict_voice_cfg`` now
  exercises 5 non-dict shapes at the YAML root too.

Suite: 583/583 TUI vitest green, 3/3 backend voice tests green,
tsc --noEmit clean.

* fix(tui): address Copilot round-5 review on #19835

Three follow-ups on the voice matcher's modifier + shift discipline:

* **``super`` branch falsely fired on Alt+<key> / bare Esc on macOS.**
  ``isVoiceToggleKey`` accepted ``isMac && key.meta`` as a Cmd
  fallback for the ``super`` modifier — but hermes-ink sets
  ``key.meta`` for plain Alt/Option AND for bare Escape on some
  macOS terminals. A ``cmd+b`` config silently fired on Alt+B;
  ``cmd+space`` on Alt+Space; ``cmd+escape`` on bare Esc. Drop the
  fallback and require the literal ``key.super`` bit. Legacy-
  terminal users who need Cmd should upgrade to a kitty-protocol
  terminal or bind ``alt+X`` explicitly.

* **Shift bit was never checked.** The parser rejects multi-
  modifier configs like ``ctrl+shift+tab``, but the runtime
  matcher didn't check ``key.shift`` — so ``ctrl+tab`` also fired
  on Ctrl+Shift+Tab and ``alt+enter`` on Alt+Shift+Enter.
  Early-return on ``key.shift === true`` so the runtime only fires
  the exact chord the user configured.

* **Test leaked ``HERMES_VOICE=1`` into later tests.**
  ``voice.toggle`` action=on writes to ``os.environ`` directly
  (CLI parity, runtime-only flag); ``test_voice_toggle_returns_
  configured_record_key`` dispatched action=on without letting
  monkeypatch take ownership of the var first. Any later test
  that read voice mode in the same Python process could inherit a
  stale enabled state. Added ``monkeypatch.setenv("HERMES_VOICE",
  "0")`` up front so monkeypatch restores the original value at
  teardown.

Coverage added:

* ``cmd+b`` / ``cmd+space`` / ``cmd+escape`` do NOT fire on
  ``key.meta``-only events on darwin.
* ``ctrl+tab`` / ``alt+enter`` / ``ctrl+o`` reject matches when
  ``key.shift`` is held; sanity cases without Shift still fire.

Suite: 585/585 TUI vitest green, 3/3 backend voice tests green,
tsc --noEmit clean.

* fix(tui): address Copilot round-6 review on #19835

Three classes of modifier-discipline tightening + one config-surface
honesty fix:

* **Default ``ctrl+b`` Cmd fallback leaked Alt+B.** The default's
  macOS Cmd+B muscle-memory path used ``isActionMod(key)``, which
  returns ``key.meta || key.super`` on darwin. hermes-ink also
  reports plain Alt as ``key.meta``, so Alt+B silently fired the
  default binding. Replaced with strict ``isMac && key.super ===
  true`` — kitty-style Cmd+B still works, Alt+B correctly
  rejected. Legacy-terminal mac users (Terminal.app without
  CSI-u) now get raw Ctrl+B only; the documented default still
  works everywhere.

* **ctrl / super branches accepted extra modifier bits.** The
  parser rejects multi-modifier configs like ``ctrl+alt+o``, but
  the runtime matcher was permissive — ``ctrl+o`` fired on
  Ctrl+Alt+O / Ctrl+Cmd+O, and ``super+b`` fired on Cmd+Alt+B /
  Ctrl+Cmd+B. Added strict ``!key.alt && !key.meta && key.super
  !== true`` on ctrl, and ``!key.ctrl && !key.alt && !key.meta``
  on super, so the runtime only fires the exact chord the parser
  would let you configure.

* **Dropped ``cmd`` / ``command`` aliases.** They parsed to
  ``super`` and rendered as ``Cmd+X``, but legacy macOS terminals
  report Cmd as ``key.meta`` (same signal as Alt), so a
  ``cmd+o`` config was advertised as working but never actually
  fired on Terminal.app-without-CSI-u. That recreated the
  "displayed shortcut does not work" problem this PR was meant to
  remove. Users who want the platform action modifier spell it
  ``super`` / ``win`` — that matches the unambiguous ``key.super``
  bit, and kitty-style macOS terminals render it as ``Cmd+X`` via
  platform-aware formatter.

Coverage updated:

* Default ctrl+b no longer fires on Alt+B via ``key.meta`` leak;
  raw Ctrl+B and kitty-style Cmd+B still fire.
* ``ctrl+o`` rejects Ctrl+Alt+O / Ctrl+Cmd+O / Ctrl+Meta+O chords.
* ``super+b`` rejects Cmd+Alt+B / Cmd+Meta+B / Ctrl+Cmd+B chords.
* ``cmd+b`` / ``command+b`` / ``meta+b`` all fall back to the
  documented default at parse time (joined the ambiguous-mac-mod
  rejection class).
* Round-2 expectations that asserted ``cmd+b`` parsed as super
  and accepted ``key.meta`` on darwin updated to reflect the new
  stricter contract.

Suite: 588/588 TUI vitest green, 3/3 backend voice tests green,
tsc --noEmit clean.

* fix(tui): address Copilot follow-up on wire typing + escape precedence

Two follow-ups from the latest Copilot pass:

* **Config wire typing honesty (`gatewayTypes.ts`)**
  `config.get full` forwards raw `yaml.safe_load()` output, so
  `voice.record_key` can be any scalar/container when hand-edited.
  Typing it as `string` suggests a normalized contract that the
  backend does not guarantee and makes unsafe callers more likely.
  Change `ConfigVoiceConfig.record_key` to `unknown` with an
  explicit comment that callers must normalize at runtime.

* **Escape-based voice bindings were swallowed before voice check**
  `useInputHandlers()` handled `key.escape` for queue-edit cancel and
  selection clear before `isVoiceToggleKey(...)`, so configured
  `ctrl+escape` / `alt+escape` / `super+escape` chords were advertised
  but never toggled recording in those UI states.
  Add an early escape+voice check before generic Esc handlers so
  escape-based voice bindings win when configured, while plain Esc
  behavior remains unchanged.

Also updated PR #19835 description text to remove stale cmd/command
alias claims and match the current parser contract.

* fix(tui): pass configured voice shortcut through TextInput layer

Thread the live parsed voiceRecordKey into TextInput so configured voice.record_key chords bubble to useInputHandlers instead of being consumed as editor input. This removes the last hardcoded Ctrl+B pass-through in the composer path while preserving existing global control chord behavior.

* fix(tui): require explicit alt bit for escape-based alt chords

Hermes-ink reports bare Escape as meta=true+escape=true on some terminals, so a configured alt+escape binding was firing on bare Esc. Require an explicit key.alt bit when the configured named key is escape so plain Esc stays plain Esc; kitty-style alt+escape still fires.

* fix(tui): harden voice.record + TextInput paste + super-mod reserved list

Three round-7 Copilot follow-ups on #19835:

- voice.record start handler used _load_cfg().get('voice', {}).get(...) without
  shape checks, so malformed YAML (bool/scalar/list) returned 5025 instead of
  using VAD defaults. Centralized _voice_cfg_dict() helper and type-guarded
  silence_threshold/silence_duration with numeric fallbacks.
- TextInput pass-through check moved above paste/copy handling so configured
  voice chords (ctrl+v / alt+v / cmd+v) beat the composer's paste/copy
  defaults.
- parser now also rejects super+{c,d,l,v} — on macOS those are
  copy/exit/clear/paste and would be advertised in /voice status but never
  actually toggle recording.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix(tui): round-8 Copilot review — allow ctrl+x, gate super reservations to macOS, preserve voice key on transient RPC failure

Three round-8 Copilot follow-ups on #19835:

- Revert ctrl+x addition to _RESERVED_CTRL_CHARS (landed via Copilot Autofix
  commit 731ec86): ctrl+x is only claimed during queue-edit
  (queueEditIdx !== null), so voice works the rest of the session and
  matches CLI ctrl+<letter> parity.
- Gate super+{c,d,l,v} reservation to isMac. Linux/Windows TUI globals key
  off Ctrl, so kitty/CSI-u super+<letter> configs don't collide on non-mac
  and should stay usable.
- applyDisplay() now skips setVoiceRecordKey when cfg is null so one
  transient quietRpc() failure after a config edit doesn't clobber the
  cached binding back to Ctrl+B until the next successful poll.

New coverage:
- parseVoiceRecordKey preserves ctrl+x on linux
- super+{c,d,l,v} rejected on darwin, allowed on linux
- applyDisplay(null, ...) leaves voiceRecordKey untouched

* fix(cli,tui): normalize voice.record_key aliases across CLI + TUI for parity

Round-9 Copilot review on #19835: TUI accepted control+/option+/opt+/super+/win+ aliases but the classic CLI only rewrote literal ctrl+/alt+ before handing to prompt_toolkit, so a TUI-valid config silently bound a different (or no) shortcut in the CLI.

- Added normalize_voice_record_key_for_prompt_toolkit() in hermes_cli/voice.py with a single alias table (ctrl/control/alt/option/opt → c-/a-).
- Wired it into all three cli.py sites (_enable_voice_mode hint, _show_voice_status display, and the prompt_toolkit binding in _register_voice_handler).
- /voice status display now renders control+x as Ctrl+X and option+x as Alt+X (canonical casing) to match TUI formatVoiceRecordKey.
- super/win/windows are intentionally left unchanged: prompt_toolkit has no super modifier, so the CLI will reject them loudly at startup rather than silently binding Ctrl+B. Documented this split at both the TUI _MOD_ALIASES comment and the CLI normalizer docstring.
- Added tests covering ctrl/control/alt/option/opt mapping, case-insensitivity, non-string fallback, empty-string fallback, and super/win pass-through.

* fix(cli): port TUI parser contract into CLI voice.record_key normalizer

Round-10 Copilot review on #19835.

hermes_cli/voice.py's normalize_voice_record_key_for_prompt_toolkit() previously did blind substring replacement with no trim/validate step, so the CLI diverged from the TUI parser on:
- whitespace ('ctrl + b' -> 'c- b' instead of 'c-b')
- typoed named keys ('ctrl+spcae' passed through as 'c-spcae' and prompt_toolkit would reject at startup)
- bare-char configs ('o' should fall back, not pass through as 'o')
- multi-modifier chords ('ctrl+alt+r')
- reserved ctrl chars ('ctrl+c/d/l')
- unknown modifiers ('meta+b' / 'shift+b')
- named-key aliases ('return'/'esc'/'bs'/'del' not collapsed to prompt_toolkit canonicals)

Port the TUI parser contract into Python (_VOICE_MOD_ALIASES, _VOICE_NAMED_KEYS, _VOICE_RESERVED_CTRL_CHARS) so one config value binds the same shortcut in both runtimes.

Also added format_voice_record_key_for_status() shared between the PTT hint and /voice status display. Non-string scalars (voice.record_key: true / 1) now surface as 'Ctrl+B' instead of the raw scalar — /voice status no longer advertises a shortcut that can never bind.

Tests: 29/29 in test_voice_wrapper.py, including 11 new regressions covering whitespace, named-key aliases, typos, bare-char, multi-modifier, reserved ctrl, unknown mods, non-string fallback, and formatter contract.

* fix(cli): shape-safe voice config read + graceful super/win fallback

Round-11 Copilot review on #19835.

Two remaining cross-runtime gaps:

1. load_config().get('voice', {}) still assumed voice was a dict, so a hand-edited voice: true / voice: cmd+b at the top level raised AttributeError before the voice UI could start. Added voice_record_key_from_config(cfg) to hermes_cli/voice.py that isinstance-guards both the root and the voice subkey. All three cli.py read sites (_enable_voice_mode hint, _show_voice_status, PTT binding) now use it.

2. The CLI normalizer previously passed super+/win+/windows+ through unrewritten so prompt_toolkit would reject them loudly at startup — but that crash was a worse UX than a silent fallback. Normalizer now returns c-b for those spellings, and the PTT binding site logs a warning so users see why their TUI-only shortcut isn't binding in the CLI.

Coverage: 34/34 in tests/hermes_cli/test_voice_wrapper.py (5 new cases for voice_record_key_from_config + malformed-root + malformed-voice + extractor/normalizer composition).

* fix(cli): self-audit cleanup — remaining voice-config shape safety + doc drift

Self-review of the voice.record_key change set turned up four remaining items Copilot would very likely flag next round:

1. cli.py _voice_start_continuous still read load_config().get('voice', {}).get('silence_threshold') without an isinstance guard, so a hand-edited voice: true / voice: cmd+b (non-dict) raised AttributeError on VAD recording start. Shape-safe coerce the voice dict and numeric-guard silence_threshold/silence_duration.

2. cli.py _enable_voice_mode's auto_tts check had the same bug — fixed with the same isinstance guard.

3. hermes_cli/voice.py module comment on _VOICE_MOD_ALIASES still said super/win/windows 'pass through unchanged and prompt_toolkit's add() call loudly rejects them at startup'. Round 11 changed the normalizer to silently fall back to c-b with a warning at the binding site; updated the comment to match.

4. ui-tui/src/lib/platform.ts header comment had the same stale 'CLI will loudly reject them at startup' claim; updated to 'falls back to the documented default and logs a warning'.

No behavior change on the code paths already covered by test_voice_wrapper.py; the two cli.py fixes are defensive against malformed YAML that previous rounds already hardened in tui_gateway/server.py but missed in the classic CLI.

* fix(cli,tui): round-12 Copilot review — alt-collide on mac, bool-in-int guards, voice UI hardcodes, mtime-reload test

Five round-12 Copilot review items on #19835:

1. platform.ts: hermes-ink reports Alt as key.meta on many terminals; isActionMod on darwin accepts key.meta as the action modifier. So alt+c/d/l get claimed by isCopyShortcut / isAction('d')/'l') before the voice check. Reject those configs at parse time on macOS only (non-mac keeps them usable).

2. cli.py: four remaining hardcoded 'Ctrl+B' sites in voice-facing UI (_get_voice_status_fragments status bar, _voice_start_recording hints, _get_placeholder composer text) were still lying about non-default configs. Added self._voice_record_key_label() shared helper and wired it into all three sites.

3. server.py + cli.py: bool is a subclass of int, so isinstance(silence_threshold, (int, float)) accepted True/False from malformed YAML and forwarded 1/0 to the VAD engine. Exclude bool explicitly so boolean typos fall back to the documented 200 / 3.0 defaults.

4. useConfigSync.ts: extracted the config.get-full fetch+apply body into a shared hydrateFullConfig() helper. Both the initial hydration and mtime-reload paths now use it, so the polling/RPC wiring is exercised by direct unit tests (4 new cases: fresh apply, reapply on new value, transient RPC failure preserves cache, back-compat without voice setter).

5. Added alt+{c,d,l} rejection regressions on darwin + allow on linux, and bool-leak regressions for both silence_threshold and silence_duration in tests/test_tui_gateway_server.py.

Suite: 602/602 TUI vitest, 38/38 backend voice tests, typecheck + lints clean.

* fix(cli): cache voice record-key label at binding time + status-bar coverage

Round-13 Copilot review on #19835.

_voice_record_key_label() was reading live config on every render, which caused two problems:

1. prompt_toolkit registers the push-to-talk binding once at session start (@kb.add(_voice_key)); the binding does NOT re-read config. Editing voice.record_key mid-session would switch the status-bar / placeholder / recording-hint label to the new shortcut while the actual keybinding stayed on the startup chord — reintroducing the display/binding drift this whole PR is fighting.

2. Hot render path: during recording the UI is invalidated every 150ms, so re-loading + deep-merging config on every call added avoidable UI overhead.

Fix: cache the label at the same site that registers the prompt_toolkit binding via new set_voice_record_key_cache(raw_key). _voice_record_key_label() now just returns the cached value (falls back to 'Ctrl+B' before startup). Status/placeholder/hint are always in sync with the live binding; no config reload per render.

Also added 4 regression cases to tests/cli/test_cli_status_bar.py: configured ctrl+<letter> renders in both wide and compact status bars, configured named key (ctrl+space) renders in the recording hint, pre-startup absent cache falls back to Ctrl+B, and malformed configs (bool True) fall through the formatter to Ctrl+B.

Suite: 60/60 test_cli_status_bar + test_voice_wrapper, typecheck + lints clean.

* fix(cli): route /voice on + /voice status through startup-pinned label; mac alt+cdl parity

Round-14 Copilot review on #19835. All three comments legit:

1. _enable_voice_mode still formatted label from live load_config() — mid-session config edit would make /voice on announce the new shortcut while the prompt_toolkit binding stayed the startup chord. Use self._voice_record_key_label() (cached at binding time, round-13) so /voice on cannot drift from the live binding.

2. _show_voice_status had the same bug — /voice status reported live config instead of the pinned startup binding. Fixed the same way.

3. CLI normalizer accepted alt+c/alt+d/alt+l even though the TUI parser rejects them on macOS (Copilot round-12 — hermes-ink reports Alt as key.meta, isActionMod on darwin accepts it, collides with isCopyShortcut / isAction). Added _VOICE_RESERVED_ALT_CHARS_MAC = {c,d,l} gated to sys.platform == 'darwin' so a shared config like option+c falls back to c-b on both runtimes on macOS; non-mac still binds a-c.

Coverage: 4 new tests in test_voice_wrapper.py covering mac alt+cdl rejection, linux alt+cdl allowed, option/opt alias forms, and mac-specific exclusions for other alt letters. 62/62 in voice wrapper + status bar suites.

---------

Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
Co-authored-by: asheriif <ahmedsherif95@gmail.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-04 15:49:28 -07:00
briandevans
9fa3a093f2 fix(local): test root as ancestor candidate; use real pipe for fake stdout
Address Copilot review on PR #17569:

1. _resolve_safe_cwd never tested the filesystem root because the loop
   exited when `os.path.dirname(parent) == parent`, which is true once
   `parent == '/'`. Restructure so the root is checked before the
   self-equal exit. Adds `test_returns_root_when_only_root_exists` —
   regression-guarded by reverting the loop and watching it fail.

2. The fake `Popen.stdout` was a `MagicMock`; `BaseEnvironment._wait_for_process`
   calls `proc.stdout.fileno()` then `select.select`/`os.read` against it,
   which raised `TypeError: fileno() returned a non-integer` (visible as a
   thread exception in test output) and could in theory read from an
   unrelated real fd. Hand `fake_popen` a real `os.pipe()` with the write
   end pre-closed so the drain loop sees EOF immediately. Helper records
   each fd so the test cleans up after itself.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:31:47 -07:00
briandevans
9644b8ae67 fix(local): recover when persistent_shell cwd is deleted (#17558)
When a tool call deletes its own working directory (`cd /tmp/foo &&
rm -rf /tmp/foo`), the next `subprocess.Popen(args, cwd=self.cwd)` raised
`FileNotFoundError: [Errno 2]` before bash even started — every subsequent
terminal/file-tool call hit the same wedge until the gateway restarted.

Fix in `LocalEnvironment._run_bash`: before handing `self.cwd` to Popen,
resolve a safe alternative when the path is gone (walk up to the nearest
existing ancestor, falling back to `tempfile.gettempdir()` only as a last
resort). Log a warning so the recovery is visible — not silent — and
update `self.cwd` so the next call doesn't repeat the message.

Defense in depth in `LocalEnvironment._update_cwd`: only adopt the new
cwd when it still exists as a directory. `pwd -P` from a deleted cwd can
leave a stale value in the marker file; refusing to store a missing path
keeps `self.cwd` valid by construction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 15:31:47 -07:00
briandevans
81cd678291 fix(google-workspace): restore required_credential_files in SKILL.md (#16452)
PR #9931 ("feat(google-workspace): add --from flag for custom sender display name")
accidentally removed the required_credential_files frontmatter block that tells
hermes to bind-mount google_token.json and google_client_secret.json into Docker
and Modal remote terminals before running setup.py.

Without this header the credential files are never registered in the session-scoped
ContextVar, so get_credential_file_mounts() returns an empty list at container
creation time and the OAuth files are invisible inside the sandbox.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:43:14 -07:00
briandevans
60b143e9df fix(tui_gateway): guard sys.path against local package shadowing (#15989)
When the TUI backend (tui_gateway/entry.py) is spawned by Node.js with the
user's CWD containing a local utils/ directory, that directory shadows the
installed utils module, causing ImportError in run_agent and hermes_cli.

Strip '' and '.' from sys.path and prepend HERMES_PYTHON_SRC_ROOT (already
set by hermes_cli before spawning the subprocess) so installed packages
always win over CWD artifacts.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-04 12:42:43 -07:00
briandevans
eadf34633e fix(models): strip :cloud/-cloud suffix from models.dev Ollama Cloud IDs
models.dev appends :cloud and -cloud suffixes to Ollama Cloud model IDs
(e.g. kimi-k2.6:cloud, qwen3-coder:480b-cloud) that the live Ollama Cloud
API does not use. Without normalisation, these suffixed IDs bypass the
dedup check and appear alongside the correct clean IDs, causing 400/404
errors when users select them in /model or hermes model.

Add _strip_ollama_cloud_suffix() and apply it to mdev entries before the
dedup merge in fetch_ollama_cloud_models() so all model IDs stored in the
disk cache use the canonical form the API accepts.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:38:15 -07:00
Yoimex
c050ee6573 fix(file_ops): resolve search_files path/line collision for hyphenated numeric filenames 2026-05-04 12:37:47 -07:00
Ricardo-M-L
fbc477df71 fix(run_agent): acquire lock in IterationBudget.used property
The `used` property was reading `self._used` without holding the lock,
while `consume()`, `refund()`, and `remaining` all properly acquire
`self._lock` before accessing `_used`. This means a concurrent call to
`used` during `consume()` or `refund()` could observe a partially-
updated value, leading to incorrect iteration budget metrics reported
to the gateway, or in extreme cases a ValueError from CPython's list
implementation when the internal array resizes during iteration.

Fix: acquire the lock in `used` just like `remaining` does.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-04 12:37:28 -07:00
ClawdIA
64ad7dec0d fix(file-ops): allow file search in hidden roots 2026-05-04 12:37:09 -07:00
briandevans
9e2628ee7c test(discord): annotate make_attachment content_type as Optional[str]
Copilot review: the helper accepted None in one test but was annotated str.
Matches actual usage where no-content-type attachments are a tested scenario.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:36:47 -07:00
Ioodu
1c7f47a58c fix(cron): add concurrency regression test for parallel job state writes
get_due_jobs() called load_jobs() and save_jobs() without holding
_jobs_file_lock, creating a race with the locked mark_job_run() and
advance_next_run(). Wrap get_due_jobs() with the lock (delegating to a
new _get_due_jobs_locked() inner function) so all load→modify→save
cycles are serialised. Add two regression tests: one verifying 3
concurrent mark_job_run() calls each land their correct last_status and
last_run_at without overwrites, and a stress test confirming 10 parallel
calls each increment their job's completed count to exactly 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:36:29 -07:00
lhysdl
6875471916 fix(tts): update MiniMax API endpoint to v1/text_to_speech
MiniMax deprecated the old v1/t2a_v2 endpoint (api.minimax.io) and
moved to v1/text_to_speech (api.minimax.chat). The new API:

- Uses a flat payload: {model, text, voice_id} instead of nested
  voice_setting / audio_setting objects
- Returns raw audio bytes (Content-Type: audio/mpeg) instead of
  JSON with hex-encoded audio
- Uses model 'speech-01' instead of 'speech-2.8-hd'
- Updated default voice_id to 'female-shaonv' for Chinese TTS

The implementation detects Content-Type to handle both old and new
API responses, maintaining backward compatibility for any users who
manually configured the legacy base_url.
2026-05-04 12:36:09 -07:00
briandevans
75bce317a3 fix(cron): expand \${VAR} refs in config.yaml during job execution (#15890)
The cron scheduler's run_job() loaded config.yaml with yaml.safe_load()
but never called _expand_env_vars(), so ${HERMES_MODEL} and similar
references in model:, fallback_providers:, and other config.yaml fields
were forwarded to the LLM API as literal strings, causing HTTP 400 errors.

The normal CLI path has always called _expand_env_vars() via load_config(),
so this was a cron-only gap. The .env load at the top of run_job() already
populates os.environ before config.yaml is read, so the expansion sees the
correct values.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:35:46 -07:00
Albert.Zhou
fd9c32c0f2 fix(email): drop non-allowlisted senders before dispatch to prevent mail loops
Add EMAIL_ALLOWED_USERS check in EmailAdapter._dispatch_message()
to silently discard emails from senders not in the allowlist.  This
prevents the adapter from creating thread context and dispatching a
MessageEvent for unauthorized senders, which could race with the
gateway authorization check and result in SMTP replies being sent
despite the handler returning None.

Test: tests/gateway/test_email.py::TestDispatchMessage::test_non_allowlisted_sender_dropped
Test: tests/gateway/test_email.py::TestDispatchMessage::test_allowlisted_sender_proceeds
Test: tests/gateway/test_email.py::TestDispatchMessage::test_empty_allowlist_allows_all
2026-05-04 12:35:22 -07:00
briandevans
20edca75e9 fix(update): sync bundled skills to all profiles, including active (#16176)
`hermes update` iterated only non-active profiles when seeding bundled
skills. `seed_profile_skills()` uses a subprocess with an explicit
HERMES_HOME so it correctly targets any profile path; the `p.name !=
active` filter was the only thing preventing the active profile from
being included, leaving it silently on stale skill content after every
update.

Drop the filter and update the header line from "other profiles" to
"all profiles". The active profile is now seeded on the same path as
every other profile. The earlier `sync_skills()` call (module-level
HERMES_HOME) remains for backward compatibility; the subprocess-based
loop is reliable regardless of which HERMES_HOME the CLI was invoked
with.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 12:34:53 -07:00
jjjojoj
103f51ad34 fix(doctor): check gh auth status when GITHUB_TOKEN absent
hermes doctor showed 'No GITHUB_TOKEN (60 req/hr)' warning even when
users had authenticated via gh auth login. Now falls back to
gh auth status --json authenticated when GITHUB_TOKEN and GH_TOKEN
are both unset.

Fixes #16115
2026-05-04 12:34:31 -07:00
fiver
8ab9f61dcf fix(gateway): preserve WSL interop PATH in systemd units 2026-05-04 12:34:06 -07:00
Teknium
d90f73bcec
fix(gateway): use git HEAD SHA, not file mtimes, for stale-code check (#19740)
The stale-code self-check (Issue #17648) used sentinel-file mtimes to
decide whether the gateway survived a `hermes update` with stale
`sys.modules`. That signal false-positives on any write to the
sentinel files — including agent-driven edits during Hermes-on-Hermes
dev sessions. Telling the agent to patch `run_agent.py` would flip
the check to True on the next user message and force a gateway
restart even though no update happened.

Switch the signal to `git rev-parse HEAD`. Agent file edits don't
move HEAD; `hermes update` (git pull) always does. Reading .git/HEAD
directly (no subprocess) with a 5s cache keeps the overhead negligible
on bursty chats. Non-git installs short-circuit to False — the
stale-modules class can't occur without a git-backed update path, so
there's nothing to detect.

The legacy `_compute_repo_mtime` helper is kept but unused by
detection, reserved as a fallback hook for future pip-install update
paths.

- _read_git_head_sha(): resolves HEAD across main checkout, worktree
  (follows `gitdir:` + `commondir` pointers), and packed-refs layouts.
- _current_git_sha_cached(): per-runner 5s SHA cache.
- _detect_stale_code(): boot SHA vs current SHA, returns False when
  either is unavailable.
- Tests cover all four layouts, the agent-edits-don't-trigger
  regression, and cache behavior.

Refs #17648.
2026-05-04 12:33:21 -07:00
Teknium
1c7c7c3c5f
feat(kanban-dashboard): per-platform home-channel notification toggles (#19864)
* revert: auto-subscribe gateway chat on tool-driven kanban_create (#19718)

Reverts ff3d2773e2. Teknium reviewed the merged PR and decided this
behavior isn't wanted — tool-driven kanban_create should not mirror
the slash-command path's auto-subscribe. Orchestrators that want
their originating chat notified can call kanban_notify-subscribe
explicitly; we're not going to make it implicit.

* feat(kanban-dashboard): per-platform home-channel notification toggles

Adds a "Notify home channels" section to the task drawer in the kanban
dashboard plugin. Each platform where the user has set a home channel
(/sethome, TELEGRAM_HOME_CHANNEL env var, gateway.platforms.<p>.home_channel
in config.yaml) gets a toggle pill. Toggling on writes a kanban_notify_subs
row keyed to that platform's home (chat_id + thread_id); toggling off
removes it. The existing gateway notifier watcher delivers completed /
blocked / gave_up events without any new plumbing — this is purely a GUI
surface over existing machinery.

Replaces the reverted auto-subscribe behavior from #19718 with an explicit,
per-task, per-platform, user-controlled opt-in. No implicit subscription
on tool-driven kanban_create; no CLI commands; no slash commands. Just a
toggle in the drawer.

Backend (plugins/kanban/dashboard/plugin_api.py):
- GET  /api/plugins/kanban/home-channels[?task_id=X]
  Returns every platform with a configured home, plus a per-entry
  subscribed: bool relative to task_id (false when task_id omitted).
  Reads the live GatewayConfig via load_gateway_config() so env-var
  overlays stay honored.
- POST /api/plugins/kanban/tasks/:id/home-subscribe/:platform
  Idempotent add_notify_sub keyed to the platform's home.
- DELETE /api/plugins/kanban/tasks/:id/home-subscribe/:platform
  remove_notify_sub for the same tuple.
- 404 when the platform has no home configured, or task_id doesn't
  exist (POST only).

Frontend (plugins/kanban/dashboard/dist/index.js):
- TaskDrawer fetches /home-channels on open, keyed on task_id.
- HomeSubsSection renders nothing when zero platforms have a home (so
  users who haven't set one up don't see an empty UI block).
- Optimistic toggle with busy flag + revert-on-failure. One pill per
  platform; ✓ prefix and --on class indicate the subscribed state.

CSS (plugins/kanban/dashboard/dist/style.css):
- .hermes-kanban-home-subs flex row + .hermes-kanban-home-sub pill
  style + --on subscribed variant (subtle ring-colored background).

Live-tested against a dashboard with TELEGRAM + DISCORD_BOT_TOKEN /
HOME_CHANNEL env vars set: drawer shows both pills, toggling each
flips its visual state AND writes/removes the correct kanban_notify_subs
row (verified via direct DB read).

Tests (tests/plugins/test_kanban_dashboard_plugin.py, 11 new, 53/53
pass total):
- home-channels lists only platforms with a home (slack with a
  token but no home is excluded)
- no task_id -> all subscribed=false
- subscribe creates notify_sub row with correct chat/thread/platform
- subscribed=true reflected in subsequent GET
- idempotent re-subscribe
- unknown platform -> 404
- unknown task -> 404
- unsubscribe removes the row
- telegram + discord subscribe/unsubscribe independent
- zero homes -> empty list
2026-05-04 12:31:21 -07:00
emozilla
2bc82bb504 clarify placeholder telegram credential in tests 2026-05-04 15:31:15 -04:00
Teknium
3db6b9cc87
feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern) (#19709)
* feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern)

Adds a no_agent=True option to the cronjob system. When enabled, the
scheduler runs the attached script on schedule and delivers its stdout
directly to the job's target — no LLM, no agent loop, no token spend.
This is the classic bash-watchdog pattern (memory alert every 5 min,
disk alert every 15 min, CI ping) reimplemented as a first-class Hermes
primitive instead of a systemd timer + curl + bot token triplet living
outside the system.

## What

  hermes cron create "every 5m" \
    --no-agent \
    --script memory-watchdog.sh \
    --deliver telegram \
    --name memory-watchdog

Agent tool:

  cronjob(action='create',
          schedule='every 5m',
          script='memory-watchdog.sh',
          no_agent=True,
          deliver='telegram')

Semantics:
- Script stdout (trimmed) → delivered verbatim as the message
- Empty stdout          → silent tick (no delivery; watchdog pattern)
- wakeAgent=false gate  → silent tick (same gate LLM jobs use)
- Non-zero exit/timeout → delivered as an error alert
                          (broken watchdogs shouldn't fail silently)
- No LLM ever invoked; no tokens spent; no provider fallback applied

## Implementation

cron/jobs.py
  * create_job gains no_agent: bool = False
  * prompt becomes Optional (no_agent jobs don't need one)
  * Validation: no_agent=True requires a script at create time
  * Field roundtrips via load_jobs / save_jobs / update_job

cron/scheduler.py
  * run_job: new short-circuit branch at the top that runs the script,
    wraps its output into the (success, doc, final_response, error)
    tuple downstream delivery already expects, and returns before any
    AIAgent import or construction
  * _run_job_script: picks interpreter by extension — .sh/.bash run
    under /bin/bash, anything else under sys.executable (Python).
    Shell support unlocks the bash-watchdog pattern without wrapping
    scripts in Python. Extension is explicit; we deliberately do NOT
    trust the file's own shebang. Path-containment guard (scripts dir)
    unchanged.

tools/cronjob_tools.py
  * Schema: new no_agent boolean property with clear trigger guidance
  * cronjob() accepts no_agent and validates mode-specific shape:
    - no_agent=True requires script; prompt/skills optional
    - no_agent=False keeps the existing 'prompt or skill required' rule
  * update path rejects flipping no_agent=True on a job without a script
  * _format_job surfaces no_agent in list output
  * Handler lambda forwards no_agent from tool args

hermes_cli/main.py, hermes_cli/cron.py
  * 'hermes cron create --no-agent' and edit's --no-agent / --agent
    pair for toggling at CLI parity with the agent tool
  * Existing --script help text updated to describe both modes
  * List / create / edit output now shows 'Mode: no-agent (...)' when set

## Tests

tests/cron/test_cron_no_agent.py — 18 tests covering:
  * create_job: no_agent shape, validation, field persistence
  * update_job: flag roundtrip across reload
  * cronjob tool: schema validation, update toggling, mode-specific
    requirements, prompt-relaxation rule
  * run_job short-circuit:
    - success path delivers stdout verbatim
    - empty stdout → SILENT_MARKER (no delivery downstream)
    - wakeAgent=false gate → silent
    - script failure → error alert
    - run_job does NOT import AIAgent (verified via mock)
  * _run_job_script:
    - .sh executes via bash (no shebang required)
    - .bash executes via bash
    - .py still runs via sys.executable (regression)
    - path-traversal still blocked (security regression)

All 18 new tests pass. 341/342 pre-existing cron tests still pass; the
one failure (test_script_empty_output_noted) was already broken on main
and is unrelated to this change.

## Docs

website/docs/guides/cron-script-only.md — new dedicated guide covering
the watchdog pattern, interpreter rules, delivery mapping, worked
examples (memory / disk alerts), and the comparison table vs hermes send,
regular LLM cron jobs, and OS-level cron.

website/docs/user-guide/features/cron.md — new 'No-agent mode' section
in the cron feature reference, cross-linked to the guide.

website/docs/guides/automate-with-cron.md — new tip box pointing users
to no-agent mode when they don't need LLM reasoning.

## Compatibility

- Existing jobs: unchanged. no_agent defaults to False, existing code
  paths untouched until the flag is set.
- Schema additive only; older jobs.json without the field load fine
  via .get() with False default.
- New CLI flags are opt-in and don't alter existing flag behavior.

* fix(cron): lazy-import AIAgent + SessionDB so no_agent ticks pay zero

The unconditional `from run_agent import AIAgent` + SessionDB() init at
the top of run_job() meant every no_agent tick still paid the full agent
module load cost (~300ms + transitive imports + DB open) even though it
never touched any of that machinery.

Move both to live under the default (LLM) path, after the no_agent
short-circuit has returned. Now a no_agent tick's sys.modules stays
clean — verified end-to-end:

    assert 'run_agent' not in sys.modules  # before
    run_job(no_agent_job)
    assert 'run_agent' not in sys.modules  # after

The existing mock-based unit test (test_run_job_no_agent_never_invokes_aiagent)
kept passing because patch() replaces the class AFTER import; the leak
was only visible via real subprocess-style verification. End-to-end
demo confirmed: agent calls cronjob(no_agent=True) → script runs →
stdout delivered → no LLM machinery loaded.

* docs(cron): tighten no_agent tool schema — defaults, silent semantics, pick rule

Previous description buried the important bits in one long sentence.
Agents could plausibly miss three things an LLM-facing schema should
make unmissable:

1. What the default is — now first sentence + JSON Schema `default: false`
2. What 'silent run' actually means for the user — now spelled out:
   'nothing is sent to the user and they won't see anything happened'
3. When to pick True vs False — now a concrete decision rule with
   examples on both sides (watchdogs/metrics/pollers → True;
   summarize/draft/pick/rephrase → False)

Also adds explicit 'prompt and skills are ignored when True' since the
agent could otherwise still pass them out of habit.

No behavior change — schema text only.
2026-05-04 12:31:01 -07:00
teknium1
d35efb9898 feat(telegram): /topic off + help + auth gate + screenshot debounce
Four production-readiness additions to topic mode:

1. /topic off — clean disable path. Flips telegram_dm_topic_mode.enabled
   to 0 and clears telegram_dm_topic_bindings for this chat. Previously
   users had to edit state.db with sqlite3 to turn the feature off.
   Idempotent: calling /topic off when the chat was never enabled
   returns a friendly no-op message.

2. /topic help — inline usage printed in the DM so users don't have to
   visit docs to discover /topic off, /topic <session-id>, etc.

3. Authorization gate. /topic mutates SQLite side tables and flips the
   root DM into a lobby, so the action must be authorized. Now calls
   self._is_user_authorized(source); unauthorized DMs get a refusal
   instead of activation. Defense in depth on top of the gateway's
   existing pre-route auth.

4. BotFather screenshot debounce. A user repeatedly running /topic
   while Threads Settings is still disabled would previously re-upload
   the same screenshot every time. Now rate-limited to one send per
   5 minutes per chat. /topic off resets the counter so re-enabling
   starts fresh.

Command-def args hint updated: /topic [off|help|session-id].

Docs:
- New /topic subcommands table at the top of the multi-session section
- Disable instructions updated to recommend /topic off first, with the
  raw SQL fallback kept for bulk cleanup
- Under-the-hood list extended with the capability-hint debounce and
  the authorization gate

Tests (6 new):
- /topic help returns usage and doesn't create topic tables
- /topic off disables mode AND clears bindings
- /topic off is idempotent when never enabled
- Unauthorized users get refusal, no tables created
- Capability-hint debounce is per-chat
- /topic off resets both lobby and capability debounce counters

All 402 targeted tests pass. Full gateway sweep: 4809/4810
(pre-existing test_teams::test_send_typing unrelated).
2026-05-04 12:07:17 -07:00
teknium1
1381c89e56 fix(telegram): polish topic mode — CASCADE, General-topic handling, rename guard, debounce
Five follow-ups to topic mode based on integration audit:

1. ON DELETE CASCADE on telegram_dm_topic_bindings.session_id. Session
   pruning (manual /delete, auto-cleanup, any future prune job) would
   have thrown 'FOREIGN KEY constraint failed' for sessions bound to a
   topic. Migration bumped to v2, rebuilds the bindings table in place
   if FK lacks CASCADE. Idempotent; only runs once per DB.

2. Never auto-rename operator-declared topics. If an operator has
   extra.dm_topics configured AND a user runs /topic, messages in those
   pre-declared topics would previously trigger auto-rename and silently
   mutate operator config. _rename_telegram_topic_for_session_title now
   early-returns when _get_dm_topic_info returns a dict for this
   (chat_id, thread_id). Uses class-based lookup (not hasattr) so
   MagicMock test fixtures don't accidentally trip the guard.

3. General topic handling. Telegram's General (pinned top) topic in a
   forum-enabled private chat may send messages with message_thread_id=1
   or omit thread_id entirely depending on client. Both are now treated
   as the root lobby, not a topic lane. Prevents users from
   accidentally burning a session on the General topic.

4. Debounce the root-lobby reminder. 30-second cooldown per chat so a
   user who forgets topic mode is enabled and types ten messages in the
   root gets one reminder, not ten. Explicit command replies
   (/new-in-lobby, /topic <session-id>) still land every time.

5. Docs: added under-the-hood invariants for the above, plus a
   Downgrade section explaining that rolling back to a pre-/topic
   Hermes build leaves the DB tables orphaned but harmless — DMs just
   revert to native per-thread isolation.

Tests:
- test_operator_declared_topic_is_not_auto_renamed
- test_general_topic_is_treated_as_root_lobby
- test_lobby_reminder_is_debounced_per_chat
- test_binding_survives_session_deletion_via_cascade
- test_migration_rebuilds_v1_binding_table_with_cascade_fk

Validated: 4803/4804 tests pass (tests/gateway/ + tests/test_hermes_state.py).
Sole failure is a pre-existing test_teams::test_send_typing flake
unrelated to this PR.
2026-05-04 12:07:17 -07:00
teknium1
a7683d04a9 fix(telegram): harden DM topic binding — persist through switch_session, rebind on /new
Follow-up on @EmelyanenkoK's feat: add Telegram DM topic-mode sessions.

Three issues:

1. Split-brain session state. After get_or_create_session() returned a
   SessionEntry for a topic lane, the handler was mutating
   .session_id in place to the binding's target, but never persisting
   the switch through SessionStore. The sessions.json session_key →
   session_id map kept pointing at the lane's natural id; any reader
   that reloaded from disk saw the wrong id. Fixed by routing through
   SessionStore.switch_session(), which _save()s the mapping and ends
   the old session in SQLite like /resume does.

2. /new inside a topic was a one-message no-op. Reset created a new
   session but left the telegram_dm_topic_bindings row pointing at the
   old session_id, so the next message's binding lookup switched right
   back. Now _handle_reset_command rebinds the topic to the new
   session_id after reset.

3. is_telegram_session_linked_to_topic and
   list_unlinked_telegram_sessions_for_user both called
   apply_telegram_topic_migration() on read, contradicting the PR's
   own invariant that migration only runs on explicit /topic opt-in.
   They now tolerate missing topic tables and return empty/False.

Also: _telegram_topic_mode_enabled() now only treats True as enabled
(not any truthy return), so test fixtures with MagicMock session_db
don't accidentally flip every DM into lobby mode — this was breaking
4 pre-existing test_status_command tests.

Tests:
- New regression: /new inside a topic must update the binding row
  (test_new_inside_telegram_topic_rewrites_binding_to_new_session).
- _make_runner now stubs switch_session so existing restore tests
  still exercise the new code path.

Validated end-to-end with real SessionDB + SessionStore:
readers on fresh DB don't create topic tables; enable creates them;
binding override persists across SessionStore restart; /new rebinds
and the new id survives a restart.

Co-authored-by: EmelyanenkoK <emelyanenko.kirill@gmail.com>
2026-05-04 12:07:17 -07:00
EmelyanenkoK
25065283b3 fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
EmelyanenkoK
d6615d8ec7 feat: add Telegram DM topic-mode sessions 2026-05-04 12:07:17 -07:00
brooklyn!
d9c090fe36
Merge pull request #19338 from asheriif/fix/tui-plugin-slash-exec-live
fix(tui): run plugin slash commands live
2026-05-04 09:57:45 -07:00
kshitijk4poor
54e78cadb2 test: add regression test for Teams interactive_setup import fix
Adapted from PR #19188 by @LeonSGP43 — mocks cli_output helpers and
verifies interactive_setup persists credentials to .env without
crashing. Also adds megastary to AUTHOR_MAP.
2026-05-04 06:54:27 -07:00
bobashopcashier
d89e7a3cd4 fix(anthropic): restrict fast mode to Opus 4.6 (Anthropic API contract)
Per https://platform.claude.com/docs/en/build-with-claude/fast-mode:
"Fast mode is currently supported on Opus 4.6 only. Sending speed: fast
with an unsupported model returns an error."

Pre-fix, _is_anthropic_fast_model() returned True for any claude-* model,
so /fast on Opus 4.7 (or Sonnet/Haiku) would persist agent.service_tier=fast
in config.yaml and the adapter would inject extra_body["speed"] = "fast"
on every subsequent request. Opus 4.7 returns:

  HTTP 400: 'claude-opus-4-7' does not support the `speed` parameter.

This wedged sessions across model upgrades (a user who ran /fast on Opus 4.6
and later switched the default model to 4.7 hit a hard 400 on every turn
until they manually edited config.yaml).

Changes:
- _is_anthropic_fast_model: gate on "opus-4-6" / "opus-4.6" only
- anthropic_adapter: add _supports_fast_mode predicate as defensive guard
  so stale request_overrides on an unsupported model are dropped silently
  instead of 400'ing
- Tests: flip the assertions that mirrored the bug (Sonnet/Haiku/Opus 4.7
  asserting fast-mode support) to match the documented API contract
2026-05-04 06:23:52 -07:00
briandevans
ce22301dc6 test(sms): use clear=True in test_missing_phone_number_is_non_retryable
Prevents pre-existing TWILIO_PHONE_NUMBER or SMS_WEBHOOK_URL values in
the outer test environment from leaking into the assertion context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:25:09 -07:00
0668001438
83080772f2 fix(delegation): honor provider override for subagents
Clear inherited provider preference filters when delegation.provider is set so delegated children do not route back to the parent provider. Add a regression test for cross-provider delegation with parent OpenRouter filters.

Closes #10653
2026-05-04 05:22:35 -07:00
Pratik Rai
7a8ee8b29d fix(gateway): deduplicate Weixin messages by content fingerprint 2026-05-04 05:20:13 -07:00
briandevans
0b5fd40a01 fix(delegate): correct _spawn_child → _build_child_agent in comments
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:18:45 -07:00
VinVC
5d6431c114 fix(doctor): resolve merge conflicts, add kimi-coding-cn test
- Rebased on upstream/main to resolve conflicts
- Added test_run_doctor_accepts_kimi_coding_cn_provider test
- All 30 tests pass
2026-05-04 05:12:42 -07:00
阿泥豆
0e9416036a test: add unit tests for heartbeat stale threshold increase 2026-05-04 05:08:51 -07:00
briandevans
6b4ccb9b14 fix(session-search): report source from resolved parent, not FTS5 child session (#15909)
When a delegation child session (e.g. source='telegram') contains the
FTS5 hit but _resolve_to_parent() maps it to a different root session
(source='api_server'), the result entry was still reporting the child's
source because the loop discarded session_meta as `_` and fell back to
match_info.get('source'), which carries the child session's value.

Use the resolved parent's session_meta for source, model, and started_at
with match_info as a fallback, so the output accurately reflects the
session the user actually interacted with.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:07:40 -07:00
briandevans
b46b0c9888 fix(backup): floor pre-update backup_keep to 1 so the new backup survives
`updates.backup_keep: 0` (or any negative value) wiped the freshly-
created pre-update zip:

  _prune_pre_update_backups(backup_dir, keep=0):
      backups = sorted(..., reverse=True)   # newest first, includes
                                            # the zip we just wrote
      for p in backups[0:]:                 # = all of them
          p.unlink()

The wrapper in `main.py` then printed `Saved: <path>` for a file that
no longer existed (the size lookup is wrapped in `try/except OSError`
which silently degrades to "0 B"), leaving operators believing they had
a recovery point when they had none.

This is a real footgun because some config systems treat 0 as "keep
unlimited"; here it does the opposite — every backup is destroyed
right after creation.

Fix: clamp `keep` to a minimum of 1 inside `_prune_pre_update_backups`
since that helper is only invoked immediately after a fresh backup
is written.  Operators who genuinely want no backups should set
`updates.pre_update_backup: false` (which gates creation entirely)
rather than relying on `backup_keep: 0`.

Also extends the `backup_keep` config docstring to spell out the floor
and point at `pre_update_backup: false` as the off-switch.

## Tests

Three regression tests added in `TestPreUpdateBackup`:

  - `test_keep_zero_does_not_delete_freshly_created_backup` —
    asserts the file persists after `keep=0`
  - `test_keep_negative_does_not_delete_freshly_created_backup` —
    same for negative values
  - `test_keep_zero_still_prunes_older_backups` — proves the floor
    only protects the new backup; older ones are still rotated out

Verified the new tests fail on origin/main (without the floor) and
pass with it; full `tests/hermes_cli/test_backup.py` suite green
(84 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:07:13 -07:00
Sanhu Li
ef8c213e88 fix(model-switch): soft-accept unlisted openai-codex models 2026-05-04 05:06:53 -07:00
0xsir0000
52882dade6 fix(agent): include name field on every role:tool message for Gemini compatibility (#16478)
Gemini's OpenAI-compatibility endpoint strictly requires the `name` field
on `role: tool` messages — it returns HTTP 400 ("Request contains an
invalid argument") when the function name is missing. OpenAI/Anthropic/
ollama tolerate the absence, so the gap stays invisible until the
conversation accumulates a tool turn and the user routes it through Gemini
(direct API or via ollama-cloud proxy).

Fix: add a `_get_tool_call_name_static()` helper alongside the existing
`_get_tool_call_id_static()`, and populate `name` at every site that
constructs a `role: tool` message — the pre-call sanitizer stub, the
tool-call args repair marker, both interrupt-skip paths, both
result-append paths (parallel + sequential), the invalid-tool-name
recovery, the invalid-JSON-args recovery, and the exception fallback.

Each call site was already in scope of the function name (`function_name`,
`skipped_name`, `name`, or a dict tool_call), so the change is local —
no new lookups, no behavior change for providers that already worked.

Fixes #16478
2026-05-04 05:06:33 -07:00
OpenClaw Bot
0443484115 fix(qqbot): honor proxy env vars for websocket 2026-05-04 05:06:09 -07:00
陈运波0668001438
6cf7a9e330 fix(vision): preserve explicit provider auth with custom base_url
Keep the configured vision provider when base_url is overridden so credential-pool lookup still resolves provider-specific API keys (e.g. ZAI_API_KEY), and add a regression test for this path.
2026-05-04 05:05:43 -07:00
swithek
b7bbc62503 fix(compressor): _prune_old_tool_results boundary direction 2026-05-04 05:05:18 -07:00
Dejie Guo
d29f90e89d fix(error_classifier): avoid large-context false overflow heuristics
Generic 400 and server-disconnect heuristics used absolute token/message-count fallbacks that are too aggressive for 1M context sessions. Gate those absolute fallbacks to smaller context windows while preserving relative pressure checks.

Fixes #16351
2026-05-04 05:04:56 -07:00
giwaov
026a5e47df fix(cli): preserve Windows hidden-dir paths in markdown 2026-05-04 05:04:36 -07:00
Teknium
3fb35520c6
revert: auto-subscribe gateway chat on tool-driven kanban_create (#19718) (#19721)
Reverts ff3d2773e2. Teknium reviewed the merged PR and decided this
behavior isn't wanted — tool-driven kanban_create should not mirror
the slash-command path's auto-subscribe. Orchestrators that want
their originating chat notified can call kanban_notify-subscribe
explicitly; we're not going to make it implicit.
2026-05-04 05:04:01 -07:00
Teknium
ff3d2773e2
feat(kanban): auto-subscribe gateway chat on tool-driven kanban_create (#19718)
Closes #19479.

When an orchestrator agent calls kanban_create from a gateway session
(e.g. a Telegram user delegating to an orchestrator profile), auto-
subscribe the originating (platform, chat, thread, user) to the new
task's terminal events. Mirrors the behavior of the /kanban create
slash command in gateway/run.py so tool-driven creation is at parity
with human-driven creation.

Without this, a user who interacts with an orchestrator exclusively
via the gateway never receives blocked / completed / gave_up
notifications for tasks the orchestrator created on their behalf —
silently breaking the gateway-first multi-agent flow the reporter
describes.

Reads the context-local HERMES_SESSION_* vars via get_session_env()
(not os.environ — those are contextvars for asyncio concurrency
safety). Falls through cleanly in CLI / cron contexts with no
session active (subscribed=False in the response). Best-effort: if
the gateway module isn't importable (test rigs stubbing gateway.*),
the task still creates, we just skip the subscription.

Response gains a 'subscribed' bool so the orchestrator knows whether
terminal events will land back in the originating chat or whether it
needs to poll / unblock manually.

Tests: 4 new in tests/tools/test_kanban_tools.py covering
CLI/no-subscribe, telegram/gateway-auto-subscribe, discord-DM/no-
thread subscribe, and partial-ctx/no-chat_id no-subscribe. 40/40
kanban tool tests pass.
2026-05-04 05:02:23 -07:00
Nikolay Gusev
fdf9343c51 fix(tools): wrap bare scalars in single-element list for array-typed args
Open-weight models (DeepSeek, Qwen, GLM) sometimes emit tool calls like
`{"urls": "https://a.com"}` when the tool schema declares
`type: array`.  The call was JSON-valid but semantically wrong, and
`coerce_tool_args` would pass the bare string through — the tool then
failed with a confusing type error.

`coerce_tool_args` now wraps non-list, non-null values in a
single-element list when the schema declares `array`.  Strings still go
through `_coerce_value` first so JSON-encoded arrays
(`'["a","b"]'`) parse correctly and nullable `"null"` still
becomes `None`.  `None` itself is preserved — tools with sensible
defaults already handle it, and we don't want to silently mask a
deliberate null.

Salvaged from #19652 (NikolayGusev-astra) — the broader validate-then-
repair layer had several issues (duplicated existing coercion,
mis-classified `old_string` as a path field, prepended non-JSON
prefixes to tool results that break downstream JSON parsing, hardcoded
offset/limit defaults unsuitable for non-read_file tools).  The one
genuinely new capability is wrapping bare scalars, which is implemented
here directly inside the existing coercion path.

Co-authored-by: Nikolay Gusev <ngusev@astralinux.ru>
2026-05-04 05:00:37 -07:00
Teknium
a175f39577
feat(nous): persist Nous OAuth across profiles via shared token store (#19712)
Mirrors the Codex auto-import UX. On successful Nous login (either
`hermes auth add nous --type oauth` or `hermes login nous`), tokens are
mirrored to `$HERMES_SHARED_AUTH_DIR/nous_auth.json` (default
`~/.hermes/shared/nous_auth.json`, outside any named profile's
HERMES_HOME). On next login in a new profile, the flow offers to import
those credentials ("Import these credentials? [Y/n]") and rehydrates via
a forced refresh+mint instead of running the full device-code flow.

Runtime refresh in any profile syncs the rotated refresh_token back to
the shared store so sibling profiles don't hit stale-token fallback
after rotation.

The volatile 24h agent_key is NOT persisted to the shared store —
only the long-lived OAuth tokens are cross-profile useful.

- `HERMES_SHARED_AUTH_DIR` env var for tests + custom layouts
- Pytest seat belt mirrors the existing `_auth_file_path` guard so
  forgetting to redirect the store in a test fails loudly
- File mode 0600 where platform supports it
- Runtime credential resolution is unchanged — shared store is only
  consulted during the login flow, so profile isolation at runtime is
  preserved
- Stale refresh_token + portal-down cases gracefully fall back to
  device-code

Addresses a user report from Mike Nguyen: running
`hermes --profile <name> auth add nous --type oauth` for every new
profile is unnecessary friction now that Codex has a shared-import
flow via `~/.codex/auth.json`.
2026-05-04 04:54:55 -07:00
Teknium
d3b22b76d8
fix(kanban): enforce worker task-ownership on destructive tool calls (#19713)
Closes #19534 (security).

A worker spawned by the kanban dispatcher has HERMES_KANBAN_TASK set
to its own task id. The destructive tools (kanban_complete,
kanban_block, kanban_heartbeat) resolved task_id via
_default_task_id() which preferred an explicit arg over the env var,
with no ownership check — so a buggy or prompt-injected worker could
complete / block / heartbeat any OTHER task (sibling, cross-tenant,
anything) by supplying its id. Reporter's repro: worker for t_A
passed task_id=t_B to kanban_complete and got {"ok": true}.

Fix: add _enforce_worker_task_ownership(tid). If HERMES_KANBAN_TASK
is set and tid doesn't match, return a structured tool error with
guidance to use kanban_comment (for information handoff across tasks)
or kanban_create (for follow-up work). Orchestrator profiles (no env
var, but kanban toolset enabled per #18968) are exempt — their job
is routing and sometimes includes closing out child tasks.

Kept unrestricted (deliberately):
- kanban_show — workers legitimately read parent/sibling handoff context
- kanban_comment — cross-task comments are the handoff mechanism
- kanban_create — orchestrator fan-out, worker follow-up spawning
- kanban_link — parent/child linking

Tests: 5 new regression tests in tests/tools/test_kanban_tools.py
covering the grid (worker-attacks-foreign ×3 tools, worker-own-task
preserved, orchestrator-unrestricted). 36/36 pass.
2026-05-04 04:54:02 -07:00
h0tp-ftw
8c8f95bc8e fix(gateway): show friendly error when service is not installed
Instead of an unhelpful CalledProcessError traceback when running
`hermes gateway start/stop/restart` without first installing the service,
check for the unit file and exit with an actionable install hint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 04:49:51 -07:00
Teknium
a8b689f0c2 test(kanban): regression for status=running rejection at dashboard PATCH
Reporter of #19535 explicitly asked for a regression test — covers it
here so a future refactor of _set_status_direct can't silently re-enable
the direct ready/todo -> running bypass.

Asserts both: (a) HTTP 400 with 'running' in the detail message, and
(b) the task's status is unchanged after the rejected PATCH (pre-request
status preserved, no partial mutation).
2026-05-04 04:46:47 -07:00
vominh1919
652f8e6f3e fix(test): correct _coerce_number inf/nan test assertions
The test 'test_inf_stays_string_for_integer_only' incorrectly asserted
that _coerce_number('inf') returns float('inf'), but the function
correctly returns the original string 'inf' because infinity is not
JSON-serializable.

Fixed the assertion to expect the string 'inf', and added two new tests
for negative infinity and NaN edge cases to improve coverage of the
non-JSON-serializable number guard in _coerce_number().
2026-05-04 04:45:55 -07:00
Yoimex
edf9c75621 fix(env): pass -- to cd for hyphen-prefixed workdirs 2026-05-04 04:45:03 -07:00
Teknium
ae40fca955 fix(profiles): keep validate_profile_name strict; callers normalize first
Follow-up to @changchun989's cherry-pick: reverts the validate-via-
normalize change so validate_profile_name remains a strict regex check
on the input AS-GIVEN. Callers that accept mixed-case user input
(dashboard UI, CLI args, import flows) call normalize_profile_name()
first, then validate the result. This keeps validate honest about
what the on-disk directory name must look like — e.g. '  jules '
(trailing whitespace) is now rejected instead of silently trimmed
and accepted.

- validate_profile_name: strict lowercase/regex check again, 'UPPER'
  back in the invalid-names parametrize
- 8 call sites in profiles.py (create_profile, delete_profile,
  set_active_profile, export_profile, import_profile, rename_profile,
  resolve_profile_env, plus the clone_from branch): swap the
  normalize-then-validate order
- scripts/release.py: add changchun989@proton.me -> changchun989 to
  AUTHOR_MAP so CI doesn't block on the unmapped contributor email

All kanban + profile tests pass (268 across test_profiles.py +
test_kanban_db.py + test_kanban_core_functionality.py, plus 73 in
test_kanban_tools.py + test_kanban_dashboard_plugin.py).

Closes #18498.
2026-05-04 04:44:37 -07:00
changchun989
a31477dabb fix(profiles): normalize profile IDs for Kanban assignees and lookups
- Add normalize_profile_name() for lowercase canonical IDs and Default alias
- Use canonical names in create/delete/rename/export/import/set_active paths
- Canonicalize Kanban assignee on create/assign, list filter, and worker spawn
- Tests for mixed-case assignees and profile resolution (fixes #18498)
2026-05-04 04:44:37 -07:00
Yuyang Xu
60c4bc96fd fix(security): restore .env/auth.json/state.db with 0600 perms
`hermes import` was creating secret files with the process umask
(typically 0644) instead of 0600. zipfile.open() does not honor the
Unix mode bits stored in zip member external_attr; the restore loop
used open(target, "wb") which always falls back to umask.

Threat: silent privilege downgrade after a routine restore on
multi-user systems (shared dev boxes, CI runners, jump hosts) — any
local user could read API keys and OAuth tokens from ~/.hermes/.

Fix mirrors the convention already used at file creation
(hermes_cli/auth.py: stat.S_IRUSR | stat.S_IWUSR for auth.json).
The quick-snapshot restore path (restore_quick_snapshot) is
unaffected — it uses shutil.copy2 which preserves perms via
copystat().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 04:43:53 -07:00
Cameron Aragon
239ea1bdea fix(image-gen): preserve xAI API error status 2026-05-04 04:43:07 -07:00
Teknium
5ec6baa400
feat(kanban): multi-project boards — one install, many kanbans (#19653)
Adds first-class board support to kanban so users can separate unrelated
streams of work (projects, repos, domains) into isolated queues. Single-
project users stay on the 'default' board and see no UI change.

Isolation model
---------------
- Each board is a directory at `~/.hermes/kanban/boards/<slug>/` with
  its own `kanban.db`, `workspaces/`, and `logs/`. The 'default' board
  keeps its legacy path (`~/.hermes/kanban.db`) for back-compat — fresh
  installs and pre-boards users get zero migration.
- Workers spawned by the dispatcher have `HERMES_KANBAN_BOARD` pinned in
  their env alongside the existing `HERMES_KANBAN_DB` /
  `HERMES_KANBAN_WORKSPACES_ROOT` pins, so workers physically cannot see
  other boards' tasks.
- The gateway's single dispatcher loop now sweeps every board per tick;
  per-tick cost is a few extra filesystem stats.
- CAS concurrency guarantees are preserved per-board (each board is its
  own SQLite DB, same WAL+IMMEDIATE machinery as before).

CLI
---
  hermes kanban boards list|create|switch|show|rename|rm
  hermes kanban --board <slug> <any-subcommand>

Board resolution order: `--board` flag → `HERMES_KANBAN_BOARD` env →
`~/.hermes/kanban/current` file → `default`. Slug validation is strict:
lowercase alphanumerics + hyphens + underscores, 1-64 chars, starts with
alphanumeric. Uppercase is auto-downcased; slashes / dots / `..` /
control chars are rejected so boards can't name their way out of the
boards/ directory.

Passive discoverability: when more than one board exists, `hermes kanban
list` prints a one-line header ("Board: foo (2 other boards …)") so
users who stumble across multi-project never have to hunt for the
feature. Invisible for single-board installs.

Dashboard
---------
- New `BoardSwitcher` component at the top of the Kanban tab: dropdown
  with all boards + task counts, `+ New board` button, `Archive`
  button (non-default only). Hidden entirely when only `default` exists
  and is empty — single-project users never see it.
- New `NewBoardDialog` modal: slug / display name / description / icon
  + "switch to this board after creating" checkbox.
- Selected board persists to `localStorage` so browser users don't
  shift the CLI's active board out from under a terminal they left open.
- New `?board=<slug>` query param on every existing endpoint plus a
  new `/boards` CRUD surface (`GET /boards`, `POST /boards`,
  `PATCH /boards/<slug>`, `DELETE /boards/<slug>`,
  `POST /boards/<slug>/switch`).
- Events WebSocket is pinned to a board at connection time; switching
  opens a fresh WS against the new board.

Also fixes a pre-existing bug in the plugin's tenant / assignee
filters: the SDK's `Select` uses `onValueChange(value)`, not
native `onChange(event)`, so those filters silently didn't work.
New `selectChangeHandler` helper wires both signatures.

Tests
-----
49 new tests in `tests/hermes_cli/test_kanban_boards.py` covering:
slug validation (valid / invalid / auto-downcase), path resolution
(default = legacy path, named = `boards/<slug>/`, env var override),
current-board resolution chain (env > file > default), board CRUD +
archive / hard-delete, per-board connection isolation (tasks don't
leak), worker spawn env injection (`HERMES_KANBAN_BOARD`,
`HERMES_KANBAN_DB`, `HERMES_KANBAN_WORKSPACES_ROOT` all point at the
right board), and end-to-end CLI surface.

Regression surface: all 264 pre-existing kanban tests continue to pass.

Live-tested via the dashboard: created 3 boards (default,
hermes-agent, atm10-server), created tasks on each via both CLI
(`--board <slug> create`) and dashboard (inline create on the Ready
column), confirmed zero cross-board leakage, confirmed `BoardSwitcher`
+ `NewBoardDialog` work end-to-end in the browser.
2026-05-04 04:42:38 -07:00
vominh1919
135b4c8b35 fix(mcp): decouple AnyUrl import from mcp dependency
AnyUrl was imported inside the same try block as mcp.client.auth, so
when the mcp package was not installed, AnyUrl was undefined and
_build_client_metadata raised NameError at runtime.

Moved the AnyUrl import to its own try/except block so it's available
whenever pydantic is installed (which is a core dependency), regardless
of whether the mcp SDK is present.

Also added pytest.importorskip('mcp') to the three
test_build_client_metadata tests that exercise _build_client_metadata,
since that function depends on OAuthClientMetadata from the mcp package.
2026-05-04 04:42:18 -07:00
vominh1919
0d563621fb fix(test): skip bedrock adapter tests when botocore is not installed
Six tests in test_bedrock_adapter.py import botocore.exceptions
directly (ConnectionClosedError, EndpointConnectionError,
ReadTimeoutError, ClientError) without guarding the import. When
botocore is not installed (it's an optional dependency), these tests
fail with ModuleNotFoundError instead of being gracefully skipped.

Added pytest.importorskip('botocore') to each affected test function,
following the same pattern used elsewhere in the test suite (e.g.
test_voice_mode.py for numpy, test_mcp_oauth.py for mcp).

Tests affected:
- TestIsStaleConnectionError: 3 tests
- TestCallConverseInvalidatesOnStaleError: 3 tests

Before: 6 FAIL with ModuleNotFoundError
After:  6 SKIP with reason message
2026-05-04 04:41:55 -07:00
vominh1919
d1d2d43387 fix(test): add skip marker for transcription tests requiring faster_whisper
TestTranscribeLocalExtended patches faster_whisper.WhisperModel, which
triggers an ImportError when the faster_whisper package is not installed.
Added a pytest.mark.skipif marker using importlib.util.find_spec so
these tests are gracefully skipped instead of failing with
ModuleNotFoundError.
2026-05-04 04:41:36 -07:00
Siddharth Balyan
af6f9bc2a1
fix: refresh systemd unit on gateway boot (not just start/restart) (#19684)
The resilient restart settings from PR #18639 only took effect when
the gateway was started via `hermes gateway start` or `hermes gateway
restart` — both of which call refresh_systemd_unit_if_needed() which
writes the new unit and runs daemon-reload.

However, when the gateway self-restarts via exit-code-75 (stale-code
detection after `hermes update`, or the /restart command), systemd
respawns the process directly without going through any CLI function.
The unit file on disk stays stale, and systemd keeps using the old
cached settings (StartLimitBurst=5, RestartSec=30) until someone
manually runs `hermes gateway restart`.

This meant that after PR #18639 was deployed, users who never ran
`hermes gateway restart` manually were still vulnerable to the
permanent-death-on-network-outage bug.

Fix: call refresh_systemd_unit_if_needed() at the top of run_gateway()
(the foreground entry point that systemd's ExecStart invokes). This
ensures that on every boot — whether triggered by systemd restart,
exit-75 respawn, or manual foreground run — the unit definition and
daemon state are current. The call is best-effort (exceptions caught)
and a no-op when the unit is already current (one stat + string compare).
2026-05-04 16:27:51 +05:30
Ioodu
e50809b771 fix(file-tools): cap read_file result size to prevent context window overflow
Set max_result_size_chars=100_000 on the read_file registry entry (was
float('inf')), closing the Layer 2 defense-in-depth gap in
tool_result_storage.py. The existing Layer 1 guard inside
_handle_read_file already returns a JSON error for oversized reads;
this aligns the registry cap with every other tool.

Update test_read_file_never_persisted → test_read_file_result_size_cap
to assert 100_000, and add test_read_file_registry_cap_is_100k as an
explicit regression guard against re-introducing float('inf').

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 03:14:59 -07:00
Teknium
5b6d413476 fix(cli,gateway): surface title errors from /new <name>
The contributor's PR silently swallowed ValueError from
SessionDB.set_session_title() with bare except Exception: pass.
Users typing /new <title> with an already-in-use title got an
untitled session and no feedback.

Changes:
- cli.py: catch ValueError from both sanitize_title() and
  set_session_title(); print the error and mark the session
  untitled in the banner (never echo the rejected title back).
- gateway/run.py: append a warning note to the reset reply on
  title rejection; reflect the accepted title in the header.
- Add regression tests for the duplicate-title path in CLI and
  gateway.

Also map exx@example.com -> @exxmen in scripts/release.py.
2026-05-04 03:14:50 -07:00
Exx
f720751d79 feat(cli,gateway): /new accepts optional session name argument
Allow users to start a fresh session and immediately set its title by
passing a name to /new (or /reset):

    /new Refactor auth module

Changes:
- hermes_cli/commands.py: add args_hint='[name]' to /new command
- cli.py: parse title argument in process_command(), pass to new_session()
- cli.py: new_session() accepts title=None, sets title via SessionDB
- gateway/run.py: _handle_reset_command() parses title, sets on new entry
- gateway/session.py: reset_session() accepts optional display_name
- tests: add test_new_session_with_title, test_reset_command_with_title,
  test_new_command_in_help_output

All 36 affected tests pass.
2026-05-04 03:14:50 -07:00
xyiy001
e69d11d30c fix(browser): allow CDP override to pass requirement checks
Treat explicit CDP override mode as a valid browser backend even when agent-browser is absent, and add a regression test to prevent false-negative availability gating.
2026-05-04 03:12:30 -07:00
ChanlerDev
e3461e0b2a fix(cli): remove dead 'q' check from quit command resolution
The 'q' alias is defined for 'queue' command in commands.py:93.
The hardcoded 'q' in cli.py:5910 was dead code - resolve_command('q')
returns the queue CommandDef, so canonical would never be 'q'.

Removes the misleading check without changing any behavior:
- /quit and /exit still exit (defined aliases)
- /q still maps to queue (as intended)
2026-05-04 03:11:30 -07:00
DaniuXie
a45bd28598 fix(wecom): set SUPPORTS_MESSAGE_EDITING=False to prevent broken streaming 2026-05-04 03:10:36 -07:00