Commit Graph

3441 Commits

Author SHA1 Message Date
Teknium
1c7c7c3c5f
feat(kanban-dashboard): per-platform home-channel notification toggles (#19864)
* revert: auto-subscribe gateway chat on tool-driven kanban_create (#19718)

Reverts ff3d2773e2. Teknium reviewed the merged PR and decided this
behavior isn't wanted — tool-driven kanban_create should not mirror
the slash-command path's auto-subscribe. Orchestrators that want
their originating chat notified can call kanban_notify-subscribe
explicitly; we're not going to make it implicit.

* feat(kanban-dashboard): per-platform home-channel notification toggles

Adds a "Notify home channels" section to the task drawer in the kanban
dashboard plugin. Each platform where the user has set a home channel
(/sethome, TELEGRAM_HOME_CHANNEL env var, gateway.platforms.<p>.home_channel
in config.yaml) gets a toggle pill. Toggling on writes a kanban_notify_subs
row keyed to that platform's home (chat_id + thread_id); toggling off
removes it. The existing gateway notifier watcher delivers completed /
blocked / gave_up events without any new plumbing — this is purely a GUI
surface over existing machinery.

Replaces the reverted auto-subscribe behavior from #19718 with an explicit,
per-task, per-platform, user-controlled opt-in. No implicit subscription
on tool-driven kanban_create; no CLI commands; no slash commands. Just a
toggle in the drawer.

Backend (plugins/kanban/dashboard/plugin_api.py):
- GET  /api/plugins/kanban/home-channels[?task_id=X]
  Returns every platform with a configured home, plus a per-entry
  subscribed: bool relative to task_id (false when task_id omitted).
  Reads the live GatewayConfig via load_gateway_config() so env-var
  overlays stay honored.
- POST /api/plugins/kanban/tasks/:id/home-subscribe/:platform
  Idempotent add_notify_sub keyed to the platform's home.
- DELETE /api/plugins/kanban/tasks/:id/home-subscribe/:platform
  remove_notify_sub for the same tuple.
- 404 when the platform has no home configured, or task_id doesn't
  exist (POST only).

Frontend (plugins/kanban/dashboard/dist/index.js):
- TaskDrawer fetches /home-channels on open, keyed on task_id.
- HomeSubsSection renders nothing when zero platforms have a home (so
  users who haven't set one up don't see an empty UI block).
- Optimistic toggle with busy flag + revert-on-failure. One pill per
  platform; ✓ prefix and --on class indicate the subscribed state.

CSS (plugins/kanban/dashboard/dist/style.css):
- .hermes-kanban-home-subs flex row + .hermes-kanban-home-sub pill
  style + --on subscribed variant (subtle ring-colored background).

Live-tested against a dashboard with TELEGRAM + DISCORD_BOT_TOKEN /
HOME_CHANNEL env vars set: drawer shows both pills, toggling each
flips its visual state AND writes/removes the correct kanban_notify_subs
row (verified via direct DB read).

Tests (tests/plugins/test_kanban_dashboard_plugin.py, 11 new, 53/53
pass total):
- home-channels lists only platforms with a home (slack with a
  token but no home is excluded)
- no task_id -> all subscribed=false
- subscribe creates notify_sub row with correct chat/thread/platform
- subscribed=true reflected in subsequent GET
- idempotent re-subscribe
- unknown platform -> 404
- unknown task -> 404
- unsubscribe removes the row
- telegram + discord subscribe/unsubscribe independent
- zero homes -> empty list
2026-05-04 12:31:21 -07:00
emozilla
2bc82bb504 clarify placeholder telegram credential in tests 2026-05-04 15:31:15 -04:00
Teknium
3db6b9cc87
feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern) (#19709)
* feat(cron): add no_agent mode for script-only cron jobs (watchdog pattern)

Adds a no_agent=True option to the cronjob system. When enabled, the
scheduler runs the attached script on schedule and delivers its stdout
directly to the job's target — no LLM, no agent loop, no token spend.
This is the classic bash-watchdog pattern (memory alert every 5 min,
disk alert every 15 min, CI ping) reimplemented as a first-class Hermes
primitive instead of a systemd timer + curl + bot token triplet living
outside the system.

## What

  hermes cron create "every 5m" \
    --no-agent \
    --script memory-watchdog.sh \
    --deliver telegram \
    --name memory-watchdog

Agent tool:

  cronjob(action='create',
          schedule='every 5m',
          script='memory-watchdog.sh',
          no_agent=True,
          deliver='telegram')

Semantics:
- Script stdout (trimmed) → delivered verbatim as the message
- Empty stdout          → silent tick (no delivery; watchdog pattern)
- wakeAgent=false gate  → silent tick (same gate LLM jobs use)
- Non-zero exit/timeout → delivered as an error alert
                          (broken watchdogs shouldn't fail silently)
- No LLM ever invoked; no tokens spent; no provider fallback applied

## Implementation

cron/jobs.py
  * create_job gains no_agent: bool = False
  * prompt becomes Optional (no_agent jobs don't need one)
  * Validation: no_agent=True requires a script at create time
  * Field roundtrips via load_jobs / save_jobs / update_job

cron/scheduler.py
  * run_job: new short-circuit branch at the top that runs the script,
    wraps its output into the (success, doc, final_response, error)
    tuple downstream delivery already expects, and returns before any
    AIAgent import or construction
  * _run_job_script: picks interpreter by extension — .sh/.bash run
    under /bin/bash, anything else under sys.executable (Python).
    Shell support unlocks the bash-watchdog pattern without wrapping
    scripts in Python. Extension is explicit; we deliberately do NOT
    trust the file's own shebang. Path-containment guard (scripts dir)
    unchanged.

tools/cronjob_tools.py
  * Schema: new no_agent boolean property with clear trigger guidance
  * cronjob() accepts no_agent and validates mode-specific shape:
    - no_agent=True requires script; prompt/skills optional
    - no_agent=False keeps the existing 'prompt or skill required' rule
  * update path rejects flipping no_agent=True on a job without a script
  * _format_job surfaces no_agent in list output
  * Handler lambda forwards no_agent from tool args

hermes_cli/main.py, hermes_cli/cron.py
  * 'hermes cron create --no-agent' and edit's --no-agent / --agent
    pair for toggling at CLI parity with the agent tool
  * Existing --script help text updated to describe both modes
  * List / create / edit output now shows 'Mode: no-agent (...)' when set

## Tests

tests/cron/test_cron_no_agent.py — 18 tests covering:
  * create_job: no_agent shape, validation, field persistence
  * update_job: flag roundtrip across reload
  * cronjob tool: schema validation, update toggling, mode-specific
    requirements, prompt-relaxation rule
  * run_job short-circuit:
    - success path delivers stdout verbatim
    - empty stdout → SILENT_MARKER (no delivery downstream)
    - wakeAgent=false gate → silent
    - script failure → error alert
    - run_job does NOT import AIAgent (verified via mock)
  * _run_job_script:
    - .sh executes via bash (no shebang required)
    - .bash executes via bash
    - .py still runs via sys.executable (regression)
    - path-traversal still blocked (security regression)

All 18 new tests pass. 341/342 pre-existing cron tests still pass; the
one failure (test_script_empty_output_noted) was already broken on main
and is unrelated to this change.

## Docs

website/docs/guides/cron-script-only.md — new dedicated guide covering
the watchdog pattern, interpreter rules, delivery mapping, worked
examples (memory / disk alerts), and the comparison table vs hermes send,
regular LLM cron jobs, and OS-level cron.

website/docs/user-guide/features/cron.md — new 'No-agent mode' section
in the cron feature reference, cross-linked to the guide.

website/docs/guides/automate-with-cron.md — new tip box pointing users
to no-agent mode when they don't need LLM reasoning.

## Compatibility

- Existing jobs: unchanged. no_agent defaults to False, existing code
  paths untouched until the flag is set.
- Schema additive only; older jobs.json without the field load fine
  via .get() with False default.
- New CLI flags are opt-in and don't alter existing flag behavior.

* fix(cron): lazy-import AIAgent + SessionDB so no_agent ticks pay zero

The unconditional `from run_agent import AIAgent` + SessionDB() init at
the top of run_job() meant every no_agent tick still paid the full agent
module load cost (~300ms + transitive imports + DB open) even though it
never touched any of that machinery.

Move both to live under the default (LLM) path, after the no_agent
short-circuit has returned. Now a no_agent tick's sys.modules stays
clean — verified end-to-end:

    assert 'run_agent' not in sys.modules  # before
    run_job(no_agent_job)
    assert 'run_agent' not in sys.modules  # after

The existing mock-based unit test (test_run_job_no_agent_never_invokes_aiagent)
kept passing because patch() replaces the class AFTER import; the leak
was only visible via real subprocess-style verification. End-to-end
demo confirmed: agent calls cronjob(no_agent=True) → script runs →
stdout delivered → no LLM machinery loaded.

* docs(cron): tighten no_agent tool schema — defaults, silent semantics, pick rule

Previous description buried the important bits in one long sentence.
Agents could plausibly miss three things an LLM-facing schema should
make unmissable:

1. What the default is — now first sentence + JSON Schema `default: false`
2. What 'silent run' actually means for the user — now spelled out:
   'nothing is sent to the user and they won't see anything happened'
3. When to pick True vs False — now a concrete decision rule with
   examples on both sides (watchdogs/metrics/pollers → True;
   summarize/draft/pick/rephrase → False)

Also adds explicit 'prompt and skills are ignored when True' since the
agent could otherwise still pass them out of habit.

No behavior change — schema text only.
2026-05-04 12:31:01 -07:00
teknium1
d35efb9898 feat(telegram): /topic off + help + auth gate + screenshot debounce
Four production-readiness additions to topic mode:

1. /topic off — clean disable path. Flips telegram_dm_topic_mode.enabled
   to 0 and clears telegram_dm_topic_bindings for this chat. Previously
   users had to edit state.db with sqlite3 to turn the feature off.
   Idempotent: calling /topic off when the chat was never enabled
   returns a friendly no-op message.

2. /topic help — inline usage printed in the DM so users don't have to
   visit docs to discover /topic off, /topic <session-id>, etc.

3. Authorization gate. /topic mutates SQLite side tables and flips the
   root DM into a lobby, so the action must be authorized. Now calls
   self._is_user_authorized(source); unauthorized DMs get a refusal
   instead of activation. Defense in depth on top of the gateway's
   existing pre-route auth.

4. BotFather screenshot debounce. A user repeatedly running /topic
   while Threads Settings is still disabled would previously re-upload
   the same screenshot every time. Now rate-limited to one send per
   5 minutes per chat. /topic off resets the counter so re-enabling
   starts fresh.

Command-def args hint updated: /topic [off|help|session-id].

Docs:
- New /topic subcommands table at the top of the multi-session section
- Disable instructions updated to recommend /topic off first, with the
  raw SQL fallback kept for bulk cleanup
- Under-the-hood list extended with the capability-hint debounce and
  the authorization gate

Tests (6 new):
- /topic help returns usage and doesn't create topic tables
- /topic off disables mode AND clears bindings
- /topic off is idempotent when never enabled
- Unauthorized users get refusal, no tables created
- Capability-hint debounce is per-chat
- /topic off resets both lobby and capability debounce counters

All 402 targeted tests pass. Full gateway sweep: 4809/4810
(pre-existing test_teams::test_send_typing unrelated).
2026-05-04 12:07:17 -07:00
teknium1
1381c89e56 fix(telegram): polish topic mode — CASCADE, General-topic handling, rename guard, debounce
Five follow-ups to topic mode based on integration audit:

1. ON DELETE CASCADE on telegram_dm_topic_bindings.session_id. Session
   pruning (manual /delete, auto-cleanup, any future prune job) would
   have thrown 'FOREIGN KEY constraint failed' for sessions bound to a
   topic. Migration bumped to v2, rebuilds the bindings table in place
   if FK lacks CASCADE. Idempotent; only runs once per DB.

2. Never auto-rename operator-declared topics. If an operator has
   extra.dm_topics configured AND a user runs /topic, messages in those
   pre-declared topics would previously trigger auto-rename and silently
   mutate operator config. _rename_telegram_topic_for_session_title now
   early-returns when _get_dm_topic_info returns a dict for this
   (chat_id, thread_id). Uses class-based lookup (not hasattr) so
   MagicMock test fixtures don't accidentally trip the guard.

3. General topic handling. Telegram's General (pinned top) topic in a
   forum-enabled private chat may send messages with message_thread_id=1
   or omit thread_id entirely depending on client. Both are now treated
   as the root lobby, not a topic lane. Prevents users from
   accidentally burning a session on the General topic.

4. Debounce the root-lobby reminder. 30-second cooldown per chat so a
   user who forgets topic mode is enabled and types ten messages in the
   root gets one reminder, not ten. Explicit command replies
   (/new-in-lobby, /topic <session-id>) still land every time.

5. Docs: added under-the-hood invariants for the above, plus a
   Downgrade section explaining that rolling back to a pre-/topic
   Hermes build leaves the DB tables orphaned but harmless — DMs just
   revert to native per-thread isolation.

Tests:
- test_operator_declared_topic_is_not_auto_renamed
- test_general_topic_is_treated_as_root_lobby
- test_lobby_reminder_is_debounced_per_chat
- test_binding_survives_session_deletion_via_cascade
- test_migration_rebuilds_v1_binding_table_with_cascade_fk

Validated: 4803/4804 tests pass (tests/gateway/ + tests/test_hermes_state.py).
Sole failure is a pre-existing test_teams::test_send_typing flake
unrelated to this PR.
2026-05-04 12:07:17 -07:00
teknium1
a7683d04a9 fix(telegram): harden DM topic binding — persist through switch_session, rebind on /new
Follow-up on @EmelyanenkoK's feat: add Telegram DM topic-mode sessions.

Three issues:

1. Split-brain session state. After get_or_create_session() returned a
   SessionEntry for a topic lane, the handler was mutating
   .session_id in place to the binding's target, but never persisting
   the switch through SessionStore. The sessions.json session_key →
   session_id map kept pointing at the lane's natural id; any reader
   that reloaded from disk saw the wrong id. Fixed by routing through
   SessionStore.switch_session(), which _save()s the mapping and ends
   the old session in SQLite like /resume does.

2. /new inside a topic was a one-message no-op. Reset created a new
   session but left the telegram_dm_topic_bindings row pointing at the
   old session_id, so the next message's binding lookup switched right
   back. Now _handle_reset_command rebinds the topic to the new
   session_id after reset.

3. is_telegram_session_linked_to_topic and
   list_unlinked_telegram_sessions_for_user both called
   apply_telegram_topic_migration() on read, contradicting the PR's
   own invariant that migration only runs on explicit /topic opt-in.
   They now tolerate missing topic tables and return empty/False.

Also: _telegram_topic_mode_enabled() now only treats True as enabled
(not any truthy return), so test fixtures with MagicMock session_db
don't accidentally flip every DM into lobby mode — this was breaking
4 pre-existing test_status_command tests.

Tests:
- New regression: /new inside a topic must update the binding row
  (test_new_inside_telegram_topic_rewrites_binding_to_new_session).
- _make_runner now stubs switch_session so existing restore tests
  still exercise the new code path.

Validated end-to-end with real SessionDB + SessionStore:
readers on fresh DB don't create topic tables; enable creates them;
binding override persists across SessionStore restart; /new rebinds
and the new id survives a restart.

Co-authored-by: EmelyanenkoK <emelyanenko.kirill@gmail.com>
2026-05-04 12:07:17 -07:00
EmelyanenkoK
25065283b3 fix: improve telegram topic mode setup 2026-05-04 12:07:17 -07:00
EmelyanenkoK
d6615d8ec7 feat: add Telegram DM topic-mode sessions 2026-05-04 12:07:17 -07:00
brooklyn!
d9c090fe36
Merge pull request #19338 from asheriif/fix/tui-plugin-slash-exec-live
fix(tui): run plugin slash commands live
2026-05-04 09:57:45 -07:00
kshitijk4poor
54e78cadb2 test: add regression test for Teams interactive_setup import fix
Adapted from PR #19188 by @LeonSGP43 — mocks cli_output helpers and
verifies interactive_setup persists credentials to .env without
crashing. Also adds megastary to AUTHOR_MAP.
2026-05-04 06:54:27 -07:00
bobashopcashier
d89e7a3cd4 fix(anthropic): restrict fast mode to Opus 4.6 (Anthropic API contract)
Per https://platform.claude.com/docs/en/build-with-claude/fast-mode:
"Fast mode is currently supported on Opus 4.6 only. Sending speed: fast
with an unsupported model returns an error."

Pre-fix, _is_anthropic_fast_model() returned True for any claude-* model,
so /fast on Opus 4.7 (or Sonnet/Haiku) would persist agent.service_tier=fast
in config.yaml and the adapter would inject extra_body["speed"] = "fast"
on every subsequent request. Opus 4.7 returns:

  HTTP 400: 'claude-opus-4-7' does not support the `speed` parameter.

This wedged sessions across model upgrades (a user who ran /fast on Opus 4.6
and later switched the default model to 4.7 hit a hard 400 on every turn
until they manually edited config.yaml).

Changes:
- _is_anthropic_fast_model: gate on "opus-4-6" / "opus-4.6" only
- anthropic_adapter: add _supports_fast_mode predicate as defensive guard
  so stale request_overrides on an unsupported model are dropped silently
  instead of 400'ing
- Tests: flip the assertions that mirrored the bug (Sonnet/Haiku/Opus 4.7
  asserting fast-mode support) to match the documented API contract
2026-05-04 06:23:52 -07:00
briandevans
ce22301dc6 test(sms): use clear=True in test_missing_phone_number_is_non_retryable
Prevents pre-existing TWILIO_PHONE_NUMBER or SMS_WEBHOOK_URL values in
the outer test environment from leaking into the assertion context.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:25:09 -07:00
0668001438
83080772f2 fix(delegation): honor provider override for subagents
Clear inherited provider preference filters when delegation.provider is set so delegated children do not route back to the parent provider. Add a regression test for cross-provider delegation with parent OpenRouter filters.

Closes #10653
2026-05-04 05:22:35 -07:00
Pratik Rai
7a8ee8b29d fix(gateway): deduplicate Weixin messages by content fingerprint 2026-05-04 05:20:13 -07:00
briandevans
0b5fd40a01 fix(delegate): correct _spawn_child → _build_child_agent in comments
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:18:45 -07:00
VinVC
5d6431c114 fix(doctor): resolve merge conflicts, add kimi-coding-cn test
- Rebased on upstream/main to resolve conflicts
- Added test_run_doctor_accepts_kimi_coding_cn_provider test
- All 30 tests pass
2026-05-04 05:12:42 -07:00
阿泥豆
0e9416036a test: add unit tests for heartbeat stale threshold increase 2026-05-04 05:08:51 -07:00
briandevans
6b4ccb9b14 fix(session-search): report source from resolved parent, not FTS5 child session (#15909)
When a delegation child session (e.g. source='telegram') contains the
FTS5 hit but _resolve_to_parent() maps it to a different root session
(source='api_server'), the result entry was still reporting the child's
source because the loop discarded session_meta as `_` and fell back to
match_info.get('source'), which carries the child session's value.

Use the resolved parent's session_meta for source, model, and started_at
with match_info as a fallback, so the output accurately reflects the
session the user actually interacted with.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:07:40 -07:00
briandevans
b46b0c9888 fix(backup): floor pre-update backup_keep to 1 so the new backup survives
`updates.backup_keep: 0` (or any negative value) wiped the freshly-
created pre-update zip:

  _prune_pre_update_backups(backup_dir, keep=0):
      backups = sorted(..., reverse=True)   # newest first, includes
                                            # the zip we just wrote
      for p in backups[0:]:                 # = all of them
          p.unlink()

The wrapper in `main.py` then printed `Saved: <path>` for a file that
no longer existed (the size lookup is wrapped in `try/except OSError`
which silently degrades to "0 B"), leaving operators believing they had
a recovery point when they had none.

This is a real footgun because some config systems treat 0 as "keep
unlimited"; here it does the opposite — every backup is destroyed
right after creation.

Fix: clamp `keep` to a minimum of 1 inside `_prune_pre_update_backups`
since that helper is only invoked immediately after a fresh backup
is written.  Operators who genuinely want no backups should set
`updates.pre_update_backup: false` (which gates creation entirely)
rather than relying on `backup_keep: 0`.

Also extends the `backup_keep` config docstring to spell out the floor
and point at `pre_update_backup: false` as the off-switch.

## Tests

Three regression tests added in `TestPreUpdateBackup`:

  - `test_keep_zero_does_not_delete_freshly_created_backup` —
    asserts the file persists after `keep=0`
  - `test_keep_negative_does_not_delete_freshly_created_backup` —
    same for negative values
  - `test_keep_zero_still_prunes_older_backups` — proves the floor
    only protects the new backup; older ones are still rotated out

Verified the new tests fail on origin/main (without the floor) and
pass with it; full `tests/hermes_cli/test_backup.py` suite green
(84 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 05:07:13 -07:00
Sanhu Li
ef8c213e88 fix(model-switch): soft-accept unlisted openai-codex models 2026-05-04 05:06:53 -07:00
0xsir0000
52882dade6 fix(agent): include name field on every role:tool message for Gemini compatibility (#16478)
Gemini's OpenAI-compatibility endpoint strictly requires the `name` field
on `role: tool` messages — it returns HTTP 400 ("Request contains an
invalid argument") when the function name is missing. OpenAI/Anthropic/
ollama tolerate the absence, so the gap stays invisible until the
conversation accumulates a tool turn and the user routes it through Gemini
(direct API or via ollama-cloud proxy).

Fix: add a `_get_tool_call_name_static()` helper alongside the existing
`_get_tool_call_id_static()`, and populate `name` at every site that
constructs a `role: tool` message — the pre-call sanitizer stub, the
tool-call args repair marker, both interrupt-skip paths, both
result-append paths (parallel + sequential), the invalid-tool-name
recovery, the invalid-JSON-args recovery, and the exception fallback.

Each call site was already in scope of the function name (`function_name`,
`skipped_name`, `name`, or a dict tool_call), so the change is local —
no new lookups, no behavior change for providers that already worked.

Fixes #16478
2026-05-04 05:06:33 -07:00
OpenClaw Bot
0443484115 fix(qqbot): honor proxy env vars for websocket 2026-05-04 05:06:09 -07:00
陈运波0668001438
6cf7a9e330 fix(vision): preserve explicit provider auth with custom base_url
Keep the configured vision provider when base_url is overridden so credential-pool lookup still resolves provider-specific API keys (e.g. ZAI_API_KEY), and add a regression test for this path.
2026-05-04 05:05:43 -07:00
swithek
b7bbc62503 fix(compressor): _prune_old_tool_results boundary direction 2026-05-04 05:05:18 -07:00
Dejie Guo
d29f90e89d fix(error_classifier): avoid large-context false overflow heuristics
Generic 400 and server-disconnect heuristics used absolute token/message-count fallbacks that are too aggressive for 1M context sessions. Gate those absolute fallbacks to smaller context windows while preserving relative pressure checks.

Fixes #16351
2026-05-04 05:04:56 -07:00
giwaov
026a5e47df fix(cli): preserve Windows hidden-dir paths in markdown 2026-05-04 05:04:36 -07:00
Teknium
3fb35520c6
revert: auto-subscribe gateway chat on tool-driven kanban_create (#19718) (#19721)
Reverts ff3d2773e2. Teknium reviewed the merged PR and decided this
behavior isn't wanted — tool-driven kanban_create should not mirror
the slash-command path's auto-subscribe. Orchestrators that want
their originating chat notified can call kanban_notify-subscribe
explicitly; we're not going to make it implicit.
2026-05-04 05:04:01 -07:00
Teknium
ff3d2773e2
feat(kanban): auto-subscribe gateway chat on tool-driven kanban_create (#19718)
Closes #19479.

When an orchestrator agent calls kanban_create from a gateway session
(e.g. a Telegram user delegating to an orchestrator profile), auto-
subscribe the originating (platform, chat, thread, user) to the new
task's terminal events. Mirrors the behavior of the /kanban create
slash command in gateway/run.py so tool-driven creation is at parity
with human-driven creation.

Without this, a user who interacts with an orchestrator exclusively
via the gateway never receives blocked / completed / gave_up
notifications for tasks the orchestrator created on their behalf —
silently breaking the gateway-first multi-agent flow the reporter
describes.

Reads the context-local HERMES_SESSION_* vars via get_session_env()
(not os.environ — those are contextvars for asyncio concurrency
safety). Falls through cleanly in CLI / cron contexts with no
session active (subscribed=False in the response). Best-effort: if
the gateway module isn't importable (test rigs stubbing gateway.*),
the task still creates, we just skip the subscription.

Response gains a 'subscribed' bool so the orchestrator knows whether
terminal events will land back in the originating chat or whether it
needs to poll / unblock manually.

Tests: 4 new in tests/tools/test_kanban_tools.py covering
CLI/no-subscribe, telegram/gateway-auto-subscribe, discord-DM/no-
thread subscribe, and partial-ctx/no-chat_id no-subscribe. 40/40
kanban tool tests pass.
2026-05-04 05:02:23 -07:00
Nikolay Gusev
fdf9343c51 fix(tools): wrap bare scalars in single-element list for array-typed args
Open-weight models (DeepSeek, Qwen, GLM) sometimes emit tool calls like
`{"urls": "https://a.com"}` when the tool schema declares
`type: array`.  The call was JSON-valid but semantically wrong, and
`coerce_tool_args` would pass the bare string through — the tool then
failed with a confusing type error.

`coerce_tool_args` now wraps non-list, non-null values in a
single-element list when the schema declares `array`.  Strings still go
through `_coerce_value` first so JSON-encoded arrays
(`'["a","b"]'`) parse correctly and nullable `"null"` still
becomes `None`.  `None` itself is preserved — tools with sensible
defaults already handle it, and we don't want to silently mask a
deliberate null.

Salvaged from #19652 (NikolayGusev-astra) — the broader validate-then-
repair layer had several issues (duplicated existing coercion,
mis-classified `old_string` as a path field, prepended non-JSON
prefixes to tool results that break downstream JSON parsing, hardcoded
offset/limit defaults unsuitable for non-read_file tools).  The one
genuinely new capability is wrapping bare scalars, which is implemented
here directly inside the existing coercion path.

Co-authored-by: Nikolay Gusev <ngusev@astralinux.ru>
2026-05-04 05:00:37 -07:00
Teknium
a175f39577
feat(nous): persist Nous OAuth across profiles via shared token store (#19712)
Mirrors the Codex auto-import UX. On successful Nous login (either
`hermes auth add nous --type oauth` or `hermes login nous`), tokens are
mirrored to `$HERMES_SHARED_AUTH_DIR/nous_auth.json` (default
`~/.hermes/shared/nous_auth.json`, outside any named profile's
HERMES_HOME). On next login in a new profile, the flow offers to import
those credentials ("Import these credentials? [Y/n]") and rehydrates via
a forced refresh+mint instead of running the full device-code flow.

Runtime refresh in any profile syncs the rotated refresh_token back to
the shared store so sibling profiles don't hit stale-token fallback
after rotation.

The volatile 24h agent_key is NOT persisted to the shared store —
only the long-lived OAuth tokens are cross-profile useful.

- `HERMES_SHARED_AUTH_DIR` env var for tests + custom layouts
- Pytest seat belt mirrors the existing `_auth_file_path` guard so
  forgetting to redirect the store in a test fails loudly
- File mode 0600 where platform supports it
- Runtime credential resolution is unchanged — shared store is only
  consulted during the login flow, so profile isolation at runtime is
  preserved
- Stale refresh_token + portal-down cases gracefully fall back to
  device-code

Addresses a user report from Mike Nguyen: running
`hermes --profile <name> auth add nous --type oauth` for every new
profile is unnecessary friction now that Codex has a shared-import
flow via `~/.codex/auth.json`.
2026-05-04 04:54:55 -07:00
Teknium
d3b22b76d8
fix(kanban): enforce worker task-ownership on destructive tool calls (#19713)
Closes #19534 (security).

A worker spawned by the kanban dispatcher has HERMES_KANBAN_TASK set
to its own task id. The destructive tools (kanban_complete,
kanban_block, kanban_heartbeat) resolved task_id via
_default_task_id() which preferred an explicit arg over the env var,
with no ownership check — so a buggy or prompt-injected worker could
complete / block / heartbeat any OTHER task (sibling, cross-tenant,
anything) by supplying its id. Reporter's repro: worker for t_A
passed task_id=t_B to kanban_complete and got {"ok": true}.

Fix: add _enforce_worker_task_ownership(tid). If HERMES_KANBAN_TASK
is set and tid doesn't match, return a structured tool error with
guidance to use kanban_comment (for information handoff across tasks)
or kanban_create (for follow-up work). Orchestrator profiles (no env
var, but kanban toolset enabled per #18968) are exempt — their job
is routing and sometimes includes closing out child tasks.

Kept unrestricted (deliberately):
- kanban_show — workers legitimately read parent/sibling handoff context
- kanban_comment — cross-task comments are the handoff mechanism
- kanban_create — orchestrator fan-out, worker follow-up spawning
- kanban_link — parent/child linking

Tests: 5 new regression tests in tests/tools/test_kanban_tools.py
covering the grid (worker-attacks-foreign ×3 tools, worker-own-task
preserved, orchestrator-unrestricted). 36/36 pass.
2026-05-04 04:54:02 -07:00
h0tp-ftw
8c8f95bc8e fix(gateway): show friendly error when service is not installed
Instead of an unhelpful CalledProcessError traceback when running
`hermes gateway start/stop/restart` without first installing the service,
check for the unit file and exit with an actionable install hint.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-05-04 04:49:51 -07:00
Teknium
a8b689f0c2 test(kanban): regression for status=running rejection at dashboard PATCH
Reporter of #19535 explicitly asked for a regression test — covers it
here so a future refactor of _set_status_direct can't silently re-enable
the direct ready/todo -> running bypass.

Asserts both: (a) HTTP 400 with 'running' in the detail message, and
(b) the task's status is unchanged after the rejected PATCH (pre-request
status preserved, no partial mutation).
2026-05-04 04:46:47 -07:00
vominh1919
652f8e6f3e fix(test): correct _coerce_number inf/nan test assertions
The test 'test_inf_stays_string_for_integer_only' incorrectly asserted
that _coerce_number('inf') returns float('inf'), but the function
correctly returns the original string 'inf' because infinity is not
JSON-serializable.

Fixed the assertion to expect the string 'inf', and added two new tests
for negative infinity and NaN edge cases to improve coverage of the
non-JSON-serializable number guard in _coerce_number().
2026-05-04 04:45:55 -07:00
Yoimex
edf9c75621 fix(env): pass -- to cd for hyphen-prefixed workdirs 2026-05-04 04:45:03 -07:00
Teknium
ae40fca955 fix(profiles): keep validate_profile_name strict; callers normalize first
Follow-up to @changchun989's cherry-pick: reverts the validate-via-
normalize change so validate_profile_name remains a strict regex check
on the input AS-GIVEN. Callers that accept mixed-case user input
(dashboard UI, CLI args, import flows) call normalize_profile_name()
first, then validate the result. This keeps validate honest about
what the on-disk directory name must look like — e.g. '  jules '
(trailing whitespace) is now rejected instead of silently trimmed
and accepted.

- validate_profile_name: strict lowercase/regex check again, 'UPPER'
  back in the invalid-names parametrize
- 8 call sites in profiles.py (create_profile, delete_profile,
  set_active_profile, export_profile, import_profile, rename_profile,
  resolve_profile_env, plus the clone_from branch): swap the
  normalize-then-validate order
- scripts/release.py: add changchun989@proton.me -> changchun989 to
  AUTHOR_MAP so CI doesn't block on the unmapped contributor email

All kanban + profile tests pass (268 across test_profiles.py +
test_kanban_db.py + test_kanban_core_functionality.py, plus 73 in
test_kanban_tools.py + test_kanban_dashboard_plugin.py).

Closes #18498.
2026-05-04 04:44:37 -07:00
changchun989
a31477dabb fix(profiles): normalize profile IDs for Kanban assignees and lookups
- Add normalize_profile_name() for lowercase canonical IDs and Default alias
- Use canonical names in create/delete/rename/export/import/set_active paths
- Canonicalize Kanban assignee on create/assign, list filter, and worker spawn
- Tests for mixed-case assignees and profile resolution (fixes #18498)
2026-05-04 04:44:37 -07:00
Yuyang Xu
60c4bc96fd fix(security): restore .env/auth.json/state.db with 0600 perms
`hermes import` was creating secret files with the process umask
(typically 0644) instead of 0600. zipfile.open() does not honor the
Unix mode bits stored in zip member external_attr; the restore loop
used open(target, "wb") which always falls back to umask.

Threat: silent privilege downgrade after a routine restore on
multi-user systems (shared dev boxes, CI runners, jump hosts) — any
local user could read API keys and OAuth tokens from ~/.hermes/.

Fix mirrors the convention already used at file creation
(hermes_cli/auth.py: stat.S_IRUSR | stat.S_IWUSR for auth.json).
The quick-snapshot restore path (restore_quick_snapshot) is
unaffected — it uses shutil.copy2 which preserves perms via
copystat().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 04:43:53 -07:00
Cameron Aragon
239ea1bdea fix(image-gen): preserve xAI API error status 2026-05-04 04:43:07 -07:00
Teknium
5ec6baa400
feat(kanban): multi-project boards — one install, many kanbans (#19653)
Adds first-class board support to kanban so users can separate unrelated
streams of work (projects, repos, domains) into isolated queues. Single-
project users stay on the 'default' board and see no UI change.

Isolation model
---------------
- Each board is a directory at `~/.hermes/kanban/boards/<slug>/` with
  its own `kanban.db`, `workspaces/`, and `logs/`. The 'default' board
  keeps its legacy path (`~/.hermes/kanban.db`) for back-compat — fresh
  installs and pre-boards users get zero migration.
- Workers spawned by the dispatcher have `HERMES_KANBAN_BOARD` pinned in
  their env alongside the existing `HERMES_KANBAN_DB` /
  `HERMES_KANBAN_WORKSPACES_ROOT` pins, so workers physically cannot see
  other boards' tasks.
- The gateway's single dispatcher loop now sweeps every board per tick;
  per-tick cost is a few extra filesystem stats.
- CAS concurrency guarantees are preserved per-board (each board is its
  own SQLite DB, same WAL+IMMEDIATE machinery as before).

CLI
---
  hermes kanban boards list|create|switch|show|rename|rm
  hermes kanban --board <slug> <any-subcommand>

Board resolution order: `--board` flag → `HERMES_KANBAN_BOARD` env →
`~/.hermes/kanban/current` file → `default`. Slug validation is strict:
lowercase alphanumerics + hyphens + underscores, 1-64 chars, starts with
alphanumeric. Uppercase is auto-downcased; slashes / dots / `..` /
control chars are rejected so boards can't name their way out of the
boards/ directory.

Passive discoverability: when more than one board exists, `hermes kanban
list` prints a one-line header ("Board: foo (2 other boards …)") so
users who stumble across multi-project never have to hunt for the
feature. Invisible for single-board installs.

Dashboard
---------
- New `BoardSwitcher` component at the top of the Kanban tab: dropdown
  with all boards + task counts, `+ New board` button, `Archive`
  button (non-default only). Hidden entirely when only `default` exists
  and is empty — single-project users never see it.
- New `NewBoardDialog` modal: slug / display name / description / icon
  + "switch to this board after creating" checkbox.
- Selected board persists to `localStorage` so browser users don't
  shift the CLI's active board out from under a terminal they left open.
- New `?board=<slug>` query param on every existing endpoint plus a
  new `/boards` CRUD surface (`GET /boards`, `POST /boards`,
  `PATCH /boards/<slug>`, `DELETE /boards/<slug>`,
  `POST /boards/<slug>/switch`).
- Events WebSocket is pinned to a board at connection time; switching
  opens a fresh WS against the new board.

Also fixes a pre-existing bug in the plugin's tenant / assignee
filters: the SDK's `Select` uses `onValueChange(value)`, not
native `onChange(event)`, so those filters silently didn't work.
New `selectChangeHandler` helper wires both signatures.

Tests
-----
49 new tests in `tests/hermes_cli/test_kanban_boards.py` covering:
slug validation (valid / invalid / auto-downcase), path resolution
(default = legacy path, named = `boards/<slug>/`, env var override),
current-board resolution chain (env > file > default), board CRUD +
archive / hard-delete, per-board connection isolation (tasks don't
leak), worker spawn env injection (`HERMES_KANBAN_BOARD`,
`HERMES_KANBAN_DB`, `HERMES_KANBAN_WORKSPACES_ROOT` all point at the
right board), and end-to-end CLI surface.

Regression surface: all 264 pre-existing kanban tests continue to pass.

Live-tested via the dashboard: created 3 boards (default,
hermes-agent, atm10-server), created tasks on each via both CLI
(`--board <slug> create`) and dashboard (inline create on the Ready
column), confirmed zero cross-board leakage, confirmed `BoardSwitcher`
+ `NewBoardDialog` work end-to-end in the browser.
2026-05-04 04:42:38 -07:00
vominh1919
135b4c8b35 fix(mcp): decouple AnyUrl import from mcp dependency
AnyUrl was imported inside the same try block as mcp.client.auth, so
when the mcp package was not installed, AnyUrl was undefined and
_build_client_metadata raised NameError at runtime.

Moved the AnyUrl import to its own try/except block so it's available
whenever pydantic is installed (which is a core dependency), regardless
of whether the mcp SDK is present.

Also added pytest.importorskip('mcp') to the three
test_build_client_metadata tests that exercise _build_client_metadata,
since that function depends on OAuthClientMetadata from the mcp package.
2026-05-04 04:42:18 -07:00
vominh1919
0d563621fb fix(test): skip bedrock adapter tests when botocore is not installed
Six tests in test_bedrock_adapter.py import botocore.exceptions
directly (ConnectionClosedError, EndpointConnectionError,
ReadTimeoutError, ClientError) without guarding the import. When
botocore is not installed (it's an optional dependency), these tests
fail with ModuleNotFoundError instead of being gracefully skipped.

Added pytest.importorskip('botocore') to each affected test function,
following the same pattern used elsewhere in the test suite (e.g.
test_voice_mode.py for numpy, test_mcp_oauth.py for mcp).

Tests affected:
- TestIsStaleConnectionError: 3 tests
- TestCallConverseInvalidatesOnStaleError: 3 tests

Before: 6 FAIL with ModuleNotFoundError
After:  6 SKIP with reason message
2026-05-04 04:41:55 -07:00
vominh1919
d1d2d43387 fix(test): add skip marker for transcription tests requiring faster_whisper
TestTranscribeLocalExtended patches faster_whisper.WhisperModel, which
triggers an ImportError when the faster_whisper package is not installed.
Added a pytest.mark.skipif marker using importlib.util.find_spec so
these tests are gracefully skipped instead of failing with
ModuleNotFoundError.
2026-05-04 04:41:36 -07:00
Siddharth Balyan
af6f9bc2a1
fix: refresh systemd unit on gateway boot (not just start/restart) (#19684)
The resilient restart settings from PR #18639 only took effect when
the gateway was started via `hermes gateway start` or `hermes gateway
restart` — both of which call refresh_systemd_unit_if_needed() which
writes the new unit and runs daemon-reload.

However, when the gateway self-restarts via exit-code-75 (stale-code
detection after `hermes update`, or the /restart command), systemd
respawns the process directly without going through any CLI function.
The unit file on disk stays stale, and systemd keeps using the old
cached settings (StartLimitBurst=5, RestartSec=30) until someone
manually runs `hermes gateway restart`.

This meant that after PR #18639 was deployed, users who never ran
`hermes gateway restart` manually were still vulnerable to the
permanent-death-on-network-outage bug.

Fix: call refresh_systemd_unit_if_needed() at the top of run_gateway()
(the foreground entry point that systemd's ExecStart invokes). This
ensures that on every boot — whether triggered by systemd restart,
exit-75 respawn, or manual foreground run — the unit definition and
daemon state are current. The call is best-effort (exceptions caught)
and a no-op when the unit is already current (one stat + string compare).
2026-05-04 16:27:51 +05:30
Ioodu
e50809b771 fix(file-tools): cap read_file result size to prevent context window overflow
Set max_result_size_chars=100_000 on the read_file registry entry (was
float('inf')), closing the Layer 2 defense-in-depth gap in
tool_result_storage.py. The existing Layer 1 guard inside
_handle_read_file already returns a JSON error for oversized reads;
this aligns the registry cap with every other tool.

Update test_read_file_never_persisted → test_read_file_result_size_cap
to assert 100_000, and add test_read_file_registry_cap_is_100k as an
explicit regression guard against re-introducing float('inf').

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 03:14:59 -07:00
Teknium
5b6d413476 fix(cli,gateway): surface title errors from /new <name>
The contributor's PR silently swallowed ValueError from
SessionDB.set_session_title() with bare except Exception: pass.
Users typing /new <title> with an already-in-use title got an
untitled session and no feedback.

Changes:
- cli.py: catch ValueError from both sanitize_title() and
  set_session_title(); print the error and mark the session
  untitled in the banner (never echo the rejected title back).
- gateway/run.py: append a warning note to the reset reply on
  title rejection; reflect the accepted title in the header.
- Add regression tests for the duplicate-title path in CLI and
  gateway.

Also map exx@example.com -> @exxmen in scripts/release.py.
2026-05-04 03:14:50 -07:00
Exx
f720751d79 feat(cli,gateway): /new accepts optional session name argument
Allow users to start a fresh session and immediately set its title by
passing a name to /new (or /reset):

    /new Refactor auth module

Changes:
- hermes_cli/commands.py: add args_hint='[name]' to /new command
- cli.py: parse title argument in process_command(), pass to new_session()
- cli.py: new_session() accepts title=None, sets title via SessionDB
- gateway/run.py: _handle_reset_command() parses title, sets on new entry
- gateway/session.py: reset_session() accepts optional display_name
- tests: add test_new_session_with_title, test_reset_command_with_title,
  test_new_command_in_help_output

All 36 affected tests pass.
2026-05-04 03:14:50 -07:00
xyiy001
e69d11d30c fix(browser): allow CDP override to pass requirement checks
Treat explicit CDP override mode as a valid browser backend even when agent-browser is absent, and add a regression test to prevent false-negative availability gating.
2026-05-04 03:12:30 -07:00
ChanlerDev
e3461e0b2a fix(cli): remove dead 'q' check from quit command resolution
The 'q' alias is defined for 'queue' command in commands.py:93.
The hardcoded 'q' in cli.py:5910 was dead code - resolve_command('q')
returns the queue CommandDef, so canonical would never be 'q'.

Removes the misleading check without changing any behavior:
- /quit and /exit still exit (defined aliases)
- /q still maps to queue (as intended)
2026-05-04 03:11:30 -07:00
DaniuXie
a45bd28598 fix(wecom): set SUPPORTS_MESSAGE_EDITING=False to prevent broken streaming 2026-05-04 03:10:36 -07:00
LeonSGP43
0df7e61d2c fix(cli): omit empty api_mode when probing custom models 2026-05-04 02:46:41 -07:00
Teknium
3c070f9f9d
fix(curator): only mark agent-created for background-review sediment (#19621)
Tighten the provenance semantics added in #19618: skills a user asks a
foreground agent to write via skill_manage(create) now stay invisible to
the curator. Only skills the background self-improvement review fork
sediments through skill_manage get the created_by=agent marker.

- tools/skill_provenance.py — new ContextVar module mirroring the
  _approval_session_key pattern: set_current_write_origin / reset /
  get / is_background_review. Default origin is 'foreground'; the
  review fork sets 'background_review'.
- run_agent.py — run_conversation() binds the ContextVar from
  self._memory_write_origin at the top of each call. The review fork
  runs on its own thread (fresh context), so foreground and review
  contexts never cross-contaminate.
- tools/skill_manager_tool.py — skill_manage(action='create') now
  only calls mark_agent_created() when is_background_review(). All
  other cases (foreground create, patch, edit, write_file, delete)
  continue as before.
- tests: test_skill_provenance.py (6 tests covering the ContextVar
  surface), split test_full_create_via_dispatcher into foreground
  vs. review-fork variants, curator status tests now mark-first.

Why: the agent routinely edits existing user skills on the user's
behalf; those writes must never flip provenance. And when a user
explicitly asks the foreground agent to create a skill, that skill
belongs to the user. The curator should only be cleaning up after
its own autonomous sediment from the review nudge loop.
2026-05-04 02:42:16 -07:00
nftpoetrist
4e2b20b705 fix(cli): sync use_gateway in _reconfigure_provider for tts, browser, and web
_reconfigure_provider() updates cloud_provider/backend/tts.provider when
switching tool providers via "hermes setup tools → Reconfigure", but did
not update the matching use_gateway flag. _configure_provider() (the
initial-setup path) sets use_gateway on all three tool categories. The
omission in _reconfigure_provider leaves a stale value in config.yaml:
switching from a Nous-managed provider (use_gateway=True) to a self-hosted
one keeps use_gateway=True, continuing to route requests through the Nous
gateway; switching the other way leaves use_gateway unset so the managed
feature does not activate.

Fix: mirror _configure_provider's use_gateway = bool(managed_feature)
assignment in the tts, browser, and web blocks of _reconfigure_provider.
Symmetric across all three tool categories. No behavior change for any
provider that does not set tts_provider, browser_provider, or web_backend.

Fixes #15229
2026-05-04 02:33:55 -07:00
LeonSGP43
abcaf05229 fix(skills): keep manual skills out of curator 2026-05-04 02:19:28 -07:00
asheriif
21c7c9f0ca fix(tui): harden plugin slash exec errors 2026-05-04 09:07:37 +00:00
Teknium
cac4f2c0e6 test(kanban): update worker-prompt header assertion to match #19427
PR #19427 dropped the 'You are a Kanban worker' identity line from
KANBAN_GUIDANCE so SOUL.md stays authoritative for profile identity.
This test assertion was stale against that change; update it to the
new protocol-only header.
2026-05-04 02:00:42 -07:00
nftpoetrist
9faaa292b4 fix(delegate): inherit parent fallback_chain in _build_child_agent
_build_child_agent constructed child AIAgents without passing
fallback_model, leaving _fallback_chain=[] for every subagent.
When a subagent hit a rate-limit or credential exhaustion the
runtime fallback check (run_agent.py:7486 / 12267) found an empty
chain and failed immediately — even though the parent agent was
configured with fallback_providers and would have recovered.

The cron scheduler already propagates fallback_model correctly
(scheduler.py:1038). Fix closes the parity gap by reading the
parent's _fallback_chain (the normalised list form accepted by
AIAgent's fallback_model parameter) and threading it through.

Empty chains coerce to None so AIAgent initialises _fallback_chain=[]
as usual rather than iterating an empty list.
2026-05-04 01:48:56 -07:00
molvikar
cb33c73418 fix(run_agent): gate iteration-limit provider routing to OpenRouter 2026-05-04 01:45:59 -07:00
Asunfly
8a364df2c8 fix: inherit reasoning config in API server runs 2026-05-04 01:44:16 -07:00
ai-ag2026
8bdec80882 fix(agent): surface preflight compression status
Preflight compression can run synchronously before the first model call when a loaded session exceeds the active context threshold. Gateway users saw no visible progress while the compression LLM call was in flight, which can look like a dropped message during long compactions.\n\nEmit the existing lifecycle status through _emit_status before starting preflight compression so CLI, gateway, and WebUI status callbacks all get immediate feedback.\n\nAdds a regression assertion for the preflight path.
2026-05-04 01:41:51 -07:00
Teknium
06031229e8
fix(tests): tolerate ps ancestor-walk in find_gateway_pids fallback test (#19590)
Follow-up to #19586 (@cixuuz salvage): _get_ancestor_pids walks ps -o ppid=
up the process tree, which the pre-existing mock in
test_find_gateway_pids_falls_back_to_pid_file_when_process_scan_fails didn't
expect. Return empty stdout so the ancestor loop terminates cleanly and the
original fallback assertion still passes.
2026-05-04 01:40:39 -07:00
Hermes Agent
74c997d985 fix(gateway): move quick-command dispatch before built-in handlers
Quick commands of type "alias" that target built-in slash commands
(e.g. /h -> /model) were processed too late in _handle_message — after
the if-canonical=="model" checks. This meant alias expansion never
reached the target handler and fell through to the LLM as raw text.

Two fixes:
1. Move the quick_commands block before built-in dispatch so alias
   targets (like /model) hit the correct handler after expansion.
2. Extract bare command name from target_command via .split()[0] to
   feed _resolve_cmd() correctly (was using the full arg-string).
2026-05-04 01:39:23 -07:00
nftpoetrist
e89376d66f fix(setup): add missing SLACK_HOME_CHANNEL prompt to _setup_slack()
_setup_slack() was the only platform setup function that did not prompt
for a home channel. All four sibling setups (_setup_telegram,
_setup_discord, _setup_mattermost, _setup_bluebubbles) close with an
identical home-channel block, and setup_gateway() already checks for
SLACK_HOME_CHANNEL presence at the end of the wizard — but the value
was never collected, leaving cron delivery and cross-platform
notifications silently broken for Slack after a fresh hermes setup run.

Add the standard home-channel prompt at the end of _setup_slack(),
symmetric with the Discord implementation. Add two unit tests that
verify the prompt is saved when provided and skipped when left blank.
2026-05-04 01:37:18 -07:00
wanazhar
df88375f0d fix: treat ctrl-c as curses cancel 2026-05-04 01:36:44 -07:00
leavr
ccb5d87076 test: cover max-iterations summary message sanitization 2026-05-04 01:36:27 -07:00
tmdgusya
a1cb811cb8 fix(cli): avoid voice TTS restart race 2026-05-04 01:36:07 -07:00
ethan
645b99aadd test(cron): cover null next_run_at recovery and non-dict origin tolerance
Adds four regression tests guarding the bugfix in the previous commit:
- TestGetDueJobs::test_broken_cron_without_next_run_is_recovered exercises
  cron schedules whose next_run_at was lost; expects compute_next_run to
  repopulate it within get_due_jobs() rather than silently skipping the job.
- TestGetDueJobs::test_broken_interval_without_next_run_is_recovered does
  the same for interval schedules.
- TestResolveOrigin::test_string_origin_is_tolerated and
  test_non_dict_origin_is_tolerated confirm _resolve_origin() returns None
  for legacy/hand-edited origins (string, list, int) instead of raising.

Co-Authored-By: Claude <noreply@anthropic.com>
2026-05-04 01:32:58 -07:00
Teknium
91ea3ae4b2 test(skills): add bytes-vs-str equivalence and on-disk hash parity tests
Follow-up on #9925 cherry-pick adding two additional tests:
- bytes content hashes identically to its str-decoded form
- mixed bytes+str bundle hash equals the on-disk content_hash from
  skills_guard (the production invariant used to detect drift)

Also map dodofun@126.com and 1615063567@qq.com in AUTHOR_MAP so the
CI contributor check passes for the cherry-picked commit.

Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: zhao0112 <1615063567@qq.com>
2026-05-04 01:28:12 -07:00
dh
3072e5543b skills-hub: hash binary skill bundle files correctly 2026-05-04 01:28:12 -07:00
daixin1204
744079ffe6 fix(curator): prevent false-positive consolidation from substring matching
_classify_removed_skills used naive 'in' substring matching to detect
whether a removed skill's name appeared in skill_manage arguments.
Short/common skill names (api, git, test, foo, etc.) matched
incorrectly when they appeared as substrings of longer words in file
paths (references/api-design.md) or content (latest, testing).

Replace with field-aware matching:
- file_path: needle must match a complete filename stem or directory
  name, with -/_ normalised for variant tolerance
- content fields: word-boundary regex (\b) prevents embedding in
  longer words

Also add 3 regression tests covering the false-positive scenarios.
2026-05-04 01:21:23 -07:00
Clooooode
1964b0565b test(kanban): add failing test for list_profiles_on_disk with custom HERMES_HOME
list_profiles_on_disk() hardcodes Path.home() / ".hermes" / "profiles",
ignoring HERMES_HOME when set to a custom root (e.g. /opt/data).

Add test_list_profiles_on_disk_custom_root to cover this case.

Related to #18442, #18985.
2026-05-04 01:21:14 -07:00
Siddharth Balyan
a11aed1acc
fix(cli): local backend CLI always uses launch directory, stops .env sync of TERMINAL_CWD (#19334)
The old CWD heuristic was fooled by:
1. TERMINAL_CWD persisted to .env by `hermes config set terminal.cwd`
2. Inherited TERMINAL_CWD from parent hermes processes
3. Only resolved when config had a placeholder value (not explicit paths)

Fix:
- load_cli_config() unconditionally uses os.getcwd() for local backend
- TERMINAL_CWD always force-exported in CLI mode (overrides stale values)
- Gateway sets _HERMES_GATEWAY=1 marker so lazy cli.py imports don't clobber
- Remove terminal.cwd from config-set .env sync map (prevents re-poisoning)
- Clarify setup wizard label as 'Gateway working directory'

Closes #19214
2026-05-04 11:36:19 +05:30
Ben
2f2998bb1b fix(tui): tolerate npm's peer-flag drop in lockfile comparison
`_tui_need_npm_install()` compares the canonical `package-lock.json` against
the hidden `node_modules/.package-lock.json` to decide whether `npm install`
needs to re-run. npm 9 drops the `"peer": true` field from the hidden lock
on dev-deps that are *also* declared as peers (the canonical lock preserves
the dual annotation). That made the check flag 16 packages (`@babel/core`,
`@types/node`, `@types/react`, `@typescript-eslint/*`, `react`, `vite`,
`tsx`, `typescript`, …) as mismatched on every launch, triggering a runtime
`npm install`.
Inside the Docker image, that runtime install then fails with EACCES because
`/opt/hermes/ui-tui/node_modules/` is root-owned from build time, so
`docker run … hermes-agent --tui` prints:
    Installing TUI dependencies…
    npm install failed.
…and exits 1, with no preview. The empty preview is a second bug: the
launcher captured only stderr, but npm 9 writes EACCES to stdout, which
was DEVNULL'd.
Fixes:
 - Add `"peer"` to `_NPM_LOCK_RUNTIME_KEYS` so the comparison ignores the
   non-deterministic field, alongside the existing `"ideallyInert"`.
 - Capture stdout as well as stderr in the install subprocess so future
   failures surface a useful preview instead of a bare "failed." line.
Regression tests:
 - `test_no_install_when_only_peer_annotation_differs` — the exact scenario
 - `test_install_when_version_differs_even_with_peer_drop` — guards against
   the peer-drop tolerance masking a real version skew
On-host impact: the same false-positive was firing on every `hermes --tui`
invocation from a normal checkout, silently running a no-op `npm install`
each time (it converged because the host's `node_modules/` is writable).
Startup time on the TUI should drop noticeably.
2026-05-04 14:13:38 +10:00
Chris Danis
363cc93674 fix(cron): bump skill usage when cron jobs load skills
Cron jobs that reference skills via their skills: config never bumped
the usage counters in .usage.json, so the curator could auto-archive
skills actively used by cron jobs based on stale timestamps.

Now _build_job_prompt() calls bump_use(skill_name) for each
successfully loaded skill so the curator sees them as active.
2026-05-03 17:06:48 -07:00
nftpoetrist
808fee151d fix(auxiliary): propagate explicit_api_key to _try_anthropic()
_try_anthropic() lacked the explicit_api_key parameter added to
_try_openrouter() in #18768. When resolve_provider_client() is called
with provider="anthropic" and an explicit key (e.g. from a fallback_model
entry with api_key set), the key was silently ignored — _try_anthropic()
always fell back to resolve_anthropic_token(), so the fallback returned
None,None for users without a default Anthropic credential configured.

Fix: add explicit_api_key: str = None to _try_anthropic() and use
explicit_api_key or <pool/env fallback> in both the pool-present and
no-pool paths. Pass explicit_api_key=explicit_api_key at the call site
in resolve_provider_client(). Symmetric with the _try_openrouter() fix.
No behavior change when explicit_api_key is None.
2026-05-03 17:00:55 -07:00
molvikar
74636f9c4a fix(gateway): clear queued reload-skills notes on new/resume/branch 2026-05-03 17:00:31 -07:00
Kenny Wang
222767e5e8 fix: sanitize Telegram help command mentions 2026-05-03 17:00:09 -07:00
konsisumer
6fda92aa7f fix(gateway): bridge top-level require_mention to Telegram config
Users commonly place `require_mention: true` at the top level of
config.yaml alongside `group_sessions_per_user`, expecting it to gate
Telegram group messages. The key was silently ignored because the
config loader only checked `yaml_cfg["telegram"]["require_mention"]`.

When `require_mention` is found at the top level and no telegram-specific
value is set, the fix now:
- adds it to platforms_data["telegram"]["extra"] so _telegram_require_mention()
  picks it up via the primary config.extra path
- sets TELEGRAM_REQUIRE_MENTION env var for the secondary fallback path

A telegram-specific value (telegram.require_mention) still takes
precedence over the top-level shorthand.

Also corrects telegram.md: bare /cmd without @botname is rejected when
require_mention is enabled; only /cmd@botname (bot-menu form) passes.

Fixes #3979
2026-05-03 16:59:46 -07:00
clawbot
1bd975c0ba fix(gateway): suppress duplicate voice transcripts
Deduplicate exact and near-exact Discord voice STT transcripts per guild/user over a short window to avoid duplicate delayed agent replies.

Adds regression tests for exact and near-duplicate voice transcript suppression.
2026-05-03 16:59:21 -07:00
LeonSGP43
6713274a42 fix(file): strip leaked terminal fences from reads 2026-05-03 16:58:50 -07:00
0xKingBack
3c42024539 fix(curator): pass auxiliary curator api_key/base_url into runtime resolution
Curator review fork now forwards per-slot credentials from auxiliary.curator
and legacy curator.auxiliary to resolve_runtime_provider, matching the
canonical aux task schema. Add regression tests for binding and main fallback.
2026-05-03 16:55:16 -07:00
MrBob
86e64c1d3b fix(gateway): hide required-arg commands from Telegram menu 2026-05-03 15:29:06 -07:00
Zyproth
a5cae16496 fix(api_server): fall back to default port on malformed API_SERVER_PORT 2026-05-03 15:27:03 -07:00
Zyproth
dfdd7b6e6f fix(codex-transport): preserve request override headers for xai responses 2026-05-03 15:25:45 -07:00
LeonSGP43
4a2f822137 fix(mcp): reconnect on terminated sessions 2026-05-03 15:23:33 -07:00
teknium1
2658494e81 fix(kanban): add per-path env overrides + dispatcher env injection
Layers defense-in-depth on top of the shared-root anchoring (base commit).

Changes in hermes_cli/kanban_db.py:
- kanban_db_path() now honours HERMES_KANBAN_DB first, then falls through
  to kanban_home()/kanban.db.
- workspaces_root() now honours HERMES_KANBAN_WORKSPACES_ROOT first, then
  falls through to kanban_home()/kanban/workspaces.
- All three overrides (HERMES_KANBAN_HOME, HERMES_KANBAN_DB,
  HERMES_KANBAN_WORKSPACES_ROOT) now call .expanduser() for consistency.
- _default_spawn() injects HERMES_KANBAN_DB and
  HERMES_KANBAN_WORKSPACES_ROOT into the worker subprocess env. Even
  when the worker's get_default_hermes_root() resolution somehow
  disagrees with the dispatcher's (symlinks, unusual Docker layouts),
  the two processes still open the same SQLite file.

Module docstring updated to describe all three overrides and the
dispatcher env-injection contract.

Tests (tests/hermes_cli/test_kanban_db.py, TestSharedBoardPaths):
- test_hermes_kanban_db_pin_beats_kanban_home
- test_hermes_kanban_workspaces_root_pin_beats_kanban_home
- test_empty_per_path_overrides_fall_through
- test_dispatcher_spawn_injects_kanban_db_and_workspaces_root
  (monkeypatches subprocess.Popen, asserts both env vars reach the
  child even after HERMES_HOME is rewritten by `hermes -p <profile>`.)

Docs: website/docs/reference/environment-variables.md gets entries
for the three kanban env vars.

This fusion is built on the cleanest of the seven competing PRs that
targeted issue #18442:

* Base commit (from PR #19350 by @GodsBoy): add `kanban_home()` helper
  anchored at `get_default_hermes_root()`, reroute all 5 kanban path
  sites through it (including the 3 sibling log-dir sites that the
  other six PRs missed), 8-test regression class.
* Dispatcher env-var injection approach drawn from PRs #18300
  (@quocanh261997) and #19100 (@cg2aigc).
* Per-path env overrides drawn from PR #19100 (@cg2aigc).
* get_default_hermes_root() resolution direction first proposed in
  PR #18503 (@beibi9966) and PR #18985 (@Gosuj).

Closes the duplicate/competing PRs: #18300, #18503, #18670, #18985,
#19037, #19056, #19100. Fixes #18442 and #19348.

Co-authored-by: quocanh261997 <17986614+quocanh261997@users.noreply.github.com>
Co-authored-by: cg2aigc <232694053+cg2aigc@users.noreply.github.com>
Co-authored-by: beibi9966 <beibei1988@proton.me>
Co-authored-by: Gosuj <123411271+Gosuj@users.noreply.github.com>
Co-authored-by: LeonSGP43 <154585401+LeonSGP43@users.noreply.github.com>
2026-05-03 15:13:39 -07:00
GodsBoy
f5bd77b3e1 fix(kanban): anchor board, workspaces, and worker logs at the shared Hermes root
The Kanban board is documented as shared across all Hermes profiles, but
`kanban_db_path()` and `workspaces_root()` resolved through `get_hermes_home()`,
which returns the active profile's HERMES_HOME. When the dispatcher spawned a
worker with `hermes -p <profile> --skills kanban-worker chat -q "work kanban
task <id>"`, the worker rewrote HERMES_HOME to the profile subdirectory before
kanban_db.py imported, opening a profile-local `kanban.db` that did not contain
the dispatcher's task. `kanban_show` and `kanban_complete` failed; the
dispatcher's row stayed `running` and was retried/crashed. The same defect
applied to `_default_spawn`'s log directory and `worker_log_path`, so
`hermes kanban tail` did not see the worker's output.

Add `kanban_home()` in `hermes_cli/kanban_db.py` that resolves through
`HERMES_KANBAN_HOME` (explicit override) then `get_default_hermes_root()`,
which already understands the `<root>/profiles/<name>` and Docker / custom
HERMES_HOME shapes. Reroute `kanban_db_path`, `workspaces_root`, the
`_default_spawn` log directory, `gc_worker_logs`, and `worker_log_path`
through it. Profile-specific config, `.env`, memory, and sessions stay
isolated as before; only the kanban surface is shared.

Add a `TestSharedBoardPaths` regression class to `tests/hermes_cli/test_kanban_db.py`
covering: default install, profile-worker convergence, Docker custom HERMES_HOME,
Docker profile layout, explicit `HERMES_KANBAN_HOME` override, and a real
SQLite round-trip across dispatcher and worker HERMES_HOME perspectives.
The dispatcher/worker convergence tests fail on origin/main and pass after
the fix.

Update the `kanban.md` user-guide page and the misleading docstrings in
`kanban_db.py` to describe the shared-root behavior.

Fixes #19348
2026-05-03 15:13:39 -07:00
asheriif
7e780f4832 fix(tui): run plugin slash commands live 2026-05-03 19:42:16 +00:00
Siddharth Balyan
167b5648ea
Revert "fix(cli): CLI/TUI on local backend always uses launch directory, ignores terminal.cwd (#19242)" (#19329)
This reverts commit 9eaddfafa3.
2026-05-04 00:43:58 +05:30
Siddharth Balyan
9eaddfafa3
fix(cli): CLI/TUI on local backend always uses launch directory, ignores terminal.cwd (#19242)
CLI/TUI sessions on the local backend now unconditionally use
os.getcwd() as the working directory. The terminal.cwd config value is
only consumed by gateway/cron/delegation modes (where there's no shell
to cd from).

Previously, 'hermes setup' would write an absolute path (e.g. $HOME)
into terminal.cwd which then pinned the CLI to that directory regardless
of where the user launched hermes from. This was a silent foot-gun —
the user's 'cd' was being ignored.

Changes:

1. cli.py: Restructured CWD resolution — if TERMINAL_CWD is not already
   set by the gateway, and the backend is local, always use os.getcwd().
   Config terminal.cwd is irrelevant for interactive CLI/TUI sessions.

2. setup.py: Moved the cwd prompt from setup_terminal_backend() to
   setup_gateway(). It now only appears when configuring messaging
   platforms and is labeled 'Gateway working directory'.

3. Tests: Rewrote test_cwd_env_respect.py to validate the new behavior:
   explicit config paths are ignored for CLI, gateway pre-set values are
   preserved, non-local backends keep their config paths.

4. Docs: Updated configuration.md, profiles.md, and
   environment-variables.md to clarify that terminal.cwd only affects
   gateway/cron mode on local backend.

Closes #19214
2026-05-04 00:14:36 +05:30
GodsBoy
b8ae8cc801 fix(debug): redact log content at upload time in hermes debug share
Apply agent.redact.redact_sensitive_text with force=True to log content
captured by _capture_log_snapshot before it reaches upload_to_pastebin.
On-disk logs are untouched. Compatible with the off-by-default local
redaction policy from #16794: this is upload-time-only and applies
regardless of security.redact_secrets because the public paste service
is the leak surface. A visible banner is prepended to each uploaded log
paste so reviewers know redaction was applied. --no-redact preserves
deliberate unredacted sharing for maintainer-coordinated cases.

The bug-report, setup-help, and feature-request issue templates direct
users to run hermes debug share and paste the resulting public URLs.
With redaction off by default per #16794, those uploads have been
carrying credentials onto paste.rs and dpaste.com.

force=True is non-negotiable: without it, redact_sensitive_text
short-circuits at agent/redact.py:322 when the env var is unset, so the
fix would silently be a no-op for its target audience. A regression
test pins this down.

Fixes #19316
2026-05-03 11:42:20 -07:00
Siddharth Balyan
c9a3f36f56
feat: add video_analyze tool for native video understanding (#19301)
* feat: add video_analyze tool for native video understanding

Adds a video_analyze tool that sends video files to multimodal LLMs
(e.g. Gemini) for analysis via the OpenRouter-compatible video_url
content type. Mirrors vision_analyze in structure, error handling,
and registration pattern.

Key design:
- Base64 encodes entire video (no frame extraction, no ffmpeg dep)
- Uses 'video_url' content block type (OpenRouter standard)
- Supports mp4, webm, mov, avi, mkv, mpeg formats
- 50 MB hard cap, 20 MB warning threshold
- 180s minimum timeout (videos take longer than images)
- AUXILIARY_VIDEO_MODEL env override, falls back to AUXILIARY_VISION_MODEL
- Same SSRF protection, retry logic, and cleanup as vision_analyze

Default disabled: registered in 'video' toolset (not in _HERMES_CORE_TOOLS).
Users opt in via: hermes tools enable video, or enabled_toolsets=['video'].

* feat(video): add models.dev capability pre-check + CONFIGURABLE_TOOLSETS entry

- Pre-checks model video capability via models.dev modalities.input
  before expensive base64 encoding. Fails early with helpful message
  suggesting video-capable alternatives (gemini, mimo-v2.5-pro).
- Passes optimistically if model unknown or lookup fails.
- Adds ModelInfo.supports_video_input() helper.
- Adds 'video' to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS
  so 'hermes tools enable video' works from CLI.
- 8 new tests for the capability check (37 total).

* refactor(video): remove models.dev capability pre-check

Removes _check_video_model_capability and ModelInfo.supports_video_input.
The vision_analyze tool doesn't pre-check image capability either — both
tools rely on the same pattern: send request, handle API errors gracefully
with categorized user-facing messages. The pre-check was inconsistent
(only worked for some providers/models) so drop it for parity.

* cleanup: compress comments, fix fragile timeout coupling

- Replace _VISION_DOWNLOAD_TIMEOUT * 2 with hardcoded 60s (no silent
  breakage if vision timeout changes independently)
- Strip verbose comments and redundant log lines throughout
- No behavioral changes
2026-05-04 00:04:36 +05:30
Bartok9
e527240b27 fix(tools): write_file handler now rejects missing 'content'/'path' args instead of silently writing zero-byte files (#19096)
Under context pressure, frontier models sometimes emit tool calls with
required fields dropped. Previously _handle_write_file() used
args.get('content', '') which substituted an empty string for the missing
key, returned success with bytes_written=0, and created a zero-byte file
on disk. The model had no way to detect the failure.

Changes:
- Reject calls where 'path' is absent or not a non-empty string
- Reject calls where 'content' key is entirely absent (key-presence check,
  not truthiness) — distinguishing a legitimately empty file from a dropped arg
- Reject calls where 'content' is a non-string type
- All error messages include guidance to re-emit the tool call or switch
  to execute_code with hermes_tools.write_file() for large payloads
- Explicit empty string content (file truncation) continues to work

Regression tests added for all four cases: missing path, missing content,
explicit-empty content, and wrong content type.

Fixes #19096
2026-05-03 08:52:41 -07:00
Tranquil-Flow
6b4fb9f878 fix(cron): treat non-dict origin as missing instead of crashing tick
``_resolve_origin`` called ``origin.get('platform')`` on whatever
``job.get('origin')`` returned. The leading ``if not origin: return None``
short-circuited the falsy cases (None, empty dict, "") but a non-empty
string passed that guard and then crashed with
``AttributeError: 'str' object has no attribute 'get'`` on every fire
attempt. Observed in the wild after a migration script tagged jobs with
free-form provenance strings (e.g.
``"combined-digest-replaces-x-and-y-20260503"``).

``mark_job_run`` did record ``last_status: error,
last_error: "'str' object has no attribute 'get'"`` once, but the next
tick re-loaded the same poisoned origin and crashed identically. The
job stayed enabled, fired every tick, and accumulated cascading errors
in the log until ``origin`` was patched manually.

Replace the falsy guard with ``isinstance(origin, dict)``. Non-dict
origins (string, int, list, tuple, float — anything that survived a
hand-edit, JSON-script write, or migration) are now treated the same
as a missing origin: the job continues with ``deliver`` falling back
through its normal home-channel path instead of crashing the scheduler
loop.

Test parametrises the non-dict shapes that can appear in jobs.json
through external writers and asserts ``_resolve_origin`` returns None
for each.

Note: this fix scope is the non-dict-``origin`` crash only. The
``next_run_at: null`` recurring-job recovery (the second sub-bug in
#18722) is independently addressed by the in-flight #18825, which
extends the never-silently-disable defense from #16265 to
``get_due_jobs()`` — that approach is well-aligned with the existing
recovery pattern and ships fine without a competing change here.

Fixes #18722 (non-dict origin crash; recurring-job recovery covered by #18825)
2026-05-03 08:51:50 -07:00
leprincep35700
b59bb4e351 fix(gateway): preserve home-channel thread targets across restart notifications 2026-05-03 08:47:49 -07:00
Teknium
d87fd9f039
fix(goals): make /goal work in TUI and fix gateway verdict delivery (#19209)
/goal was silently broken outside the classic CLI.

TUI: /goal was routed through the HermesCLI slash-worker subprocess,
which set the goal row in SessionDB but then called
_pending_input.put(state.goal) — the subprocess has no reader for that
queue, so the kickoff message was discarded. No post-turn judge was
wired into prompt.submit either, so even a manual kickoff would not
continue the goal loop. Intercept /goal in command.dispatch instead,
drive GoalManager directly, and return {type: send, notice, message}
so the TUI client renders the Goal-set notice and fires the kickoff.
Run the judge in _run_prompt_submit after message.complete, surface
the verdict via status.update {kind: goal}, and chain the continuation
turn after the running guard is released.

Gateway: _post_turn_goal_continuation was gated on
hasattr(adapter, 'send_message'), but adapters only expose send().
That branch was dead on every platform — users never saw
'✓ Goal achieved', 'Continuing toward goal', or budget-exhausted
messages. Replace the dead call with adapter.send(chat_id, content,
metadata) and drop a broken reference to self._loop.

Tests:
- tests/tui_gateway/test_goal_command.py — full /goal dispatch matrix
  (set / status / pause / resume / clear / stop / done / whitespace)
  plus regressions for slash.exec → 4018 and 'goal' staying in
  _PENDING_INPUT_COMMANDS.
- tests/gateway/test_goal_verdict_send.py — locks in the adapter.send
  path for done / continue / budget-exhausted and verifies the hook
  no-ops when no goal is set or the adapter lacks send().
2026-05-03 05:49:12 -07:00
kshitijk4poor
6f2dab248a fix: update tests for resume_pending semantics + add AUTHOR_MAP entries
Tests updated to reflect suspend_recently_active now setting
resume_pending=True (preserves session) instead of suspended=True
(wipes session history).

AUTHOR_MAP entries: millerc79 (#19033), shellybotmoyer (#18915)
2026-05-03 03:54:03 -07:00
nftpoetrist
6c1322b997 fix(slack): close previous handler in connect() to prevent zombie Socket Mode connections
SlackAdapter.connect() overwrote self._handler, self._app, and
self._socket_mode_task without closing the prior AsyncSocketModeHandler
first. If connect() was called a second time on the same adapter (e.g.
during a gateway restart or in-process reconnect attempt), the old Socket
Mode websocket stayed alive. Both the old and new connections received
every Slack event and dispatched it twice — producing double responses
with different wording, the same bug that affected DiscordAdapter (#18187,
fixed in #18758).

Fix: add a close-before-reassign guard at the start of the connection
setup path, mirroring the guard DiscordAdapter.connect() already has.
When self._handler is None (fresh adapter, first connect()) the block is
a harmless no-op. Scoped to the handler/app fields only — no behavior
change for any path that does not call connect() twice.

Fixes #18980
2026-05-03 03:47:49 -07:00
0xyg3n
19ba9e43b6 fix(gateway/discord): require allowlist auth on slash commands
Slash commands (_run_simple_slash, _handle_thread_create_slash) bypassed
every DISCORD_ALLOWED_* gate enforced by on_message. Any guild member
could invoke /background (RCE via terminal), /restart, /model, /skill,
etc. CVSS 9.8 Critical.

- _evaluate_slash_authorization mirrors on_message gates (user, role,
  channel, ignored channel) with fail-closed semantics
- _check_slash_authorization sends ephemeral reject + logs + admin alert
- Auth gate runs before defer() so rejections are ephemeral
- /skill autocomplete returns [] for unauthorized users (no catalog leak)
- Component views (ExecApproval, SlashConfirm, UpdatePrompt, ModelPicker)
  now honor role allowlists via shared _component_check_auth helper
- Optional DISCORD_HIDE_SLASH_COMMANDS defense-in-depth
- Cross-platform admin alert (Telegram/Slack fallback) on unauthorized attempts

Based on PR #18125 by @0xyg3n.
2026-05-03 03:44:55 -07:00
kshitijk4poor
5d5b8912be test: add tests for cmd_key preservation through name clamping
- TestClampCommandNamesTriples: unit tests for 3-tuple support in
  _clamp_command_names (short names, long names, collisions, multiple
  entries, backward compat with 2-tuples)
- TestDiscordSkillCmdKeyDispatch: integration test through the full
  discord_skill_commands pipeline verifying long skill names retain
  their original cmd_key after clamping
- Add contributor CharlieKerfoot to AUTHOR_MAP
2026-05-03 03:25:45 -07:00
kshitij
457c7b76cd
feat(openrouter): add response caching support (#19132)
Enable OpenRouter's response caching feature (beta) via X-OpenRouter-Cache
headers. When enabled, identical API requests return cached responses for
free (zero billing), reducing both latency and cost.

Configuration via config.yaml:
  openrouter:
    response_cache: true       # default: on
    response_cache_ttl: 300    # 1-86400 seconds

Changes:
- Add openrouter config section to DEFAULT_CONFIG (response_cache + TTL)
- Add build_or_headers() in auxiliary_client.py that builds attribution
  headers plus optional cache headers based on config
- Replace inline _OR_HEADERS dicts with build_or_headers() at all 5 sites:
  run_agent.py __init__, _apply_client_headers_for_base_url(), and
  auxiliary_client.py _try_openrouter() + _to_async_client()
- Add _check_openrouter_cache_status() method to AIAgent that reads
  X-OpenRouter-Cache-Status from streaming response headers and logs
  HIT/MISS status
- Document in cli-config.yaml.example
- Add 28 tests (22 unit + 6 integration)

Ref: https://openrouter.ai/docs/guides/features/response-caching
2026-05-03 01:54:24 -07:00
Henkey
9987f3d824 fix(acp): compact Zed tool replay rendering 2026-05-03 01:44:23 -07:00
Henkey
19854c7cd2 Schedule ACP history replay and fence file output 2026-05-03 01:44:23 -07:00
Henkey
eb612f5574 fix(acp): keep web extract rendering compact 2026-05-03 01:44:23 -07:00
Henkey
b294d1d022 fix(acp): keep read-file starts compact 2026-05-03 01:44:23 -07:00
Henkey
72c8037a24 fix(acp): polish common tool rendering 2026-05-03 01:44:23 -07:00
Henkey
ef9a08a872 fix(acp): polish Zed context and tool rendering 2026-05-03 01:44:23 -07:00
Henkey
e26f9b2070 fix(acp): route Zed thoughts to reasoning callbacks 2026-05-03 01:44:23 -07:00
helix4u
4f37669170 fix(tools): reconfigure enabled unconfigured toolsets 2026-05-03 00:33:02 -07:00
helix4u
d409a4409c fix(model): avoid bedrock credential probe in provider picker 2026-05-03 00:32:55 -07:00
liuhao1024
af98122793 fix(auxiliary): propagate explicit_api_key to _try_openrouter()
When resolve_provider_client() passes explicit_api_key for OpenRouter auxiliary
tasks, _try_openrouter() now accepts and honors this parameter instead of
silently ignoring it and falling back to OPENROUTER_API_KEY env var.

Root cause: _try_openrouter() had no explicit_api_key parameter, so even
when callers wanted to pass a runtime credential pool key, it could not be used.

Fix:
- Add explicit_api_key: str = None parameter to _try_openrouter()
- Prioritize explicit_api_key over pool key and env var
- Update resolve_provider_client() call site to pass explicit_api_key

Regression coverage:
- Test that explicit_api_key is passed to OpenAI client when provided
- Test that fallback to OPENROUTER_API_KEY still works when explicit_api_key is None

Closes #18338
2026-05-02 02:27:49 -07:00
teknium1
762eb79f1e fix(gateway): tighten httpx keepalive and close whatsapp typing-response leak (#18451)
Two mitigations for the CLOSE_WAIT accumulation reported against QQ Bot
+ Feishu on macOS behind Cloudflare Warp.

1. Shared httpx.Limits helper (gateway/platforms/_http_client_limits.py).
   Every long-lived platform adapter now constructs httpx.AsyncClient
   with max_keepalive_connections=10 and keepalive_expiry=2.0, vs httpx's
   default of unbounded keepalive pool and 5.0s expiry. On macOS/Warp the
   default 5s window let idle keepalive sockets sit in CLOSE_WAIT long
   enough for seven persistent adapters (QQ Bot, WeCom, DingTalk, Signal,
   BlueBubbles, WeCom-callback, plus the transient Feishu helper) to
   compound to the 256-fd ulimit. Tunable via
   HERMES_GATEWAY_HTTPX_KEEPALIVE_EXPIRY and
   HERMES_GATEWAY_HTTPX_MAX_KEEPALIVE env vars.

2. whatsapp.send_typing aiohttp leak. The call was
   'await self._http_session.post(...)' with no 'async with' and no
   variable capture — the ClientResponse went out of scope unclosed,
   holding its TCP socket in CLOSE_WAIT until GC. Fixed by wrapping in
   'async with'. This was the only bare-await aiohttp leak in the
   gateway/tools/plugins tree per audit; all other aiohttp sites use
   the context-manager pattern correctly.

The underlying reporter also saw Feishu SDK (lark-oapi) connections in
CLOSE_WAIT — those are inside the SDK and out of our direct control, but
tightening httpx keepalive across adapters reduces the aggregate pool
pressure regardless of which individual adapter leaks.
2026-05-02 02:23:37 -07:00
beibi9966
38dd057e91 fix(feishu): finalize remote document downloads inside httpx.AsyncClient context (#18502)
Snapshot Content-Type and body while the client context is still
active so pooled connections fully release on exit. Previously the
read happened after `async with httpx.AsyncClient(...)` returned —
which works today only because httpx eagerly buffers non-streaming
responses; a future refactor to `.stream()` would silently read-
after-close.

Part of the #18451 connection-hygiene audit. Salvage of #18502.
2026-05-02 02:23:37 -07:00
luyao618
13f344c5ce fix(agent): try fallback providers at init when primary credential pool is exhausted (#17929)
When a provider's credential pool has a single entry in 429-cooldown,
resolve_provider_client returns None and AIAgent.__init__ raises a
misleading RuntimeError suggesting the API key is missing — even when
valid fallback_providers are configured.

This patch makes __init__ iterate the fallback chain before raising,
mirroring the existing in-flight fallback logic in the request loop.
If a fallback resolves, the agent initializes against it and sets
_fallback_activated=True so _restore_primary_runtime can pick the
primary back up after cooldown.

Closes #17929
2026-05-02 02:09:46 -07:00
Teknium
1dce908930
fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) (#18761)
* fix(gateway): config.yaml wins over .env for agent/display/timezone settings

Regression from the silent config→env bridge. The bridge at module import
time is correct for max_turns (unconditional overwrite), but every other
agent.*, display.*, timezone, and security bridge key was guarded by
'if X not in os.environ' — so a stale .env entry from an old 'hermes setup'
run would shadow the user's current config.yaml indefinitely.

Symptom: agent.max_turns: 500 in config.yaml, HERMES_MAX_ITERATIONS=60
in .env from an old setup, and the gateway silently capped at 60
iterations per turn. Gateway logs confirmed api_calls never exceeded 60.

Three changes:

1. gateway/run.py: drop the 'not in os.environ' guards for all agent.*,
   display.*, timezone, and security.* bridge keys. config.yaml is now
   authoritative for these settings — same semantics already in place
   for max_turns, terminal.*, and auxiliary.*. Also surface the bridge
   failure (previously 'except Exception: pass') to stderr so operators
   see bridge errors instead of silently falling back to .env.

2. gateway/run.py: INFO-log the resolved max_iterations at gateway
   start so operators can verify the config→env bridge did the right
   thing instead of chasing a phantom budget ceiling.

3. hermes_cli/setup.py: stop writing HERMES_MAX_ITERATIONS to .env in
   the setup wizard. config.yaml is the single source of truth. Also
   clean up any stale .env entry left behind by pre-fix setups.

Regression tests in tests/gateway/test_config_env_bridge_authority.py
guard each config→env key against the 'stale .env shadows config' bug.

* fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log)

Three issues observed in production gateway.log during a rapid restart
chain on 2026-05-02, all fixed here.

1. _send_restart_notification logged unconditional success
   adapter.send() catches provider errors (e.g. Telegram 'Chat not found')
   and returns SendResult(success=False); it never raises. The caller
   ignored the return value and always logged 'Sent restart notification
   to <chat>' at INFO, producing a misleading success line directly
   below the 'Failed to send Telegram message' traceback on every boot.
   Now inspects result.success and logs WARNING with the error otherwise.

2. WhatsApp bridge SIGTERM on shutdown classified as fatal error
   _check_managed_bridge_exit() saw the bridge's returncode -15 (our own
   SIGTERM from disconnect()) and fired the full fatal-error path,
   producing 'ERROR ... WhatsApp bridge process exited unexpectedly' plus
   'Fatal whatsapp adapter error (whatsapp_bridge_exited)' on every
   planned shutdown, immediately before the normal '✓ whatsapp
   disconnected'. Adds a _shutting_down flag that disconnect() sets
   before the terminate, and _check_managed_bridge_exit() returns None
   for returncode in {0, -2, -15} while shutting down. OOM-kill (137)
   and other non-signal exits still hit the fatal path.

3. restart_drain_timeout default 60s → 180s
   On 2026-05-02 01:43:27 a user /restart fired while three agents were
   mid-API-call (82s, 112s, 154s into their turns). The 60s drain budget
   expired and all three were force-interrupted. 180s covers realistic
   in-flight agent turns; users on very-long-reasoning models can still
   raise it further via agent.restart_drain_timeout in config.yaml.
   Existing explicit user values are preserved by deep-merge.

Tests
- tests/gateway/test_restart_notification.py: two new tests assert INFO
  is only logged on SendResult(success=True) and WARNING with the error
  string is logged on SendResult(success=False).
- tests/gateway/test_whatsapp_connect.py: parametrized test for
  returncode in {0, -2, -15} proves shutdown-time exits are suppressed;
  separate test proves returncode 137 (SIGKILL/OOM) still surfaces as
  fatal even when _shutting_down is set.
- _check_managed_bridge_exit() reads _shutting_down via getattr-with-
  default so existing _make_adapter() test helpers that bypass __init__
  (pitfall #17 in AGENTS.md) keep working unmodified.
2026-05-02 02:08:06 -07:00
Teknium
5eac6084bc
fix(discord): warn on 32-char clamp collisions in the /skill collector (#18759)
Discord's per-command name limit is 32 chars. When two skill slugs
share the same first 32 chars (or a skill slug clamps onto a reserved
gateway command name), only the first seen wins — the second is
dropped from the /skill autocomplete. The old behavior incremented a
``hidden`` counter silently, so skill authors had no way to discover
the drop short of noticing their skill was missing from the picker.

Not an actively-biting bug today (no collisions on the default catalog
as of 2026-05), but a landmine the moment someone ships a skill with a
long name. The earlier series in #18745 / #18753 / #18754 dropped the
other silent data-loss paths in the Discord /skill collector; this one
lights up the last remaining one.

Fix: promote ``_names_used`` from a set to a dict keyed by the clamped
name, mapping to the source cmd_key (or a ``"<reserved>"`` sentinel
for names inherited via ``reserved_names``). On collision, log a
WARNING naming both sides — the winner, the loser, the clamped name,
and what to rename.

Two phrasings:

* skill-vs-skill — "both clamp to X on Discord's 32-char command-name
  limit; only the winner appears in /skill. Rename one skill's
  frontmatter ``name:`` to differ in its first 32 chars."
* skill-vs-reserved — "collides with a reserved gateway command name;
  the skill will not appear in /skill. Rename the skill's frontmatter
  ``name:``."

Tests: three cases in
``tests/hermes_cli/test_discord_skill_clamp_warning.py`` —
skill-vs-skill collision (warning names both cmd_keys + clamped prefix),
skill-vs-reserved collision (warning uses the distinct phrasing), and a
no-collision negative (zero warnings emitted).
2026-05-02 02:05:01 -07:00
teknium1
e363ced3c3 test(discord): regression coverage for zombie-websocket guard in connect()
Covers PR #18224 fix for issue #18187 — when DiscordAdapter.connect() is
called a second time without an intervening disconnect(), the previous
commands.Bot must be closed before a new one is created. Otherwise both
websockets stay connected to Discord's gateway and both fire on_message,
producing double responses with different wording.
2026-05-02 02:04:14 -07:00
teknium1
0a6865b328 test(credential_pool): regression coverage for .env vs os.environ precedence
Covers PR #18256 fix for issue #18254 — when OPENROUTER_API_KEY is set in
BOTH os.environ (stale from parent shell) and ~/.hermes/.env (fresh),
_seed_from_env must prefer the .env value. Also guards the fallback case
where .env omits the key entirely (Docker/K8s/systemd deployments that
only inject via runtime env).
2026-05-02 02:00:32 -07:00
Teknium
10297fa23c
fix(discord): /reload-skills now refreshes the /skill autocomplete live (#18754)
`_register_skill_group` captured the skill catalog in closure variables
(`entries` and `skill_lookup`) so the single `tree.add_command` call at
startup owned the only live copy. The closure is never re-entered after
startup, so `/reload-skills` — which rescans the on-disk skills dir and
refreshes the in-process `_skill_commands` registry — had no way to
propagate results into the `/skill` autocomplete on Discord. New skills
stayed invisible in the dropdown, and deleted skills returned
"Unknown skill" when the stale autocomplete entry was clicked.

The fix is purely a dataflow change: promote `entries` and `skill_lookup`
to instance attributes (`_skill_entries`, `_skill_lookup`), split the
collector-driven rebuild into a helper (`_refresh_skill_catalog_state`),
and add a public `refresh_skill_group()` method that re-runs the helper
and is safe to call at any point after the initial registration.

The gateway's `_handle_reload_skills_command` then iterates
`self.adapters` and calls `refresh_skill_group()` on any adapter that
exposes it (currently only Discord). Both sync and async implementations
are supported; adapters that don't override the method (Telegram's
BotCommand menu, Slack subcommand map, etc.) are silently skipped — the
in-process `reload_skills()` call covers them.

No `tree.sync()` is required because Discord fetches autocomplete
options dynamically on every keystroke — mutating the instance state the
callbacks already read from is sufficient. That sidesteps the per-app
command-bucket rate limit (~5 writes / 20 s) that made the previous
bulk-sync-on-reload approach unusable (#16713 context).

Tests: tests/gateway/test_reload_skills_discord_resync.py — five cases
covering (1) refresh replaces entries, (2) entries stay sorted after
refresh, (3) collector exception leaves cached state intact, (4)
`_refresh_skill_catalog_state` populates the instance attrs, (5)
orchestrator calls `refresh_skill_group()` on sync + async adapters and
skips adapters that don't expose it.
2026-05-02 02:00:11 -07:00
Teknium
6ec74aec07
fix(gateway): match disabled/optional skills by frontmatter slug, not dir name (#18753)
_check_unavailable_skill is meant to turn a typed "/foo" command that
doesn't resolve into a specific hint — "disabled, enable with hermes
skills config" or "available but not installed, install with hermes
skills install …" — instead of the generic "unknown command" reply.

It was doing the match with `skill_md.parent.name.lower().replace("_", "-")`,
comparing that to the typed command. For every skill whose directory name
drifted from its declared frontmatter `name:`, that comparison failed and
the user got the unhelpful generic path. On a standard install today 19
skills have this drift, e.g.:

  dir: mlops/stable-diffusion
  frontmatter: name: Stable Diffusion Image Generation
  registered slug (what the user types): /stable-diffusion-image-generation

  dir: mlops/qdrant
  frontmatter: name: Qdrant Vector Search
  registered slug: /qdrant-vector-search

  dir: mlops/flash-attention
  frontmatter: name: Optimizing Attention Flash
  registered slug: /optimizing-attention-flash

In every case, _check_unavailable_skill would fall through because
"stable-diffusion" != "stable-diffusion-image-generation", even with the
skill sitting right there on disk.

Fix: extract a small `_skill_slug_from_frontmatter` helper that reads the
SKILL.md frontmatter and normalizes exactly like scan_skill_commands
(lower, spaces/underscores → hyphens, strip non-[a-z0-9-], collapse
runs of hyphens, strip edges). Use it in both the
disabled-skills branch and the optional-skills branch. The disabled-set
membership check now uses the declared frontmatter name (which is what
`hermes skills config` writes into skills.disabled / platform_disabled),
not the slug.

Tests: five cases in tests/gateway/test_unavailable_skill_hint.py —
the drift case for the disabled branch, unknown-command negative,
matched-but-not-disabled negative, non-alnum stripping, and the drift
case for the optional-skills branch. All five fail against main and
pass with the fix.
2026-05-02 02:00:09 -07:00
Teknium
8825e9044c
fix(discord): complete #18741 for /skill autocomplete and drop legacy 25x25 caps (#18745)
``discord_skill_commands_by_category`` was lagging the flat
``discord_skill_commands`` collector on two counts. Both were actively
dropping skills from Discord's ``/skill`` autocomplete dropdown.

1. External-dir skills were filtered out. #18741 widened the flat
   collector to accept ``SKILLS_DIR + skills.external_dirs`` but left
   this sibling collector — the one ``_register_skill_group`` actually
   uses on Discord — still matching ``SKILLS_DIR`` only. External
   skills were visible in ``hermes skills list`` and the agent's
   ``/skill-name`` dispatch but silently absent from Discord's
   ``/skill`` picker. Widen the accepted roots to match, and derive
   categories from whichever root the skill lives under so
   ``<ext>/mlops/foo/SKILL.md`` still lands in the ``mlops`` group.

2. 25-group × 25-subcommand caps were still applied. PR #11580
   refactored ``/skill`` to a flat autocomplete (whose options Discord
   fetches dynamically — no per-command payload concern) and its
   docstring promises "no hidden skills." The collector kept the old
   nested-layout caps anyway, silently dropping anything past the 25th
   alphabetical category. On installs with 29 category dirs today (real
   example: tail categories ``social-media``, ``software-development``,
   ``yuanbao`` going missing) this was biting immediately. Remove the
   caps; ``hidden`` now reports only 32-char name-clamp collisions
   against reserved names.

Tests: guard both behaviors. ``test_no_legacy_25x25_cap`` builds 30
categories × 30 skills each and asserts all 900 are returned.
``test_external_dirs_skills_included`` monkeypatches
``get_external_skills_dirs`` and asserts an external-dir skill makes
it into the result grouped under its own top-level directory.
2026-05-02 02:00:06 -07:00
Jacob Lizarraga
2470434d60 fix(telegram): probe polling liveness after reconnect to detect wedged Updater
After a transient Telegram 502, _handle_polling_network_error's
stop()+start_polling() cycle can leave PTB's Updater with `running=True`
but a wedged consumer task that never makes progress. No error_callback
fires in that state, so the reconnect ladder never advances past attempt
1, the MAX_NETWORK_RETRIES fatal-error path is never reached, and the
gateway sits silent indefinitely.

Schedule a heartbeat probe (60s after a successful reconnect) that
verifies Updater.running is still True and bot.get_me() responds within
a tight asyncio.wait_for timeout. Either failure feeds back into the
reconnect ladder so the existing escalation path fires.

No PTB-internal coupling, no Application rebuild — minimal additive
defense inside the existing reconnect abstraction.

Tests cover healthy / Updater non-running / probe timeout / probe
network error / already-fatal cases, plus an integration check that the
probe is actually scheduled after a successful start_polling().

Closes the silent-wedge case observed in the wild after a transient
Telegram 502; existing reconnect tests updated to mock bot.get_me() now
that the success path schedules a heartbeat probe.
2026-05-02 01:55:04 -07:00
liuhao1024
9bf260472b fix(tools): deduplicate tool names at API boundary for Vertex/Azure/Bedrock
Providers like Google Vertex, Azure, and Amazon Bedrock reject API
requests with duplicate tool names (HTTP 400: 'Tool names must be
unique').  The upstream injection paths in run_agent.py already dedup
after PR #17335, but two API-boundary functions pass tools through
without checking:

- agent/auxiliary_client.py: _build_call_kwargs() (all non-Anthropic
  providers in chat_completions mode)
- agent/anthropic_adapter.py: convert_tools_to_anthropic() (Anthropic
  Messages API path)

Add defensive dedup guards at both sites.  Duplicates are dropped with
a warning log, converting a hard 400 failure into a recoverable
condition.  This is intentionally conservative — the root-cause dedup
in run_agent.py is the primary defense; these guards add resilience
against future injection-path regressions.

Includes 8 new tests covering unique passthrough, duplicate removal,
empty/None edge cases.

Closes #18478
2026-05-02 01:51:51 -07:00
Teknium
699b3679bc
fix(constants): warn once when get_hermes_home() falls back under an active profile (#18746)
When HERMES_HOME is unset but ~/.hermes/active_profile names a non-default
profile, any data this process writes lands in the default profile — not the
one the operator expects. Before this change the fallback was silent, so
cross-profile contamination (#18594) was invisible until a user noticed
their memory/state ended up in the wrong place.

Now we emit a one-shot warning to stderr the first time this happens in
a process. No raise — there are 30+ module-level callers of get_hermes_home()
and raising from any of them would brick import. Behavior is otherwise
unchanged; subprocess spawners (systemd template, kanban dispatcher, docker
entrypoint) already propagate HERMES_HOME correctly.

Bypasses logging.getLogger() because this runs before logging is configured
in a significant fraction of callers (module import time).

Refs #18594. Credit to @liuhao1024 for surfacing the silent-fallback case
in PR #18600; we kept the diagnostic signal without the import-time raise.
2026-05-02 01:49:55 -07:00
CoreyNoDream
c5e3a6fb5b fix(cli): decode .env as UTF-8 to avoid GBK crash on Windows
Path.read_text() uses the system locale by default. On Windows CN/JP/KR
locales (GBK/CP932/CP949), reading a UTF-8 .env raises UnicodeDecodeError
as soon as it contains any non-ASCII byte (e.g. an em dash).

Pin encoding="utf-8" on every .env read in hermes_cli to match how the
rest of the codebase (load_dotenv at doctor.py:26) already decodes it.

Adds a regression test that monkeypatches Path.read_text to simulate a
GBK locale and asserts 'hermes doctor' no longer raises.

Refs #18637
2026-05-02 01:40:31 -07:00
Teknium
e2cea6eeba
fix(gateway): include external_dirs skills in Telegram/Discord slash commands (#18741)
Skills configured through `skills.external_dirs` in config.yaml were
visible via `hermes skills list`, `get_skill_commands()`, and the
agent's `/skill-name` dispatch, but silently excluded from the
Telegram and Discord slash-command menus. The filter in
`_collect_gateway_skill_entries` only accepted skills whose
`skill_md_path` started with `SKILLS_DIR`, so anything under an
external directory fell through.

Widen the accepted-prefix set to include all configured external
dirs alongside the local skills dir. Every prefix is now
slash-terminated so `/my-skills` cannot also admit
`/my-skills-extra`. Also guard against empty `skill_md_path`
values so they can't accidentally match.

Fixes #8110

Salvages #8790 by luyao618.

Co-authored-by: Yao <34041715+luyao618@users.noreply.github.com>
2026-05-02 01:36:57 -07:00
Teknium
c73594fe41
fix(skills): rescan skill_commands cache when platform scope changes (#18739)
The process-global `_skill_commands` dict in agent/skill_commands.py
was seeded by whichever platform scanned first, and
`get_skill_commands()` only rescanned when the cache was empty. In a
long-lived gateway process serving multiple platforms (Telegram +
Discord + Slack), the first platform's
`skills.platform_disabled` view was silently inherited by the
others — so a skill disabled for Telegram would also disappear from
Discord's slash menu, and vice versa.

Track the platform scope the cache was populated for
(`_skill_commands_platform`) and rescan in `get_skill_commands()`
when the currently-active platform no longer matches. Platform
resolution uses the same precedence as `_is_skill_disabled`:
`HERMES_PLATFORM` env var then `HERMES_SESSION_PLATFORM` from the
gateway session context.

Fixes #14536

Salvages #14570 by LeonSGP43.

Co-authored-by: LeonSGP <leon@sgp43.com>
2026-05-02 01:36:53 -07:00
Teknium
97acd66b4c
fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671) (#18731)
* fix(curator): authoritative absorbed_into declarations on skill delete

Closes #18671. The classification pipeline that feeds cron-ref rewriting
used to infer consolidation vs pruning from two brittle signals: the
curator model's post-hoc YAML summary block, and a substring heuristic
scanning other tool calls for the removed skill's name. Both miss in
real consolidations — the model forgets the YAML under reasoning
pressure, and the heuristic misses when the umbrella's patch content
describes the absorbed behavior abstractly instead of naming the old
slug. When both miss, the skill falls through to 'no-evidence fallback'
pruned, and #18253's cron rewriter drops the cron ref entirely instead
of mapping it to the umbrella. Same observable symptom as pre-#18253:
'Skill(s) not found and skipped' at the next cron run.

The fix makes the model declare intent at the moment of deletion.
skill_manage(action='delete') now accepts absorbed_into:
  - absorbed_into='<umbrella>'  -> consolidated, target must exist on disk
  - absorbed_into=''            -> explicit prune, no forwarding target
  - missing                     -> legacy path, falls through to heuristic/YAML

The curator reconciler reads these declarations off llm_meta.tool_calls
BEFORE either the YAML block or the substring heuristic. Declaration
wins. Fallback logic stays intact for backward compat with any caller
(human or older curator conversation) that doesn't populate the arg.

Changes
- tools/skill_manager_tool.py: add absorbed_into param to skill_manage
  + _delete_skill. Validate target exists when non-empty. Reject
  absorbed_into=<self>. Wire through dispatcher + registry + schema.
- agent/curator.py: new _extract_absorbed_into_declarations() walks
  tool calls for skill_manage(delete) with the arg. _reconcile_classification
  accepts absorbed_declarations= and treats them as authoritative. Curator
  prompt updated to require the arg on every delete.
- Tests: 7 new skill_manager tests covering the tool contract (valid
  target, empty string, nonexistent target, self-reference, whitespace,
  backward compat, dispatcher plumbing). 11 new curator tests covering
  the extractor + authoritative reconciler path + mixed-legacy-and-
  declared runs.

Validation
- 307/307 targeted tests pass (curator + cron + skill_manager suites).
- E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing
  all 3. Model emits NO YAML block. Heuristic misses (patch prose
  doesn't name old slugs). Delete calls carry absorbed_into. Result:
  both PR skills correctly classified 'consolidated' + cron rewritten
  ['pr-review-format', 'pr-review-checklist', 'stale-junk'] ->
  ['hermes-agent-dev']; stale-junk pruned via absorbed_into=''.
- E2E backward-compat: delete without absorbed_into, model emits YAML
  -> routed via existing 'model' source, cron still rewritten correctly.

* feat(curator): capture + restore cron skill links across snapshot/rollback

Before this, rolling back a curator run restored the skills tree but cron
jobs still pointed at the umbrella skills the curator had rewritten them
to. The user would see their old narrow skills back on disk but their
cron jobs still configured with the merged umbrella — not actually 'back
to how it was'.

Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json
alongside the skills tarball, as cron-jobs.json. The manifest gets a new
'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI
confirm dialog) can surface what's in the snapshot. If jobs.json is
missing/unreadable/malformed, snapshot proceeds without cron data — the
skills backup is the core guarantee; cron is additive.

Rollback side: after the skills extract succeeds, the new
_restore_cron_skill_links() reconciles the backed-up jobs into the live
jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and
only on jobs matched by id. Everything else about a cron job — schedule,
last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live
state the user or scheduler has modified since the snapshot; overwriting
it would regress unrelated activity.

Reconciliation rules:
- Job in backup AND live, skills differ  → skills restored.
- Job in backup AND live, skills match   → no-op.
- Job in backup, NOT in live             → skipped (user deleted it
                                              after snapshot; their choice
                                              is later than the snapshot).
- Job in live, NOT in backup             → untouched (user created it
                                              after snapshot).
- Snapshot missing cron-jobs.json at all → rollback still succeeds,
                                              reports 'not captured'
                                              (older pre-feature snapshots
                                              keep working).

Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the
scheduler uses, so rollback doesn't race tick().

Also:
- hermes_cli/curator.py: rollback confirm dialog now shows
  'cron jobs: N (will be restored for skill-link fields only)' when the
  snapshot has cron data, or 'not in snapshot (<reason>)' otherwise.
- rollback()'s message string includes a 'cron links: ...' clause
  summarizing the reconciliation outcome.

Tests
- 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json
  captured-as-raw, full rollback-restores-skills-and-cron, rollback
  touches only skill fields, rollback skips user-deleted jobs, rollback
  leaves user-created jobs untouched, rollback still works with
  pre-feature snapshot that has no cron-jobs.json, standalone unit test
  on _restore_cron_skill_links exercising the full report shape.

Validation
- 484/484 targeted tests pass (curator + cron + skill_manager suites).
- E2E: real snapshot_skills, real cron rewrite, real rollback. Before:
  ['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage'].
  After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format',
  'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id,
  name, prompt) preserved across the round trip.
2026-05-02 01:29:57 -07:00
Amr Essam
d05a87e686 fix(gateway): clear slack assistant thread status 2026-05-01 14:01:26 -07:00
hinotoi-agent
a147164d3c fix(slack): preserve per-user slash-command session isolation 2026-05-01 14:01:26 -07:00
nightq
5cdc39e29a fix(gateway): preserve case-sensitive chat IDs in DeliveryTarget.parse
Fixes NousResearch/hermes-agent#11768

Root cause: target.strip().lower() was lowercasing the entire target string,
corrupting case-sensitive chat IDs like Slack C123ABC and Matrix !RoomABC.

Fix: Only lowercase the platform prefix for case-insensitive matching;
preserve the original case for chat_id and thread_id values.
2026-05-01 14:01:26 -07:00
YAMAGUCHI Seiji
2b3923ff13 fix(gateway): coerce scalar free_response_channels to str before split
YAML loads a bare numeric value such as
    discord:
      free_response_channels: 1491973769726791812
as an int.  _discord_free_response_channels() / _slack_free_response_channels()
checked `isinstance(raw, list)` and `isinstance(raw, str)` in that order and
then fell through to `return set()`, so a single-channel config that happened
to be unquoted was silently dropped with no log line — the bot kept demanding
@mentions even though the channel was configured to free-response.

A multi-channel value like `1234567890,9876543210` does not trip this because
the comma forces YAML to parse it as a string.  Single-channel configs are
the only case that breaks, which is exactly the footgun that's hardest to
diagnose (the config "looks right" and the feature just doesn't activate).

Note that the old-schema env-var bridge at gateway/config.py:614+ already
runs `str(frc)` when forwarding to SLACK_/DISCORD_FREE_RESPONSE_CHANNELS,
so the env-var fallback worked.  The bug only surfaces on the
`config.extra["free_response_channels"]` path populated by the `platforms:`
bridge at gateway/config.py:576, which passes the raw YAML value through
unchanged.

Fix at the reader: treat any non-list value as a scalar, coerce with str(),
then apply the same CSV split semantics.  This keeps the public contract
stable (list or str-like continues to work identically) while accepting
the ints that the YAML loader is free to hand us.

Added tests for both Discord and Slack covering:
  - bare int value in config.extra
  - list of ints in config.extra
2026-05-01 14:01:26 -07:00
Prive FE Coder
a717199bbf fix(slack): exclude reserved Slack commands from native slash manifest
Slack has built-in slash commands (e.g. /status, /me, /join) that apps
cannot register. When running `hermes slack manifest --write`, the
generated manifest included /status, causing Slack to reject the entire
manifest with a reserved-command error.

Add _SLACK_RESERVED_COMMANDS frozenset of all known Slack built-ins and
skip them in slack_native_slashes(). Affected commands remain reachable
via /hermes <command>.

Tests updated:
- New test_excludes_slack_reserved_commands validates no leaks
- test_includes_canonical_commands no longer asserts /status
- test_telegram_parity accounts for expected Slack-only exclusions
2026-05-01 14:01:26 -07:00
kshitijk4poor
8fcc160f6b fix(gateway/slack): review fixes — scope ephemeral to commands, user isolation
Self-review fixes for the slash ephemeral ack:

- Only stash response_url when text starts with '/' (gateway command).
  Free-form questions via '/hermes <question>' must produce public agent
  replies visible to the whole channel, not ephemeral.
- Use a ContextVar (_slash_user_id) to thread the invoking user's ID
  from _handle_slash_command through to send().  _pop_slash_context now
  matches the exact (channel_id, user_id) key when the ContextVar is
  set, preventing concurrent users on the same channel from stealing
  each other's ephemeral context.  ContextVars propagate to child
  asyncio.Tasks, so the value survives through handle_message →
  _process_message_background → _send_with_retry → send().
- Add truncate_message() in _send_slash_ephemeral to prevent silent
  failures on long responses (response_url has the same ~40k limit).
- Log send_private_notice failures at debug level instead of bare
  except/pass — aids diagnostics without spamming.
- Document app_mention dedup dependency on shared event ts.
- Add tests: free-form question must NOT stash context, concurrent
  users on the same channel get isolated contexts, non-slash send()
  path fallback behavior.
2026-05-01 13:33:06 -07:00
probepark
0ab2d752ff feat(gateway): private notice delivery and Slack format_message fixes
Adds platform-level private notice delivery abstraction so operational
messages (e.g. sethome prompt) can be sent ephemerally on Slack when
configured with `slack.notice_delivery: private`.

Changes:
- gateway/config.py: _normalize_notice_delivery() + GatewayConfig.get_notice_delivery()
  with per-platform config bridging
- gateway/platforms/base.py: send_private_notice() default implementation
  (falls through to send())
- gateway/platforms/slack.py: send_private_notice() via chat_postEphemeral
- gateway/run.py: _deliver_platform_notice() helper replaces direct
  adapter.send() for the sethome notice, with private→public fallback
- gateway/platforms/slack.py: app_mention handler now forwards to
  _handle_slack_message (safe due to ts-based dedup) instead of no-op pass,
  fixing edge-case Slack configs where mentions arrive only as app_mention
- gateway/platforms/slack.py format_message: negative lookbehind prevents
  markdown images (![]()) from becoming broken Slack links; italic regex
  now requires non-whitespace boundaries so 'a * b * c' stays literal

Based on PR #9340 by @probepark.
2026-05-01 13:33:06 -07:00
kshitijk4poor
7cda0e5224 fix(gateway/slack): ephemeral ack and routing for slash commands
Slack slash commands (/q, /btw, /stop, /model, etc.) previously showed
no user-visible acknowledgement and posted command replies as public
channel messages.  This diverged from Discord, which uses ephemeral
deferred responses for slash commands.

Changes:
- handle_hermes_command now passes response_type='ephemeral' and a
  'Running /cmd…' text to ack(), giving the user immediate 'Only visible
  to you' feedback when they invoke any native slash command.
- _handle_slash_command stashes the Slack response_url from the command
  payload in a per-channel context dict before dispatching to
  handle_message.
- send() checks for a pending slash context and, when found, POSTs to
  the response_url with replace_original=true to swap the initial ack
  with the real command reply (e.g. 'Queued for the next turn.'),
  keeping it ephemeral.
- Stale slash contexts are garbage-collected on lookup (120s TTL).
- The response_url POST is non-fatal: if it fails, the user already saw
  the initial ack, and send() returns success=True.

Fixes #18182
2026-05-01 13:33:06 -07:00
Teknium
f99676e315
fix(gateway): auto-restart when source files change out from under us (#17648) (#18409)
Long-running gateway processes that survive 'hermes update' keep
pre-update modules cached in sys.modules. When new tool files on
disk then try to 'from hermes_cli.config import cfg_get' (added in
PR #17304), the import resolves against the stale module object
and raises ImportError — hitting users on Matrix, Telegram, Feishu,
and other platforms.

Two defenses:

1. Gateway self-check (gateway/run.py). On __init__, snapshot the
   newest mtime across sentinel source files (hermes_cli/config.py,
   run_agent.py, gateway/run.py, etc.). On every inbound message,
   re-read those mtimes; if any is newer than boot time + 2s slack,
   request a graceful restart via the normal drain path and return
   a one-line ack to the user. Idempotent, works regardless of how
   the update happened (hermes update, manual git pull, installer).

2. Post-restart survivor sweep ('hermes update'). After the existing
   restart loop, sleep 3s, rescan for gateway PIDs we already tried
   to kill, and SIGKILL any survivors. The detached profile watchers
   and systemd then relaunch with fresh code instead of waiting out
   the 120s watcher timeout.

Closes #17648.
2026-05-01 09:50:08 -07:00
Teknium
77c0bc6b13
fix(curator): defer first run and add --dry-run preview (#18373) (#18389)
* fix(curator): defer first run and add --dry-run preview (#18373)

Curator was meant to run 7 days after install, not on the very first
gateway tick. On a fresh install (no .curator_state), should_run_now()
returned True immediately because last_run_at was None — so the gateway
cron ticker fired Curator against a fresh skill library moments after
'hermes update'. Combined with the binary 'agent-created' provenance
model (anything not bundled and not hub-installed), this consolidated
hand-authored user workflow skills without consent.

Changes:
- should_run_now(): first observation seeds last_run_at='now' and returns
  False. The next real pass fires one full interval_hours later (7 days
  by default), matching the original design intent.
- hermes curator run --dry-run: produces the same review report without
  applying automatic transitions OR permitting the LLM to call
  skill_manage / terminal mv. A DRY-RUN banner is prepended to the
  prompt and the caller skips apply_automatic_transitions. State is
  NOT advanced so a preview doesn't defer the next scheduled real pass.
- hermes update: prints a one-liner on fresh installs pointing at
  --dry-run, pause, and the docs. Silent on steady state.
- Docs: curator.md and cli-commands.md explain the deferred first-run
  behavior and warn that hand-written SKILL.md files share the
  'agent-created' bucket, with guidance to pin or preview before the
  first pass.

Tests:
- test_first_run_defers replaces the old 'first run always eligible'
  assertion — same fixture, inverted expectation.
- test_maybe_run_curator_defers_on_fresh_install covers the gateway tick
  path end-to-end.
- Three new dry-run tests cover state-advance suppression, prompt
  banner injection, and apply_automatic_transitions skipping.

Fixes #18373.

* feat(curator): pre-run backup + rollback (#18373)

Every real curator pass now snapshots ~/.hermes/skills/ into
~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling
apply_automatic_transitions or the LLM review. If a run consolidates or
archives something the user didn't want touched, 'hermes curator
rollback' restores the tree in one command. Dry-run is skipped — no
mutation means no snapshot needed.

Changes:
- agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The
  snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed
  by the skills hub). Extract refuses absolute paths and .. components,
  and uses tarfile's filter='data' on Python 3.12+. Rollback takes a
  pre-rollback safety snapshot FIRST, stages the current tree into
  .rollback-staging-<ts>/ so the extract lands in an empty dir, and
  cleans the staging dir on success. A failed extract restores the
  staged contents.
- agent/curator.py: run_curator_review() calls curator_backup.
  snapshot_skills(reason='pre-curator-run') before apply_automatic_
  transitions. Best-effort — a failed snapshot logs at debug and the
  run continues (a transient disk issue shouldn't silently disable
  curator forever).
- hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator
  rollback' subcommands. rollback supports --list, --id <ts>, -y.
- hermes_cli/config.py: curator.backup.{enabled, keep} config block
  with sane defaults (enabled=true, keep=5).
- Docs: curator.md gets a 'Backups and rollback' section; cli-commands
  .md table gets the new rows.

Tests (new file tests/agent/test_curator_backup.py, 16 cases):
- snapshot creates tarball + manifest with correct counts
- snapshot excludes .curator_backups/ (recursion guard) and .hub/
- snapshot disabled via config returns None without creating anything
- snapshot uniquifies ids within the same second (-01 suffix)
- prune honors keep count, newest-first
- list_backups + _resolve_backup cover newest-default and unknown-id
- rollback restores a deleted skill with content intact
- rollback is itself undoable — safety snapshot shows up in list_backups
- rollback with no snapshots returns an error
- rollback refuses tarballs with absolute paths or .. components
- real curator runs take a 'pre-curator-run' snapshot; dry-runs do not

All curator tests: 210 passing locally.
2026-05-01 09:49:59 -07:00
Siddharth Balyan
c5b4c48165
fix: lazy session creation — defer DB row until first message (#18370)
Prevents ghost sessions from accumulating in state.db when the TUI/web
dashboard is opened and closed without sending a message.

Changes:
- run_agent.py: Add _ensure_db_session() gate method, called at
  run_conversation() entry. Remove eager create_session() from __init__.
  Handle compression rotation flag correctly.
- tui_gateway/server.py: Remove eager db.create_session() in
  _start_agent_build(). Add post-first-message pending_title re-apply.
- hermes_state.py: Extract _insert_session_row() shared helper (DRY).
  Add prune_empty_ghost_sessions() for one-time migration.
- cli.py: One-time ghost session prune on startup. Fix _pending_title
  to call _ensure_db_session() before set_session_title().
- hermes_cli/main.py: Guard TUI exit summary on message_count > 0.
- tests: Update test_860_dedup to call _ensure_db_session() before
  direct _flush_messages_to_session_db() calls.

Closes: ghost session clutter in hermes sessions list and web dashboard.
2026-05-01 18:39:12 +05:30
teknium1
2af8b8ff37 fix(moonshot): also strip nullable/enum after anyOf collapse
The anyOf collapse in _repair_schema returned early, skipping the
nullable-strip and enum-cleanup steps. When a schema had anyOf
[{enum: [..., null, '']}, {type: null}] alongside a parent-level
'nullable: true', collapsing to the single non-null branch produced a
merged node that still had both 'nullable' and the bad enum values —
Moonshot would still 400 on it.

Fix: fall through to Rules 1/3 when the collapse produces a single
merged node; only return early for the multi-branch case (pure
anyOf preservation) or when there was no null branch to remove.

Adds a test that locks in the combined-case expectation.
2026-04-30 23:14:31 -07:00
Hendrix
9ca72a69a7 fix(moonshot): fill missing type before enum cleanup to handle anyOf branches without explicit type
When a schema node inside anyOf has enum values but no explicit 'type',
Rule 3 (enum cleanup) ran before _fill_missing_type, so node_type was
None and the enum was never cleaned. Moonshot then rejected the schema
with 'enum value (<nil>) does not match any type in [string]'.

Fix: reorder operations — fill missing type first, strip nullable,
then clean enum. This ensures enum cleanup always has a type to check.

Also fixes test expectation: empty string in enum is now correctly
stripped (Moonshot rejects it too).

Closes #16875
2026-04-30 23:14:31 -07:00
Mikey O'Brien
1be3b74cfb fix(gateway): honor MATRIX_HOME_ROOM in onboarding 2026-04-30 23:13:34 -07:00
Teknium
265bd59c1d
feat: /goal — persistent cross-turn goals (Ralph loop) (#18262)
Add a standing-goal slash command that keeps Hermes working toward a
user-stated objective across turns until it is achieved, paused, or
the turn budget runs out. Our take on the Ralph loop — cf. Codex CLI
0.128.0's /goal.

After each turn, a lightweight auxiliary-model judge call asks 'is
this goal satisfied by the assistant's last response?'. If not, and
we're under the turn budget (default 20), Hermes feeds a continuation
prompt back into the same session as a normal user message. Any real
user message preempts the continuation loop automatically.

Judge failures fail OPEN (continue) so a flaky judge never wedges
progress — the turn budget is the real backstop.

### Commands

- `/goal <text>`    — set a standing goal (kicks off the first turn)
- `/goal` or `/goal status` — show current state
- `/goal pause`    — pause the continuation loop
- `/goal resume`   — resume (resets turn counter)
- `/goal clear`    — drop the goal

Works on both CLI and gateway platforms via the central CommandDef
registry.

### Design invariants preserved

- **Prompt cache**: continuation prompts are regular user-role
  messages appended to history. No system-prompt mutation, no toolset
  swap.
- **Role alternation**: continuation is a user turn, never injected
  mid-tool-loop.
- **Session persistence**: goal state lives in SessionDB.state_meta
  keyed by `goal:<session_id>`, so `/resume` picks it up.
- **Mid-run safety**: on the gateway, `/goal status|pause|clear` are
  allowed mid-run (control-plane only); setting a new goal requires
  `/stop` first so we don't race a second continuation prompt against
  the current turn.

### Files

- `hermes_cli/goals.py` (new, 380 lines) — GoalManager + judge + state
- `hermes_cli/commands.py` — CommandDef entry
- `hermes_cli/config.py` — `goals.max_turns` default
- `hermes_cli/web_server.py` — dashboard category merge
- `cli.py` — /goal handler + post-turn continuation hook in
  process_loop
- `gateway/run.py` — /goal handler + post-turn continuation hook
  wrapping _handle_message_with_agent
- `tests/hermes_cli/test_goals.py` (new, 26 tests) — judge parsing,
  fail-open semantics, lifecycle, persistence, budget exhaustion
- `website/docs/reference/slash-commands.md` — docs entry
2026-04-30 23:10:20 -07:00
Teknium
50c046331d
feat(update): add --yes/-y flag to skip interactive prompts (#18261)
hermes update had two interactive [Y/n] prompts with no bypass:
  1. Config migration (after new env/config options are added)
  2. Autostash restore (when uncommitted work was stashed before pull)

hermes uninstall already has --yes/-y; mirrors that.

Under --yes:
  - Config-migrate prompt → auto-yes, migrate_config(interactive=False)
    so new config fields are applied but API-key prompts are skipped
    (user runs 'hermes config migrate' later for those). Matches
    gateway-mode semantics.
  - Stash-restore prompt → auto-yes, git stash apply runs automatically.

Closes the 'can I hermes update -y, No ! Fix' gap reported by @murelux.
2026-04-30 23:06:32 -07:00
Teknium
4caad285a6
feat(gateway): auto-delete slash-command system notices after TTL (#18266)
Adds opt-in auto-deletion for slash-command reply messages like
"New session started!", "Restarting gateway…", "Stopped.", and
YOLO toggles.  After the TTL elapses the gateway calls the adapter's
delete_message; on platforms without a delete API (everything except
Telegram today) the TTL is silently ignored and the message stays.

Requested on Twitter by @charlesmcdowell — tool-call bubbles are useful
real-time, but system notices clutter the thread once the agent finishes.

Implementation:

- EphemeralReply(str) sentinel in gateway/platforms/base.py.  Subclasses
  str so existing 'X' in response / response.startswith(...) checks in
  tests and call sites keep working unchanged; isinstance() still
  distinguishes it for the send path.
- _process_message_background and both busy-session bypass paths
  (in base.py) call _unwrap_ephemeral() on the handler return, send
  the unwrapped text, and schedule a detached delete task when the
  TTL > 0 AND the adapter class overrides delete_message.
- display.ephemeral_system_ttl (default 0 = disabled) in DEFAULT_CONFIG.
  Handler can pass ttl_seconds explicitly to override.
- Wrapped the highest-noise return sites: /new, /reset, /stop,
  /yolo on/off, /restart success + "already in progress".  Draining
  notices and /help output left as plain strings — those are
  informational and users want to read them.

Backward-compat: default TTL 0 → no scheduling, no behavior change
for existing users.  Platforms without delete_message silently no-op.
2026-04-30 23:05:48 -07:00
Teknium
e2eb561e8e
fix(curator): rewrite cron job skill refs after consolidation (#18253)
When the curator consolidates skill X into umbrella Y, any cron job
that listed X in its skills field would fail to load X at run time —
the scheduler logs a warning and skips it, so the scheduled job runs
without the instructions it was scheduled to follow.

cron.jobs.rewrite_skill_refs(consolidated, pruned) now updates jobs
in-place: consolidated names route to the umbrella target (dedup
when umbrella is already present), pruned names are dropped.
agent.curator._write_run_report calls it after classification,
best-effort so a cron-side failure never breaks the curator itself.

Results are recorded in run.json (counts.cron_jobs_rewritten + full
cron_rewrites payload), a separate cron_rewrites.json for convenience
when jobs were touched, and a section in REPORT.md.

Reported by @tombielecki.
2026-04-30 23:04:50 -07:00
IMHaoyan
bfb704684e fix(deepseek): use non-empty reasoning_content placeholder for V4 Pro thinking mode
DeepSeek V4 Pro tightened thinking-mode validation and rejects empty-string
reasoning_content with HTTP 400:

    The reasoning content in the thinking mode must be passed back to the API.

run_agent.py injected "" at three fallback sites — the tool-call pad in
_build_assistant_message and both injection branches of
_copy_reasoning_content_for_api (cross-provider poison guard + unconditional
thinking pad). All three now emit " " (single space), which satisfies the
non-empty check on V4 Pro without leaking fabricated reasoning.

Also upgrades stale empty-string placeholders on replay: sessions persisted
before this change have reasoning_content="" pinned at creation time; when
the active provider enforces thinking-mode echo, the replay path now rewrites
"" -> " " so existing users don't 400 on their first V4 Pro turn after
updating. Non-thinking providers still round-trip "" verbatim.

Updates 9 existing assertions + adds 2 regression tests (stale-placeholder
upgrade, non-thinking verbatim preservation).

Refs #15250, #17400.
Closes #17341.
2026-04-30 23:04:23 -07:00
Teknium
f0dc919f92
fix(compression): include system prompt + tool schemas in token estimates (#18265)
The user-visible /compress banner and the post-compression last_prompt_tokens
writeback both counted only the raw message transcript (chars/4). With a 15KB
system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks
like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of
request pressure — a 234x gap.

Two user-facing consequences:
- Banner shows 'Compressing … (~45 tokens)…' while compression is actually
  firing on 10K+ tokens of real pressure, confusing users about why
  compression triggered (reported by @codecovenant on X; #6217).
- Post-compression last_prompt_tokens writeback omits tool schemas, so the
  next should_compress() check compares real usage against a stale
  underestimate — compression triggers late, potentially past the model's
  context limit on small-context models (#14695).

Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough()
at every user-visible banner and at the post-compression writeback.
estimate_request_tokens_rough() already existed for exactly this purpose
and includes system prompt + tool schemas.

Touched call sites:
- run_agent.py: post-compression last_prompt_tokens writeback, post-tool
  call should_compress() fallback when provider usage is missing
- cli.py: /compress banner + summary
- gateway/run.py: gateway /compress banner + summary
- tui_gateway/server.py: TUI /compress status + summary
- acp_adapter/server.py: ACP /compact before/after

Left intentionally alone:
- Session-hygiene fallback and the 'no agent' /status path in gateway/run.py
  — no agent instance is in scope to query for system prompt/tools, and the
  existing 30-50% overestimate wobble on hygiene is safety-accepted.
- Verbose-mode 'Request size' logging — informational only, already counts
  system prompt via api_messages[0].

Also relabels the feedback line from 'Rough transcript estimate' to
'Approx request size' so the metric label matches what it actually measures.

Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217);
user report @codecovenant on X (2026-04-30).

Closes #14695
Closes #6217
2026-04-30 23:03:54 -07:00
Teknium
41fa1f1b5c
fix(acp): run /steer as a regular prompt on idle sessions (#18258)
When a user types /steer <text> on an ACP session that isn't actively
running a turn (and there's no interrupted-prompt salvage available),
_cmd_steer silently appended to state.queued_prompts and replied
"No active turn — queued for the next turn". That looks identical to
/queue output even though the user never typed /queue — @EddyLeeKhane
reported this as "/steer never works, gets queued instead".

Rewrite the payload to a plain user prompt before the slash-intercept
fires, matching the gateway's idle-/steer fallthrough in
gateway/run.py ~L4898.
2026-04-30 22:45:14 -07:00
Henkey
ec1443b9f1 fix(acp): normalize Windows cwd for WSL tool execution 2026-04-30 20:55:14 -07:00
Henkey
78886365c2 fix(acp): replay interrupted prompts for steer 2026-04-30 20:54:37 -07:00
Henkey
e27b0b7651 feat(acp): add steer and queue slash commands 2026-04-30 20:54:37 -07:00
Mind-Dragon
0704589ceb fix(agent): make tool loop guardrails warning-first 2026-04-30 20:43:15 -07:00
Mind-Dragon
58b89965c8 fix(agent): add tool-call loop guardrails 2026-04-30 20:43:15 -07:00
Oxidane-bot
8d7500d80d fix(gateway): snapshot callback generation after agent binds it, not before
_process_message_background snapshotted callback_generation from the
interrupt event at the TOP of the task — before the handler ran.
_hermes_run_generation is only set on the event by
GatewayRunner._bind_adapter_run_generation during
_handle_message_with_agent, which runs DURING the handler await. The
early snapshot always captured None, which then flowed into
pop_post_delivery_callback(..., generation=None) in the finally block.

In pop_post_delivery_callback, generation=None with a tuple-registered
entry (generation, callback) bypasses the ownership check — it pops and
fires the callback regardless of which run owns it. Result: a stale run
could fire a fresher run's post-delivery callback (e.g. a
background-review notification attributed to the wrong turn).

Fix: move the snapshot into the finally block, after the handler has
run and _hermes_run_generation has been bound to the current run.

Regression test added: simulates a stale handler at generation=1 and a
fresher callback registered at generation=2. Pre-fix: snapshot=None →
pop fires the generation=2 callback under generation=1's ownership
("newer" fires). Post-fix: snapshot=1 → pop skips the mismatched
entry, callback stays in the dict for the correct run to claim.

Verified: test FAILS on current main (captures "newer" in fired list),
PASSES with this fix.

Salvaged from PR #12565 (the callback-ownership portion only; the
/status totals portion was already fixed on main in 7abc9ce4d via #17158).

Co-authored-by: Oxidane-bot <1317078257maroon@gmail.com>
2026-04-30 20:41:18 -07:00
Teknium
27ec74c68a fix: coerce show_reasoning and guard_agent_created config bools
Widens #16528 to two sibling sites that had the same quoted-boolean
bug: a YAML string "false" (or "0", "no", "off") silently evaluated
truthy under bool() / if-check.

- gateway/run.py _load_show_reasoning: is_truthy_value wrap
- tools/skill_manager_tool.py _guard_agent_created_enabled: is_truthy_value wrap
- regression tests for both
2026-04-30 20:40:46 -07:00
johnncenae
bb706c3f38 fix(gateway): coerce tool_progress_command as a real boolean 2026-04-30 20:40:46 -07:00
Teknium
a94841eaa0 fix(state): include finish_reason in conversation replay
SELECT in get_messages_as_conversation() was missing finish_reason, so
assistant messages round-tripped through replay (including /branch copies)
silently dropped the provider's stop signal. Adds it to the SELECT, restores
it on assistant rows, and locks it in with a round-trip test.
2026-04-30 20:40:28 -07:00
simbam99
7ba1a2b3df fix(gateway): preserve assistant metadata when branching sessions 2026-04-30 20:40:28 -07:00
Yukipukii1
55366510e5 fix(auth): make provider config writes atomic 2026-04-30 20:39:41 -07:00
Mind-Dragon
ab6c629ccc fix(terminal): skip sudo prompt when local NOPASSWD sudo works
When running on a host with sudoers NOPASSWD configured for the current
user, interactive Hermes sessions were unnecessarily entering the
password prompt path before executing sudo commands. Outside Hermes,
`sudo -n true` exits 0 for that user.

Add `_sudo_nopasswd_works()` that probes `sudo -n true` and, when it
succeeds, lets `_transform_sudo_command()` return the command unchanged
with no stdin password. The probe:

- Is scoped to the `local` terminal backend only, so Docker/SSH/Modal
  and other remote backends do not inherit host sudo state.
- Re-probes every call (no process-lifetime cache) so an expired sudo
  timestamp cannot silently make a later command block waiting for a
  password that Hermes never prompts for.
- Is bypassed entirely when `SUDO_PASSWORD` is configured or a cached
  password already exists, preserving existing explicit-password flows.

Co-authored-by: Junting Wu <juntingpublic@gmail.com>
2026-04-30 20:38:09 -07:00
simbam99
ccfe6a47c3 fix(gateway): coerce StreamingConfig booleans and malformed numerics safely 2026-04-30 20:37:49 -07:00
hharry11
24130b7e53 fix(approval): harden YOLO mode env parsing against quoted-bool strings 2026-04-30 20:37:37 -07:00
hharry11
158eb32686 fix(gateway): preserve document type when merging queued events 2026-04-30 20:37:27 -07:00
sprmn24
adaee2c72c test(skill_utils): add regression tests for non-dict metadata in extract_skill_conditions
The fix for this bug (isinstance guard) was merged via commit 3ff9e010,
but test coverage was not included. Adding 4 tests:
- dict metadata with hermes keys (normal case)
- string metadata (bug case — previously caused AttributeError)
- None metadata
- missing metadata key
2026-04-30 20:37:15 -07:00
teknium1
e21898ea98 test(discord_tool): add regression test for per-token capability cache
Proves token A's detected capabilities do not leak to token B after the
fix in the preceding commit. Before the fix this test would have seen
both tokens return token A's cached value.
2026-04-30 20:37:12 -07:00
Teknium
82b5786721 test(browser_supervisor): cover cache-hit healthcheck on dead thread/loop
Pure unit tests for _SupervisorRegistry — no Chrome required. Verified
to fail when the fix is reverted, pass with it in place.
2026-04-30 20:33:33 -07:00
Roy-oss1
b94cb8e2c4 feat(feishu): operator-configurable bot admission and mention policy
Add two operator-facing toggles for inbound Feishu admission, enabling
bot-to-bot scenarios such as A2A orchestration and inter-bot
notifications:

  FEISHU_ALLOW_BOTS=none|mentions|all   (default: none)
    Accept messages from other bots. `mentions` requires the peer
    bot to @-mention Hermes; `all` admits every peer-bot message.

  FEISHU_REQUIRE_MENTION=true|false     (default: true)
    Whether group messages must @-mention the bot. Override per-chat
    via `group_rules.<chat_id>.require_mention` in config.yaml.

Defaults preserve prior behavior. Self-echo protection is always on:
when the bot's identity is unresolved (auto-detection failed and
FEISHU_BOT_OPEN_ID unset), peer-bot messages are rejected fail-closed
to avoid feedback loops.

Admitted peer bots bypass the human-user allowlist
(FEISHU_ALLOWED_USERS) to match existing Discord behavior; humans
still need an explicit allowlist entry. yaml feishu.allow_bots is
bridged to the env var so the adapter and gateway auth layer share
one source of truth.

Resolving peer-bot display names requires the
application:bot.basic_info:read scope; without it, peers still route
but appear as their open_id.

Test: tests/gateway/test_feishu_bot_admission.py covers the admission
pipeline, group-policy bot-bypass, hydration, and event-dispatch
plumbing as a parametrized matrix.

Change-Id: I363cccb578c2a5c8b8bf0f0a890c01c89909e256
2026-04-30 20:30:31 -07:00
buray
fa9fd26acb fix(gateway): re-inject topic-bound skill after /new or /reset
reset_session() creates a fresh SessionEntry with created_at == updated_at,
but get_or_create_session() bumps updated_at on the next inbound message,
causing _is_new_session in _handle_message_with_agent to evaluate False.
The topic/channel skill auto-load gate (group_topics, channel_skill_bindings)
silently skips the first message after a manual reset.

Add an is_fresh_reset flag on SessionEntry, set by reset_session() and
consumed once by the message handler. Kept distinct from was_auto_reset
because that flag also drives a 'session expired due to inactivity'
user-facing notice and a context-note prepend — both wrong for an
explicit /new or /reset.

Persisted through to_dict/from_dict so the flag survives gateway
restart between /reset and the next message.

Fixes #6508

Co-authored-by: warabe1122 <45554392+warabe1122@users.noreply.github.com>
Co-authored-by: willy-scr <187001140+willy-scr@users.noreply.github.com>
2026-04-30 20:29:19 -07:00
Jezza Hehn
7abc9ce4df fix(gateway): read /status token totals from SessionDB (#17158)
/status was reading session_entry.total_tokens from the in-memory
SessionStore (gateway/session.py), which the agent never writes to —
so the token count was always 0.

The agent already persists token deltas to the SQLite SessionDB
(run_agent.py:11497) for every platform with a session_id. Route
/status through that single source of truth instead of duplicating
token writes into a second store.

Fix:
- gateway/run.py: _handle_status_command now calls
  self._session_db.get_session(session_id) and sums the five token
  component columns (input/output/cache_read/cache_write/reasoning).
  Falls back to 0 when no SessionDB is configured or no row exists.
- Two new regression tests covering the populated-row and
  missing-row paths.

Co-authored-by: Hermes <127238744+teknium1@users.noreply.github.com>
2026-04-30 20:28:50 -07:00
Yukipukii1
bdb7edd89e fix(gateway): isolate pending native image paths by session 2026-04-30 20:26:35 -07:00
Teknium
531ac20408 fix(state): JSON-encode multimodal message content for sqlite
sqlite3 can only bind str/bytes/int/float/None to query parameters.
Multimodal message content is a list of parts (text + image_url), which
raised 'Error binding parameter 3: type list is not supported' in
append_message and replace_messages.

In the CLI/TUI this surfaced as a visible crash when users pasted
screenshots. In the gateway it was silently swallowed by a bare except
in append_to_transcript, causing multimodal turns to be lost from the
session transcript.

Fix at the DB layer: _encode_content wraps lists/dicts as
'\\x00json:' + json.dumps(...) on write, _decode_content unwraps on
read. Plain strings are untouched, so existing FTS search, previews,
and JSONL compat are unaffected. Paired decode in get_messages,
get_messages_as_conversation, and search_messages context previews.

Regression test covers: list content round-trip, dict content
round-trip, string content stored unchanged, replace_messages with
multimodal content.

Also included: aligned fix #17522 for TUI image attachment with
paths containing spaces (see previous commit).
2026-04-30 20:25:52 -07:00
Harry Riddle
cc340c4a4d fix(tui): always call input.detect_drop for reliable image attachment
Remove frontend regex pre-check that truncated paths containing spaces,
quotes, or Windows drive letters. Backend _detect_file_drop correctly
handles these patterns. This fixes image attachment for common filenames
like "Screenshot 2026-04-29.png".

Add tests:
- test_input_detect_drop_path_with_spaces: attaches image with spaces in name
- test_input_detect_drop_path_with_spaces_and_remainder: remainder handling

Also restored missing  in test_rollback_restore_resolves_number_and_file_path.

Scope: tui, vision, tests
2026-04-30 20:25:52 -07:00
Yukipukii1
25cbe3e1d6 fix(gateway): preserve thread routing for /update progress and prompts 2026-04-30 20:19:23 -07:00
Teknium
0ddc8aba68 fix(fallback): let custom_providers shadow built-in aliases
When a user defines `custom_providers: [{name: kimi, ...}]` and references
`provider: kimi` from fallback_model or the main config, the built-in alias
rewriting (`kimi` → `kimi-coding`) was hijacking the request before the
named-custom lookup ran.  `_get_named_custom_provider` also refused to
return a match when the raw name resolved to any built-in (including aliases),
so the custom endpoint was unreachable.

Fix at both layers of the resolution chain so every caller benefits, not
just `_try_activate_fallback`:

- hermes_cli/runtime_provider.py: narrow `_get_named_custom_provider`'s
  built-in-wins guard to canonical provider names only.  An alias like
  `kimi` that resolves to a different canonical (`kimi-coding`) no longer
  blocks the custom lookup; a canonical name like `nous` still does.

- agent/auxiliary_client.py: in `resolve_provider_client`, try the named-
  custom lookup with the original (pre-alias-normalization) name before the
  alias-normalized one, so aliased requests reach the user's custom entry.
  Also honour `explicit_base_url` and `explicit_api_key` in the API-key
  provider branch so callers that pass explicit hints (e.g. fallback
  activation) can override the registered defaults.

Tests added for:
- custom `kimi` shadowing built-in alias (regression for #15743)
- custom `nous` NOT shadowing canonical built-in (behaviour preserved)
- bare `kimi` without any custom entry still routing to built-in
- explicit base_url/api_key override on the API-key provider branch

Original PR #17827 by @Feranmi10 identified the same bug class and
implemented a narrower fix in `_try_activate_fallback`; this reshapes the
fix to live in the shared resolution layer so all callers benefit.

Fixes #15743
Co-authored-by: Feranmi10 <89228157+Feranmi10@users.noreply.github.com>
2026-04-30 20:18:44 -07:00
Yukipukii1
38875d00a7 fix(gateway): ensure platform configs honor home_channel env overrides 2026-04-30 20:18:33 -07:00
Teknium
5089c55e0b refactor(state): compute last_active ordering at SQL level via recursive CTE
Follow-up to the previous commit. Replace the post-fetch Python re-sort (which
required dropping LIMIT/OFFSET from SQL and scanning every session row) with a
recursive CTE that walks compression-continuation chains and computes
effective_last_active per root at SQL level. The outer query can then ORDER BY
+ LIMIT efficiently, and the Python projection loop no longer has to handle
ordering.

This preserves the correctness win (old compression roots whose live tip was
touched recently surface correctly) without the O(N) scan, which matters for
users with thousands of sessions.

Adds a regression test pinning the compression-tip case at limit=1 — the
stress case that any bounded-oversample shortcut would get wrong.

Co-authored-by: simbam99 <simbamax99@gmail.com>
2026-04-30 20:17:15 -07:00
simbam99
142b4bf3ce fix(session_search): order recent mode by last activity instead of start time
- order session_search recent-mode results by last activity instead of session start time
- add an opt-in `order_by_last_active` path to `SessionDB.list_sessions_rich`
- add regression coverage for both the database ordering and recent-mode call path
2026-04-30 20:17:15 -07:00
Teknium
96691268df fix(gateway): drain manual profile gateways via SIGUSR1 before respawn
The PR wired in a detached watcher that respawns manual profile gateways
after they exit.  Pair that with a SIGUSR1 graceful drain (same path
systemd/launchd use) so in-flight agent runs finish instead of getting
SIGTERM'd.  Fall back to SIGTERM if SIGUSR1 isn't wired or the gateway
doesn't exit within the drain budget — the watcher sees the exit and
relaunches either way.

Tested end-to-end against an orphaned gateway: graceful drain exits in
0.5s and the watcher fires the relaunch command.
2026-04-30 20:00:31 -07:00
Michael Nguyen
77fe7ab6b2 feat(gateway): restart manual profile gateways after update 2026-04-30 20:00:31 -07:00
Teknium
8b7b074df9 test(context_compressor): regression test for PR #17025 tail-protection off-by-one
When len(messages) <= protect_tail_count and a token budget is set, the
previous formula min(protect_tail_count, len(result) - 1) under-protected
the tail by one, allowing the oldest message to be summarized.

The test fails on the buggy formula (pruned == 1) and passes on the fix
(pruned == 0, tool content preserved verbatim).
2026-04-30 20:00:01 -07:00
hharry11
2997ef9446 fix(api-server): use session-scoped task IDs for tool isolation 2026-04-30 19:59:38 -07:00
johnncenae
a83d579d5b fix(telegram): enforce gateway auth for inline approval callbacks 2026-04-30 19:59:31 -07:00
johnncenae
9ae1fa9e39 fix(delegate): honor runtime default model during provider resolution 2026-04-30 19:58:55 -07:00
Stephen Schoettler
b29b709a71 fix(agent): sanitize Codex tool-call history summaries 2026-04-30 19:58:46 -07:00
johnncenae
1ef9e88549 fix(gateway): write restart markers atomically and fix Windows lock collisions 2026-04-30 19:58:16 -07:00
teknium1
447a2bba3a fix(plugins): bound async plugin command await with 30s timeout
Follow-up to #17963. The threaded branch of resolve_plugin_command_result
previously called Event.wait() with no timeout — a hung async plugin
handler would wedge the terminal indefinitely. Cap the wait at 30s and
raise TimeoutError instead. Added a regression test covering the hung
handler path.
2026-04-30 19:56:18 -07:00
hharry11
ca9a61ae38 fix(plugins): await async handlers in CLI and TUI dispatch 2026-04-30 19:56:18 -07:00
johnncenae
79cffa9232 auth: coerce tls insecure flag safely instead of using Python truthiness 2026-04-30 19:55:48 -07:00
johnncenae
2bf73fbe2c fix(cli): coerce tls insecure flag safely in auth state 2026-04-30 19:55:48 -07:00
Yukipukii1
2110a3a0c4 fix(tui): return JSON-RPC errors for invalid request shapes 2026-04-30 19:47:00 -07:00
Yukipukii1
5f3f456784 fix(approval): wake blocked gateway approvals on session cleanup 2026-04-30 19:46:27 -07:00
Yukipukii1
75483b6db1 fix(curator): preserve last_report_path in state 2026-04-30 19:45:59 -07:00
Mind-Dragon
aab5bcc6ac test(model_switch): cover private user_providers override 2026-04-30 19:44:26 -07:00
briandevans
97d6f25008 test(toolsets): include kanban in expected post-#17805 toolset assertions
The kanban PR (#17805, c86842546) added the `kanban` toolset and
`tools/kanban_tools.py`, but didn't update three pre-existing test
assertions that bake the full toolset/tool inventory:

* `tests/tools/test_registry.py::test_matches_previous_manual_builtin_tool_set`
  hard-codes the manual list of builtin tool modules. `tools.kanban_tools`
  was missing.
* `tests/test_tui_gateway_server.py::test_load_enabled_toolsets_rejects_disabled_mcp_env`
  and `test_load_enabled_toolsets_falls_back_when_tui_env_invalid` both
  expect `["memory"]` from `_load_enabled_toolsets()`. With kanban now
  auto-recovered by `_get_platform_tools` (its tools live in hermes-cli's
  universe but are not in CONFIGURABLE_TOOLSETS), the resolver returns
  `["kanban", "memory"]`.
* `tests/hermes_cli/test_tools_config.py::test_get_platform_tools_preserves_explicit_empty_selection`
  asserts `set()` for an explicit empty list. The recovery loop now also
  surfaces `kanban`. Reframed to assert the contract the test name
  describes — no CONFIGURABLE toolset gets re-enabled when the user
  explicitly saved an empty list — which stays correct as more
  non-configurable platform toolsets are added.

Verified the failures reproduce on clean origin/main (180a7036b) with
`.[all,dev]`-equivalent extras (fastapi, starlette, httpx, pytest-asyncio)
and that all four pass with this commit applied. CI on main itself is
currently red on these tests; this restores green for everyone's PRs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 19:43:03 -07:00
Chris Danis
f61695ee73 fix(signal): skip contentless envelopes (profile key updates, empty messages)
Signal-cli sends dataMessage wrappers for profile key updates and other
metadata events that have no actual text content. These were reaching the
gateway as msg='' and triggering full agent turns for nothing.

Add early return in _handle_envelope() when both message field is empty/
missing/whitespace AND there are no attachments. Messages with media
attachments but no text still flow through.

- 12 lines added to gateway/platforms/signal.py
- 5 new tests in TestSignalContentlessEnvelope class
2026-04-30 19:42:59 -07:00
Teknium
e5dad4ac57
fix(agent): propagate ContextVars to concurrent tool worker threads (#18123)
Propagates ContextVars (notably `tools.approval._approval_session_key`) into concurrent tool worker threads via `copy_context().run` — mirrors `asyncio.to_thread` semantics.

Fixes approval-card cross-session misrouting in concurrent gateway traffic. Repro'd on Slack: session A's dangerous-command approval was delivered to channel B (@syahidfrd).

Salvages #16660 — core 4-LOC fix preserved, unrelated `tests/eval_018/` scope contamination dropped. Adds 5 regression guards including an AST-level source check on the real call site.

Closes #16660.

Co-authored-by: firefly <promptsiren@gmail.com>
Co-authored-by: banditburai <banditburai@users.noreply.github.com>
2026-04-30 16:26:26 -07:00
Teknium
bbbce92651 feat(tui): render self-improvement review summaries in the transcript
The Ink TUI (\`hermes --tui\` + dashboard \`/chat\`) had no wiring for the
background self-improvement review. When the review fired and patched
a skill or saved a memory entry, the change landed but the user had
no visual indication it happened — only the CLI had a print surface
for the '💾 Self-improvement review: …' line.

Changes:

- tui_gateway/server.py: in _init_session, attach
  agent.background_review_callback to an _emit('review.summary',
  sid, {text}) closure. Wrapped in try/except so agents with locked
  attribute slots don't break session startup.
- ui-tui/src/app/createGatewayEventHandler.ts: handle 'review.summary'
  by routing ev.payload.text through sys(…), matching the existing
  'background.complete' pattern. Empty / whitespace payloads are
  ignored so the transcript never gets a blank system line.
- ui-tui/src/gatewayTypes.ts: extend the GatewayEvent discriminated
  union with { type: 'review.summary', payload?: { text?: string } }.

Gateway platforms (Telegram, Discord, Slack, …) already route the
review summary via background_review_callback → post-delivery queue
in gateway/run.py, so they pick up the new 'Self-improvement review:'
prefix from the companion run_agent change with no platform edits.

Tests:
- tests/tui_gateway/test_review_summary_callback.py (Python, 2 tests):
  _init_session attaches a callback that emits the right event; the
  callback path survives agents that can't accept the attribute.
- ui-tui/src/__tests__/createGatewayEventHandler.test.ts (vitest, 2
  new cases): review.summary events feed sys(...) with the full text;
  empty / missing payloads are no-ops.
- TypeScript type-check passes.
- tui_gateway suite: 64/64 pass.
2026-04-30 14:07:22 -07:00
Teknium
80a676658c fix(cli): surface self-improvement review summaries from bg thread
When the self-improvement background review fires after a turn, it runs
in a bg thread and emits a '  💾 <summary>' line to announce what it
saved to memory or skills. Two problems made this invisible to users
even when the review successfully modified a skill:

1. The print went through `_cprint` (prompt_toolkit's print_formatted_text)
   on a bg thread while the CLI's PromptSession was live. Direct
   print_formatted_text races with the input-area redraw and the line
   can land behind/above the prompt, scrolled off without the user
   seeing it.

2. The message said only '💾 Skill created.' / '💾 Memory updated'
   with no indication that the self-improvement loop was the one doing
   this. Users who did catch the line couldn't tell the background
   review from some other agent action.

Fixes:

- `_cprint` now detects when it's called from a non-app thread with a
  running prompt_toolkit Application, and routes through
  `run_in_terminal` via `loop.call_soon_threadsafe`. That pauses the
  input, prints the line above the prompt, and redraws — the normal
  prompt_toolkit contract for bg-thread output. Direct-print fallback
  preserved for the no-app / same-thread / import-error paths. Affects
  every bg-thread emission, not just the review summary (curator
  summaries and auxiliary failure prints benefit too).

- The summary now reads '  💾 Self-improvement review: <summary>' in
  both the CLI and the gateway `background_review_callback` path, so
  the origin is unambiguous.

Tests:
- New `tests/cli/test_cprint_bg_thread.py` covers all five routing
  branches (no app, app-not-running, cross-thread schedule, same-thread
  direct, app-loop-attribute-error, import-error).
- New case in `tests/run_agent/test_background_review.py` asserts the
  attributed prefix shows up in both `_safe_print` and
  `background_review_callback`.

Live E2E: exercised _cprint from a bg thread inside a real Application
event loop; confirmed get_app_or_none() sees the app, call_soon_threadsafe
schedules run_in_terminal, and the inner _pt_print runs.
2026-04-30 14:07:22 -07:00
Teknium
c868425467
feat(kanban): durable multi-profile collaboration board (#17805)
Salvage of PR #16100 onto current main (after emozilla's #17514 fix
that unblocks plugin Pydantic body validation). History preserved on
the standing `feat/kanban-standing` branch; this squashes the 22
iterative commits into one clean landing.

What this lands:
- SQLite kernel (hermes_cli/kanban_db.py) — durable task board with
  tasks, task_links, task_runs, task_comments, task_events,
  kanban_notify_subs tables. WAL mode, atomic claim via CAS,
  tenant-namespaced, skills JSON array per task, max-runtime timeouts,
  worker heartbeats, idempotency keys, circuit breaker on repeated
  spawn failures, crash detection via /proc/<pid>/status, run history
  preserved across attempts.
- Dispatcher — runs inside the gateway by default
  (`kanban.dispatch_in_gateway: true`). Ticks every 60s, reclaims
  stale claims, promotes ready tasks, spawns `hermes -p <assignee>
  chat -q "work kanban task <id>"` with HERMES_KANBAN_TASK +
  HERMES_KANBAN_WORKSPACE env. Auto-loads `--skills kanban-worker`
  plus any per-task skills. Health telemetry warns on stuck ready
  queue.
- Structured tool surface (tools/kanban_tools.py) — 7 tools
  (kanban_show, kanban_complete, kanban_block, kanban_heartbeat,
  kanban_comment, kanban_create, kanban_link). Gated on
  HERMES_KANBAN_TASK via check_fn so zero schema footprint in normal
  sessions.
- System-prompt guidance (agent/prompt_builder.py KANBAN_GUIDANCE)
  injected only when kanban tools are active.
- Dashboard plugin (plugins/kanban/dashboard/) — Linear-style board
  UI: triage/todo/ready/running/blocked/done columns, drag-drop,
  inline create, task drawer with markdown, comments, run history,
  dependency editor, bulk ops, lanes-by-profile grouping, WS-driven
  live refresh. Matches active dashboard theme via CSS variables.
- CLI — `hermes kanban init|create|list|show|assign|link|unlink|
  claim|comment|complete|block|unblock|archive|tail|dispatch|context|
  init|gc|watch|stats|notify|log|heartbeat|runs|assignees` +
  `/kanban` slash in-session.
- Worker + orchestrator skills (skills/devops/kanban-worker +
  kanban-orchestrator) — pattern library for good summary/metadata
  shapes, retry diagnostics, block-reason examples, fan-out patterns.
- Per-task force-loaded skills — `--skill <name>` (repeatable),
  stored as JSON, threaded through to dispatcher argv as one
  `--skills X` pair per skill alongside the built-in kanban-worker.
  Dashboard + CLI + tool parity.
- Deprecation of standalone `hermes kanban daemon` — stub exits 2
  with migration guidance; `--force` escape hatch for headless hosts.
- Docs (website/docs/user-guide/features/kanban.md + kanban-tutorial.md)
  with 11 dashboard screenshots walking through four user stories
  (Solo Dev, Fleet Farming, Role Pipeline, Circuit Breaker).
- Tests (251 passing): kernel schema + migration + CAS atomicity,
  dispatcher logic, circuit breaker, crash detection, max-runtime
  timeouts, claim lifecycle, tenant isolation, idempotency keys, per-
  task skills round-trip + validation + dispatcher argv, tool surface
  (7 tools × round-trip + error paths), dashboard REST (CRUD + bulk
  + links + warnings), gateway-embedded dispatcher (config gate, env
  override, graceful shutdown), CLI deprecation stub, migration from
  legacy schemas.

Gateway integration:
- GatewayRunner._kanban_dispatcher_watcher — new asyncio background
  task, symmetric with _kanban_notifier_watcher. Runs dispatch_once
  via asyncio.to_thread so SQLite WAL never blocks the loop. Sleeps
  in 1s slices for snappy shutdown. Respects HERMES_KANBAN_DISPATCH_IN_GATEWAY=0
  env override for debugging.
- Config: new `kanban` section in DEFAULT_CONFIG with
  `dispatch_in_gateway: true` (default) + `dispatch_interval_seconds: 60`.
  Additive — no \_config_version bump needed.

Forward-compat:
- workflow_template_id / current_step_key columns on tasks (v1 writes
  NULL; v2 will use them for routing).
- task_runs holds claim machinery (claim_lock, claim_expires,
  worker_pid, last_heartbeat_at) so multi-attempt history is first-
  class from day one.

Closes #16102.

Co-authored-by: emozilla <emozilla@nousresearch.com>
2026-04-30 13:36:47 -07:00
Austin Pickett
6bc5d72271
Merge pull request #16419 from vincez-hms-coder/feat/dashboard-profiles-hms-coder
feat(dashboard): add profiles management page
2026-04-30 12:09:23 -07:00
ethernet
b737af8226
Merge pull request #18047 from stephenschoettler/fix/acp-persist-user-message-test-mocks
test(acp): accept prompt persistence kwargs in MCP E2E mocks
2026-04-30 14:43:26 -04:00
Teknium
76edc40ab0 fix(agent): extend thinking-mode reasoning_content pad to Kimi/Moonshot
Builds on #16855 (@lsdsjy) which fixed DeepSeek v4 reasoning_content
replay via model_extra fallback + capturing tool_calls at method entry.
Kimi / Moonshot thinking mode enforces the same echo-back contract and
hits the same 400 when a tool-call turn is persisted without
reasoning_content.

- _build_assistant_message: pad branch now uses _needs_thinking_reasoning_pad()
  (DeepSeek OR Kimi) instead of _needs_deepseek_tool_reasoning() alone.
- Extract _needs_thinking_reasoning_pad() and reuse it in
  _copy_reasoning_content_for_api so both sites share one predicate.
- tests/run_agent/test_deepseek_reasoning_content_echo.py: add
  TestBuildAssistantMessagePadsStrictProviders parametrized over DeepSeek
  (attr=None, attr-absent), Kimi (attr=None), Moonshot (via base_url),
  and an OpenRouter negative control that must NOT pad. Proven to fail
  2/5 cases on Kimi/Moonshot without this change.
- scripts/release.py: add AUTHOR_MAP entries for lsdsjy and season179.

Refs #17400.

Co-authored-by: season179 <season.saw@gmail.com>
2026-04-30 11:18:39 -07:00
lsdsjy
b9b9ee3e6c fix(deepseek): preserve v4 reasoning_content on replay 2026-04-30 11:18:39 -07:00
Stephen Schoettler
699a9c11a9 test(acp): accept prompt persistence kwargs in mocks 2026-04-30 10:47:23 -07:00
Teknium
d60a9917d3
feat(curator): show most-used and least-used skills in hermes curator status (#18033)
Alongside the existing 'least recently used' section, surface two more
rankings so users can see which of their agent-created skills actually
get exercised:

- 'most used (top 5)' — sorted by use_count descending. Hidden when every
  skill has use_count=0 (noise suppression on fresh installs).
- 'least used (top 5)' — sorted by use_count ascending. Always shown
  when the catalog is non-empty.

use_count started tracking real agent skill activation in PR #17932
(bump_use wired into skill_view tool + slash invocation + --skill
preload), so these rankings are now meaningful.

Tests: 3 new in tests/hermes_cli/test_curator_status.py — happy path
with mixed use_counts, zero-use suppression of the most-used section,
and the no-skills clean-empty case.
2026-04-30 10:37:33 -07:00
y0shualee
f4b76fa272 fix: use skill activity in curator status
Treat skill views and edits as activity when curator reports and applies lifecycle transitions, so recently loaded or patched skills are not displayed or transitioned as never used.\n\nAdds regression tests for activity derivation, automatic transitions, and CLI status output.
2026-04-30 10:31:47 -07:00
0xDevNinja
564a649e6a fix(curator): scan nested archive subdirs in restore_skill
restore_skill() in tools/skill_usage.py used archive_root.iterdir(), which
only walked the top level of .archive/. Skills archived under nested layouts
(e.g. .archive/openclaw-imports/<skill>/ from older archive paths or
external imports) were invisible to both the exact-match and prefix-match
candidate scans, surfacing as a misleading "skill '<name>' not found in
archive" error even though the directory existed on disk.

Switch both candidate scans to archive_root.rglob('*') so the lookup
descends into category subdirectories.

Fixes #17942
2026-04-30 10:31:44 -07:00
Teknium
8b290a5908
feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941)
* fix(curator): split 'archived' into consolidated vs pruned in run reports

Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).

Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
  (evidenced by a skill_manage write_file/patch/create/edit whose target
  is a different skill AND whose file_path/content references the
  removed skill's name)
- pruned: archived without consolidation evidence (truly stale)

REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives

run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat

Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.

Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.

E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.

* feat(curator): hybrid model-declared + heuristic classification

Extend the consolidated-vs-pruned split with LLM-authored intent:

1. Curator prompt now requires a structured YAML block at the end of the
   final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
   malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
   heuristic:
   - Model wins on rationale when its umbrella exists post-run
   - Model hallucination (umbrella doesn't exist) is downgraded to the
     heuristic's finding, or pruned if there's no evidence either
   - Heuristic catches model omission — consolidations the model
     enumerated tools for but forgot to list get surfaced with a
     '(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
   and flags audit-only rows so the user knows why no reason is shown.

Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.

Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
2026-04-30 10:31:23 -07:00
Henkey
cdf9793d6d fix(acp): advertise and forward image prompts 2026-04-30 10:31:16 -07:00
Brooklyn Nicholson
b9d9fa7df8 fix(tui): respect max turns config
Co-authored-by: YuShu <24110240104@m.fudan.edu.cn>
2026-04-30 12:26:57 -05:00
ethernet
d499d17271
Merge pull request #17969 from stephenschoettler/fix/current-main-test-regressions
fix(ci): stabilize current main test regressions
2026-04-30 13:23:38 -04:00
brooklyn!
285e9efb3f
Merge pull request #17701 from NousResearch/bb/mouse-mode-self-heal
fix(cli): recover leaked mouse tracking terminal state
2026-04-30 10:09:39 -07:00
Stephen Schoettler
407dfbb021 fix(ci): stabilize current main test regressions 2026-04-30 06:36:50 -07:00
Teknium
4c792865b4 test(gateway): pin cleanup invariants for #17758 in-band drain hand-off
Belt-and-suspenders on top of @briandevans' #17758 fix.  The in-band
drain hand-off (await->create_task + session-guard preservation)
changed cleanup semantics in three places that the original PR
reasoned about but didn't test directly.  Pin each invariant so a
future refactor can't silently regress them:

1. Normal single-message path still releases _active_sessions[sk] and
   _session_tasks[sk] through end-of-finally.  The #17758 follow-up
   moved _release_session_guard under
     if current_task is self._session_tasks.get(session_key)
   For the 99%-common case current_task IS the stored task, so the
   guard must still fire.  Test would fail if the conditional were
   ever tightened in a way that dropped the normal path.

2. Drain-task cancellation releases the session.  If the drain task
   spawned by the in-band hand-off is cancelled mid-handler (e.g.
   /stop fired while draining a follow-up), its own finally must
   fire _release_session_guard.  Without this a cancel would leave
   the session permanently pinned busy.

3. Late-arrival drain still spawns when no in-band drain preceded
   it.  Pre-existing path, but the #17758 follow-up added a
   re-queue branch that only fires when ownership was already
   handed off.  When no handoff happened the else branch must still
   spawn a fresh drain task — otherwise a message arriving during
   stop_typing gets silently dropped.

All three tests pass against current main.  Zero production code
changes.
2026-04-30 05:00:25 -07:00
konsisumer
d1d0ef6dbd fix(gateway): persist user message on transient agent failures (#7100)
The #1630 fix introduced a blanket ``agent_failed_early`` transcript skip
to prevent context-overflow sessions from looping.  That guard also
triggers for unrelated transient failures (429 rate limits, read
timeouts, connection resets, provider 5xx) which have nothing to do with
session size — and it silently drops the user's message, so the agent
has no memory of the last turn on retry.

Split the failure classification in ``GatewayRunner._run_agent``:

* Context-overflow (``compression_exhausted`` flag, explicit
  context-length phrases, or generic 400 with a long history) → keep
  the existing skip, preserving the #1630/#9893 fix.
* Anything else that failed → persist just the user message so the
  conversation survives a retry.

Use specific multi-word phrases (``context length``, ``token limit``,
``prompt is too long``, etc.) to match ``run_agent.py``'s own
classifier; bare ``exceed`` false-positively flagged "rate limit
exceeded" as context overflow.

Covered by new tests in ``tests/gateway/test_7100_transient_failure_transcript.py``
and the existing #1630 suite still passes.
2026-04-30 04:32:33 -07:00
Teknium
87f5e1a25a test(ssh): update tar pipe assertion for --no-overwrite-dir
Existing test_tar_pipe_commands asserted the literal substring
'tar xf - -C /' in ssh_str, which is no longer present after the
#17767 fix adds --no-overwrite-dir between 'tar xf -' and '-C /'.

Split the one substring check into three independent assertions for
the tar stdin mode, the new --no-overwrite-dir flag (regression guard
for #17767), and the extract target.
2026-04-30 04:32:28 -07:00
Teknium
b50bc13ef9 fix(config): preserve YAML lists in hermes config set (#17876)
_set_nested unconditionally replaced any non-dict value with an empty
dict when walking the dotted path, which silently destroyed list-typed
config nodes the moment someone set a value with a numeric index
(e.g. 'hermes config set custom_providers.0.api_key NEW'). Any sibling
entries and any fields inside the targeted entry that the user didn't
write were lost.

Fix:
- _set_nested now detects list nodes and navigates by numeric index,
  and preserves both dicts AND lists at intermediate positions (scalars
  are still replaced so bare-scalar -> nested overrides keep working).
- set_config_value drops its duplicated navigation logic and calls
  _set_nested instead -- single source of truth for the rules.

Regression tests (tests/hermes_cli/test_set_config_value.py):
- test_indexed_set_preserves_sibling_list_entries -- exact #17876 repro
- test_indexed_set_preserves_non_targeted_fields -- inner-dict fields survive
- test_deeper_nesting_through_list -- dict -> list -> dict -> scalar path

35/35 existing + new tests pass.

E2E-verified with the issue's repro against a real on-disk config.yaml --
list stays a list, entry 0 updated, entry 1 intact.

Closes #17876
2026-04-30 04:32:17 -07:00
Teknium
3fc4c63d38 test(model_switch): update regression to reflect bare-custom guard 2026-04-30 04:32:11 -07:00
Sanjays2402
e0fa2cf972 fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335)
Long-lived Gateway processes were sending duplicate tool names to
providers that enforce uniqueness:

  - DeepSeek:        'Tool names must be unique.'
  - Xiaomi MiMo:     'tools contains duplicate names: lcm_expand'
  - Moonshot/Kimi:   'function name lcm_grep is duplicated'

TUI was unaffected because TUI runs with quiet_mode=False and skips the
cache entirely.

Root cause (two layered bugs)
- model_tools.get_tool_definitions(quiet_mode=True) memoizes its result
  in _tool_defs_cache. The cache-hit path returned list(cached) (safe),
  but the FIRST uncached call stored and returned the SAME object.
  run_agent.py mutates self.tools (memory + LCM context-engine schemas)
  in-place, so the very first agent init in a Gateway process
  poisoned the cache, and every subsequent init appended LCM schemas
  again on top of the already-polluted list.
- run_agent.py's context-engine injection (lcm_grep / lcm_describe /
  lcm_expand) had no dedup, unlike the memory-tools injection right
  above it which already skips already-present names.

Fix (defense in depth, per the issue's suggested fix)
- model_tools.get_tool_definitions: on the uncached branch, cache the
  computed list but return list(result) to the caller. Same pattern as
  the cache-hit path.
- run_agent.py: build _existing_tool_names from self.tools and skip
  schemas whose names are already present, mirroring the memory-tools
  block. This also defends against plugin paths that may register the
  same schemas via ctx.register_tool().

Tests (tests/test_get_tool_definitions_cache_isolation.py)
- test_first_uncached_call_returns_fresh_list \u2014 pins the fix; without
  it, first-call alias caused all the symptoms.
- test_cache_hit_returns_fresh_list \u2014 pre-existing behavior stays.
- test_caller_mutation_does_not_poison_cache \u2014 simulates run_agent
  appending lcm_grep / lcm_expand to the returned list and asserts the
  next call doesn't see them.
- test_repeated_caller_mutation_does_not_accumulate \u2014 reproduces the
  long-lived Gateway accumulation pattern across 5 agent inits.
- test_non_quiet_mode_does_not_use_cache \u2014 sanity, explains why TUI
  was fine.

5/5 pass on the new file; 23/23 still pass on tests/test_model_tools.py.
2026-04-30 04:32:06 -07:00
Rob Moen
0dd373ec43 fix(context): honor model.context_length for Ollama num_ctx and all display paths
When a user sets model.context_length in config.yaml, the value was only
used for Hermes' internal compression decisions (context_compressor) but
NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF
metadata (often 256K+) and allocates that much VRAM regardless of the
user's config — causing OOM on smaller GPUs like the P100 (16GB).

Root cause: two separate context values existed independently:
  - context_compressor.context_length = config value (e.g. 65536) ✓
  - _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config

Changes:

1. Cap Ollama num_ctx to config context_length (run_agent.py)
   When model.context_length is explicitly set and no explicit
   ollama_num_ctx override exists, cap the auto-detected GGUF value
   to the user's context_length. This is the core fix — it prevents
   Ollama from allocating more VRAM than the user budgeted.

2. Pass config_context_length through all secondary call sites
   Several paths called get_model_context_length() without the config
   override, falling through to the 256K default fallback:
   - cli.py: @-reference expansion and /model switch display
   - gateway/run.py: @-reference expansion and /model switch display
   - tui_gateway/server.py: @-reference expansion
   - hermes_cli/model_switch.py: resolve_display_context_length()

3. Normalize root-level context_length in config (hermes_cli/config.py)
   _normalize_root_model_keys() now migrates root-level context_length
   into the model section, matching existing behavior for provider and
   base_url. Users who wrote `context_length: 65536` at the YAML root
   instead of under `model:` had it silently ignored.

4. Fix misleading comments (agent/model_metadata.py)
   DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K
   as two comments stated.

Tests: 3 new tests for root-level context_length normalization.
All existing context_length tests pass (96 tests).
2026-04-30 04:31:23 -07:00
Bartok9
fbb3775770 fix(gateway): enforce auth check in busy-session path to prevent unauthorized injection (#17775)
The busy-session handler (_handle_active_session_busy_message) bypassed the
authorization gate that the cold path enforces via _is_user_authorized(). In
shared-thread contexts (Slack threads, Telegram forum topics, Discord threads)
where thread_sessions_per_user=False (the default), all participants share one
session_key. An unauthorized user posting in the same thread as an authorized
user would hit the active-session branch, skip the auth check, and have their
text merged into _pending_messages or injected via agent.interrupt().

This commit adds the same _is_user_authorized() check at the top of the busy
handler, before any message queuing, steering, or interrupt logic. Unauthorized
messages are silently dropped (return True) with a warning log — matching the
cold-path behavior.

Affected platforms: Slack, Telegram, Discord, any adapter with shared-session
thread contexts.

Closes #17775
2026-04-30 04:29:15 -07:00
briandevans
cc5b9fb581 fix(transport): omit thinking_config for Gemma on the gemini provider (#17426)
The `gemini` provider also serves Gemma (e.g. `gemma-4-31b-it`) and
historically other Google models like PaLM. Those reject
`extra_body.thinking_config` with HTTP 400:

    Unknown name "thinking_config": Cannot find field

`_build_gemini_thinking_config()` was unconditionally producing a
config dict for any model on the `gemini` / `google-gemini-cli`
provider, which `ChatCompletionsTransport.build_kwargs` then dropped
into `extra_body["thinking_config"]`. The result: every chat turn for
Gemma users on the gemini provider blew up at the API edge.

The fix is the same shape Hermes already uses for the Gemini-2.5 vs
Gemini-3 family clamping: normalise the model id, strip an
`OpenRouter`-style `google/` prefix, and short-circuit early when the
result doesn't start with `gemini`. We return `None` rather than
`{"includeThoughts": False}`, because the API rejects the field name
itself — even the polite "off" form trips the same 400.

Three regression tests cover Gemma with reasoning enabled, Gemma with
reasoning disabled, and the `google/gemma-…` OpenRouter-style id; the
existing Gemini-2.5 / Gemini-3 / `google/gemini-…` cases keep passing
because the Gemini guard fires after the prefix strip.

Fixes #17426

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 04:29:04 -07:00
Teknium
3de8e21683 feat(gateway): native send_multiple_images for Telegram, Discord, Slack, Mattermost, Email
Ports PR #17888's send_multiple_images ABC to every gateway platform that
has a native multi-attachment API, so images arrive as a single bundled
message instead of N separate ones.

Native overrides:
- Telegram: send_media_group (10 photos per album, chunks over); animated
  GIFs peeled off and routed through send_animation (albums don't support
  animations)
- Discord: channel.send(files=[...]) (10 attachments per message, chunks
  over); URL images downloaded into BytesIO so they render inline; forum
  channels use create_thread with files=[...]
- Slack: files_upload_v2(file_uploads=[...]) (10 per call, chunks over);
  respects thread_ts; records thread participation
- Mattermost: single post with file_ids list (5 per post — Mattermost cap,
  chunks over)
- Email: single SMTP message with multiple MIME attachments (no chunk cap,
  SMTP size governs); remote URLs remain linked in body (parity with
  existing send_image)

All platforms fall back to the base per-image loop on any failure, so a
single bad image in a batch never loses the rest.

Matrix, WhatsApp, and single-attachment platforms (BlueBubbles, Feishu,
WeCom, WeChat, DingTalk) continue to use the base default loop — their
server APIs only accept one attachment per message anyway.

Tests: adds tests/gateway/test_send_multiple_images.py with 19 targeted
tests covering base default loop, chunking, animation peel-off, fallback
paths, and empty-batch no-ops across all five new overrides.

Co-authored-by: Maxence Groine <maxence@groine.fr>
2026-04-30 04:28:08 -07:00
Maxence Groine
04ea895ffb feat(gateway/signal): add support for multiple images sending
Adds a new `send_multiple_images` method to the ``BasePlatformAdapter``
that implements the default "One image per message" loop and allows for
platform-specific overriding.

Implements such an override for the Signal adapter, batching images
and trying (best-effort) to work around rate-limits for voluminous
batches using a specific scheduler.

Also implements batching + rate-limit handling in the `send_message`
tool.

New tests added for the Signal adapter, its rate-limit scheduler and the
`send_message` tool
2026-04-30 04:28:08 -07:00
VinceZ-Hms-Coder
ca7f46beb5 Merge upstream/main and address Copilot review feedback
Merge resolved conflicts in web/src/{i18n/{en,zh,types}.ts,lib/api.ts}
by keeping both this branch's `profiles` additions and upstream's new
`models` page additions.

Copilot review feedback:
- Implement POST /api/profiles/{name}/open-terminal endpoint (already
  present); align Windows branch to `cmd.exe /c start "" <cmd>` so it
  matches the new test and spawns a fresh window instead of /k reusing
  the parent console.
- Move backslash escaping out of the macOS AppleScript f-string
  expression (Python <3.12 disallows backslashes inside f-string
  expression parts).
- Patch `_get_wrapper_dir` via monkeypatch in
  test_profiles_create_creates_wrapper_alias_when_safe so the test no
  longer writes to the real `~/.local/bin`.
- Extend test_dashboard_browser_safe_imports to scan `.ts` files in
  addition to `.tsx`.
- Switch upstream's new ModelsPage.tsx away from the `@nous-research/ui`
  root barrel onto per-component subpaths to satisfy the stricter scan.
- Fix NouiTypography `leading-1.4` -> `leading-[1.4]` so Tailwind
  actually emits the line-height for the `sm` variant.
- Guard ProfilesPage.openSoulEditor against out-of-order responses by
  tracking the latest requested profile via a ref.
- Replace ProfilesPage's hand-rolled setup command with a fetch to
  `/api/profiles/{name}/setup-command` so the copied command always
  matches what the backend would actually run (handles wrapper-alias
  collisions and reserved names correctly).
- Wire SOUL.md textarea label `htmlFor` -> textarea `id` so screen
  readers and clicking the label work as expected.
2026-04-30 06:43:22 -04:00
Heltman
19f9be1dff fix(tools): serialize concurrent hermes_tools RPC calls from execute_code
The sandbox-side `_call()` in both the UDS and file-based transports was
not thread-safe, so scripts that call tools from multiple threads (e.g.
`ThreadPoolExecutor` over `terminal()`) inside a single `execute_code`
run could silently receive each other's responses.

Root cause:

* UDS transport — a single module-level `_sock` was shared across all
  threads; the newline-framed protocol has no request-id; and the
  server-side RPC loop handles one connection serially. With concurrent
  callers, each thread would `sendall()` then race to `recv()` the next
  newline-terminated response from the shared buffer, so responses got
  delivered to the wrong caller.

* File transport — `_seq += 1` is a non-atomic read-modify-write, so
  two threads could allocate the same sequence number and clobber each
  other's request/response files.

Fix: guard `_call()` with a `threading.Lock` in the UDS case (covering
send+recv), and guard `_seq` allocation with a lock in the file case.
No protocol change.

Regression tests cover both the generated-source level (lock is present
and used) and an end-to-end concurrency test: running a sandboxed
ThreadPoolExecutor of 10 `terminal()` calls against a slow mock
dispatcher, asserting every caller sees its own tagged response. The
test fails without the fix (10/10 mismatched, matching real-world
repro) and passes with it.
2026-04-30 03:31:16 -07:00
Rylen Anil
3858f9419e fix: handle gateway Ctrl+C shutdown cleanly 2026-04-30 03:29:57 -07:00
Sebastian B
362996e269 fix(runtime_provider): _get_named_custom_provider must honour transport field on v12+ providers dict
The v11→v12 migrate_config step writes the API mode for every entry
under the new transport: field (per the v12+ schema in
_normalize_custom_provider_entry).  _get_named_custom_provider
read the legacy api_mode: spelling only, so for every migrated
config the lookup returned None for the api mode.

Downstream, _resolve_named_custom_runtime then falls back through
custom_provider.get("api_mode") or _detect_api_mode_for_url(base_url)
or "chat_completions".  For loopback URLs (proxies, local servers)
or unknown hostnames, the URL detector returns None and the resolver
silently downgrades the configured codex_responses /
anthropic_messages transport to chat_completions.  Requests
get sent to /v1/chat/completions instead of /v1/responses or
/v1/messages and the provider 404s — or worse, returns a usable
chat_completions response while skipping the model's reasoning /
caching surface.

Fix: read both field names — entry.get("api_mode") or
entry.get("transport") — at the two match-by-key + match-by-name
branches in _get_named_custom_provider.  The runtime normaliser
_normalize_custom_provider_entry already accepts both spellings;
this lifts the same compat into the direct-dict reader so v12+
configs work without going through the shim.

Adds three regression tests under
tests/hermes_cli/test_user_providers_model_switch.py:
- transport field is read on the match-by-key branch
- legacy api_mode spelling still works for hand-edited configs
- transport is read on the match-by-display-name branch
2026-04-30 03:29:48 -07:00
briandevans
f54935738c fix(cron): surface agent run_conversation failure flags as job failure
run_job() ignored the result's `failed=True` / `completed=False` flags
that agent.run_conversation populates on API exhaustion, mid-run
interrupts, and model aborts. Because final_response on those paths is
often a non-empty error string ("API call failed after 3 retries:
Request timed out."), the existing empty-response soft-fail in
_process_job did not trip either: the error text was delivered as if it
were the agent's reply and last_status was set to "ok" with no error
notification. Detect those flags right after the dict-shape guard and
raise so the existing except handler builds the proper failure tuple,
preserving the agent's error message via result["error"].

Adds a parametrized regression covering: API-retry-exhausted with error
text in final_response, completed=False with no final_response,
completed=False without an explicit failed flag, and the partial-reply
plus failed=True case. Plus a guard that a normal completed=True success
result is still treated as success.

Fixes #17855

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 03:27:37 -07:00
briandevans
f44f1f9615 fix(gateway): preserve session guard across in-band drain handoff
When the in-band pending-message drain spawns a fresh task and
transfers ownership via _session_tasks[session_key] = drain_task,
the original task still unwinds through the finally block.  The
drain task picks up the same interrupt_event in its own
_process_message_background entry, so an unconditional
_release_session_guard(session_key, guard=interrupt_event) at the
end of the finally matches and deletes _active_sessions[session_key]
while the drain task is still pending its first await.

A concurrent inbound message arriving in that handoff window passes
the Level-1 guard (no entry exists) and spawns a second
_process_message_background for the same session — two agents on
one session_key, duplicate responses, duplicate tool calls.

Fix: only call _release_session_guard when the current task still
owns _session_tasks[session_key].  When ownership has been
transferred to a drain task, leave _active_sessions populated; the
drain task's own lifecycle releases it.  This mirrors the
late-arrival drain path in the same finally block, which already
leaves both entries alone after handing off.

Also reorder stdlib imports in the new regression test file to
match the gateway test convention (stdlib before third-party).

Regression test: capture _active_sessions[sk] identity at every
handler entry across a 2-step in-band drain chain and assert the
guard Event identity stays the same.  Pre-fix, the original task's
finally deletes the entry, the drain task falls through to the
`or asyncio.Event()` branch, and a fresh Event is installed —
identity diverges.  Post-fix, the entry is preserved and the drain
task reuses the original Event.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 03:27:08 -07:00
briandevans
663ba9a58f fix(gateway): drain pending messages via fresh task, not recursion (#17758)
`_process_message_background` finished a turn, found a queued
follow-up, and drained it via `await
self._process_message_background(pending_event, session_key)`.  Each
chained follow-up added a frame to the call stack instead of starting
fresh.  Under sustained pending-queue activity (e.g. a user sending
follow-ups faster than the agent finishes turns) the C stack would
exhaust at ~2000 nested frames and SIGSEGV the process.

Mirror the late-arrival drain pattern that already exists in the same
function: spawn a new `asyncio.create_task(...)` for the pending event
and return so the current frame can unwind.  The new task takes
ownership via `_session_tasks[session_key]`.

The late-arrival drain in `finally` could now race with the in-band
drain across the `await typing_task` / `await stop_typing` window, so
add a guard: if `_session_tasks[session_key]` is no longer the current
task, an in-band drain already spawned a follow-up task — re-queue the
late-arrival event so that task picks it up after its current event,
instead of spawning a second concurrent task for the same session_key.

Regression test (`test_pending_drain_no_recursion.py`) chains 12
follow-ups and asserts the recorded
`_process_message_background` stack depth stays bounded at handler
entry.  Pre-fix: depths grow linearly `[1,2,3,…,12]`.  Post-fix: all
depths are `1`.

`test_duplicate_reply_suppression::test_stale_response_suppressed_when_interrupted`
called `_process_message_background` directly and implicitly relied on
the old recursive `await` semantic — updated to wait for the spawned
drain task before checking the sent list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 03:27:08 -07:00
Teknium
8d302e37a8
feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885)
Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the
Home Assistant project that supports 44 languages with zero API keys.
Adds it as a native built-in provider alongside edge/neutts/kittentts,
installable via 'hermes tools' with one keystroke.

What ships:

- New 'piper' built-in provider in tools/tts_tool.py
  - Lazy import via _import_piper()
  - Module-level voice cache keyed on (model_path, use_cuda) so switching
    voices doesn't invalidate older cached voices
  - _resolve_piper_voice_path() accepts either an absolute .onnx path or a
    voice name (auto-downloaded on first use via 'python -m
    piper.download_voices --download-dir <cache>')
  - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via
    get_hermes_dir)
  - Optional SynthesisConfig knobs: length_scale, noise_scale,
    noise_w_scale, volume, normalize_audio, use_cuda — passed through
    only when configured, so older piper-tts versions aren't broken
  - WAV output then ffmpeg conversion path (same as neutts/kittentts) so
    Telegram voice bubbles work when ffmpeg is present
  - Piper added to BUILTIN_TTS_PROVIDERS so a user's
    tts.providers.piper.command cannot shadow the native provider
    (regression test included)

- 'hermes tools' wizard entry
  - Piper appears under Voice and TTS as local free, with
    'pip install piper-tts' auto-install via post_setup handler
  - Prints voice-catalog URL and default-voice info after install

- config.yaml defaults
  - tts.piper.voice defaults to en_US-lessac-medium
  - Commented advanced knobs for discoverability

- Docs
  - New 'Piper (local, 44 languages)' section in features/tts.md
    explaining install path, voice switching, pre-downloaded voices,
    and advanced knobs
  - Piper listed in the ten-provider table and ffmpeg table
  - Custom-command-providers section updated to drop the Piper example
    (now native) and add a piper-custom example for users with their own
    trained .onnx models
  - overview.md bumps provider count to ten

- Tests (tests/tools/test_tts_piper.py, 16 tests)
  - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH)
  - _resolve_piper_voice_path across every branch: direct .onnx path,
    cached voice name, fresh download with correct CLI args, download
    failure, successful-exit-but-missing-files, empty voice to default
  - _generate_piper_tts: loads voice once, reuses cache, voice-name
    download wiring, advanced knobs flow through SynthesisConfig
  - text_to_speech_tool end-to-end dispatch and missing-package error
  - check_tts_requirements: piper availability toggles the return value
  - Regression guard: piper cannot be shadowed by a command provider
    with the same name
  - Pre-existing test_tts_mistral test broadened to mock the new
    piper/kittentts/command-provider checks (otherwise it false-passes
    when piper is installed in the test venv)

E2E verification (live):

Actual pip install piper-tts, config piper + en_US-lessac-low,
text_to_speech_tool call, voice auto-downloaded from HuggingFace,
WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the
cache (~60ms). Cache dir populated with .onnx and .onnx.json.

This caught a real bug during development: the first pass used '-d' as
the download-dir flag; the actual piper.download_voices CLI wants
'--download-dir'. Fixed before PR opened.
2026-04-30 02:53:20 -07:00
Teknium
2662bfb756
fix(tests): make test_update_stale_dashboard immune to hermes_cli.main reload (#17881)
Six tests in this file failed in CI (-n auto) after #17832 landed because
other tests on the same xdist worker reload hermes_cli.main:

  tests/hermes_cli/test_env_loader.py:85-86
    sys.modules.pop('hermes_cli.main', None)
    importlib.import_module('hermes_cli.main')

  tests/hermes_cli/test_skills_subparser.py:24-25
    del sys.modules['hermes_cli.main']

When either ran first on a worker, our top-of-file
'from hermes_cli.main import _kill_stale_dashboard_processes' captured a
stale function object whose __globals__ points at the old module dict.
patch('hermes_cli.main._find_stale_dashboard_pids', ...) then patched the
new module, but the stale function resolved the dependency via its stale
__globals__, so every patch became a no-op: pids=[] → early return → no
signals, no output, assertions failed.

Fix: add an autouse fixture that rebinds the three module-level names to
whatever is currently live in sys.modules['hermes_cli.main'] before each
test runs. The pollutants in the other two files are load-bearing for
their own tests, so fixing it on the consumer side is correct.

Repro: pytest tests/hermes_cli/test_env_loader.py tests/hermes_cli/test_update_stale_dashboard.py
2026-04-30 02:46:56 -07:00
Teknium
0da968e521
fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868)
Voscko reported curator.auxiliary.provider/model was advertised in the
docs but ignored — the review fork read only model.provider/default. The
narrow fix would wire the one-off key through, but that leaves curator
as a parallel system: not in `hermes model` → auxiliary picker, not in
the dashboard Models tab, missing per-task base_url/api_key/timeout/
extra_body.

Unify curator with the rest of the aux task system so `hermes model`
and the dashboard configure it like every other aux task.

Four sources of truth updated:
- hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary
  (timeout=600 since reviews run long), drop the one-off curator.auxiliary
  block from DEFAULT_CONFIG.curator.
- hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass')
  to _AUX_TASKS so the CLI picker offers it.
- hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the
  dashboard REST endpoint accepts it.
- web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard
  Models tab renders the task.

agent/curator.py _resolve_review_model() now reads auxiliary.curator
first (canonical), falls back to legacy curator.auxiliary (with an info
log asking users to migrate), then falls back to the main chat model.
Pre-unification users keep working.

Docs updated: docs/user-guide/features/curator.md now points at
`hermes model` → auxiliary → Curator and the dashboard Models tab.

Tests: 6 unit tests on _resolve_review_model (auto default, canonical
slot honored, partial override fallback, legacy fallback with
deprecation log assertion, new-wins-over-legacy, empty-config safety)
plus a cross-registry test that curator is wired into all four sources
of truth. test_aux_tasks_keys_all_exist_in_default_config already
covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant.

Reported by Voscko on Discord.
2026-04-30 02:46:01 -07:00
teknium1
658947480a fix(acp): drop dead message_id kwarg from replay chunks
UserMessageChunk and AgentMessageChunk do not have a message_id field
in the ACP schema. Passing it silently dropped the kwarg (pydantic
does not raise on unknown init kwargs here) and the subsequent test
assertions on .message_id raised AttributeError. Strip the dead
plumbing (uuid import, message_id= kwarg on both chunk types, unused
session_id/index parameters) and remove the matching .message_id
asserts from the test.
2026-04-30 02:45:54 -07:00
Henkey
d2536a72bf fix(acp): replay session history on load 2026-04-30 02:45:54 -07:00
teknium1
5d253e65b7 fix(openviking): pre-check fs/stat to route file URIs before hitting directory-only endpoints
Adds a deterministic pre-check on top of htsh's exception-based fallback:
before calling /content/abstract or /content/overview on a non-pseudo URI,
probe /api/v1/fs/stat. If the server says the URI is a file, route straight
to /content/read instead of eating a failing 500 round-trip.

This is the same idea pty819 and chennest independently landed in PRs
#12757 and #12937 — merged here on top of htsh's broader fix so we keep
pseudo-URI normalization and v0.3.3 browse-shape handling while avoiding
the slow exception path on servers that return a raised 500 every time.

The exception fallback from #5886 stays in place for environments where
fs/stat is unavailable or returns an unfamiliar shape.

Also credits pty819, chennest, and htsh in AUTHOR_MAP so future release
notes attribute them correctly.
2026-04-30 02:35:29 -07:00
hitesh
10e43edc09 fix(openviking): fallback summary reads to content/read for file URIs
OpenViking returns 500 for /content/abstract and /content/overview when URI points to mem_*.md files.
Add resilient fallback to /content/read for non-pseudo summary file URIs while preserving pseudo summary normalization.
Also add regression tests for fallback behavior.
2026-04-30 02:35:29 -07:00
hitesh
bff8ab0311 test(openviking): add helper regression coverage 2026-04-30 02:35:29 -07:00
Teknium
0ad4f55aa8
feat(dashboard): add --stop and --status flags (#17840)
`hermes dashboard` is a long-lived foreground server that users often
start and forget about, sometimes in a shell they've since closed.  We
didn't have a way to stop it — users had to find the PID manually.

Adds two lifecycle flags that reuse the same detection + termination
path the post-`hermes update` cleanup (PR #17832) uses:

  hermes dashboard --status
    List running hermes dashboard processes with PID + cmdline.
    Exit 0, informational.

  hermes dashboard --stop
    Terminate all running dashboards (3s grace then force-kill survivors).
    Exit 0 if none remain, 1 if any couldn't be stopped.
    Windows uses `taskkill /F` as before.

Both flags short-circuit before any fastapi/uvicorn import so they work
even on installations where the dashboard extras aren't installed —
useful when you're cleaning up after uninstalling.

The kill helper gained an optional `reason=...` param so the output
reads "(requested via --stop)" instead of the post-update-specific
"running backend no longer matches the updated frontend" wording.

E2E: `hermes dashboard --status` with nothing running prints the
empty message; with a fake `hermes dashboard ...` cmdline spawned via
`exec -a`, `--status` lists it, `--stop` terminates it (exit -15),
and a follow-up `--status` returns empty.
2026-04-30 02:30:20 -07:00
Teknium
2facea7f71
feat(tts): add command-type provider registry under tts.providers.<name> (#17843)
Reshape of PR #17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).

Config shape:

  tts:
    provider: piper-en
    providers:
      piper-en:
        type: command
        command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
        output_format: wav

Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.

Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
  cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
  context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
  opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
  the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
  under tts.providers.<name>.max_text_length.

Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.

Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.

E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.

Co-authored-by: Versun <me+github7604@versun.org>
2026-04-30 02:29:08 -07:00
Teknium
5b85a7d351
fix(update): kill stale dashboard processes instead of warning (#17832)
`hermes update` previously just printed a warning when it detected a
running `hermes dashboard` process from the previous version, telling
the user to kill and restart it themselves.  In practice dashboards get
started and forgotten, so the warning was routinely ignored and users
ended up with a silent frontend/backend mismatch (new JS bundle served
against the old in-memory Python backend, e.g. new auth headers the old
code doesn't recognise → every API call 401s).

The dashboard has no service manager, no PID file, and we don't record
the original launch args (--host, --port, --insecure, --tui, --no-open)
so we can't auto-restart it.  But we CAN stop it, which is what the
user wants — the failure mode when the stale process is left alive is
worse than the dashboard just being down.

- POSIX: SIGTERM, poll for ~3s, SIGKILL any survivors.
- Windows: `taskkill /PID <pid> /F`.
- Print each PID's outcome plus a one-line restart hint.
- Detection logic is unchanged (same ps / wmic scan, same guards
  against the `pgrep -f` greedy-match trap from #16872 and the
  #17049 wmic UnicodeDecodeError fix).

Also split the old monolithic `_warn_stale_dashboard_processes` into
`_find_stale_dashboard_pids` (scan) + `_kill_stale_dashboard_processes`
(kill), keeping the old name as an alias so any external callers still
work.

E2E verified: spawned a fake `hermes dashboard` cmdline via
`exec -a 'hermes dashboard …' sleep 300`, ran
`_kill_stale_dashboard_processes()`, confirmed SIGTERM exit (-15)
and that a post-scan returns an empty PID list.
2026-04-30 01:34:34 -07:00
Teknium
fd0796947f
fix: stabilize CI — TS widen, sys.modules restore, WS subscriber race (#17836)
Three narrow fixes targeting the remaining red checks after #17828:

1. ui-tui/src/app/slash/commands/ops.ts (Docker Build):
   /reload-mcp's local params type annotated session_id: string
   while ctx.sid is string | null. Widen to string | null —
   matches every other rpc call site and the test harness which passes
   { session_id: null }. Fixes TS2322 on line 86. The rpc signature
   itself is Record<string, unknown>, so this is purely a local
   typing fix, no behavioral change.

2. tests/plugins/test_achievements_plugin.py (13 cascading test failures):
   _install_fake_session_db did a raw sys.modules['hermes_state'] =
   fake_module without restoration, leaking the fake across xdist
   worker boundaries. Downstream tests doing from hermes_state import
   SessionDB got a module whose SessionDB was lambda: fake_db
   — 6 test_hermes_state.py tests failed with AttributeError: 'function'
   object has no attribute '_sanitize_fts5_query' / _contains_cjk,
   and 7 test_860_dedup.py tests failed with TypeError: got unexpected
   keyword argument 'db_path' (real code calls SessionDB(db_path=...)).

   Fix: stash monkeypatch on the plugin_api module object in the
   fixture, and have the helper do monkeypatch.setitem(sys.modules,
   'hermes_state', fake_module) for auto-restoration at test teardown.

3. tests/hermes_cli/test_web_server.py (WS race):
   TestPtyWebSocket::test_pub_broadcasts_to_events_subscribers hit the
   30s test timeout on CI. websocket_connect returns after
   ws.accept() — but /api/events registers the subscriber in
   _event_channels on the NEXT await (inside _event_lock). A
   publish immediately after connect could race ahead of registration
   and be dropped, and the subsequent receive_text() blocked until
   SIGALRM killed the test. Fix: poll _event_channels after the
   subscriber connects, before publishing.

Validation:
scripts/run_tests.sh tests/plugins/test_achievements_plugin.py
                     tests/run_agent/test_860_dedup.py
                     tests/test_hermes_state.py
                     tests/hermes_cli/test_web_server.py    338 passed
cd ui-tui && npm run type-check                             clean
cd ui-tui && npm run build                                  clean

Remaining red checks are pure infra (Nix ubuntu hits
TwirpErrorResponse ResourceExhausted on the GH Actions cache API; Nix
macos bounces between npm build openssl-legacy and cache rate-limits)
and cannot be fixed in the codebase.
2026-04-30 01:34:08 -07:00
Teknium
aa7bf329bc
feat(gateway): centralize audio routing + FLAC support + Telegram doc fallback (#17833)
Extracted from PR #17211 (@versun) so it can land independently of the
local_command TTS provider redesign.

- Add should_send_media_as_audio(platform, ext, is_voice) in
  gateway/platforms/base.py; single source of truth for audio routing.
- Add .flac to recognized audio extensions (MEDIA regex, weixin audio
  set, send_message audio set).
- Telegram send_voice() now falls back to send_document for formats
  Telegram's Bot API can't play natively (.wav, .flac, ...) instead of
  raising; MP3/M4A still go to sendAudio, Opus/OGG still go to sendVoice.
- Route _send_telegram() in send_message_tool through a narrower
  _TELEGRAM_SEND_AUDIO_EXTS = {.mp3, .m4a} set.
- cron.scheduler._send_media_via_adapter now delegates the audio
  decision to should_send_media_as_audio so it matches the gateway.
- Update the cron live-adapter ogg test to flag [[audio_as_voice]] so
  it still routes to sendVoice under the new Telegram-specific policy.
- Tests: unit coverage for should_send_media_as_audio across platforms,
  end-to-end MEDIA routing via _process_message_background and
  GatewayRunner._deliver_media_from_response, TelegramAdapter.send_voice
  fallback for FLAC/WAV.

Co-authored-by: Versun <me+github7604@versun.org>
2026-04-30 01:32:31 -07:00
Teknium
26787ce638 test(gateway): isolate plugin adapter imports and guard the anti-pattern
Fixes the xdist collision that broke CI on PR #17764, and structurally
prevents future plugin-adapter tests from reintroducing it.

Problem
-------
tests/gateway/test_teams.py (new in this PR) and tests/gateway/test_irc_adapter.py
(already on main) both followed the same anti-pattern:

  sys.path.insert(0, str(_REPO_ROOT / 'plugins' / 'platforms' / '<name>'))
  from adapter import <Adapter>

Every platform plugin ships its own adapter.py, so the bare
'from adapter import ...' races for sys.modules['adapter']. Whichever test
collected first in a given xdist worker won; the other crashed at
collection with ImportError, and the polluted sys.path cascaded into 19
unrelated test failures across tools/, hermes_cli/, and run_agent/ in the
same worker.

Fix
---
1. tests/gateway/_plugin_adapter_loader.py (new): shared helper
   load_plugin_adapter('<name>') that imports plugins/platforms/<name>/adapter.py
   via importlib.util under the unique module name plugin_adapter_<name>.
   Zero sys.path mutation, no possibility of collision.

2. tests/gateway/test_irc_adapter.py and tests/gateway/test_teams.py:
   migrated to the helper. All 'from adapter import ...' statements
   (including the ones inside test methods) are replaced with module-level
   attribute access on the loaded module.

3. tests/gateway/conftest.py: new pytest_configure guard that AST-scans
   every test_*.py under tests/gateway/ at session start and fails the
   run with a pointer to the helper if any test uses sys.path.insert into
   plugins/platforms/ OR a bare 'import adapter' / 'from adapter import'.
   Runs on the xdist controller only (skipped in workers). The next plugin
   adapter test that tries to reintroduce this pattern gets rejected at
   collection time with a clear remediation message.

4. scripts/release.py: add aamirjawaid@microsoft.com -> heyitsaamir to
   AUTHOR_MAP so the check-attribution workflow passes.

Validation
----------
scripts/run_tests.sh tests/gateway/                    4194 passed
scripts/run_tests.sh tests/gateway/test_{teams,irc}*   72 passed (both orderings)
scripts/run_tests.sh <11 prev-failing test files>      398 passed
Guard triggers correctly on both Path-operator and string-literal forms
of the anti-pattern.
2026-04-30 01:19:34 -07:00
Aamir Jawaid
b3137d758c feat(teams): add Microsoft Teams platform adapter as a plugin
Hello! I am the maintainer of the microsoft-teams-apps Python SDK and
I built this Teams adapter to integrate Microsoft Teams into Hermes.

Adds a `plugins/platforms/teams` platform plugin using the new
PlatformRegistry system from #17751. The adapter self-registers via
`register(ctx)` — no hardcoding in run.py, toolsets.py, or any
other core file.

Key features:
- Supports personal DMs, group chats, and channel posts
- Adaptive Card approval prompts with in-place button replacement
  (Allow Once / Allow Session / Always Allow / Deny)
- aiohttp webhook server bridged from the Teams SDK to avoid
  the fastapi/uvicorn dependency
- ConversationReference caching for correct proactive sends in
  non-DM chats
- `interactive_setup()` for `hermes gateway setup` integration
- `platform_hint` for LLM context (Teams markdown subset)
- 34 tests covering adapter init, send, message handling, and
  plugin registration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-30 01:19:34 -07:00
Teknium
21e695fcb6
fix: clean up defensive shims and finish CI stabilization from #17660 (#17801)
PR #17660 landed a sweep of CI fixes but left three loose ends:

1. tests/cli/test_cli_loading_indicator.py::test_reload_mcp_sets_busy_state_
   and_prints_status — /reload-mcp gained a prompt-cache-invalidation
   confirmation (commit 4d7fc0f37) that was never wired into this test.
   The test exercises the loading-indicator path, so pre-approve via
   config and go straight into _reload_mcp().

2. tools/mcp_tool.py _make_tool_handler — the added
   getattr(server, '_rpc_lock', None) + 'skip the lock if missing'
   branch is inconsistent with four sibling call sites that still
   direct-access server._rpc_lock. The lock is guaranteed by
   MCPServerTask.__init__; falling through to an unlocked
   session.call_tool would silently serialize-strip RPCs if the guard
   ever triggered. Restore direct access.

3. tui_gateway/server.py _messages_as_conversation — the helper
   existed only to catch 'TypeError: include_ancestors unexpected'
   from mocked SessionDBs that don't actually exist. The real
   SessionDB.get_messages_as_conversation has accepted
   include_ancestors since introduction, and every test FakeDB in
   the repo already declares the kwarg. Remove the shim, inline the
   two call sites.
2026-04-29 23:53:17 -07:00
Teknium
62a5d7207d
feat(plugins): bundle hermes-achievements + scan full session history (#17754)
* feat(plugins): bundle hermes-achievements, scan full session history

Ships @PCinkusz's hermes-achievements dashboard plugin (https://github.com/PCinkusz/hermes-achievements) as a bundled plugin at plugins/hermes-achievements/ and fixes a bug in the scan path that made the plugin only see the first 200 sessions — making lifetime badges (50k tool calls, 75k errors, etc.) unreachable on long-running installs.

Changes:

- plugins/hermes-achievements/: vendor v0.3.1 verbatim (manifest, dist/, plugin_api.py, tests, docs, README).
- plugins/hermes-achievements/dashboard/plugin_api.py:
  * scan_sessions(): limit=None now scans ALL sessions via SQLite LIMIT -1. Previously capped at 200, so users with 8000+ sessions saw ~2% of their history.
  * evaluate_all(): first-ever scans run in a background thread so the dashboard request path never blocks. Stale snapshots serve immediately while a background refresh runs. force=True still blocks synchronously for manual /rescan.
  * _build_pending_snapshot(), _start_background_scan(), _run_scan_and_update_cache(): supporting plumbing + idempotent thread spawn.
- tests/plugins/test_achievements_plugin.py: new tests covering the 200-cap regression, the background-scan first-run flow, stale-serve-plus-background-refresh, forced sync rescan, and scan-thread idempotency.
- website/docs/user-guide/features/built-in-plugins.md: lists hermes-achievements in the bundled-plugins table and documents API endpoints, state files, and performance characteristics.

E2E validated against a real 8564-session ~6.4GB state.db:
  * Cold scan: 13m 19s (one-time, backgrounded — UI never blocks)
  * Warm rescan: 1.47s (8563/8564 sessions reused from checkpoint cache)
  * 57/60 achievements unlocked, 3 discovered — aggregates like total_tool_calls=259958, total_errors=164213, skill_events=368243 correctly surface lifetime badges that the 200-cap made unreachable.

Original credit: @PCinkusz (MIT-licensed). Upstream repo remains the staging ground for new badges; this bundle keeps the dashboard feature parity with Hermes core changes.

* feat(achievements): publish partial snapshots during cold scan

Previously a cold scan on a large session DB (13min on 8564 sessions)
showed zero badges for the entire duration, then every badge at once
when the scan completed. A dashboard refresh mid-scan was indistinguishable
from a fresh install with no history.

Now the scanner publishes a partial snapshot to _SNAPSHOT_CACHE every
250 sessions, so each refresh during a cold scan surfaces more badges
incrementally.

Mechanism:
- scan_sessions() takes an optional progress_callback fired every
  progress_every sessions with (sessions_so_far, scanned, total).
- _compute_from_scan() is extracted from compute_all() and gains an
  is_partial flag that skips writing to state.json — we don't want
  to record unlocked_at based on a half-complete aggregate that a
  later session might rebalance.
- _run_scan_and_update_cache() installs a publisher callback that
  builds a partial snapshot, marks it mode='in_progress', and writes
  it to the cache with age=0 so the UI keeps polling /scan-status
  and picks up the final snapshot when the scan completes.
- Manual /rescan (force=True) disables partial publishing — the
  caller is blocking on the final result anyway.

E2E against real 8564-session state.db (polled cache every 10s):
  t=10s: cache empty
  t=20s: 250/8564 scanned, 35 unlocked, 25 discovered
  t=40s: 500/8564 scanned, 42 unlocked, 18 discovered
  t=60s: 1000/8564 scanned, 49 unlocked, 11 discovered
  ...

Tests: 9/9 pass (2 new — partial snapshot publication + no-persist-on-partial).
Upstream unittest suite: 10/10 pass.

* feat(achievements): in-progress scan banner with live % progress

Previously the dashboard showed zero badges silently during long cold
scans (13min on 8564 sessions). The backend was publishing partial
snapshots every 250 sessions, but the bundled UI didn't surface any
indicator that a scan was running — it just rendered the main page
with whatever counts were currently published and no way for the user
to know more progress was coming.

UI changes (dist/index.js, dist/style.css):

- Added a scan-in-progress banner rendered between the hero and stats
  when scan_meta.mode is 'pending' or 'in_progress'. Shows:
    BUILDING ACHIEVEMENT PROFILE…
    Scanned 1,750 of 8,564 sessions · 20%. Badges unlock as more history streams in.
  with a pulsing teal indicator and a filling teal/cyan progress bar.
  Disappears the moment the backend flips to 'full' or 'incremental'.

- Added an auto-poller via useEffect — while scanInFlight is true the
  page re-fetches /achievements every 4s WITHOUT toggling the loading
  skeleton, so unlock counts tick up visibly without the user refreshing.
  The effect cleans itself up when the scan finishes.

- Added refresh() (re-fetch, no loading flip) alongside the existing
  load() (full reload, used by the Rescan button).

Attribution preserved:

- Added a header comment to index.js crediting @PCinkusz
  (https://github.com/PCinkusz/hermes-achievements, MIT) as the
  original author, noting the banner is a layered addition on top
  of the original dist bundle.
- Matching header comment in style.css, flagging the new
  .ha-scan-banner* rules as the local addition.

Live-verified end to end:

- Spun up `hermes dashboard --port 9229 --no-open` against a fresh
  HERMES_HOME symlinked to the real 8564-session state.db.
- Opened /achievements in a browser, confirmed the banner renders with
  live progress: 'Scanned 1,000 of 8,564 sessions · 11%' → updates to
  '1,250 ... · 14%' → '1,750 ... · 20%' without user interaction,
  matching the backend's partial publications.
- Stats row simultaneously climbed from 35 → 49 → 53 unlocked as
  more history streamed in.
- Vision analysis of the rendered page confirms the banner styling
  matches the rest of the dashboard (dark card bg, teal accent, same
  small-caps typography, pulsing indicator reusing ha-pulse keyframes).
2026-04-29 23:23:57 -07:00
Teknium
ce0c3ae493
fix(aux): remove hardcoded Codex fallback model, drop Codex from auto chain (#17765)
The _CODEX_AUX_MODEL constant had already rotated twice in 6 weeks
(gpt-5.3-codex -> gpt-5.2-codex -> now broken again at gpt-5.2-codex)
because ChatGPT-account Codex gates which models it accepts via an
undocumented, shifting allow-list that OpenAI publishes no changelog
for.  Any pinned default will keep going stale.  Issue #17533 reports
the current breakage: every ChatGPT-account auxiliary fallback fails
with HTTP 400 "model is not supported" and the 60s pause loop degrades
long sessions.

Rather than reset the clock with another stale pin (PR #17544 proposes
gpt-5.2-codex -> gpt-5.4), remove the hardcoded second-order Codex
fallback entirely:

- Delete `_CODEX_AUX_MODEL`.
- Drop `_try_codex` from `_get_provider_chain()` (the auto chain now
  ends at api-key providers; 4 rungs instead of 5).
- Rename `_try_codex() -> _build_codex_client(model)` and require an
  explicit model from the caller.  No more guessing.
- `resolve_provider_client("openai-codex", model=None)` now warns and
  returns (None, None) instead of silently guessing a stale model ID.
- Remove `_try_codex` from the `provider="custom"` fallback ladder
  (same stale-constant trap).
- `_resolve_strict_vision_backend("openai-codex")` routes through
  `resolve_provider_client` so the caller's explicit model is honored.

Codex-main users are unaffected: Step 1 of `_resolve_auto` already
uses `main_provider` + `main_model` directly and passes the user's
configured Codex model through `resolve_provider_client`, which never
touched `_CODEX_AUX_MODEL`.  Per-task overrides (`auxiliary.<task>.provider/model`)
continue to work and are the supported way to route specific aux tasks
through Codex.

Users whose main provider fails with a payment/connection error and
who have ONLY ChatGPT-account Codex auth will now see the 60s pause
without a stale-model-rejection noise line in between -- same outcome,
cleaner failure.

Closes #17533.  Supersedes #17544 (which resets the clock on the
same stale-constant problem).
2026-04-29 23:23:50 -07:00
Stephen Schoettler
f73364b1c4
fix(ci): stabilize main test suite regressions (#17660)
* fix: stabilize main test suite regressions

* test(agent): update MiniMax normalization expectation

* test: stabilize remaining CI assertions

* test: harden config helper monkeypatching

* test: harden CI-only assertions

* fix(agent): propagate fast streaming interrupts
2026-04-29 23:18:55 -07:00
Teknium
828d3a320b
fix(anthropic): reactive recovery for OAuth 1M-context beta rejection (#17752)
Keep context-1m-2025-08-07 in OAuth requests by default so 1M-capable
subscriptions retain full context. When Anthropic rejects a request with
400 'long context beta is not yet available for this subscription',
disable the beta for the rest of the session, rebuild the client, and
retry once.

Addresses #17680 (thanks @JayGwod for the clean reproduction) without
forcing every OAuth user off the 1M context window.

Changes:
- agent/error_classifier.py: new FailoverReason.oauth_long_context_beta_forbidden;
  pattern matches 400 + 'long context beta' + 'not yet available'. Narrow
  enough that the existing 429 tier-gate pattern keeps its own reason.
- agent/anthropic_adapter.py: _common_betas_for_base_url,
  build_anthropic_client, build_anthropic_kwargs gain drop_context_1m_beta
  kwarg. Default=False (1M stays). OAuth OAUTH_ONLY_BETAS unchanged.
- agent/transports/anthropic.py: build_kwargs forwards the flag.
- run_agent.py: self._oauth_1m_beta_disabled flag, retry-once guard,
  recovery branch next to the image-shrink path. _rebuild_anthropic_client
  honors the flag. The main build_kwargs call site threads it through for
  fast-mode extra_headers.
- hermes_cli/doctor.py, hermes_cli/models.py: sibling OAuth /v1/models
  probes get the same reactive retry — previously they'd falsely report
  the Anthropic API as unreachable for affected subscriptions.

Tests: 2190 tests/agent/ + 94 adjacent integration tests pass. New unit
tests cover the classifier pattern (including the collision guard against
the 429 tier-gate) and the drop_context_1m_beta adapter behavior (default
keeps 1M, flag strips only 1M while preserving every other beta).
2026-04-29 21:56:54 -07:00
Teknium
4d363499db feat(plugins): bundled platform plugins auto-load by default
Platform plugins shipped in-repo under plugins/platforms/ should be
available out of the box — users shouldn't have to add 'irc-platform'
to plugins.enabled before they can pick IRC from the gateway setup menu.

Adds a new ``kind: platform`` plugin type that mirrors the existing
``kind: backend`` auto-load semantics:

- Bundled (shipped in the hermes-agent repo): auto-load unconditionally.
- User-installed (~/.hermes/plugins/): still opt-in via plugins.enabled
  so untrusted code doesn't silently run.

Changes:

* hermes_cli/plugins.py: add 'platform' to _VALID_PLUGIN_KINDS, document
  the new kind in the PluginManifest docstring, extend the bundled auto-
  load rule from 'backend only' to 'backend or platform'.

* plugins/platforms/irc/plugin.yaml: declare kind: platform.

* hermes_cli/gateway.py: remove the now-redundant
  _load_bundled_platform_plugins_for_enumeration() helper and the
  _enable_plugin_for_platform() helper. The setup menu's _all_platforms()
  just calls discover_plugins() and reads the registry — bundled
  platforms are already loaded at that point. Drops the 'needs_enable'
  flag and the 'plugin disabled — select to enable' status string.

* hermes_cli/setup.py: relax the "gateway is configured" detector used
  during OpenClaw migration. Switching to _platform_status() in an
  earlier commit tightened the check to require an exact "configured"
  match, dropping platforms whose status is "enabled, not paired",
  "partially configured", "configured + E2EE", etc. Now any non-"not
  configured" status counts — the user has already started setup there
  and we shouldn't force the section to rerun.

* tests/hermes_cli/test_setup_irc.py: drop the TestIRCPluginDisabledFlow
  class and test_configure_platform_enables_disabled_plugin_first — the
  no-longer-existent flow they were testing.

* tests/hermes_cli/test_setup_openclaw_migration.py: patch both
  setup.get_env_value and gateway.get_env_value in the 4 gateway-section
  tests that reach _platform_status() through the unified setup flow;
  switch WHATSAPP_ENABLED to the literal "true" in the registry-parity
  test so WhatsApp's value-shape validator matches.

Verified via fresh-install smoke (empty plugins.enabled, no env vars):
IRC plugin loads, Platform('irc') resolves, _all_platforms() lists IRC
with status 'not configured'. 160 targeted tests pass.
2026-04-29 21:56:51 -07:00
Teknium
71c8ca17dc chore(salvage): strip duplicated/merge-corrupted blocks from PR #17664
Removes drive-by duplication that accumulated during the contributor
branch's multiple rebases. All runtime-benign (dict last-wins,
redefinition last-wins) but left dead source that would confuse
reviewers and maintainers.

Surgical in-place de-duplication (kept PR's intentional additions,
removed only the doubled copy):

* hermes_cli/auth.py: duplicate "gmi" + "azure-foundry" ProviderConfig
* hermes_cli/models.py: duplicate "gmi" entry in _PROVIDER_MODELS
* hermes_cli/config.py: duplicate NOTION/LINEAR/AIRTABLE/TENOR skill env
  block + duplicate get_custom_provider_context_length definition
* hermes_cli/gateway.py: duplicate _setup_yuanbao
* gateway/platforms/base.py: duplicate is_host_excluded_by_no_proxy
* gateway/platforms/telegram.py: duplicate delete_message
* gateway/stream_consumer.py: duplicate _should_send_fresh_final and
  _try_fresh_final
* gateway/run.py: duplicate _parse_reasoning_command_args /
  _resolve_session_reasoning_config / _set_session_reasoning_override,
  duplicate "Drain silently when interrupted" interrupt check
* run_agent.py: duplicate HERMES_AGENT_HELP_GUIDANCE append, duplicate
  codex_message_items capture, duplicate custom_providers resolution
* tools/approval.py: duplicate HARDLINE_PATTERNS section and duplicate
  hardline call in check_dangerous_command
* tools/mcp_tool.py: duplicate _orphan_stdio_pids module-level decl
* cron/scheduler.py: duplicate "not configured/enabled" check — kept
  the new early-rejection, removed the stale late-path copy

Full-file resets to origin/main (all PR additions were duplicates of
content already on main):

* ui-tui/packages/hermes-ink/index.d.ts
* ui-tui/packages/hermes-ink/src/entry-exports.ts
* ui-tui/packages/hermes-ink/src/ink/selection.ts
* ui-tui/src/app/interfaces.ts
* ui-tui/src/app/slash/commands/core.ts
* ui-tui/src/components/thinking.tsx
* ui-tui/src/lib/memoryMonitor.ts
* ui-tui/src/types.ts
* ui-tui/src/types/hermes-ink.d.ts
* tests/hermes_cli/test_doctor.py
* tests/hermes_cli/test_api_key_providers.py
* tests/hermes_cli/test_model_validation.py
* tests/plugins/memory/test_hindsight_provider.py
* tests/run_agent/test_run_agent.py
* tests/gateway/test_email.py
* tests/tools/test_dockerfile_pid1_reaping.py
* hermes_cli/commands.py (slack_native_slashes block — full duplicate)
2026-04-29 21:56:51 -07:00
Ari Lotter
868bc1c242 feat(irc): add interactive setup
feat(gateway): refine Platform._missing_ and platform-connected dispatch

Restricts plugin-name acceptance to bundled plugin scan + registry
(no arbitrary string -> enum-pollution), pulls per-platform connectivity
checks into a _PLATFORM_CONNECTED_CHECKERS lambda map with a clean
_is_platform_connected method, and adds tests covering the checker map,
plugin platform interface, and IRC setup wizard.
2026-04-29 21:56:51 -07:00
Ari Lotter
1f1608067c feat(gateway): unify setup flows, load platforms dynamically from registry
Merge the two gateway setup paths (hermes setup gateway + hermes gateway
setup) to use a single _unified_platforms() list that merges built-in
_PLATFORMS with dynamically registered plugin entries from
platform_registry.

- Add setup_fn field to PlatformEntry for plugin setup flows
- _unified_platforms() merges built-ins with registry entries by key
- setup_gateway() now uses unified list instead of hardcoded
  _GATEWAY_PLATFORMS tuple list
- gateway_setup() uses same unified list, plugin entries appear
  alongside built-ins with no [plugin] suffix
- _platform_status() handles plugin platforms via registry check_fn
- Plugin platforms with setup_fn get called directly; plugins without
  get a generic env-var display fallback

IRC and other plugin platforms now appear automatically in the setup
menu when registered via platform_registry.register().

feat(gateway): surface disabled platform plugins in setup and auto-enable on select

Platform plugins under plugins/platforms/* (IRC, etc.) were gated behind
plugins.enabled, so `hermes gateway setup` wouldn't list them until the
user ran `hermes plugins enable <name>` first. Now the setup menu always
surfaces them as "plugin disabled — select to enable", and picking one
adds it to plugins.enabled before running its setup flow.

Along the way, unify the two gateway setup flows so `hermes setup gateway`
and `hermes gateway setup` both read from the same platform list (built-in
_PLATFORMS + platform_registry entries), dispatch through a single
_configure_platform() helper, and share _platform_status(). Deletes the
dead bespoke wrappers in setup.py (_setup_whatsapp, _setup_weixin,
_setup_email, etc.) that duplicated logic now covered by the registry
path or _setup_standard_platform.

Also:
- PlatformEntry gains a plugin_name field so the registry knows which
  plugin owns each entry (required for auto-enable).
- PluginContext.register_platform auto-stamps plugin_name from the
  manifest so plugins don't have to pass it explicitly.
- PluginManager now scans plugins/platforms/* as its own category root,
  one level below the bundled plugin scan.
- Fix IRC plugin discovery: rename PLUGIN.yaml → plugin.yaml (the
  scanner is case-sensitive) and add the missing __init__.py that
  _load_directory_module requires.
2026-04-29 21:56:51 -07:00
Teknium
2e20f6ae2d feat: complete plugin platform parity — all 12 integration points
Extends the platform plugin interface from Phase 1 to cover every
touchpoint where built-in platforms have hardcoded behavior.

- allowed_users_env / allow_all_env: per-platform auth env vars
- max_message_length: smart-chunking for send_message tool
- pii_safe: session PII redaction flag
- emoji: CLI/gateway display
- allow_update_command: /update access control

send_message tool (tools/send_message_tool.py):
- Replaced hardcoded platform_map dict with Platform() call
- Added _send_via_adapter() for plugin platforms — routes through
  live gateway adapter when available
- Registry-aware max message length for smart chunking

Cron delivery (cron/scheduler.py):
- Replaced hardcoded 15-entry platform_map with Platform() call
- Plugin platforms now work as cron delivery targets

User authorization (gateway/run.py _is_user_authorized):
- Registry fallback: checks PlatformEntry.allowed_users_env and
  allow_all_env when platform not in hardcoded maps
- Plugin platforms get per-platform auth support

_UPDATE_ALLOWED_PLATFORMS: checks registry allow_update_command flag
Channel directory: includes plugin platforms in session enumeration
Orphaned config warning: descriptive message when plugin platform is
  in config but no plugin registered it
Gateway weakref: _gateway_runner_ref for cross-module adapter access

hermes status: shows plugin platforms with (plugin) tag
hermes gateway setup: plugin platforms appear in menu with setup hints
hermes_cli/platforms.py: get_all_platforms() merges with registry,
  platform_label() falls back to registry for plugin names

- 8 new tests (extended fields, cron resolution, platforms merge)
- Updated 3 tests for new Platform() based resolution
- 2829 passed, 24 pre-existing failures, zero new failures
2026-04-29 21:56:51 -07:00
Teknium
8f144fe36b feat: pluggable platform adapter registry + IRC reference implementation
Adds a platform adapter plugin interface so anyone can create new gateway
platforms (IRC, Viber, Line, etc.) as drop-in plugins without modifying
core gateway code.

- PlatformEntry dataclass: name, label, adapter_factory, check_fn,
  validate_config, required_env, install_hint, source
- PlatformRegistry singleton with register/unregister/create_adapter
- _create_adapter() in gateway/run.py checks registry first, falls
  through to existing if/elif chain for built-in platforms

- Platform._missing_() accepts unknown string values, creating cached
  pseudo-members so Platform('irc') is Platform('irc') holds true
- GatewayConfig.from_dict() now parses plugin platform names from
  config.yaml without rejecting them
- get_connected_platforms() delegates to registry for unknown platforms

- PluginContext.register_platform() for plugin authors
- Mirrors the existing register_tool() / register_hook() pattern

- Full async IRC adapter using stdlib asyncio (zero external deps)
- Connects via TLS, handles PING/PONG, nick collision, NickServ auth
- Channel messages require addressing (nick: msg), DMs always dispatch
- Markdown stripping for IRC-clean output, message splitting for
  512-byte line limit
- Config via config.yaml extra dict or IRC_* env vars

- Platform enum dynamic members (identity stability, case normalization)
- PlatformRegistry (register, unregister, create, validation, factory)
- GatewayConfig integration (from_dict parsing, get_connected_platforms)
- IRC adapter (init, send, protocol parsing, markdown, requirements)

No existing platform adapters were migrated — the if/elif chain is
untouched. This is Phase 1: prove the interface with a real plugin.
2026-04-29 21:56:51 -07:00
Teknium
4d7fc0f37c feat(gateway,cli): confirm /reload-mcp to warn about prompt cache invalidation
Reloading MCP servers rebuilds the tool set for the active session, which
invalidates the provider prompt cache (tool schemas are baked into the
system prompt). The next message re-sends full input tokens — can be
expensive on long-context or high-reasoning models.

To surface that cost, /reload-mcp now routes through a new slash-confirm
primitive with three options: Approve Once / Always Approve / Cancel.
'Always Approve' persists approvals.mcp_reload_confirm: false so future
reloads run silently.

Coverage:

* Classic CLI (cli.py) — interactive numbered prompt.
* TUI (tui_gateway + Ink ops.ts) — text warning on first call; `now` /
  `always` args skip the gate; `always` also persists the opt-out.
* Messenger gateway — button UI on Telegram (inline keyboard), Discord
  (discord.ui.View), Slack (Block Kit actions); text fallback on every
  other platform via /approve /always /cancel replies intercepted in
  gateway/run.py _handle_message.
* Config key: approvals.mcp_reload_confirm (default true).
* Auto-reload paths (CLI file watcher, TUI config-sync mtime poll) pass
  confirm=true so they do NOT prompt.

Implementation:

* tools/slash_confirm.py — module-level pending-state store used by all
  adapters and by the CLI prompt. Thread-safe register/resolve/clear.
* gateway/platforms/base.py — send_slash_confirm hook (default 'Not
  supported' → text fallback).
* gateway/run.py — _request_slash_confirm helper + text intercept in
  _handle_message (yields to in-progress tool-exec approvals so
  dangerous-command /approve still unblocks the tool thread first).

Tests:

* tests/tools/test_slash_confirm.py — primitive lifecycle + async
  resolution + double-click atomicity (16 tests).
* tests/hermes_cli/test_mcp_reload_confirm_gate.py — default-config
  shape + deep-merge preserves user opt-out (5 tests).

Targeted runs (hermetic): 89 passed (slash-confirm, config gate,
existing agent cache, existing telegram approval buttons).
2026-04-29 21:56:47 -07:00
helix4u
7fae87bc00 fix(gateway): refresh cached agents after MCP tool changes 2026-04-29 21:56:47 -07:00
simbam99
ffa65291d1 fix(cron): clear auto-delivery thread context between jobs 2026-04-29 21:08:59 -07:00
teknium1
763aadd6bf fix(telegram): preserve pre-#17686 chat-ID-in-_USERS configs + doc split
PR #15027 (5 days ago) shipped TELEGRAM_GROUP_ALLOWED_USERS as a chat-ID
allowlist. #17686 correctly renames that to sender user IDs and moves
chat IDs to TELEGRAM_GROUP_ALLOWED_CHATS. Without a shim, any user on
PR #15027's guidance would silently start rejecting group traffic on
upgrade.

- gateway/run.py: in _is_user_authorized, if TELEGRAM_GROUP_ALLOWED_USERS
  contains values starting with '-' (chat-ID-shaped), honor them as chat
  IDs and log a one-shot deprecation warning pointing users at the new
  TELEGRAM_GROUP_ALLOWED_CHATS var.
- tests/gateway/test_unauthorized_dm_behavior.py: three new tests cover
  legacy chat-ID values authorizing the listed chat, not crossing to
  other chats, and mixed sender/chat values in the same var.
- website/docs/user-guide/messaging/telegram.md: rewrite the Group
  Allowlisting section to document the new user/chat split + migration
  note. Remove stale '/thread_id' suffix claim (code never parsed it).
- website/docs/reference/environment-variables.md: document all three
  Telegram allowlist env vars.
2026-04-29 21:07:55 -07:00
Anders Bell
1f712173b2 fix(telegram): support group user allowlist 2026-04-29 21:07:55 -07:00
teknium1
dd2d1ba5e6 refactor(reload-skills): queue note for next turn, drop cache invalidation + agent tool
Salvage-follow-up to @shannonsands's /reload-skills PR. Trims the feature to
match the design: user-initiated rescan, no prompt-cache reset, no new
schema surface, no phantom user turn, and the next-turn note carries each
added/removed skill's 60-char description (not just its name).

Changes vs the original PR:

* Drop the in-process skills prompt-cache clear in reload_skills(). Skills
  are invoked at runtime via /skill-name, skills_list, or skill_view —
  they don't need to live in the system prompt for the model to use them.
  Keeping the cache intact preserves prefix caching across the reload so
  /reload-skills pays no cache-reset cost. (MCP has to break the cache
  because tool schemas must be known at conversation start; skills do not.)

* Drop the skills_reload agent tool and SKILLS_RELOAD_SCHEMA from
  tools/skills_tool.py, plus the four skills_reload enumerations in
  toolsets.py. No new schema surface — agents can already see a freshly-
  installed skill via skill_view / skills_list the moment it's on disk.

* Replace the phantom 'role: user' turn injection with a one-shot queued
  note. CLI uses self._pending_skills_reload_note (same pattern as
  _pending_model_switch_note, prepended to the next API call and cleared).
  Gateway uses self._pending_skills_reload_notes[session_key]. The note
  is prepended to the NEXT real user message in this session, so message
  alternation stays intact and nothing out-of-band is persisted to the
  transcript.

* reload_skills() now returns added/removed as
  [{'name': str, 'description': str}, ...] (description truncated to 60
  chars — matches the curator / gateway adapter budget). The injected
  next-turn note formats each entry as 'name — description' so the model
  can actually reason about which new skills to call without running
  skills_list first.

* Only emit the note when the diff is non-empty. On empty diff, print
  'No new skills detected' and do nothing else.

* Tests rewritten to cover the queue semantics, the description payload,
  and a regression guard that the prompt-cache snapshot is preserved.
2026-04-29 21:07:47 -07:00
Shannon Sands
7966560fb5 feat(skills): /reload-skills slash command + skills_reload agent tool
Adds a public reload path for the in-process skill caches so newly
installed (or removed) skills become visible mid-session without a
gateway restart. Mirrors the shape of /reload-mcp.

Three surfaces:
* /reload-skills slash command — CLI (cli.py) and gateway (gateway/run.py),
  with /reload_skills alias for Telegram autocomplete and an explicit
  Discord registration.
* skills_reload agent tool (tools/skills_tool.py) — lets agents/subagents
  pick up freshly-installed skills via tool call.
* agent.skill_commands.reload_skills() — shared helper that clears
  _skill_commands, _SKILLS_PROMPT_CACHE (in-process LRU), and the
  on-disk .skills_prompt_snapshot.json, then returns an added/removed
  diff plus the new total count.

Tested:
* tests/agent/test_skill_commands_reload.py (9 cases)
* tests/cli/test_cli_reload_skills.py       (3 cases)
* tests/gateway/test_reload_skills_command.py (4 cases)

Use case: NemoClaw / OpenShell-style sandboxed orchestrators that drop
skills into ~/.hermes/skills mid-session, plus agentic flows where the
agent itself installs a skill via the shell tool and needs it bound
without a gateway restart. The Python helper
clear_skills_system_prompt_cache(clear_snapshot=True) already exists
internally — this PR just exposes it via slash command and tool.
2026-04-29 21:07:47 -07:00
ethernet
7d48a16f14 remove relaunch_chat
not needed
2026-04-29 20:33:29 -07:00
ethernet
3c673468b4 refactor(cli): derive relaunch flag table from argparse introspection
Pull the top-level + chat parser construction out of main() into
hermes_cli/_parser.py so relaunch.py can introspect parser._actions to
discover which flags exist and whether they take values, instead of
maintaining a parallel hand-rolled (flag, takes_value) tuple list.

- _parser.py: build_top_level_parser() returns (parser, subparsers,
  chat_parser); side-effect-free import.
- main.py: ~290 lines of inline parser construction collapsed to a
  helper call. Other subparsers stay inline (dispatch is bound to
  module-level cmd_* functions).
- _parser._inherited_flag(parser, ...): wraps parser.add_argument and
  sets action.inherit_on_relaunch = True. Used in place of
  parser.add_argument for the 25 flags (top-level + chat) that need to
  carry over.
- _parser.PRE_ARGPARSE_INHERITED_FLAGS: holds --profile/-p, which
  isn't on argparse (consumed earlier by main._apply_profile_override).
- relaunch.py: drops _CRITICAL_DESTS and _PRE_ARGPARSE_FLAGS; the table
  builder now filters by getattr(action, 'inherit_on_relaunch', False).
- test_ignore_user_config_flags.py: brittle inspect.getsource grep
  replaced with proper parser introspection.
- test_relaunch.py: introspection sanity tests added.

Salvaged from PR #17549; added top-level -t/--toolsets flag to
_parser.py so #17623 (fix(tui): honor launch toolsets) behavior is
preserved on current main.

Co-authored-by: ethernet <arilotter@gmail.com>
2026-04-29 20:33:29 -07:00
ethernet
95f2802f84 feat(cli): preserve --tui and other flags across internal relaunches
Extract all os.execvp('hermes', ...) calls into a utility so flags like
--tui, --dev, --profile, --model, --provider, et al. survive session
resume and post-setup relaunch.

- resolve_hermes_bin: prefers sys.argv[0] when callable, then PATH,
  then falls back to '${sys.executable} -m hermes_cli.main' (fixes nix
run relaunches)
- build_relaunch_argv: allowlists critical flags so they carry over
- cmd_sessions browse now calls relaunch(['--resume', <id>])
- _apply_profile_override skips redundant work when HERMES_HOME is
  already set (child inherits parent profile)
- setup.py replaces _resolve_hermes_chat_argv with relaunch_chat()
- added comprehensive tests for flag extraction and binary resolution

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 20:33:29 -07:00
Brooklyn Nicholson
8dcab19d02 fix(gateway): fail closed when session.delete can't enumerate active sessions
If a concurrent RPC mutates _sessions while session.delete is iterating
it (e.g. a parallel session.create on the thread pool), the bare except
swallowed the RuntimeError and let the delete proceed against a row
that may still be live.  Snapshot via list(_sessions.values()) and
return an error when even that raises, instead of treating "couldn't
check" as "no active sessions."
2026-04-29 20:21:16 -07:00
Brooklyn Nicholson
24b5279f43 feat(tui): delete sessions from /resume picker with d
Pressing `d` on the highlighted row in the resume picker prompts
`delete? y/n`; `y` deletes the session (DB row + on-disk transcript
files), anything else cancels.  The active session is excluded from
deletion server-side.

Adds a new `session.delete` JSON-RPC handler that wraps
`SessionDB.delete_session`, forwarding the per-profile `sessions/`
directory so transcripts get cleaned up alongside the row.
2026-04-29 20:21:16 -07:00
Brooklyn Nicholson
87e259a678 fix(cli): tighten mouse leak sanitizer
Handle unbounded SGR mouse report coordinates and avoid regex work on ordinary prompt-buffer edits by short-circuiting before sanitizer passes.
2026-04-29 22:10:18 -05:00
Teknium
31f70d1f2a
fix(ci): recover 38 failing tests on main (#17642)
CI Tests workflow has been red on main for 40+ consecutive runs. This
commit recovers every failure visible in run 25130722163 (most recent
completed run prior to this PR).

Root causes, by group:

Test-mock drift after product landed (fix: update mocks)
- test_mcp_structured_content / test_mcp_dynamic_discovery (6 tests):
  product added _rpc_lock (#02ae15222) and _schedule_tools_refresh
  (#1350d12b0) without updating sibling test files. Install a real
  asyncio.Lock inside the fake run-loop and patch at _schedule_tools_refresh.
- test_session.py: renamed normalize_whatsapp_identifier → canonical_
  whatsapp_identifier upstream; keep a local alias so the legacy tests
  keep working.
- test_run_progress_topics Slack DM test: PR #8006 made Slack default
  tool_progress=off; explicitly set it to 'all' in the test fixture so
  the progress-callback path still runs. Also read tool_progress_callback
  at call time rather than freezing it in FakeAgent.__init__ — production
  assigns it AFTER construction.
- test_tui_gateway_server session-create/close race: session.create now
  defers _start_agent_build behind a 50ms timer — wait for the build
  thread to enter _make_agent before closing, otherwise the orphan-
  cleanup path never runs.
- test_protocol session.resume: product get_messages_as_conversation now
  takes include_ancestors kwarg; accept **_kwargs in the test stub.
- test_copilot_acp_client redaction: redactor is OFF by default (snapshots
  HERMES_REDACT_SECRETS at import); patch agent.redact._REDACT_ENABLED=True
  for the duration of the test.
- test_minimax_provider: after #17171, dots in non-Anthropic model names
  stay dots even with preserve_dots=False. Assert the new invariant
  rather than the old 'broken for MiniMax' behavior.
- test_update_autostash: updater now scans `ps -A` for dashboard PIDs;
  the test's catch-all subprocess.run stub needed stdout/stderr fields.
- test_accretion_caps: read_timestamps dict is populated lazily when
  os.path.getmtime succeeds. Use .get("read_timestamps", {}) to tolerate
  CI filesystems where the stat races file creation.

Change-detector tests (fix: rewrite as structural invariants)
- test_credential_sources_registry_has_expected_steps: was a frozen set
  comparison that broke when minimax-oauth was added. Rewrite as an
  invariant check (every step has description, no dupes, core steps
  present) per AGENTS.md 'don't write change-detector tests'.

xdist ordering / test pollution (fix: reset state, use module-local patches)
- test_setup vercel: sibling test saved VERCEL_PROJECT_ID='project' to
  os.environ via save_env_value() and never cleared it. monkeypatch.delenv
  the VERCEL_* vars in the link-file test.
- test_clipboard TestIsWsl: GitHub Actions is on Azure VMs whose real
  /proc/version often contains 'microsoft'. Patching builtins.open with
  mock_open didn't reliably intercept hermes_constants.is_wsl's call in
  xdist workers that had already cached _wsl_detected=True from an
  earlier test. Patch hermes_constants.open directly and add
  teardown_method to reset the cache after each test.

Pytest-asyncio cancellation hangs (fix: bound product await with timeout)
- test_session_split_brain_11016 (3 params) + test_gateway_shutdown
  cancel-inflight: under pytest-asyncio 1.3.0, 'await task' and
  'asyncio.gather(cancelled_tasks)' can stall for 30s when the cancelled
  task's finally block awaits typing-task cleanup. Bound both with
  asyncio.wait_for(..., timeout=5.0) and asyncio.shield — the stragglers
  are released from adapter tracking and allowed to finish unwinding in
  the background. This is also a legitimate hardening: a wedged finally
  shouldn't stall the caller's dispatch or a gateway shutdown.

Orphan UI config (fix: merge tiny tab into messaging category)
- test_web_server test_no_single_field_categories: the telegram.reactions
  config field lived in its own 'telegram' schema category with no
  siblings. Fold it under 'discord' via _CATEGORY_MERGE so the dashboard
  doesn't render an orphan single-field tab.

Local verification: 38/38 originally-failing tests pass; 4044/4044
gateway tests pass; 684/684 targeted subset (all 16 touched test files)
passes.
2026-04-29 20:05:32 -07:00
Brooklyn Nicholson
98a428fd61 fix(cli): recover from leaked mouse tracking escapes
Detect leaked SGR mouse-report fragments in CLI input, strip them, and reset terminal modes in-place so scroll and typing recover without reopening the tab. Add regression tests for escaped, visible, and bare leak forms.
2026-04-29 21:35:47 -05:00
brooklyn!
fc7f55f490
fix(tui): responsive /compress with live progress + CLI-parity feedback (#17661)
* fix(tui): offload manual compaction RPC

Route TUI session compression through the existing long-handler pool so slow compaction does not block other gateway RPCs.

* fix(tui): show compaction progress immediately

Print a local status line before the compress RPC starts so slow manual compaction does not look like a no-op.

* feat(tui): rich /compress feedback parity with CLI

Show pre-compaction message count and rough token estimate immediately, emit a status update so the bottom bar reflects ongoing compaction, and report a multi-line summary (headline + token delta + optional note) using the shared summarize_manual_compression helper.

* fix(tui): show live compaction estimate in transcript

Mirror compression progress status into the transcript so users see the backend message count and token estimate while /compress is still running.

* fix(tui): single live compaction line with spinner glyph

Drop the redundant local "compressing context..." placeholder and prefix the live backend status line with a braille spinner glyph so /compress reads as a single in-progress row.

* fix(tui): address review nits on /compress feedback

Reuse the precomputed token estimate inside _compress_session_history so the gateway does not redo the O(n) work while holding history_lock, keep the status bar pinned during long manual compactions instead of auto-restoring after 4s, and drop the redundant noop bullet that doubled with the system role glyph.

* fix(tui): release history_lock during compaction LLM call

Move the snapshot/commit pattern into _compress_session_history so the lock is held only across the in-memory bookkeeping, not during agent._compress_context. Also emit a final neutral status update from session.compress so the pinned compressing indicator clears even on errors.

* fix(tui): rebuild prompt cleanly + sync session_key after compress

Pass system_message=None so AIAgent._compress_context rebuilds the system prompt without nesting the cached identity block. Reuse the handler's pre-snapshotted history inside _compress_session_history to avoid a second O(n) copy under the lock. After compaction, when AIAgent._compress_context rotates session_id, sync the gateway session_key, migrate approval notify + yolo state, restart the slash worker, and clear the stale pending title. Mirrors HermesCLI._manual_compress.

* Avoid /compress lock re-entry in slash side effects.

Stop pre-locking history before _compress_session_history in slash command mirroring, keep session-key sync parity with manual compression, and add a regression test that asserts /compress is invoked without holding history_lock.
2026-04-29 18:01:18 -07:00
brooklyn!
5e6e8b6af3
fix(tui): honor launch toolsets (#17623)
* fix(tui): honor launch toolsets

Carry chat --toolsets through the TUI launcher so TUI sessions use the same per-session tool scope as the classic CLI.

* fix(tui): parse top-level toolsets flag

Allow top-level hermes --tui --toolsets to reach the implicit chat session, matching chat subcommand behavior.

* fix(tui): validate launch toolsets

Filter invalid HERMES_TUI_TOOLSETS entries and fall back to configured CLI toolsets when the override contains no valid toolsets.

* fix(tui): avoid config load for builtin toolsets

Honor built-in HERMES_TUI_TOOLSETS values before loading config and treat all/* as the all-toolsets sentinel.

* fix(cli): honor toolsets in oneshot mode

Forward top-level --toolsets into oneshot agent construction so the flag is not silently ignored outside the TUI path.

* fix(cli): validate oneshot toolsets

Reject invalid-only oneshot toolset overrides before output redirection and clarify TUI fallback warnings.

* fix(cli): preserve all-toolsets sentinel

Map explicit all/* oneshot toolset overrides to the all-toolsets sentinel and replace locals() checks in TUI toolset loading.

* fix(cli): warn on extra all-toolset entries

Warn when all/* toolset overrides include additional ignored entries so typos are still visible.

* fix(tui): honor plugin toolset overrides

Discover plugin toolsets before rejecting unresolved explicit toolset overrides and read raw config for MCP name validation.

* fix(tui): reuse toolset argument normalizer

Share top-level TUI toolset argument parsing with the oneshot path to avoid duplicate normalization logic.

* fix(cli): reject disabled mcp toolsets

Validate explicit toolset overrides against enabled MCP servers only and clarify top-level toolset flag help.

* fix(cli): distinguish disabled mcp from unknown toolsets

Report disabled MCP servers separately from unknown toolset entries and stub plugin discovery in invalid-name tests for determinism.
2026-04-29 16:55:27 -07:00
brooklyn!
d9bf093728
Merge pull request #17638 from NousResearch/bb/tui-details-persist
fix(tui): persist global details mode sections
2026-04-29 15:15:37 -07:00
brooklyn!
2a9a5fffa5
Merge pull request #17625 from NousResearch/bb/tui-reasoning-hide
fix(tui): hide reasoning panels immediately
2026-04-29 14:49:20 -07:00
Brooklyn Nicholson
c2cb6d1071 fix(tui): persist global details mode sections
Pin all detail sections when /details sets a global mode so config sync does not restore built-in section defaults.
2026-04-29 16:46:42 -05:00
hejuntt1014
528e7dc176 fix(cli): exclude profiles/ from profile create --clone-all
shutil.copytree from default ~/.hermes duplicated ~/.hermes/profiles into
the new profile, causing nested profiles/.../profiles/... and huge disk use.
Match export behavior (_DEFAULT_EXPORT_EXCLUDE_ROOT) by ignoring the sibling
profiles tree at the source root.

Made-with: Cursor
2026-04-29 14:21:35 -07:00
Brooklyn Nicholson
d8afafd22b fix(tui): hide reasoning panels immediately
Make /reasoning hide update the thinking section visibility so existing and live reasoning blocks disappear without waiting for config sync.
2026-04-29 15:23:14 -05:00
Teknium
9be3ab1a5b
fix(plugins): stop firing pre_tool_call hook twice per tool execution (#17611)
The skip_pre_tool_call_hook flag was added to prevent double-firing of
pre_tool_call when run_agent._invoke_tool pre-checks for a block
directive and then dispatches via handle_function_call. But the
implementation added an else: branch that fired invoke_hook again for
'observers', without noticing that get_pre_tool_call_block_message() in
hermes_cli.plugins already fires invoke_hook('pre_tool_call', ...) as
part of its block-directive poll.

Result: every tool call ran through the run_agent loop fired the hook
twice — reported by community users whose observer / audit plugins
logged each tool invocation twice with identical timestamps.

Fix: delete the else: branch. The single-fire contract is now:
  - skip=False (direct handle_function_call): hook fires once inside
    get_pre_tool_call_block_message().
  - skip=True (run_agent._invoke_tool path): caller fires the hook
    once via get_pre_tool_call_block_message(); handle_function_call
    must not fire it again.

Tightened the existing skip-flag test (renamed to
test_skip_flag_prevents_double_fire) to assert pre_tool_call fires
zero times when skip=True, and added
test_run_agent_pattern_fires_pre_tool_call_exactly_once to lock in
end-to-end that the full block-check + dispatch sequence fires the
hook exactly once.
2026-04-29 12:43:39 -07:00
Nanako0129
2e991770fc fix(gemini): pass base_url into chat transport 2026-04-29 12:10:40 -07:00
Nanako0129
c5a5e586d7 fix(gemini): nest OpenAI-compat thinking config under google 2026-04-29 12:10:40 -07:00
Teknium
c61b2e0af7
feat(skills): refuse skill_manage writes on pinned skills (#17562)
Extend curator's pin flag from 'skip auto-transitions' to 'no agent
edits at all'. All five skill_manage mutation actions (edit, patch,
delete, write_file, remove_file) now refuse pinned skills with a
message pointing the user at `hermes curator unpin <name>`.

Motivation: pin used to only stop the curator's own maintenance pass
from touching a skill. Nothing prevented the main agent from editing
or deleting a pinned skill via skill_manage in-session. This gives
users a hard fence against unwanted agent edits — same semantics as
curator pinning, extended to the write tool.

Create is unaffected (you can't pin a name that doesn't exist yet,
and name collisions already error out). Broken sidecars fail open
rather than lock the agent out.

The schema description advertises the new refusal so models know
not to route around it with rename/recreate tricks.
2026-04-29 10:28:25 -07:00
Adam Manning
f3aa989b1b test(cli): cover minimax-oauth resolution, refresh, menu wiring
Add and extend tests for the minimax-oauth provider across three test
modules.

New file: tests/test_minimax_oauth.py (15 tests)
  - test_pkce_pair_produces_valid_s256: verifies PKCE verifier/challenge
    pair produces a valid S256 hash and correct lengths
  - test_request_user_code_happy_path: mocks httpx, verifies correct
    POST parameters and response parsing
  - test_request_user_code_state_mismatch_raises: verifies CSRF guard
  - test_request_user_code_non_200_raises: verifies HTTP error handling
  - test_poll_token_pending_then_success: verifies polling loop retries
    on 'pending' and returns on 'success'
  - test_poll_token_error_raises: verifies 'error' status raises AuthError
  - test_poll_token_timeout_raises: verifies deadline expiry raises
  - test_refresh_skip_when_not_expired: verifies no HTTP call when token
    is fresh
  - test_refresh_updates_access_token: verifies new access/refresh tokens
    stored on successful refresh
  - test_refresh_reuse_triggers_relogin_required: verifies
    relogin_required=True on invalid_grant/refresh_token_reused
  - test_resolve_credentials_requires_login: verifies AuthError when no
    stored state
  - test_provider_registry_contains_minimax_oauth: PROVIDER_REGISTRY key
  - test_minimax_oauth_alias_resolves: portal/global/underscore aliases
  - test_get_minimax_oauth_auth_status_not_logged_in
  - test_get_minimax_oauth_auth_status_logged_in

Extended: tests/hermes_cli/test_runtime_provider_resolution.py
  - test_minimax_oauth_runtime_returns_anthropic_messages_mode
  - test_minimax_oauth_runtime_uses_inference_base_url

Extended: tests/hermes_cli/test_api_key_providers.py
  - TestMinimaxOAuthProvider class (8 tests) covering registry keys,
    auth_type, endpoints, client_id, aliases, CANONICAL_PROVIDERS
    listing, _PROVIDER_MODELS entries, and aux model
2026-04-29 09:53:42 -07:00
beenherebefore
e0c0167428 fix(cron): use last_run_at as croniter base for cron jobs
compute_next_run() ignored the last_run_at parameter for cron-type
schedules, always computing from _hermes_now() instead. This was
inconsistent with interval jobs which DO use last_run_at as the anchor.

After a crash or restart, cron jobs would compute next_run_at from
the arbitrary restart time rather than the actual last execution time.
While the stale detection in get_due_jobs() catches most cases, using
last_run_at as the croniter base eliminates edge cases and makes the
behavior consistent across schedule types.

Salvaged from #9014 (authored by @beenherebefore) onto current main.
The original PR branch was 2+ weeks stale and would have reverted
substantial unrelated work (jobs_file_lock, workdir/context_from/
enabled_toolsets, issue #16265 state=error recovery). Kept just the
7-line substantive fix and the regression test.
2026-04-29 08:24:48 -07:00
yeyitech
ec27f0a3fa fix(cron): fall back gracefully when HERMES_CRON_TIMEOUT is invalid
Bare `float(os.getenv("HERMES_CRON_TIMEOUT", 600))` in `run_job()` raises
a `ValueError` when the env var is set to a non-numeric string (e.g. "abc").
Replace it with the same defensive try/except pattern already used by
`_get_script_timeout()` for `HERMES_CRON_SCRIPT_TIMEOUT`: log a warning
and fall back to the 600 s default instead of crashing.

Also update the existing env-var tests to exercise the new code path and
add two new tests — one for an invalid value, one for an empty string.

Fixes #11319

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-29 08:21:04 -07:00
Teknium
8c8fc6c1ec
fix(skills): let skill_manage patch/edit/delete skills in external_dirs in place (#17512)
Closes #4759, closes #4381.

Mutating actions (patch, edit, write_file, remove_file, delete) used to
refuse skills that lived under `skills.external_dirs` with 'Skill X is in
an external directory and cannot be modified. Copy it to your local skills
directory first.'  Faced with that error, the agent would fall back to
action='create', which always writes under ~/.hermes/skills/ — producing
a silent duplicate of the external skill in the local store.

Fix: drop the read-only gate.  `skills.external_dirs` is configured by the
user; if they pointed it at a directory, they already said 'these are my
skills, treat them the same.'  Filesystem permissions handle the genuine
read-only case (write fails, agent sees the error).

- New _containing_skills_root() resolves whichever dir actually contains
  the skill; _delete_skill uses it to bound empty-category cleanup so an
  external root is never rmdir'd.
- _create_skill behavior is unchanged: new skills still land in local
  SKILLS_DIR only.  Fewer moving parts.
- Seven new TestExternalSkillMutations tests covering patch/edit/write_file/
  remove_file/delete/create against a mocked two-root layout + a category
  rmdir-safety check.
2026-04-29 08:16:52 -07:00
Teknium
e120cd5941
fix(model_switch): dedup /model picker rows when custom provider endpoint matches a built-in (#16970) (#17511)
When a user authenticates a built-in provider via env var (e.g. DASHSCOPE_API_KEY
triggers the built-in 'alibaba' row) AND defines a custom_providers entry
pointing at the same endpoint, the picker previously emitted two rows for one
endpoint. The built-in row already carries the canonical slug, curated model
list, and correct auth wiring, so the shadow custom entry is redundant.

Adds a _builtin_endpoints set populated as sections 1/2/2b emit rows. Each
entry is the provider's effective base URL (env override via base_url_env_var
wins over the static inference_base_url, so DASHSCOPE_BASE_URL-overridden
endpoints dedup correctly). Section 4 skips any grouped custom entry whose
base_url matches.

Intentionally does NOT repurpose model_catalog.enabled as a 'hide built-ins'
flag. That config controls the remote curated-manifest fetch (documented on
the model-catalog reference page) and overloading it would silently change
behavior for users who disable it for network/privacy reasons.

Three new tests:
- shadow dedup fires when endpoint matches static inference_base_url
- dedup does NOT hide custom entries on genuinely distinct endpoints
- dedup honors the base_url_env_var override path
2026-04-29 08:11:05 -07:00
teknium1
fa3338c171 test(anthropic): regression guard for DeepSeek /anthropic thinking replay
Covers the #16748 fix:
- unsigned thinking blocks synthesised from reasoning_content survive replay
- non-latest assistant turns keep their thinking (DeepSeek validates every turn)
- signed Anthropic blocks are stripped (DeepSeek can't validate them)
- cache_control is stripped from thinking blocks
- OpenAI-compat base (api.deepseek.com without /anthropic) is NOT matched
- non-DeepSeek third parties (minimax) keep the generic strip-all behaviour
2026-04-29 08:10:29 -07:00
刘昊
60c6b07128 fix(cron): keep SOUL.md identity when workdir is unset 2026-04-29 08:10:25 -07:00
teknium1
0a5ee01e48 fix(hindsight): route flush-on-switch through writer queue, not raw thread
Follow-up to the cherry-picked PR #17447. The original flush spawned a
bare threading.Thread for the buffer-flush path, overwriting
self._sync_thread — which is aliased to the long-lived writer thread.
Two consequences:

1. No serialization with the writer queue. If old-session retains were
   still queued in _retain_queue, the flush ran concurrently with the
   writer and both threads could call aretain_batch against the same
   document_id.
2. The pre-spawn 'self._sync_thread.join(timeout=5.0)' tried to join the
   long-lived writer, which never exits, so the join was a no-op that
   just timed out — never actually serialized anything.

Fix: enqueue the flush closure on _retain_queue via _ensure_writer +
put(). Natural FIFO ordering behind any pending retains, no new thread,
no broken join. Shutdown-aware so it doesn't enqueue after teardown.

Tests updated to drain via _retain_queue.join() instead of the stale
_sync_thread.join(). Added regression guard
test_flush_serializes_behind_pending_retains_via_writer_queue that
blocks the writer mid-retain to prove the flush waits in FIFO behind
the old retain.

Also seeds _retain_queue / _shutting_down / stubbed _ensure_writer on
the bare-object test helper in test_memory_session_switch.py so that
path doesn't blow up under the new queue-enqueue.

tests/plugins/memory/test_hindsight_provider.py + tests/agent/test_memory_session_switch.py: 103/103 passing.
2026-04-29 08:09:03 -07:00
Nicolò Boschi
c38dac742b fix(hindsight): flush buffered turns and drop stale prefetch on session switch
Two data-loss / leak gaps in HindsightMemoryProvider.on_session_switch
introduced by #17409.

1. Buffered turns silently lost when retain_every_n_turns > 1.
   on_session_switch unconditionally cleared _session_turns without
   flushing. Users who batched every N>1 turns and switched mid-batch
   (/reset, /new, /resume, /branch, or context compression) had those
   buffered turns disappear. Same data-loss class as the shutdown race,
   different lifecycle event.

   Note commit_memory_session() -> on_session_end() runs *before*
   on_session_switch on /reset, but Hindsight doesn't implement
   on_session_end so the buffer survives that step and dies at clear
   time. /resume, /branch, and compression skip commit_memory_session
   entirely so an on_session_end impl wouldn't help them anyway.

   Fix: snapshot the old _session_id, _document_id, _parent_session_id,
   _turn_index, and _session_turns; spawn one final retain that lands
   under the OLD document_id; then rotate state. Metadata is built
   synchronously against the old self._* so session_id / lineage tags
   on the flushed item all reference the prior session consistently.

2. Stale _prefetch_result leaks across switch.
   If queue_prefetch ran in the old session and the result hadn't been
   consumed by prefetch() yet, on_session_switch left the cached recall
   text in place. The next session's first prefetch() call would return
   text mined from the prior session's bank/query.

   Fix: join any in-flight _prefetch_thread (3s bounded — matches
   shutdown()), then clear _prefetch_result under _prefetch_lock before
   rotating session_id.

Tests
-----
- tests/plugins/memory/test_hindsight_provider.py (TestSessionSwitchBufferFlush):
    - buffered turns flushed under OLD document_id with OLD lineage tags
    - empty buffer => no spurious retain
    - _prefetch_result cleared on switch
    - in-flight prefetch thread is awaited before clear (no race)
- tests/agent/test_memory_session_switch.py: factory extended to seed the
  attrs the new flush path reads (_retain_source, _platform, _bank_id,
  prefetch state, etc.) and stub _run_hindsight_operation so existing
  switch-state assertions keep passing without network setup.
2026-04-29 08:09:03 -07:00
Teknium
1bedc836b5
docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507)
The ~/.openclaw/ detection banner (#16327) had two problems flagged in #16629:

1. It only pitched 'hermes claw cleanup' (destructive archive) and never
   mentioned 'hermes claw migrate' — the actual non-destructive path that
   ports config/memory/skills into Hermes.
2. The copy anthropomorphized the bug ('the agent can still get confused',
   'dutifully reads') and framed OpenClaw as a competitor to eliminate
   ('instead of Hermes's').

Rewrite so migrate leads, cleanup is a clearly-labelled follow-up with a
warning that archiving breaks OpenClaw for users still running it.

Closes #16629
2026-04-29 08:08:36 -07:00
briandevans
e0a03f3f40 fix(api-server): collapse tool start/lifecycle into a single SSE event
Address Copilot review on PR #16666:

1. **Duplicate event on every tool start** — both ``tool_progress_callback``
   and ``tool_start_callback`` fire side-by-side in ``run_agent.py``, so
   wiring both into chat completions emitted *two* ``hermes.tool.progress``
   events per real tool call. Drop the legacy ``_on_tool_progress`` emit
   entirely; ``_on_tool_start`` now produces a single unified event that
   carries the legacy ``tool``/``emoji``/``label`` fields plus the new
   ``toolCallId``/``status`` correlation fields. Label is computed inline
   via ``build_tool_preview`` so callers do not need to pre-format it.

2. **Weak per-event correlation in the regression test** — the previous
   assertion checked that a ``toolCallId`` appeared *somewhere* in the
   aggregate, which would have passed even if ``running`` lacked the id.
   Collect ``(status, toolCallId)`` per event and assert each event
   carries the correct pair, plus exactly two events on the wire (no
   silent duplication regression).

The two existing chat-completions tool-progress tests are updated to fire
``tool_start_callback`` instead of ``tool_progress_callback``, matching
production reality where ``run_agent`` always pairs them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 08:08:16 -07:00
kshitijk4poor
13c238327e fix: address self-review findings for Vercel Sandbox salvage
- Add vercel_sandbox to hardline blocklist container bypass test
- Add vercel_sandbox to skills_tool remote backend parametrize test
- Deduplicate runtime set: doctor.py and setup.py now import
  _SUPPORTED_VERCEL_RUNTIMES from terminal_tool.py
- Add docstring to _run_bash explaining timeout/stdin_data discards
- Always stop sandbox during cleanup (unconditional, matching Modal/Daytona)
- Update security.md: container bypass text, production tip, comparison table
- Update environment-variables.md: TERMINAL_ENV list, Vercel auth vars,
  TERMINAL_VERCEL_RUNTIME
- Update inline comments in cli.py and config.py to include vercel_sandbox
2026-04-29 07:22:33 -07:00
Scott Trinh
5a1d4f6804 feat: add Vercel Sandbox backend
Adds Vercel Sandbox as a supported Hermes terminal backend alongside
existing providers (Local, Docker, Modal, SSH, Daytona, Singularity).

Uses the Vercel Python SDK to create/manage cloud microVMs, supports
snapshot-based filesystem persistence keyed by task_id, and integrates
with the existing BaseEnvironment shell contract and FileSyncManager
for credential/skill syncing.

Based on #17127 by @scotttrinh, cherry-picked onto current main.
2026-04-29 07:22:33 -07:00
Magaav
810d98e892 feat(api_server): expose run status for external UIs (#17085)
Adds two API server endpoints for external UIs and orchestrators:

- GET /v1/capabilities — machine-readable feature discovery so clients
  can detect which Runs API / SSE / auth features this Hermes version
  supports before depending on them.
- GET /v1/runs/{run_id} — pollable run status so dashboards can check
  queued/running/completed/failed/cancelled/stopping state without
  holding an SSE connection open.

Also moves request validation ahead of run allocation so invalid
payloads no longer leave orphaned entries in _run_streams waiting for
the TTL sweep.

task_id is intentionally kept as "default" for the Runs API to
preserve the shared-sandbox model used by CLI, gateway, and the
existing _run_agent_with_callbacks path. session_id is surfaced in
run status for external-UI correlation only.

Salvage of PR #17085 by @Magaav.
2026-04-29 06:38:10 -07:00
Teknium
83c288da01
fix(anthropic): broaden Kimi thinking-suppression to custom endpoints (#17455)
The guard that drops Anthropic's `thinking` kwarg for Kimi endpoints was
matched on `https://api.kimi.com/coding` only.  Users configuring a
custom Kimi-compatible gateway (or an official Moonshot host) with
`api_mode: anthropic_messages` fall through to the generic third-party
path, which strips thinking blocks AND still sends
`thinking={enabled,...}` → upstream rejects with HTTP 400
"reasoning_content is missing in assistant tool call message at index N"
on the next request after a tool call.

Replace `_is_kimi_coding_endpoint` callers (history replay + thinking
kwarg gate) with `_is_kimi_family_endpoint(base_url, model)` that also
matches the `api.kimi.com` / `moonshot.ai` / `moonshot.cn` hosts and
Kimi/Moonshot family model names (`kimi-`, `moonshot-`, `k1.`, `k2.`,
…) for custom / proxied endpoints.  Keeps the UA-header check in
`build_anthropic_client` URL-only — the `claude-code/0.1.0` header is
an official-Kimi contract.

Plumbs optional `model` through `convert_messages_to_anthropic` so
the unsigned reasoning_content→thinking block synthesised for Kimi's
history validation survives the third-party signature-stripping pass
on custom hosts too.

Closes #17057.
2026-04-29 06:35:42 -07:00
Teknium
398945e7b1
fix(cron): accept list-form deliver values so deliver=['telegram'] works (#17456)
The cron schema contracts deliver as a string ("local", "origin",
"telegram", "telegram:chat_id[:thread_id]", or comma-separated combos),
but MCP clients and scripts sometimes pass an array like ['telegram'].

Before this change, the list was written to jobs.json verbatim, and
the scheduler's str(deliver).split(',') then tried to resolve the
literal string "['telegram']" as a platform — returning None and
logging 'no delivery target resolved for deliver=[\'telegram\']'.

Fix on both ends:
- tools/cronjob_tools.py: normalize deliver at the API boundary on
  create and update, so storage is always a string.
- cron/scheduler.py: normalize deliver in _resolve_delivery_targets,
  so existing jobs.json entries with list-form deliver are handled
  gracefully without requiring users to edit the file.

Closes #17139
2026-04-29 06:35:34 -07:00
Nicolò Boschi
0565497dcc fix(hindsight): drain retain queue cleanly on shutdown
The plugin used to spawn one daemon thread per sync_turn() to do the
aretain_batch network write. On CLI exit, that pattern raced interpreter
shutdown — the last retain could reach aiohttp after asyncio's
"cannot schedule new futures" guard had fired, producing noisy logs and
silently losing the final unsaved turn:

    WARNING ... Hindsight sync failed: cannot schedule new futures after
            interpreter shutdown
    ERROR asyncio: Unclosed client session
            client_session: <aiohttp.client.ClientSession object at 0x...>

Switch to a single-writer model: each provider owns one long-lived
writer thread plus a queue. sync_turn() snapshots state and enqueues a
job; the writer drains sequentially. Once shutdown() is called:

  - new sync_turn() / queue_prefetch() calls are dropped, not enqueued
  - a sentinel wakes the writer so it finishes in-flight work
  - shutdown joins the writer (10s) before nulling the client

Also register an idempotent atexit hook from the first sync_turn(), so
exit paths that don't go through MemoryManager.shutdown_all() (Ctrl-C,
abrupt exit) still get a chance to drain.

Tests: keep _sync_thread as a legacy alias to the writer, swap join()
calls to _retain_queue.join() (canonical wait-for-drain), add a new
TestShutdownRace suite covering single-writer reuse, post-shutdown drop,
queue draining, and shutdown idempotency.
2026-04-29 06:34:24 -07:00
briandevans
835f9adec0 fix(update,test): clarify wmic comment; switch tests to monkeypatch sys.platform
Two fix-ups for #17123:

1. Reword the inline comment in `_warn_stale_dashboard_processes` to
   accurately describe the failure mode (locale-dependent decoder, not a
   "default UTF-8 decoder") and identify `errors="ignore"` as the
   load-bearing protection. Per Copilot's review.

2. Switch `TestWindowsWmicEncoding` from `patch("hermes_cli.main.sys")`
   to `monkeypatch.setattr(sys, "platform", "win32")` — the codebase's
   canonical pattern (e.g. `tests/hermes_cli/test_auth_ssl_macos.py`).
   The MagicMock-replacement approach passed locally on Python 3.12 but
   the platform-equality check failed under CI's xdist+Python 3.11,
   leaving both new tests red despite the fix being present.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:34:13 -07:00
briandevans
b85fff9495 fix(update): protect dashboard wmic scan against UnicodeDecodeError on Windows non-UTF-8 locales (#17049)
`hermes update` calls `_warn_stale_dashboard_processes()` to warn about
dashboard processes still running the pre-update Python backend. On
Windows, that scan shells out to `wmic process get ProcessId,CommandLine
/FORMAT:LIST` with `text=True` and no explicit encoding.

`wmic` emits text in the system code page (e.g. cp936 on zh-CN locales),
not UTF-8. Without an explicit `encoding=`, Python's default UTF-8
decoder crashes the subprocess reader thread with
`UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd0 ...`. In
Python 3.11 that crash is silently absorbed: `subprocess.run()` returns
a `CompletedProcess` with `result.stdout = None`, the next line calls
`result.stdout.split("\n")`, and `hermes update` aborts with the
exact `AttributeError: 'NoneType' object has no attribute 'split'`
trace reported in #17049.

Fix: pass `encoding="utf-8", errors="ignore"` so undecodable bytes
cannot take down the reader thread (the parsing only matches the ASCII
prefixes `CommandLine=` and `ProcessId=`, so dropping non-UTF-8 bytes
is safe), and short-circuit when `result.stdout is None` as a defensive
guard for environments where the reader thread still fails for other
reasons.

This is the same root cause as #17074 (which patches
`hermes_cli/gateway._scan_gateway_pids` for the `hermes setup` path).
That PR does not touch `_warn_stale_dashboard_processes`, so
`hermes update` remains broken on the same locales until this lands.

Regression test in `tests/hermes_cli/test_update_stale_dashboard.py`:
- `test_wmic_invoked_with_utf8_ignore_errors` asserts the explicit
  encoding/errors kwargs reach `subprocess.run`.
- `test_wmic_returns_none_stdout_does_not_crash` simulates the
  reader-thread-crashed `result.stdout=None` aftermath and asserts the
  function returns silently instead of raising AttributeError.

Both new tests fail against clean origin/main (7d4648461) reproducing
the original AttributeError; both pass with this patch. The remaining
3 failures in `tests/hermes_cli/test_cmd_update.py` and
`test_update_autostash.py` are pre-existing baselines on origin/main —
they reproduce identically without this change and are unrelated to
the wmic scan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:34:13 -07:00
teknium1
9e63062b6c fix(stt): resolve API keys from ~/.hermes/.env via get_env_value (#17140)
Widen #17163 to the sibling file tools/transcription_tools.py, which had
the same class of bug. STT provider call sites and the _get_provider
selection gate called os.getenv(...) directly and missed keys that only
lived in ~/.hermes/.env.

Same pattern as tts_tool.py: one guarded top-level import of
get_env_value (falls back to os.getenv on ImportError), then every
API-key and paired-base-URL lookup swapped over.

Call sites migrated:
- _transcribe_groq    — GROQ_API_KEY
- _transcribe_mistral — MISTRAL_API_KEY
- _transcribe_xai     — XAI_API_KEY, XAI_STT_BASE_URL
- _get_provider       — GROQ/MISTRAL/XAI_API_KEY in explicit + auto branches

Module-level defaults (DEFAULT_STT_MODEL, GROQ_BASE_URL, etc.) stay on
os.getenv — they're import-time constants, not runtime config, and the
dotenv fallback would add no value there.

New regression tests in tests/tools/test_transcription_dotenv_fallback.py
(8 cases) mirror briandevans' TTS tests: per-provider dotenv-key
forwarding, selection-gate dotenv visibility, and an end-to-end probe
that patches hermes_cli.config.load_env to simulate ~/.hermes/.env
carrying the key while os.environ does not.
2026-04-29 06:25:20 -07:00
briandevans
33967b4e52 fix(tts): tolerate missing hermes_cli.config in tts_tool import
Wrap the new top-level `from hermes_cli.config import get_env_value`
in try/except ImportError and fall back to a thin os.getenv shim, so
importing tools.tts_tool keeps working in environments where
hermes_cli.config is unavailable. This matches the existing tolerance
in `_load_tts_config()` (tools/tts_tool.py) and the same
import-fallback pattern in tools/tool_backend_helpers.py::fal_key_is_configured.

Also update the TestDotenvFallbackPerProvider docstring to accurately
describe the mocking strategy: per-provider tests patch
`tools.tts_tool.get_env_value` directly, while the regression-guard
tests cover the lower-level `hermes_cli.config.load_env` integration.

Addresses Copilot review on #17163.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:25:20 -07:00
briandevans
40d25e125b fix(tts): resolve API keys from ~/.hermes/.env via get_env_value (#17140)
TTS provider tools (elevenlabs, xai, minimax, mistral, gemini) called
os.getenv("X_API_KEY") directly, which bypassed Hermes's dotenv bridge in
hermes_cli.config. Users who keep their TTS keys only in ~/.hermes/.env saw
"X_API_KEY not set" errors even though the rest of the stack
(agent/credential_pool, hermes_cli/auth) already resolves keys through
get_env_value() — same class of bug as #15914 fixed for those modules.

Switch every TTS env-var lookup (API keys, base URLs, and
check_tts_requirements gates) to get_env_value, which checks os.environ
first and then ~/.hermes/.env. Behaviour for users with keys exported in
the shell is unchanged; users with dotenv-only keys now succeed. The two
diagnostics prints in __main__ are migrated for consistency.

Regression test (tests/tools/test_tts_dotenv_fallback.py):
  - per-provider: each backend reads the dotenv key when only
    ~/.hermes/.env carries it (5 providers).
  - end-to-end: with hermes_cli.config.load_env returning the key and
    os.environ empty, _generate_minimax_tts and check_tts_requirements
    both succeed; reverting tools/tts_tool.py back to os.getenv makes all
    7 tests fail with "MINIMAX_API_KEY not set" / similar.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-29 06:25:20 -07:00
Teknium
ff687c019e
fix(aux): skip kimi-coding in vision auto-detect (closes #17076) (#17451)
* docs(anthropic): correct OAuth scope to Max plan + extra usage credits only

The previous docs pass (#17399) overstated what Anthropic OAuth works
with. In practice Hermes can only route against a Claude Max plan that
has purchased extra usage credits — the base Max allowance is not
consumed, and Claude Pro is not supported at all. Without Max + extra
credits, users must fall back to an ANTHROPIC_API_KEY (pay-per-token).

Updates the four pages touched in #17399:
- integrations/providers.md
- user-guide/features/credential-pools.md
- reference/environment-variables.md
- getting-started/quickstart.md

* fix(aux): skip kimi-coding in vision auto-detect (closes #17076)

Kimi Coding Plan's /coding endpoint (Anthropic Messages wire) has no
image_in capability — Kimi's own docs confirm and suggest switching to
a vision-capable model. Vision lives on the separate Kimi Platform
(api.moonshot.ai, OpenAI-wire, pay-as-you-go). When the user has
kimi-coding as main provider and auxiliary.vision.provider=auto,
resolve_vision_provider_client was handing back an AnthropicAuxiliaryClient
wrapped around /coding which 404'd on every vision request.

Add a _PROVIDERS_WITHOUT_VISION frozenset ({kimi-coding, kimi-coding-cn})
and gate the main-provider vision branch on membership. On a skip the
auto-detect falls through to OpenRouter → Nous like any other
main-provider-unavailable case.

Explicit per-task overrides (auxiliary.vision.provider=kimi-coding) are
unaffected — the skip only applies when the caller is in auto mode.

Tests: 4 new targeted tests in TestVisionAutoSkipsKimiCoding covering
the skip path, CN variant, explicit-override passthrough, and a guard
against accidental skip-list widening.
2026-04-29 06:10:23 -07:00
teknium1
258755a24f test(weixin): cover _is_stale_session_ret helper (#17228)
Regression test for the ret=-2 / errmsg='unknown error' disambiguation:
- ret=-2 or errcode=-2 with 'unknown error' → stale session (True)
- ret=-2 with 'freq limit' or other errmsg → rate limit (False)
- ret=-14 → not matched here (handled by SESSION_EXPIRED_ERRCODE path)
- Success codes and missing errmsg → False
2026-04-29 05:44:44 -07:00
teknium1
b0435cc164 fix(model_tools): cancel coroutine on timeout so worker thread exits + log full traceback
_run_async() bridges sync tool handlers to async code. When the handler
is invoked from inside a running event loop (gateway / nested async),
it spawns a worker thread and blocks on future.result(timeout=300).

Before this change, a coroutine that ran past 300s leaked its worker
thread:

  - future.cancel() is a no-op on a running ThreadPoolExecutor future
    (cancel only works on not-yet-started work).
  - pool.shutdown(wait=False, cancel_futures=True) let the caller
    proceed but the worker kept running the coroutine until it
    returned on its own.

Every tool timeout leaked one thread. In long-lived gateway / RL
sessions this is cumulative.

The fix replaces bare asyncio.run() with a worker wrapper that
creates its own event loop. On timeout, _run_async schedules
task.cancel() on that loop via call_soon_threadsafe, then shuts the
pool down with wait=False so the caller returns immediately. The
coroutine observes CancelledError at its next await and the worker
thread exits cleanly.

Also switches logger.error() to logger.exception() in the top-level
handle_function_call() except block so tool failures produce full
stack traces in errors.log instead of just the message.

Related: #17420 (contributor flagged the leak; the original fix used
pool.shutdown(wait=True) which would have converted the leak into a
hang — caller blocks forever on the same stuck coroutine). Credit
for identifying the leak goes to the contributor.

Co-authored-by: 0z! <162235745+0z1-ghb@users.noreply.github.com>
2026-04-29 05:00:40 -07:00
tmimmanuel
3606414ec7 fix(gateway): isolate platform connect failures with per-platform timeout
Wrap each adapter.connect() in asyncio.wait_for() so one platform hanging
during startup or reconnect cannot block the others. Telegram's 8-retry
connect loop (~140s worst case) previously prevented Feishu from ever
starting when Telegram was network-restricted — common for users in
regions where Telegram is blocked.

Default timeout is 30s; override via HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT
(0 disables). Applied to both startup and the reconnect watcher so a
platform that hangs mid-retry also does not stall retries for others.

Fixes #17242
2026-04-29 05:00:37 -07:00
Teknium
20b759cd02
fix(process): reconcile session.exited against real child exit in poll/wait (#17430)
When a background terminal process spawns a descendant daemon that
inherits the stdout pipe (e.g. 'hermes update' triggering a gateway
systemctl restart), the reader thread's stdout.read() never returns EOF
and its finally: block never runs. session.exited stays False forever,
so process(action='poll') returns 'running' indefinitely even though
the direct child exited long ago.

Issue #17327: Feishu user polled 74 times over 7 minutes before killing
the gateway manually.

Fix: add _reconcile_local_exit() that checks the direct Popen.poll()
before trusting session.exited. If the direct child has exited, drain
any immediately-readable bytes non-blocking and flip session.exited.
Called from poll() and wait(). The stuck reader thread remains blocked
but is a daemon thread and gets reaped with the process.

Safe no-op for env/PTY sessions, already-exited sessions, and live
children (returns None from Popen.poll()).
2026-04-29 04:59:21 -07:00
Teknium
13683c0842
feat(memory): notify providers on mid-process session_id rotation (#17409)
Fixes #6672

Memory providers now receive on_session_switch() whenever AIAgent.session_id
rotates mid-process — /resume, /branch, /reset, /new, and context
compression. Before this, providers that cached per-session state in
initialize() (Hindsight's _session_id, _document_id, accumulated
_session_turns, _turn_counter) kept writing into the old session's
record after the agent had moved on.

MemoryProvider ABC
------------------
- New optional hook on_session_switch(new_session_id, *,
  parent_session_id='', reset=False, **kwargs) with no-op default for
  backward compat. reset=True signals /reset or /new — providers should
  flush accumulated per-session buffers. reset=False for /resume,
  /branch, compression where the logical conversation continues.

MemoryManager
-------------
- on_session_switch() fans the hook out to every registered provider.
  Isolated try/except per provider — one bad provider can't block others.
- Empty/None new_session_id is a no-op to avoid corrupting provider state
  during shutdown paths.

run_agent.py
------------
- _sync_external_memory_for_turn now passes session_id=self.session_id
  into sync_all() and queue_prefetch_all(). Providers with defensive
  session_id updates in sync_turn (Hindsight already had this at
  plugins/memory/hindsight/__init__.py:1199) now actually receive the
  current id.
- Compression block at ~L8884 already notified the context engine of
  the rollover; now also calls
  _memory_manager.on_session_switch(reason='compression').

cli.py
------
- new_session() fires reset=True, reason='new_session' so providers
  flush buffers.
- _handle_resume_command fires reset=False, reason='resume' with the
  previous session as parent_session_id.
- _handle_branch_command fires reset=False, reason='branch' with the
  parent session_id already captured for the DB parent link.

gateway/run.py
--------------
- _handle_resume_command now evicts the cached AIAgent, mirroring
  /branch and /reset. The next message rebuilds a fresh agent whose
  memory provider initialize() runs with the correct session_id —
  matches the pattern the gateway already uses for provider state
  cross-session transitions.

Hindsight reference implementation
----------------------------------
- plugins/memory/hindsight/__init__.py adds on_session_switch that:
  updates _session_id, mints a fresh _document_id (prevents
  vectorize-io/hindsight#1303 overwrite), and clears _session_turns /
  _turn_counter / _turn_index so in-flight batches don't flush under
  the new document id. parent_session_id only overwritten when provided
  (avoids clobbering on a bare switch).

Tests
-----
- tests/agent/test_memory_session_switch.py: new dedicated file. ABC
  default no-op, manager fan-out, failure isolation, empty-id no-op,
  session_id propagation through sync_all/queue_prefetch_all, Hindsight
  state transitions for every reset/non-reset case, parent preservation.
- tests/cli/test_branch_command.py: new test verifying /branch fires
  the hook with correct parent_session_id + reset=False + reason.
- tests/gateway/test_resume_command.py: new test verifying /resume
  evicts the cached agent.
- tests/run_agent/test_memory_sync_interrupted.py: updated existing
  assertions to account for the session_id kwarg on sync_all and
  queue_prefetch_all.

E2E verified (real imports, tmp HERMES_HOME):
- /resume: session_id updates, doc_id fresh, buffers cleared, parent set
- /branch: session_id forks, parent links to original
- /new: reset=True clears accumulated state
- compression: reason='compression' propagated, lineage preserved
- Empty id: no-op, state preserved
- Legacy provider without on_session_switch: no crash

Reported by @nicoloboschi (Hindsight maintainer); related scope-widening
comment by @kidonng extending coverage to compression.
2026-04-29 04:57:22 -07:00
Rylen Anil
37d107e03d [verified] fix(gateway): accept user systemd private socket during preflight 2026-04-29 04:57:01 -07:00
Teknium
df0e97a168
fix(minimax): enable Anthropic prompt caching for MiniMax's own models (#17425)
MiniMax's /anthropic endpoint documents cache_control support (0.1x read
pricing, 5-min TTL) for MiniMax-M2.7, M2.5, M2.1, M2. PR #12846 gated
third-party Anthropic-wire caching on 'claude' in model name, which left
MiniMax's own model family re-paying full input tokens every turn.

Opt in explicitly via provider id (minimax / minimax-cn) or host match
(api.minimax.io / api.minimaxi.com). Narrow allowlist mirroring the
existing Qwen/Alibaba branch below; leaves room for a capability-based
surface (ProviderConfig.supports_anthropic_cache) if a third provider
needs it.

Closes #17332
2026-04-29 04:56:55 -07:00
exiao
23f5fc6765 feat(gateway/signal): native formatting, reply quotes, and reactions
Three Signal adapter improvements that depend on the no-edit-mode
plumbing from the previous commit.

1. Native formatting (markdown -> Signal bodyRanges)
   Signal renders markdown as literal characters (**bold**, `code`, #
   heading), which looks broken. Added _markdown_to_signal(text) that
   strips markdown syntax and emits Signal-native bodyRanges as
   start:length:STYLE entries. Offsets are computed in UTF-16 code
   units so non-BMP emoji stay aligned. Supports BOLD, ITALIC, STRIKE,
   MONO, and headings mapped to BOLD. Fenced code and inline code are
   handled; link syntax is unwrapped to visible text + URL.

   Includes edge-case fixes reported previously:
   - Bullet lists ("* item") no longer misidentified as italics
   - URLs containing underscores no longer italicized around the dot

2. Reply-quote context
   Parses dataMessage.quote on inbound messages and populates
   MessageEvent.raw_message with sender + timestamp_ms. This lets the
   gateway's existing [Replying to: "..."] injector (gateway/run.py)
   work on Signal, matching Telegram/Matrix behavior.

3. Processing reactions
   Overrides on_processing_start -> hourglass and on_processing_complete
   -> checkmark via the sendReaction JSON-RPC using targetAuthor and
   targetTimestamp pulled from raw_message. Uses the ProcessingOutcome
   enum introduced in the previous commit.

Also sets SUPPORTS_MESSAGE_EDITING = False on SignalAdapter so the
no-edit streaming path activates.

Tests: 40+ new tests in tests/gateway/test_signal_format.py covering
markdown conversion, UTF-16 offset correctness with non-BMP emoji,
bullet-list and URL false-positive regressions, reply-quote extraction,
and reaction payload shape. Regression extensions to test_signal.py.
2026-04-29 04:38:17 -07:00
Teknium
21676e80cc
Revert "fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957)" (#17397)
This reverts commit 023f5c74b1.
2026-04-29 03:55:03 -07:00
Ben Barclay
58a6171bfb
Merge pull request #17305 from NousResearch/feat/docker-run-as-host-user
feat(docker): run container as host user to avoid root-owned bind mounts
2026-04-29 16:41:55 +10:00
Teknium
bc0d8a941e
feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307)
Every curator pass now emits a dated report directory under
`~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files:

- `run.json` — machine-readable full record (before/after snapshot,
  state transitions, all tool calls, model/provider, timing, full LLM
  final response untruncated, error if any)
- `REPORT.md` — human-readable markdown: model + duration header,
  auto-transition counts, LLM consolidation stats, archived-this-run
  list, new-skills-this-run list, state transitions, the full LLM
  final summary, and a recovery footer pointing at the archive + the
  `hermes curator restore` command

Reports live under `logs/curator/`, not inside `skills/` — they're
operational telemetry, not user-authored skill data, and belong
alongside `agent.log` / `gateway.log`.

Internals:
- `_run_llm_review()` now returns a dict (final, summary, model,
  provider, tool_calls, error) instead of a bare truncated string so
  the reporter has full fidelity
- Report writer is fully best-effort — any failure logs at DEBUG and
  never breaks the curator itself. Same-second rerun gets a numeric
  suffix so reports can't clobber each other
- Report path stamped into `.curator_state` as `last_report_path`
- `hermes curator status` surfaces a "last report:" line so users
  can immediately open the latest run

Tests (all green):
- 7 new tests in tests/agent/test_curator_reports.py covering: report
  location (logs not skills), both files written, run.json shape and
  diff accuracy, markdown structure, error path still writes, state
  transitions captured, same-second runs get unique dirs
- Existing test_run_review_synchronous_invokes_llm_stub updated to
  stub the new dict-returning _run_llm_review signature

Live E2E: ran a synchronous pass against a 1-skill test collection
with a stubbed LLM; report written correctly, state stamped with
last_report_path, markdown human-readable, run.json machine-parseable.
2026-04-28 23:23:11 -07:00
Ben
5531c0df82 feat(docker): run container as host user to avoid root-owned bind mounts
Add opt-in terminal.docker_run_as_host_user config flag that passes
--user $(id -u):$(id -g) to the Docker backend so files written into
bind-mounted directories (/workspace, /root, docker_volumes entries) are
owned by the host user instead of root.

When enabled on POSIX platforms, also drops SETUID/SETGID caps since the
container no longer needs gosu/su to switch users.  Falls back cleanly on
platforms without os.getuid (e.g. native Windows Docker) with a warning.

Wired through all three config.yaml -> TERMINAL_* env-var bridges:
  - cli.py env_mappings        (CLI + TUI startup)
  - gateway/run.py _terminal_env_map (gateway / messaging platforms)
  - hermes_cli/config.py _config_to_env_sync (`hermes config set`)

Also fixes docker_mount_cwd_to_workspace silently failing in gateway
mode -- it was missing from gateway/run.py's _terminal_env_map.

Adds tests/tools/test_terminal_config_env_sync.py to guard against
future drift between the three bridges (same bug class shipped twice
in one month).

Bundled Hermes image won't work with this flag since its entrypoint
expects to start as root for the usermod/gosu hermes flow; works with
the default nikolaik/python-nodejs image and plain Debian/Ubuntu.
2026-04-29 16:16:43 +10:00
brooklyn!
5e68503d2f
Merge pull request #17190 from NousResearch/bb/tui-cold-start-profiling
perf(tui): cut visible cold start ~57% with lazy agent init
2026-04-28 22:45:14 -07:00
vincez-hms-coder
469e4df3c2 fix(profiles): preserve skills on dashboard profile creation 2026-04-29 01:42:51 -04:00
vincez-hms-coder
ae11a31058 feat(profiles): add profile setup command endpoint and wrapper creation 2026-04-29 01:42:51 -04:00
vincez-hms-coder
1745cfc6d7 fix(dashboard): avoid node-only ui imports in browser 2026-04-29 01:42:50 -04:00
vincez-hms-coder
58c07867e3 fix(dashboard): keep profiles list resilient 2026-04-29 01:39:52 -04:00
vincez-hms-coder
4523965de9 feat(dashboard): add profiles management page
Copy profile dashboard changes onto a fresh branch under the vincez-hms-coder account.

Includes:
- Profiles dashboard route and sidebar entry
- Profile lifecycle REST endpoints
- SOUL.md read/write support
- i18n labels and helper text updates
- Targeted profile API tests

Test plan:
- pytest tests/hermes_cli/test_web_server.py -k profile -q
- cd web && npm run build
2026-04-29 01:39:51 -04:00
teknium1
fa9383d27b feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations
Based on three live test runs against 346 agent-created skills on the
author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator
prompt needed three sharpenings before it consistently produced real
umbrella consolidation instead of passive audit output:

**Umbrella-first framing.** The original 'decide keep/patch/archive/
consolidate' framing lets opus default to 'keep' whenever two skills
aren't byte-identical. The new prompt explicitly tells the reviewer
that pairwise distinctness is the wrong bar — the right question is
'would a human maintainer write this as N separate skills, or one
skill with N labeled subsections?' Expect 10-25 prefix clusters; merge
each into an umbrella via one of three methods.

**Three concrete consolidation methods.** (a) Merge into an existing
umbrella (patch the broadest skill, archive siblings); (b) Create a
new umbrella SKILL.md (skill_manage action=create); (c) Demote
session-specific detail into references/, templates/, or scripts/
under the umbrella via skill_manage action=write_file, then archive
the narrow sibling. This matches the support-file vocabulary the
review-prompt side already uses (PR #17213).

**Two observed bailouts pre-empted:** 'usage counters are zero so I
can't judge' (rule 4: judge on content, not use_count) and 'each has
a distinct trigger' (rule 5: pairwise distinctness is the wrong bar).

**Config-aware parent inheritance.** _run_llm_review() was building
AIAgent() without explicit provider/model, hitting an auto-resolve
path that returned empty credentials → HTTP 400 'No models provided'
against OpenRouter. Fork now inherits the user's main provider and
model (via load_config + resolve_runtime_provider) before spawning —
runs on whatever the user is currently on, OAuth-backed or
pool-backed included.

**Unbounded iteration ceiling.** max_iterations=8 was way too low for
an umbrella-build pass over hundreds of skills. A live pass takes
50-100 API calls (scanning, clustering, skill_view'ing candidates,
patching umbrellas, mv'ing siblings). Raised to 9999 — the natural
stopping criterion is 'no more clusters worth processing', not an
arbitrary tool-call budget.

**Tests updated:** test_curator_review_prompt_has_invariants accepts
DO NOT / MUST NOT and drops 'keep' from the required-verb set (the
umbrella-first prompt correctly deemphasizes 'keep' as a first-class
decision label since passive keep-everything is the failure mode
being prevented). Added test_curator_review_prompt_is_umbrella_first
asserting the umbrella framing, class-level thinking, references/
+ templates/ + scripts/ support-file mentions, and the 'use_count
is not evidence of value' pre-emption. Added
test_curator_review_prompt_offers_support_file_actions asserting
skill_manage action=create and action=write_file are both named.

**Live validation on author's setup:**
- Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome
- Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella
- Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.
2026-04-28 22:33:33 -07:00
Teknium
a12f7aa8bb fix(curator): default cycle is every 7 days, not 24 hours
Weekly is closer to how skill churn actually works — most agent-created
skills don't change multiple times per day, so a daily review is pure
cost without benefit. Bumping the default to 7 days reduces aux-model
spend while still catching drift and staleness on the timescales that
matter (30d stale, 90d archive).

Changes:
- DEFAULT_INTERVAL_HOURS: 24 -> 168 (7 days)
- config.yaml default: interval_hours: 24 -> 24 * 7
- CLI status line renders as '7d' when interval is a whole-day multiple
- Test `test_old_run_eligible` decoupled from the exact default: it now
  uses 2 * get_interval_hours() so future tweaks don't break it
2026-04-28 22:33:33 -07:00
Teknium
0d31864e3b fix(curator): defense-in-depth gates against bundled/hub skills
Previous invariants only gated the primary entry points
(apply_automatic_transitions, archive_skill, CLI pin). Several paths
were unprotected:

  - bump_view / bump_use / bump_patch / set_state / set_pinned wrote
    usage records unconditionally, which is confusing noise in
    .usage.json even though the review list filtered them out
  - restore_skill did not check whether a bundled skill now shadows
    the archived name
  - CLI unpin was asymmetric with CLI pin — it had no gate

Fixes:
  - _mutate() (the shared counter / state writer) now drops silently
    when the skill is not agent-created. .usage.json never gains a
    record for a bundled or hub-installed skill.
  - restore_skill() refuses to restore under a name that is now
    bundled or hub-installed (would shadow upstream).
  - CLI unpin gate matches CLI pin.

New tests:
  - 5 provenance-guard tests on skill_usage (one per mutator)
  - 1 end-to-end test that hammers every mutator at a bundled skill
    and a hub skill, asserts both are untouched on disk, and asserts
    the sidecar stays clean
  - 2 CLI tests proving pin/unpin refuse bundled skills symmetrically

64/64 tests passing (29 skill_usage + 27 curator + 8 new guards).
2026-04-28 22:33:33 -07:00
Teknium
c8b7e7268a refactor(curator): point review prompt at existing tools
The LLM review prompt mentioned bespoke `archive_skill` and `pin_skill`
tools that are not registered as model tools. Swap the prompt to rely
on the real surface:

  - skill_manage action=patch  — for patching and consolidation
  - terminal                   — to `mv` skill dirs into .archive/

Also drop `pin` from the model's decision list — pinning is a user
opt-out for `hermes curator pin <skill>`, not something the model
should do autonomously.

Decision list is now: keep / patch / consolidate / archive.

Tests updated: prompt-invariant test now asserts the existing tools
are referenced and that bespoke tool names do NOT appear. New test
prevents `pin` from being re-added as a model decision.
2026-04-28 22:33:33 -07:00
Teknium
bc79e227e6 feat(curator): background skill maintenance (issue #7816)
Adds the Curator — an auxiliary-model background task that periodically
reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage,
transitions unused skills through active → stale → archived, and spawns
a forked AIAgent to consolidate overlaps and patch drift.

Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI
startup and gateway boot when the last run is older than interval_hours
(default 24) AND the agent has been idle for min_idle_hours (default 2).

Invariants (all load-bearing):
- Never touches bundled or hub-installed skills (.bundled_manifest +
  .hub/lock.json double-filter)
- Never auto-deletes — archive only. Archives are recoverable
  via `hermes curator restore <skill>`
- Pinned skills bypass all auto-transitions
- Uses the aux client; never touches the main session's prompt cache

New files:
- tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes,
  provenance filter
- agent/curator.py — orchestrator: config, idle gating, state-machine
  transitions (pure, no LLM), forked-agent review prompt
- hermes_cli/curator.py — `hermes curator {status,run,pause,resume,
  pin,unpin,restore}` subcommand
- tests/tools/test_skill_usage.py — 29 tests
- tests/agent/test_curator.py — 25 tests

Modified files (surgical patches):
- tools/skills_tool.py — bump view_count on successful skill_view
- tools/skill_manager_tool.py — bump patch_count on skill_manage
  patch/edit/write_file/remove_file; forget record on delete
- hermes_cli/config.py — add curator: section to DEFAULT_CONFIG
- hermes_cli/commands.py — add /curator CommandDef with subcommands
- hermes_cli/main.py — register `hermes curator` subparser via
  register_cli() from hermes_cli.curator
- cli.py — /curator slash-command dispatch + startup hook
- gateway/run.py — gateway-boot hook (mirrors CLI)

Validation:
- 54 new tests across skill_usage + curator, all passing in 3s
- 346 tests across all touched files' neighbors green
- 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green
- CLI smoke: `hermes curator status/pause/resume` work end-to-end

Companion to PR #16026 (class-first skill review prompt) — together
they form a loop: the review prompt stops near-duplicate skill creation
at the source, and the curator prunes/consolidates what still accumulates.

Refs #7816.
2026-04-28 22:33:33 -07:00
Lyle Lengyel
80e474f11f fix(gateway,terminal): expand shell tilde in terminal.cwd before subprocess
Commit 3c42064e made config.yaml the single source of truth for
TERMINAL_CWD, but the config bridge passes cwd values verbatim to
os.environ. When a user sets terminal.cwd: ~/ in config.yaml, the
literal string '~/'' reaches subprocess.Popen, which the kernel
rejects because it does not expand shell tilde syntax.

This patch adds three defensive layers:

1. gateway/run.py — expanduser at config bridge time so TERMINAL_CWD
   is always an absolute path.

2. tools/terminal_tool.py — expanduser when reading TERMINAL_CWD in
   _get_env_config(), guarding against stale or manually-set env vars.

3. tools/environments/local.py — expanduser in LocalEnvironment before
   passing cwd to subprocess.Popen, the final safety net.

Includes regression tests in test_config_cwd_bridge.py for nested
terminal.cwd, top-level cwd alias, and precedence ordering.

Refs: 3c42064e
2026-04-28 22:26:09 -07:00
JackJin
88e07c42b4 fix(cli): prevent .env sanitizer from splitting GLM_API_KEY by LM_API_KEY suffix
The known-key splitter in `_sanitize_env_lines` used substring matching
to find concatenated KEY=VALUE pairs. When a registered key was a suffix
of another (LM_API_KEY is a suffix of GLM_API_KEY), the shorter key's
needle would match inside the longer one, causing the sanitizer to
rewrite `GLM_API_KEY=...` as `G\nLM_API_KEY=...` and silently break
Z.AI/GLM auth (and similarly `GLM_BASE_URL` -> `G\nLM_BASE_URL`).

Drop matches whose needle range is fully contained within a longer
overlapping match. Two regression tests cover the suffix-collision case
and confirm a real concatenation that happens to start with the longer
key still splits where it should.

Fixes #17138
2026-04-28 22:22:45 -07:00
Brooklyn Nicholson
4858e26eaa feat(tui): port classic CLI /reload (.env hot-reload) to TUI
Classic CLI exposes ``/reload`` (re-reads ~/.hermes/.env into
``os.environ`` via ``hermes_cli.config.reload_env``) so newly added API
keys take effect without restarting the session.  The TUI was missing
the parity command, so users had to Ctrl+C out and ``hermes --tui``
again whenever they added or rotated a credential.

Three small wires:

* New ``reload.env`` JSON-RPC method in ``tui_gateway/server.py`` that
  delegates to ``hermes_cli.config.reload_env`` and returns the count
  of vars updated.
* New ``/reload`` slash command in ``ui-tui/src/app/slash/commands/ops.ts``
  matching the existing ``/reload-mcp`` pattern (native RPC, no slash
  worker).
* Drop ``cli_only=True`` from the ``reload`` ``CommandDef`` in
  ``hermes_cli/commands.py`` so help/menus surface it in the TUI too.
  ``reload_env`` itself is environment-agnostic.

Same caveat as classic CLI: the *currently constructed* agent's
credential pool / provider routing does not auto-rebuild.  Users who
want a brand-new credential resolution should follow with ``/new``.

Tests:
* New ``test_reload_env_rpc_calls_hermes_cli_reload_env`` confirms
  RPC delegates and reports the count.
* New ``test_reload_env_rpc_surfaces_errors`` confirms exceptions are
  rendered as JSON-RPC errors.
* ``createSlashHandler.test.ts`` slash-parity matrix extended with
  ``['/reload', 'reload.env', {}]`` so we can't regress the routing.

Validation:
  scripts/run_tests.sh tests/test_tui_gateway_server.py — 92/92.
  scripts/run_tests.sh tests/hermes_cli/test_commands.py — 128/128.
  cd ui-tui && npm run type-check — clean; npm test --run — 390/390.
2026-04-28 22:22:30 -07:00
Teknium
dcd7b717f8
fix(gateway): linearize tool-progress bubbles with content messages (#17280)
After PR #7885 (97b0cd51e) added content-side segment breaks for
natural mid-turn assistant messages, the tool-progress task in
gateway/run.py was not updated to match. progress_msg_id and
progress_lines persisted for the whole run, so after a tool batch
produced bubble B1 followed by content bubble C1, the next tool.started
kept editing the OLD bubble B1 above C1 — making the chat appear out
of order on Telegram, Discord, and Slack.

Add on_new_message callback to GatewayStreamConsumer, fired at the
four sites where a fresh content bubble lands on the platform:
  - _send_or_edit first-send branch (NOT edits)
  - _send_commentary
  - _send_new_chunk (overflow split)
  - each successful chunk of _send_fallback_final

Gateway supplies a lambda that enqueues ('__reset__',) into the
progress_queue. send_progress_messages() handles the marker in both
the main loop and the CancelledError drain path, clearing
progress_msg_id, progress_lines, and the dedup state so the next
tool.started opens a fresh bubble below the new content.

Result: each tool batch appears in chronological order below the
preceding content. When no content appears between tool batches,
tools still group in one bubble (CLI-style compactness).

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 22:17:33 -07:00
Tranquil-Flow
ac855bba0e fix(cli): respect terminal.cwd config in local terminal backend
init_session() runs a login shell bootstrap that sources profile scripts
(.bashrc, .bash_profile, etc.) before capturing pwd. If any profile
script changes the working directory, the captured cwd overwrites the
configured terminal.cwd value — so terminal commands run in the wrong
directory despite the TUI banner showing the configured path.

Add an explicit 'builtin cd' to the configured cwd in the bootstrap
script, after profile sourcing but before pwd capture, ensuring the
configured terminal.cwd is always what gets recorded.

Fixes #14044
2026-04-28 22:16:08 -07:00
Brooklyn Nicholson
f95c34f415 fix(browser): address Copilot round-4 on /browser connect
* Reject unsupported schemes (anything outside http/https/ws/wss) in
  cli.py /browser connect before probing or persisting, matching the
  gateway's existing 4015 path.
* Defend gateway browser.manage against `{"url": null}` and
  non-string urls: empty/null falls back to DEFAULT_BROWSER_CDP_URL,
  non-string returns a 4015 instead of slipping into the generic
  5031 catch via TypeError on `"://" in url`.
* Add regression tests for both null-url fallback and non-string
  rejection.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson
679a27498d fix(browser): address Copilot round-3 on /browser connect
* Gate `browser.progress` emit on truthy `session_id`. The TUI
  prints `messages` from the response when there's no session, so
  emitting events too would double-render. Now: with a session →
  events stream live; without one → bundled messages only.
* Resolve `system = platform.system()` once in `_browser_connect`
  and thread it through `try_launch_chrome_debug` and
  `_failure_messages` → `manual_chrome_debug_command`, so the
  generated hint is consistent (and tests are deterministic) on
  any host.
* Add `test_browser_manage_connect_no_session_skips_progress_events`
  to lock in the gating behavior.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson
d1ee4915f3 fix(browser): address Copilot review on /browser connect
Fixes from Copilot's two passes on PR #17238:

* Validate parsed URL once: reject missing host, invalid port, and
  unsupported scheme up front so malformed inputs (e.g. http://:9222
  or http://localhost:abc) don't fall through to a generic 5031.
* Tighten _is_default_local_cdp to require a discovery-style path so
  ws://127.0.0.1:9222/devtools/browser/<id> is not collapsed to bare
  http://127.0.0.1:9222 (which would lose the path and break the
  connect).
* Move browser.manage into _LONG_HANDLERS so the up-to-10s
  launch-and-retry loop runs on the RPC pool instead of blocking the
  main dispatcher.
* try_launch_chrome_debug uses Windows-appropriate detach kwargs
  (creationflags=DETACHED_PROCESS|CREATE_NEW_PROCESS_GROUP) instead
  of POSIX-only start_new_session=True.
* manual_chrome_debug_command uses subprocess.list2cmdline on
  Windows so the printed instruction is cmd.exe-compatible.
* Mirror host/port validation in cli.py /browser connect so the
  classic CLI never persists an invalid BROWSER_CDP_URL.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson
e750829015 fix(tui): stream /browser connect progress as gateway events
Emit browser.progress JSON-RPC notifications during the connect work and render them in the TUI as system transcript lines, so users see the same step-by-step status the base CLI prints instead of nothing for ~1m followed by a final result.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson
7d39a45749 fix(tui): show /browser connect progress like CLI
Return CLI-style browser connect status messages from the gateway and render them in the TUI so local Chrome launch attempts are visible instead of ending in a silent delayed failure.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson
69ff114ee2 fix(browser): avoid bogus Chrome launch fallback
Detect an actual Chrome/Chromium executable before printing a manual CDP launch command, including common WSL-mounted Windows browser paths, so /browser connect does not suggest google-chrome when it is unavailable.
2026-04-28 22:11:10 -07:00
Brooklyn Nicholson
f10a3df632 fix(tui): align /browser connect local CDP handling
Share Chrome CDP launch helpers between the classic CLI and TUI so default /browser connect uses loopback consistently, retries local Chrome launch, and reports a copyable manual-start command instead of claiming a dead connection.
2026-04-28 22:11:10 -07:00
Teknium
1d4218be56
feat(review): active-update bias, loaded-skill-first, support-file variants (#17213)
The background skill-review prompts (_SKILL_REVIEW_PROMPT and the **Skills**
half of _COMBINED_REVIEW_PROMPT) steered the reviewer toward passive
behavior — most passes concluded 'Nothing to save.' even when the session
produced real lessons. User-preference corrections (style, format,
legibility, verbosity) were especially lost: they were read as memory
signals only, so skills never carried the fix.

This rewrite changes the stance:

- **Active-update bias.** The reviewer now treats inaction as a missed
  learning opportunity. 'Nothing to save.' remains an explicit escape
  but is no longer framed as the most-common outcome.

- **User-preference corrections are first-class skill signals.** Style,
  tone, format, legibility, verbosity complaints — and the actual
  phrasings users use ('stop doing X', 'this is too verbose', 'I hate
  when you Y', 'remember this') — now warrant patching the skill that
  governs the task, not just writing to memory.

- **Loaded-skill-first preference order.** When a skill was loaded via
  /skill-name or skill_view during the session, the reviewer patches
  THAT one first. It was in play; it's the right place.

- **Four-step ladder: patch-loaded → patch-umbrella → support-file →
  create.** Support files are explicitly enumerated as three kinds:
    * references/<topic>.md — session-specific detail OR condensed
      knowledge banks (quoted research, API docs excerpts, domain notes)
    * templates/<name>.<ext> — starter files to copy and modify
    * scripts/<name>.<ext>  — statically re-runnable actions

- **Name-veto for CREATE.** New skill names MUST be class-level — no PR
  numbers, error strings, codenames, library-alone names, or session
  artifacts ('fix-X / debug-Y / audit-Z-today'). If the proposed name
  only fits today's task, fall back to one of the patch/support-file
  options.

- **Memory scope clarified.** 'who the user is and what the current
  situation and state of your operations are' — MEMORY.md is
  situational/state, USER.md is identity/preferences.

- **Curator handoff.** Reviewer flags overlap; the background curator
  handles consolidation at scale. Single-session reviewer doesn't
  attempt umbrella-rebalancing.

Tests: tests/run_agent/test_review_prompt_class_first.py upgraded to
assert the new behavioral contracts (active bias, user-correction
signals, loaded-skill-first, support-file kinds, name-veto, memory
framing, curator handoff). 17 tests, all pass.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 21:11:48 -07:00
Brooklyn Nicholson
9e398e1809 perf(tui): avoid importing classic CLI during tool discovery
TUI session readiness was still laggy after the gateway-ready fixes. Profiling
session.create -> session.info showed the slow phase is background AIAgent
construction (~1.1s). A cProfile run of tui_gateway.server::_make_agent showed
model_tools/tool discovery importing tools.code_execution_tool, whose
module-level EXECUTE_CODE_SCHEMA calls _get_execution_mode(), which imported
cli.CLI_CONFIG.

That pulled the classic interactive CLI stack (prompt_toolkit/Rich and REPL
setup) into every agent startup path, including hermes --tui where it is not
used. Replace that with hermes_cli.config.read_raw_config(), which is cached and
reads only the raw code_execution section. Existing defaults still apply when
the key is absent.

Measurements on macOS Terminal.app:
- import run_agent: ~466ms -> ~347ms
- model_tools import: ~418ms -> ~272ms
- _make_agent: ~1452ms -> ~1239ms
- session.create -> session.info: ~1167ms -> ~999ms
- full hermes --tui ready p50: ~1655ms -> ~1537ms

Tests:
- scripts/run_tests.sh tests/tools/test_code_execution_modes.py tests/tools/test_code_execution.py
2026-04-28 22:42:17 -05:00
Scott Trinh
fd943461ca fix(doctor): accept catalog provider aliases
Validate configured providers against both Hermes runtime provider ids and
catalog-normalized provider ids. This keeps providers like ai-gateway from
being rejected after catalog resolution maps them to models.dev ids.

Keep credential checks and vendor-slug warnings anchored to the runtime id
so doctor reports actionable provider names in follow-up diagnostics.
2026-04-28 18:27:42 -07:00
Teknium
9f004b6d94
perf(tools): memoize get_tool_definitions + TTL-cache check_fn results (#17098)
Two amplifying optimizations to per-turn overhead in the gateway:

1. get_tool_definitions() memoization (model_tools.py)
   Keyed on (frozenset(enabled), frozenset(disabled),
   registry._generation, config.yaml mtime+size). Only active when
   quiet_mode=True (which is every hot-path caller — gateway,
   AIAgent.__init__); quiet_mode=False keeps the existing print side
   effects. Cached path returns a shallow-copy list sharing read-only
   schema dicts.

   Measured: 7.5 ms → 0.01 ms per call (~750× speedup). Gateway
   constructs fresh AIAgent per message, so this saves ~7 ms/turn before
   any LLM work.

2. check_fn() TTL cache (tools/registry.py)
   check_fn callables like check_terminal_requirements probe external
   state (Docker daemon, Modal SDK, playwright binary). For a long-lived
   process, hitting them on every get_definitions() pass was pure waste
   — external state changes on human timescales. 30 s TTL so env-var
   flips (hermes tools enable X) propagate within a turn or two without
   explicit invalidation.

   Measured: first call 7.5ms → 1.6ms (check_fn probes now dominate);
   subsequent calls ~0.01ms via the upstream memoization.

Invalidation surface:
- registry._generation bumps on register/deregister/register_toolset_alias,
  invalidating the memoized definitions automatically.
- config.yaml mtime in the cache key captures user-visible config edits
  affecting dynamic schemas (execute_code mode, discord allowlist).
- invalidate_check_fn_cache() exposed for explicit flushes (e.g. after
  hermes tools enable/disable).
- tests/conftest.py autouse fixture clears both caches before every test
  so env-var monkeypatches don't see stale results.

Also fixes a regression from PR #17046 that I missed:
- tools/web_tools.py — Firecrawl was removed from module scope by the
  lazy import, breaking 8 tests that patch 'tools.web_tools.Firecrawl'.
  Applied the same _FirecrawlProxy pattern used in auxiliary_client/
  run_agent for OpenAI (module-level proxy that looks like the class
  but imports the SDK on first call/isinstance; patch() replaces the
  attribute as usual).

Verified:
- 49/49 tests/tools/test_web_tools_config.py pass (was 8 failing on main)
- 68/68 tests/tools/test_homeassistant_tool.py pass (was 1 failing in
  the full suite due to check_fn TTL cross-test pollution; fixed by
  the autouse fixture)
- 3887/3895 tests/tools/ (8 pre-existing fails: 2 delegate, 1 mcp
  dynamic discovery, 5 mcp structured content — all confirmed on main)
- 2973/2976 tests/agent/ + tests/run_agent/ (3 pre-existing fails)
- 868/868 tests/run_agent/ (excluding test_run_agent.py which has
  pre-existing suite-level issues)
- Live smoke: 2 turns + /model switch + tool calls, zero errors in
  agent.log session window.

Co-authored-by: teknium1 <teknium@users.noreply.github.com>
2026-04-28 18:20:17 -07:00
brooklyn!
188eaa57c4
fix(tui): honor documented mouse_tracking config key (#17188)
* fix(tui): honor documented mouse_tracking config key

The TUI runtime was reading display.tui_mouse while docs and user-facing
examples pointed users at display.mouse_tracking. That made persistent
mouse-disable config look like a no-op for users trying to restore native
terminal selection/copy behavior on Linux/SSH/tmux terminals.

Use display.mouse_tracking as the canonical key, keep display.tui_mouse as
a legacy fallback, and have /mouse write the documented key. Both gateway
config.get and client-side config sync now share the same precedence: the
canonical key wins, then the legacy key, then default on.

* review(copilot): align mouse tracking config coercion

- Load gateway config once before deriving display.mouse_tracking state.
- Use key-presence precedence on the TUI client too, so canonical
  mouse_tracking wins over legacy tui_mouse even when the value is null.
- Treat numeric 0 as disabled on both gateway and client, matching the
  existing string "0" handling.
- Widen ConfigDisplayConfig mouse fields because config.get full returns raw
  YAML, not normalized booleans.
2026-04-28 17:39:07 -07:00
brooklyn!
6b09df39be
fix(tui): restore macOS copy behavior and theme polish (#17131)
This PR groups the TUI fixes that restore macOS Terminal usability and clean up the theme/composer regressions:

- copy transcript selections on macOS drag-release so Terminal.app users can copy while mouse tracking is enabled
- copy composer selections on macOS drag-release; composer selection is internal to TextInput and does not use the global Ink selection bus
- keep IDE Cmd+C forwarding setup macOS-only, and make keybinding conflict checks respect simple when-clause overlap/negation
- force truecolor before chalk initializes (unless NO_COLOR / FORCE_COLOR / HERMES_TUI_TRUECOLOR opt-outs apply) so the default banner keeps its gold/amber/bronze gradient in Terminal.app
- move TUI surfaces onto semantic theme tokens and preserve skin prompt symbols as bare tokens with renderer-owned spacing
- render focused placeholders as dim hint text in TTY mode instead of inverse/selected-looking synthetic cursor text
2026-04-28 18:47:14 -05:00
brooklyn!
7d81d76366
feat(tui): pluggable busy-indicator styles (#13610) (#17150)
* feat(tui): pluggable busy-indicator styles (kaomoji/emoji/unicode/ascii)

The status-bar `FaceTicker` rotated through wide-and-variable kaomoji
glyphs (`(。•́︿•̀。)`, `( ͡° ͜ʖ ͡°)`, …) every 2.5s.  Real display widths range
from ~5 to ~16 columns, so the rest of the bar (cwd, ctx %, voice,
bg counter) shifted on every cycle.  Padding the verb alone (#17116)
helped but didn't address the dominant jitter source — the glyph
itself.

Add four indicator styles, configurable + hot-swappable:

* `kaomoji` (default — preserves the existing vibe; verb is now
  pad-stable so the only width churn left is the kaomoji itself).
* `emoji`  — single 2-col emoji frame (`⚕ 🌀 🤔  🍵 🔮`).
* `unicode` — `unicode-animations` braille spinner (1-col, smooth).
* `ascii`  — `| / - \` (1-col, max compat).

Wires:

* `display.tui_status_indicator` in `DEFAULT_CONFIG` (default
  `kaomoji`).
* New JSON-RPC `config.set/get indicator` keys, narrow allow-list.
* `applyDisplay` reads the field and patches `UiState.indicatorStyle`,
  so the existing `mtime` poll picks up `~/.hermes/config.yaml` edits
  within ~5s without a TUI restart.
* `/indicator [style]` slash command (alias `/indicator-style`,
  subcommand completion `kaomoji|emoji|unicode|ascii`).  Bare form
  shows the current style; setter fires `config.set` and
  optimistically `patchUiState({ indicatorStyle })` so the live TUI
  swaps immediately, matching the `/skin` UX.
* `CommandDef("indicator", ..., subcommands=...)` so classic CLI
  autocomplete + TUI `complete.slash` both surface it.
* `FaceTicker` decouples spinner cadence from verb cadence — the
  glyph runs at the spinner's authored interval (or `FACE_TICK_MS`
  for kaomoji), the verb stays on the original 2.5s cycle, and both
  re-arm cleanly when style changes.

Tests:

* `normalizeIndicatorStyle` rejects unknown / non-string input.
* `applyDisplay → tui_status_indicator` covers fan-out + fallback.
* `/indicator <style>` hot-swaps `UiState.indicatorStyle` after a
  successful `config.set`.
* `/indicator sparkle` rejects with the usage hint and never hits
  the gateway.
* Slash-parity matrix gets `'/indicator'` → `config.get`.

Validation:
  cd ui-tui && npm run type-check — clean; npm test --run — 398/398.
  scripts/run_tests.sh tests/test_tui_gateway_server.py
  tests/hermes_cli/test_commands.py — 220/220.

* chore(tui): drop /indicator-style alias to declutter autocomplete

* fix(tui): drop verb-width pad — /indicator handles glyph jitter directly

* fix(tui): unicode indicator style hides the verb (cleanest option)

* refactor(tui): single source of truth for INDICATOR_STYLES; cleaner error format

Round 1 Copilot review on PR #17150:

- Exported `INDICATOR_STYLES` const tuple from `interfaces.ts`;
  `IndicatorStyle` union type is derived from it. `useConfigSync`
  builds its validation Set from the tuple, and `session.ts` uses it
  for both the usage hint and the runtime allow-list — adding/removing
  a style now touches one line.
- Backend `config.set indicator` error message: switched
  `sorted(allowed)` list repr to `pick one of ascii|emoji|kaomoji|unicode`
  (matches the TUI usage hint), and reports the normalized `raw`
  instead of the original `value`. Backend allowed tuple now has a
  comment pointing back at `INDICATOR_STYLES` so the two stay aligned.

Note: kept the verb portion unpadded per design intent — fixed-width
padding was the exact UX the `/indicator` command was added to remove.
Stable width comes from the glyph; verbs cycling is part of the kawaii
aesthetic. Reply on the verb thread will explain.

* fix(tui): drop type collapse + gate verb timer + DEFAULT_INDICATOR_STYLE

Round 2 Copilot review on PR #17150:

- `tui_status_indicator?: 'ascii' | ... | string` collapses to `string`
  in TS — consumers got no narrowing. Documented as plain `string` with
  a comment about runtime validation via `normalizeIndicatorStyle`.
- `FaceTicker` always started a 2.5s verb interval, even for the
  `unicode` style which hides the verb entirely. Now gated on
  `showVerb` from `renderIndicator` — `unicode` stays calm.

Pre-emptive self-review (avoid round 3):
- Three call sites duplicated the literal `'kaomoji'` default
  (uiStore, normalizeIndicatorStyle, slash command). Added
  `DEFAULT_INDICATOR_STYLE` to interfaces.ts and threaded it through
  so changing the default touches one line.

* fix(tui-gateway): normalize config.get indicator output to match TUI render

Round 4 Copilot review on PR #17150: `config.get` for `indicator`
returned the raw `display.tui_status_indicator` value without
validation, so a hand-edited config.yaml with stray casing or an
unknown style would leave `/indicator` printing one thing while
the TUI rendered the kaomoji default (frontend's
`normalizeIndicatorStyle` does this normalization on receive).

Lifted the allow-list to module scope as `_INDICATOR_STYLES` /
`_INDICATOR_DEFAULT`, reused by both `config.set` and `config.get`.
Comment notes the alignment with `INDICATOR_STYLES` /
`DEFAULT_INDICATOR_STYLE` in interfaces.ts so adding/removing a
style is a one-line change on each end.

Tests cover: known value verbatim, casing/whitespace normalize,
unknown→default, unset→default.

* fix(tui-gateway): preserve falsy-input diagnostics in config.set indicator error

Round 5 Copilot review on PR #17150: `raw = str(value or "").strip().lower()`
collapsed any falsy non-string (`0`, `False`, `[]`) to empty string,
so the error message read `unknown indicator: ` with nothing after —
losing the original input.

Switched to `("" if value is None else str(value)).strip().lower()`
so only `None` (the genuine 'no value' case) becomes blank.  Used
`{raw!r}` in the error so the diagnostic is unambiguous (`'0'` vs `0`).

Tests:
- known-value happy path (`'EMOJI'` → `'emoji'`)
- falsy non-string inputs (`0` / `False` / `[]`) surface meaningfully
- `None` keeps the blank-repr error
2026-04-28 18:19:16 -05:00
brooklyn!
1e326c686d
fix(tui-gateway): harden stdio transport against half-closed pipes + SIGTERM races (#17118)
* fix(tui-gateway): harden stdio transport against half-closed pipes + SIGTERM races

`tui_gateway` reports `tui_gateway_crash.log` traces where the main
thread sits in `sys.stdin` while a worker holds `_stdout_lock` mid-
flush, and SIGTERM then calls `sys.exit(0)` while the lock is still
held — the interpreter shutdown stalls behind the wedged write.

Two narrowly scoped hardenings:

**`tui_gateway/transport.py`**

* Move JSON serialisation outside the lock — long messages no longer
  block sibling writers while we serialise.
* Treat `BrokenPipeError`, `ValueError` ("I/O on closed file") and
  generic `OSError` from both `write` and `flush` as "peer is gone":
  return `False` instead of bubbling, matching what `write_json`'s
  callers in `entry.py` already expect.
* Split `flush` into its own try block so a stuck flush never strands
  a partial write or holds the lock indefinitely on its way out.
* Optional `HERMES_TUI_GATEWAY_NO_FLUSH=1` env knob to skip explicit
  `flush()` entirely on environments where a half-closed read pipe
  produces an indefinite kernel-level block.  Default unchanged.

**`tui_gateway/entry.py`**

* `_log_signal` now spawns a 1-second daemon timer that calls
  `os._exit(0)` if the orderly `sys.exit(0)` path is itself stuck
  behind a wedged worker.  Atexit handlers run inside the grace
  window when they can; the timer is the safety net so a deadlocked
  flush no longer strands the gateway process.

Tests:

* `test_write_json_closed_stream_returns_false` — ValueError path.
* `test_write_json_oserror_on_flush_returns_false` — OSError on flush
  must not strand the lock; the write portion still landed before the
  flush failure.
* `test_write_json_no_flush_env_skips_flush` — env knob bypass.

Validation: `scripts/run_tests.sh tests/tui_gateway/test_protocol.py`
(42/42 pass; one pre-existing failure on
`test_session_resume_returns_hydrated_messages` is unrelated to this
change — same `include_ancestors` mock kwarg issue tracked elsewhere).
`scripts/run_tests.sh tests/test_tui_gateway_server.py` 90/90 pass.

* review(copilot): tighten transport hardening comments + test cleanup

* review(copilot): narrow exception capture, configurable grace, simpler no-flush test

* fix(tui-gateway): narrow ValueError to closed-stream; surface UnicodeEncodeError

Copilot review on PR #17118: `UnicodeEncodeError` is a ValueError
subclass, so a non-UTF-8 stdout (mismatched PYTHONIOENCODING / locale)
would have been silently swallowed as 'peer gone' under
`except ValueError`.  That hides a real environment bug.

Now:
- UnicodeEncodeError → log with exc_info (warning) and drop the frame
- ValueError where str(e) contains 'closed file' → peer gone, return False
- Any other ValueError → log loudly, drop frame (defensive, but visible)

Same shape applied to flush.  Adds two regression tests.

* fix(tui-gateway): reserve write() False for peer-gone; re-raise programming errors

Round 2 Copilot review on PR #17118: `Transport.write()` returning
`False` is documented as 'peer is gone', and `entry.py` reacts by
calling `sys.exit(0)`.  But the implementation also returned False
for non-IO conditions (non-JSON-safe payloads, UnicodeEncodeError,
unrelated ValueErrors), so a programming error or local env bug would
present as a clean disconnect — exactly the diagnosis pain we wanted
to eliminate.

Now:
- `json.dumps` failure → re-raises (TypeError/ValueError surfaces in crash log)
- `BrokenPipeError` → False (peer gone)
- `ValueError('...closed file...')` → False (peer gone)
- `UnicodeEncodeError` and any other ValueError → re-raise
- `OSError` → False (existing IO-failure semantics, debug-logged)

Tests updated to assert the re-raise behaviour and added a
non-serializable-payload regression test.

* fix(tui-gateway): narrow OSError to peer-gone errnos; honest test naming

Round 3 Copilot review on PR #17118:

- Docstring claimed False = peer gone, but generic OSError on write/flush
  also returned False — meaning ENOSPC/EACCES/EIO would silently exit.
  Added `_PEER_GONE_ERRNOS = {EPIPE, ECONNRESET, EBADF, ESHUTDOWN, +WSA}`
  and narrowed the OSError handlers; non-peer-gone errnos re-raise.
  Docstring now lists OSError as peer-gone branch with the errno set.
- The `_DISABLE_FLUSH` test was named after the env var but actually
  patched the module constant. Renamed it to reflect the contract being
  tested (skips flush when constant is true) AND added a real
  end-to-end test that sets the env var, reloads transport.py, and
  asserts the constant flips. Cleanup reload restores defaults so
  parallel tests stay isolated.

Self-review (avoid round 4):
- Verified TeeTransport's secondary-swallow stays intentional.
- _log_signal grace path already covered by separate tests.
2026-04-28 17:54:06 -05:00