hermes-agent-features

Author	SHA1	Message	Date
Mibayy	ebf2ea584a	feat(terminal,cli): docker_extra_args + display.timestamps Two independent opt-in QoL toggles, both off by default. terminal.docker_extra_args: - List of extra flags appended verbatim to docker run after security defaults. Useful for adding capabilities (e.g. --cap-add SETUID) or other docker run options not exposed by existing config keys. - Non-string entries are logged and skipped. - Also available via TERMINAL_DOCKER_EXTRA_ARGS='[...]' env var. display.timestamps: - Appends [HH:MM] to user input bullet and the assistant response box header. Single hub in _format_submitted_user_message_preview() covers both single-line and multi-line user previews; assistant response label gets the timestamp at box-open time. Closes #1569 (timestamps). Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>	2026-05-10 22:43:39 -07:00
Teknium	404640a2b7	feat(goals): /goal checklist + /subgoal user controls (#23456 ) * feat(goals): /goal checklist + /subgoal user controls Two-phase judge for /goal — Phase A decomposes the goal into a detailed checklist on first turn; Phase B evaluates each pending item harshly against the agent's most recent response. The goal completes only when every item is in a terminal status (completed or impossible). Adds /subgoal so the user can append, complete, mark impossible, undo, remove, or clear items the judge missed or got wrong. Mechanics: - GoalState gains `checklist` and `decomposed` fields, both backwards compatible (old state_meta rows load unchanged). - Phase A: aux call writes a harsh, exhaustive checklist; biased toward more items not fewer. Falls through to legacy freeform judge when decompose fails. - Phase B: judge gets the checklist + last-response snippet + path to a per-session conversation dump at <HERMES_HOME>/goals/<sid>.json. A bounded read_file tool (max 5 calls per turn, restricted to that one file) lets the judge inspect history when the snippet is ambiguous. Stickiness in code: terminal items are frozen, only the user can revert via /subgoal undo. - Continuation prompt shows checklist progress when non-empty; reverts to old prompt when empty. - Status line shows M/N done counts. CLI + gateway + TUI gateway all pass the agent reference into evaluate_after_turn so the dump can be written. Gateway-side /subgoal is allowed mid-run since it only modifies the checklist the judge consults at turn boundaries. Tests: 24 new cases — backcompat round-trip, Phase A decompose, Phase B updates + new_items + stickiness, user override flows, conversation dump (incl. unsafe-sid sanitization), judge read_file restriction. Existing freeform-mode tests updated to patch the renamed `judge_goal_freeform` and skip Phase A explicitly. * fix(goals): off-by-one in judge index, message-list plumbing, prompt tuning Three live-test findings from running /goal end-to-end against gemini-3-flash-preview as the judge: 1. Off-by-one bug — the judge sees the checklist rendered with 1-based indices ('1. [ ] foo, 2. [ ] bar') but the apply layer indexed state.checklist as 0-based. Result: every judge update landed on the wrong item, evidence got attached to neighbouring rows, and the genuine 'first pending' item (usually #1) never got marked. Fix: convert 1 → 0 in _parse_evaluate_response. Also tightened the user prompt to call out the 1-based scheme explicitly. New tests cover the parser conversion + an end-to-end fake-judge round-trip. 2. Conversation dump never happened — _extract_agent_messages tried common AIAgent attribute names (.messages, .conversation_history, etc.) but AIAgent doesn't expose the message list as an instance attribute; it lives inside run_conversation()'s scope. Result: the judge's read_file tool always saw history_path=unavailable. Fix: added an explicit messages= kwarg to evaluate_after_turn that all three call sites (CLI, gateway, TUI gateway) now pass directly. Agent-attribute extraction kept as back-compat fallback. 3. Prompt was too harsh on simple goals. The original 'be HARSH, default to leaving items pending' wording made the judge refuse to mark 'file exists' completed even after the agent ran ls, test -f, os.path.isfile, and find — burning the entire 8-turn budget on a fizzbuzz task. Softened to 'strict but not absurd' with explicit guidance on what counts as evidence and a directive not to require re-proving items already established earlier. Re-tested live with the same fizzbuzz goal: now terminates in 2 turns with all 8 checklist items correctly attributed to their own evidence. /subgoal user-action flow (add / complete / undo / impossible) verified live as well.	2026-05-10 16:56:51 -07:00
Teknium	c5f1f863ac	fix(cli): drive _prompt_text_input directly when off main thread (#23454 ) Slash commands (/clear, /new, /undo, /reload-mcp) are dispatched from the process_loop daemon thread. prompt_toolkit.run_in_terminal returns a coroutine that only the main-thread event loop can drive, so calling it from a daemon thread orphans the coroutine — the input prompt never renders and user keystrokes leak into the composer instead of the confirmation prompt (issue #23185). Mirror the thread-aware guard already in _run_curses_picker: when off the main thread, fall back to a direct input() call. Also wrap run_in_terminal in try/except so WSL / Warp / other emulators that silently drop the scheduled coroutine fall back to input() too. Tests: tests/cli/test_prompt_text_input_thread_safety.py covers main thread (run_in_terminal path), daemon thread (direct input fallback), no-app, run_in_terminal-raises, and EOF handling.	2026-05-10 16:16:10 -07:00
teknium1	00ce5f04d9	feat(session): make /handoff actually transfer the session live Builds on @kshitijk4poor's CLI handoff stub. The original PR's flow deferred everything to whenever a real user happened to message the target platform; this rewrites it so the gateway picks up handoffs immediately and the destination chat just starts working. State machine on sessions table replaces the boolean flag: None -> 'pending' -> 'running' -> ('completed' \| 'failed') plus handoff_error for failure reasons. CLI request_handoff / get_handoff_state / list_pending_handoffs / claim_handoff / complete_handoff / fail_handoff helpers wrap the transitions. CLI side (cli.py): /handoff <platform> validates the platform's home channel via load_gateway_config, refuses if the agent is mid-turn, flips the row to 'pending', and poll-blocks (60s) on terminal state. On 'completed' it prints the /resume hint and exits the CLI like /quit. On 'failed' or timeout it surfaces the reason and the CLI session stays intact. Gateway side (gateway/run.py): new _handoff_watcher background task scans state.db every 2s, atomically claims pending rows, and runs _process_handoff for each. _process_handoff: 1. Resolves the platform's home channel. 2. Asks the adapter for a fresh thread via the new create_handoff_thread(parent_chat_id, name) capability so the handed-off conversation gets its own scrollback. Adapters that don't support threads (or fail) return None and the watcher falls back to the home channel directly. 3. Constructs a SessionSource keyed as 'thread' when a thread was created, 'dm' otherwise, then session_store.switch_session re-binds the destination key to the CLI session_id. The full role-aware transcript replays via load_transcript on the next turn (no flat-text injection into context_prompt). 4. Forges a synthetic MessageEvent(internal=True) with the handoff notice and dispatches through _handle_message; the agent runs against the loaded transcript and adapter.send delivers the reply. 5. Marks the row 'completed' on success, 'failed' (+error) on any exception. Adapter capability (gateway/platforms/base.py): create_handoff_thread default returns None. Three overrides: - Telegram (gateway/platforms/telegram.py): wraps _create_dm_topic so DM topics (Bot API 9.4+) and forum supergroups both work. - Discord (gateway/platforms/discord.py): parent.create_thread on text channels with a seed-message + message.create_thread fallback for permission edge cases. Skips DMs and other non-thread-capable parents. - Slack (gateway/platforms/slack.py): posts a seed message and returns its ts as the thread anchor — Slack threads are message-anchored. In thread mode, build_session_key keys the destination without user_id (thread_sessions_per_user defaults to False) so the synthetic turn and any later real-user message in the thread share the same session_key — seamless takeover without race. CommandDef stays cli_only=True (handoff is initiated from the CLI; gateway exposes /resume for the reverse direction). Removed the original PR's _handle_message_with_agent handoff hook (transcript-as-text injection into context_prompt) and the send_message_tool notification — both replaced by the watcher path. Tests rewritten around the new state machine: 13/13 pass. E2E-validated thread + no-thread paths and the failure path against real worktree imports with mocked adapters.	2026-05-10 13:06:25 -07:00
kshitijk4poor	878611a79d	feat(session): add /handoff command for cross-platform session transfer Adds /handoff <platform> CLI command that queues the current session for resume on the configured home channel of any messaging platform. CLI side: - /handoff telegram — marks session in shared DB, sends summary to the Telegram home channel via send_message - /handoff discord — same for Discord - Supports telegram, discord, slack, whatsapp, signal, matrix Gateway side: - On new session creation, checks for pending handoffs for the incoming message's platform - If found, loads the CLI session's full conversation history and injects it into the context prompt as a handoff transcript - Agent continues the conversation seamlessly Files: - hermes_state.py: handoff_pending, handoff_platform columns + helpers - cli.py: _handle_handoff_command dispatch + handler - hermes_cli/commands.py: CommandDef entry - gateway/run.py: handoff detection in _handle_message_with_agent - tests/hermes_cli/test_session_handoff.py: 8 tests	2026-05-10 13:06:25 -07:00
Teknium	68e44642c8	fix(stream-retry): collapse two-line drop status, name provider, and let agent.log capture diagnostics (#22993 ) Subagent stream drops were spamming the parent terminal with two lines per blip ('Connection dropped...' + 'Reconnected...') while leaving zero breadcrumb in agent.log to debug them. Two underlying bugs, fixed together: 1. quiet_mode raised the run_agent/tools/etc. loggers to ERROR, which filters records before root-logger file handlers see them. The comment claimed 'File handlers still capture everything' — that was wrong. Removed in both run_agent.py and cli.py; console quietness already comes from hermes_logging not installing a console StreamHandler in non-verbose mode. 2. The stream-retry blocks emitted two _emit_status calls per drop ('⚠️ Connection dropped... Reconnecting...' + '🔄 Reconnected — resuming…') with no provider name, so multi-provider sessions had to dig through agent.log to attribute a drop. Replaced both call sites with a single _emit_stream_drop helper that emits ONE line naming the provider and error class, and always writes a structured WARNING to agent.log with subagent_id, depth, provider, base_url, error_type. Net UX change: 6 lines per triple-subagent drop → 3 lines, each naming the provider. agent.log now has a structured breadcrumb per retry that didn't exist before. Tests: 6 new tests in tests/run_agent/test_stream_drop_logging.py covering the logger-level guard, structured WARNING content, single status line per drop (no Reconnected follow-up), and provider naming.	2026-05-09 22:35:35 -07:00
Teknium	b67ea7ff47	perf(cli): skip welcome banner on `chat -q` single-query mode (#22904 ) `hermes chat -q "..."` printed the full welcome banner before running the query — kawaii ASCII logo, available toolsets list, available skills list, model name, session ID, working directory, update-available notice. Building it took ~420 ms on cold start (~200 ms version-update probe, the rest is toolset / skill enumeration plus Rich panel rendering). For a one-shot `-q` query the banner is noise: the user already picked the prompt, doesn't need a toolset reference, and gets the session ID + resume hint from `_print_exit_summary()` after the response prints. The fully-quiet `-Q` / `--quiet` machine-readable path was already banner-free; this brings the human-facing single-query path in line so all non-interactive invocations are fast. Measured impact (`hermes chat -q "ok" --max-turns 1`, 10-run percentiles, 9950X3D): median: 1.90 → 1.75 s (-150 ms) min: 1.80 → 1.73 s ( -70 ms) P25: 1.82 → 1.74 s ( -80 ms) Wider variance than expected; the banner cost overlaps with API latency on real `chat -q` runs. Min-time delta of 70 ms is the cleanest signal — that's the deterministic banner-build cost gone. The 150 ms median delta picks up cases where the version-update probe also finishes during the wait. Interactive mode (`hermes` with no `-q`) and the `--list-tools` / `--list-toolsets` one-shot listing commands still show the banner — those are the contexts where it's actually wanted. Tests: 656/656 `tests/cli/` pass on top of latest main (modulo 5 pre- existing flakes in `test_cli_save_config_value.py` that fail with `No module named 'ruamel'` both with and without this change).	2026-05-09 18:20:28 -07:00
ming	85383c6363	fix(cli): preserve config comments on setting writes	2026-05-09 17:55:12 -07:00
Wesley Simplicio	116a1446a4	fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV Problem: terminal.docker_env set in config.yaml was silently ignored. Docker containers never received the user-specified env vars. Root cause: docker_env was missing from all three config→env bridging maps (cli.py env_mappings, gateway/run.py _terminal_env_map, hermes_cli/config.py _config_to_env_sync) and from the terminal_tool _get_env_config() reader. _create_environment() consumed the key from container_config correctly, but it was always {} because TERMINAL_DOCKER_ENV was never set. Also extend the list-serialisation branches in cli.py and gateway/run.py to handle dict values via json.dumps (lists already used json.dumps; plain str() on a dict produces undecodable output). Fix: - cli.py: add "docker_env": "TERMINAL_DOCKER_ENV" to env_mappings; serialise dict values with json.dumps alongside existing list path - gateway/run.py: same additions to _terminal_env_map and serialisation - hermes_cli/config.py: add "terminal.docker_env": "TERMINAL_DOCKER_ENV" to _config_to_env_sync so `hermes config set terminal.docker_env …` persists to .env correctly - tools/terminal_tool.py: add docker_env key to _get_env_config() reading TERMINAL_DOCKER_ENV via _parse_env_var with default "{}" Tests: add test_docker_env_is_bridged_everywhere to tests/tools/test_terminal_config_env_sync.py — stash-verified: fails on origin/main, passes with fix. Fixes #20537	2026-05-09 17:53:35 -07:00
Teknium	c7f0aab949	feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838 ) Pick openrouter/pareto-code as your model and OpenRouter auto-routes each request to the cheapest model meeting your coding-quality bar (ranked by Artificial Analysis). The new openrouter.min_coding_score config key (0.0-1.0, default 0.65) tunes the floor. - hermes_cli/models.py: add openrouter/pareto-code to OPENROUTER_MODELS so it shows up in the picker with a description - hermes_cli/config.py: add openrouter.min_coding_score (default 0.65 — lands on a mid-tier coder on the current Pareto frontier) - plugins/model-providers/openrouter: emit extra_body.plugins = [{id: pareto-router, min_coding_score: X}] when model is openrouter/pareto-code AND the score is a valid float in [0.0, 1.0] - agent/transports/chat_completions.py: same emission on the legacy flag path (when no provider profile is loaded) - run_agent.py: openrouter_min_coding_score kwarg + storage; plumbed into both build_kwargs() invocations and the context-summary extra_body path - cli.py: read openrouter.min_coding_score once at init, validate float in [0,1], pass to AIAgent constructions (CLI + background-task paths) - cron/scheduler.py, batch_runner.py, tools/delegate_tool.py, tui_gateway/server.py: propagate the kwarg (mirrors providers_order plumbing — subagents inherit, cron/batch read from config) - tests: profile-level + transport-level coverage of the model gating, unset/empty/out-of-range handling, and the legacy flag path - docs: new 'OpenRouter Pareto Code Router' section in providers.md Verified end-to-end against api.openrouter.ai: at score=0.65 we land on a mid-tier coder, at omission we get the strongest. Score is silently dropped on any model other than openrouter/pareto-code, so it's safe to leave set.	2026-05-09 14:47:00 -07:00
Teknium	70bc52e408	fix(cli): make Ctrl+Enter insert newline on WSL/SSH/Windows Terminal (#22777 ) Native Windows, WSL, SSH sessions, and Windows Terminal all send Ctrl+Enter as bare LF (c-j). Hermes was binding c-j as submit on every POSIX platform, so Ctrl+Enter submitted instead of inserting a newline on those terminals. Reported in #22379. Add _preserve_ctrl_enter_newline() predicate that detects the environments where Ctrl+Enter must produce a newline (sys.platform == 'win32', SSH_CONNECTION/SSH_CLIENT/SSH_TTY env, WT_SESSION, WSL_DISTRO_NAME, /proc/version 'microsoft' marker). Gate the c-j-as-submit binding off in those environments and gate the c-j-as-newline handler on. Local POSIX TTYs without those markers (docker exec, plain ssh from a Mac) keep c-j as submit so plain Enter still works on thin PTYs. Add install_ctrl_enter_alias() in hermes_cli/pt_input_extras.py mapping the three CSI-u / modifyOtherKeys variants of Ctrl+Enter ('\x1b[13;5u', '\x1b[27;5;13~', '\x1b[27;5;13u') to the (Escape, ControlM) tuple Alt+Enter produces. This lets Kitty / mintty / xterm-with-modifyOtherKeys users over SSH get a Ctrl+Enter newline through the existing Alt+Enter handler. 9 new tests + extended existing test_lf_enter_binds_to_submit_handler_posix to cover bare-local vs SSH branches. Closes #22379.	2026-05-09 12:48:14 -07:00
Teknium	b9c001116e	feat: confirm prompt for destructive slash commands (#4069 ) (#22687 ) /clear, /new, /reset, and /undo now ask the user to confirm before discarding conversation state — three-option prompt routed through the existing tools.slash_confirm primitive. Native yes/no buttons render on Telegram, Discord, and Slack (their adapters already implement send_slash_confirm); other platforms get a text-fallback prompt and reply with /approve, /always, or /cancel. The classic prompt_toolkit CLI uses the same three-option flow via the established _prompt_text_input pattern (see _confirm_and_reload_mcp). TUI keeps its existing modal overlay (#12312). Gated by new config key approvals.destructive_slash_confirm (default true). Picking 'Always Approve' flips the gate to false so subsequent destructive commands run silently — matches the established mcp_reload_confirm UX. Out of scope: /cron remove (separate domain — scheduled jobs, not session history). Existing TUI overlay env-var (HERMES_TUI_NO_CONFIRM) left unchanged; cosmetic unification can come later. Closes #4069.	2026-05-09 11:04:46 -07:00
kshitij	2a7047c2ed	fix(sqlite): fall back to journal_mode=DELETE on NFS/SMB/FUSE (#22043 ) SQLite's WAL mode requires shared-memory (mmap) coordination and fcntl byte-range locks that don't reliably work on network filesystems. Upstream documents this explicitly: https://www.sqlite.org/wal.html#sometimes_queries_return_sqlite_busy_in_wal_mode On NFS / SMB / some FUSE mounts / WSL1, 'PRAGMA journal_mode=WAL' raises 'sqlite3.OperationalError: locking protocol' (SQLITE_PROTOCOL). Before this change, every feature backed by state.db or kanban.db broke silently: - /resume, /title, /history, /branch returned 'Session database not available.' with no cause - gateway logged the init failure at DEBUG (invisible in errors.log) - kanban dispatcher crashed every 60s, driving the known migration race (duplicate column name: consecutive_failures, #21708 / #21374) Changes: - hermes_state.apply_wal_with_fallback(): shared helper that tries WAL and falls back to DELETE on SQLITE_PROTOCOL-style errors with one WARNING explaining why - hermes_state.get_last_init_error() + format_session_db_unavailable(): capture the init failure cause and surface it in user-facing strings (with an NFS/SMB pointer for 'locking protocol') - hermes_cli/kanban_db.connect(): use the shared helper - gateway/run.py: bump SessionDB init failure log DEBUG -> WARNING (matches cli.py's existing correct behavior) - cli.py (4 sites) + gateway/run.py (5 sites): replace bare 'Session database not available.' with format_session_db_unavailable() Tests: 12 new tests in tests/test_hermes_state_wal_fallback.py + 1 new test in tests/hermes_cli/test_kanban_db.py. Existing suites (state, kanban, gateway, cli) remain green for all tests unrelated to pre-existing failures on main. Evidence: real-world user on NFSv3 mount (172.26.224.200:d2dfac12/home, local_lock=none) reporting 'Session database not available.' on /resume; 'locking protocol' appears in 4 distinct log entries across backup, kanban, TUI, and CLI paths in the same session. closes #22032	2026-05-09 02:09:35 -07:00
Syed Abdur Rehman Ali	f5b635f6ab	feat(cli): recognise Shift+Enter as a newline key Closes #5346. Most terminals send the same byte sequence for `Enter` and `Shift+Enter` by default, so the application can't tell them apart — this is a terminal protocol limitation, not something Hermes can paper over. But terminals that implement the Kitty keyboard protocol (Kitty / foot / WezTerm / Ghostty by default; iTerm2 / Alacritty / VS Code terminal / Warp once the protocol is enabled) DO emit a distinct sequence for `Shift+Enter`: - `\x1b[13;2u` — Kitty / CSI-u, modifier=2 - `\x1b[27;2;13~` — xterm modifyOtherKeys=2 Stock prompt_toolkit doesn't have the CSI-u sequence in its `ANSI_SEQUENCES` table at all, and it maps the modifyOtherKeys variant to plain `Keys.ControlM` (Enter) — i.e. it strips the Shift modifier, which is the bug users actually hit on iTerm2 and friends. This PR adds `hermes_cli/pt_input_extras.install_shift_enter_alias()`, called once at CLI startup from `cli.py`, which inserts/overwrites those sequences in `ANSI_SEQUENCES` so they decode to `(Keys.Escape, Keys.ControlM)` — the same key tuple `Alt+Enter` produces. The existing Alt+Enter newline handler (`@kb.add('escape', 'enter')` in `cli.py`) then fires unchanged, so there is no new keybinding to register and no behavioral change for terminals that don't emit the distinct sequences. Files ===== * `hermes_cli/pt_input_extras.py` — new module hosting the helper. Lives outside `cli.py` so it's importable in tests without dragging in the full CLI runtime (which depends on `fire`, `rich`, etc.). * `cli.py` — calls `install_shift_enter_alias()` once at module import. Wrapped in try/except so prompt_toolkit version drift can't break CLI startup. * `tests/cli/test_cli_shift_enter_newline.py` — 6 tests: - registration of all three byte sequences - overwrite of stock prompt_toolkit's broken modifyOtherKeys mapping - idempotency - parser equivalence: CSI-u Shift+Enter == Alt+Enter - parser equivalence: modifyOtherKeys Shift+Enter == Alt+Enter - plain Enter remains a single key (submit), distinct from the two-key Alt+Enter / Shift+Enter tuple * `website/docs/user-guide/cli.md` — keybinding table updated; new "Shift+Enter compatibility" subsection with a per-terminal status table noting macOS Terminal / stock Windows Terminal cannot distinguish the keystroke at the protocol level. * `website/docs/getting-started/quickstart.md`, `website/docs/guides/tips.md` — short mention pointing readers at the full compatibility note in `cli.md`. Tested ====== pytest tests/cli/test_cli_shift_enter_newline.py # 6 passed Live-tested by triggering `\x1b[13;2u` against the running Vt100Parser (see test). Not exercised in a real terminal end-to-end because that requires a Kitty-protocol-capable host; the test exercises the parser path that drives the live terminal too.	2026-05-08 16:26:51 -07:00
Teknium	26bac67ef9	fix(entry-points): guard hermes_bootstrap import so partial updates don't brick hermes (#22091 ) teknium1 hit ModuleNotFoundError: No module named 'hermes_bootstrap' after a code update, on both his Windows machine AND his Linux workstation. The failure mode is real and affects every user who updates hermes by any path OTHER than a fully-successful ``hermes update``. ## What happens hermes_bootstrap.py is a top-level module registered via pyproject.toml's ``py-modules`` list (added by Brooklyn's Windows UTF-8 stdio work). It must be registered in the venv's editable-install .pth file before Python can find it as a bare ``import hermes_bootstrap``. ``hermes update`` handles this correctly: (1) git reset --hard, (2) clear __pycache__, (3) uv pip install -e . (re-registers the package including the new py-modules list), (4) restart. BUT if any step AFTER (1) fails — network blip during pip install, PEP 668 on a system Python, venv locked, uv not in PATH, a crash mid-update — the user is left with new code that references hermes_bootstrap and a venv that doesn't know about it. Every hermes invocation after that crashes with ModuleNotFoundError, including ``hermes update`` itself. No recovery path without manual `uv pip install -e .`. Also affects users who ``git pull`` the repo directly without running hermes update — relatively common for developers. ## Fix Wrap ``import hermes_bootstrap`` in a try/except ModuleNotFoundError across all 6 entry points (hermes_cli/main, run_agent, gateway/run, acp_adapter/entry, cli, batch_runner). On Windows, missing bootstrap means the UTF-8 stdio setup doesn't run — degraded behavior (Unicode chars may fail to print) but NOT a crash. POSIX is unaffected either way since the bootstrap is a no-op there. Once hermes is running again, the user can ``hermes update`` to fully recover. ## Test update tests/test_hermes_bootstrap.py::test_entry_point_imports_bootstrap scans for the first top-level import in each entry point and asserts it is hermes_bootstrap. Extended the check to accept a Try block whose body is a lone Import of hermes_bootstrap — that's the recovery-friendly form we just introduced. Verified behavior by ``mv hermes_bootstrap.py hermes_bootstrap.py.bak`` and confirming ``python -c "import hermes_cli.main"`` succeeds. 82/82 tests pass (hermes_bootstrap + windows-native + windows-compat).	2026-05-08 14:43:13 -07:00
Teknium	0ba1e12abc	fix(windows): browser tool + spurious SIGINT from subprocess spawning Three related Windows-only fixes that together make the browser toolset actually usable on Windows. Symptom chain: user invokes browser_navigate -> tool returns {"success": false, "error": "Daemon process exited during startup with no error output"} and the CLI exits mid-turn with the session summary. Root cause (3 layers): 1. tools/browser_tool.py::_find_agent_browser() resolved node_modules/.bin/agent-browser to the extensionless POSIX shell shim via Path.exists(). On Windows, CreateProcessW cannot execute that script (WinError 193 "not a valid Win32 application"). Fix: delegate to shutil.which with path=node_modules/.bin so PATHEXT picks up agent-browser.CMD on Windows and the extensionless shim stays correct on POSIX. 2. Windows Terminal / Win32 delivers a spurious CTRL_C_EVENT to the parent hermes.exe whenever a background thread spawns a .cmd subprocess. Python 3.11's default SIGINT handler raises KeyboardInterrupt in MainThread, which unwinds prompt_toolkit's app.run() -> cli.py::run()'s finally block calls _run_cleanup() -> _emergency_cleanup_all_sessions -> spawns a concurrent _run_browser_command("close", ...) on the same session the agent thread just opened. Two agent-browser processes race on the same --session name, the daemon startup loses, and the tool returns the "Daemon process exited during startup" error. Fix: install a Windows-only SIGINT handler that absorbs the signal silently. Real user Ctrl+C still routes through prompt_toolkit's own c-c keybinding at the TUI layer, which is how Claude Code handles the same quirk (driving cancellation via the TUI key handler, not signals). 3. In tools/browser_tool.py, both Popen sites now pass creationflags=CREATE_NO_WINDOW \| STARTF_USESTDHANDLES with close_fds=True on Windows. CREATE_NO_WINDOW suppresses the .cmd console flash; STARTF_USESTDHANDLES + close_fds ensures the child inherits only our three chosen handles (DEVNULL stdin, temp-file stdout/stderr) and no leaked parent console handles that could confuse agent-browser's native daemon spawn. Notably we do NOT add CREATE_NEW_PROCESS_GROUP - on Python 3.11 Windows the flag interacts badly with asyncio's ProactorEventLoop and makes things worse. Verified end-to-end on Windows 10 / Windows Terminal / PowerShell: browser_navigate to https://example.com returns {"success": true, "title": "Example Domain"} and the CLI stays alive for follow-up tool calls and assistant turns. Refs: earlier Windows quirks commits 1cebb3bad (Ctrl+Enter newline), 26f5af52a (environment hints), aefd1a37f (Playwright Chromium).	2026-05-08 14:27:40 -07:00
Teknium	d1838041e5	feat: Ctrl+Enter inserts newline on Windows Terminal Windows Terminal intercepts Alt+Enter for its fullscreen shortcut, leaving Windows users with no Enter-involving way to insert a newline in the Hermes prompt. Fix it by reclaiming c-j on Windows only: - _bind_prompt_submit_keys now binds c-j (LF) to submit only on POSIX, where thin PTYs (docker exec, some SSH configs) deliver Enter as LF. On Windows plain Enter is always c-m, so c-j is free. - Windows-only prompt binding: c-j inserts a newline. Windows Terminal sends Ctrl+Enter as LF, so the user-facing keystroke is Ctrl+Enter — no terminal settings changes required. - Alt+Enter binding unchanged; still works on mac/Linux/WSL. - Test TestPromptToolkitTerminalCompatibility::test_lf_enter_binds_to_submit_handler split into platform-aware assertions for POSIX vs win32. - Fixed the Ctrl+J claim in hermes_cli/tips.py (was wrong before this commit even on POSIX) to point Windows users at Ctrl+Enter. Tradeoff: on Windows, raw Ctrl+J (without Enter) also inserts a newline, since WT collapses Ctrl+Enter and Ctrl+J to the same c-j keycode. No conflicting Hermes binding existed for Ctrl+J, so this is a harmless side effect.	2026-05-08 14:27:40 -07:00
Teknium	cbce5e93fc	codebase: add encoding='utf-8' to all bare open() calls (PLW1514) Closes the last Python-on-Windows UTF-8 exposure by making every text-mode open() call explicit about its encoding. Before: on Windows, bare open(path, 'r') defaults to the system locale encoding (cp1252 on US-locale installs). That means reading any config/yaml/markdown/json file with non-ASCII content either crashes with UnicodeDecodeError or silently mis-decodes bytes. After: all 89 affected call sites in production code now pass encoding='utf-8' explicitly. Works identically on every platform and every locale, no surprise behavior. Mechanical sweep via: ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix --exclude 'tests,venv,.venv,node_modules,website,optional-skills, skills,tinker-atropos,plugins' . All 89 fixes have the same shape: open(x) or open(x, mode) became open(x, encoding='utf-8') or open(x, mode, encoding='utf-8'). Nothing else changed. Every modified file still parses and the Windows/sandbox test suite is still green (85 passed, 14 skipped, 0 failed across tests/tools/test_code_execution_windows_env.py + tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py + tests/test_hermes_bootstrap.py). Scope notes: - tests/ excluded: test fixtures can use locale encoding intentionally (exercising edge cases). If we want to tighten tests later that's a separate PR. - plugins/ excluded: plugin-specific conventions may differ; plugin authors own their code. - optional-skills/ and skills/ excluded: skill scripts are user-authored and we don't want to mass-edit them. - website/ and tinker-atropos/ excluded: vendored / generated content. 46 files touched, 89 +/- lines (symmetric replacement). No behavior change on POSIX or on Windows when the file is ASCII; bug fix on Windows when the file contains non-ASCII.	2026-05-08 14:27:40 -07:00
Teknium	d94fb47717	hermes_bootstrap: Windows-only UTF-8 stdio shim for all entry points Codebase-wide fix for Python-on-Windows UTF-8 footguns, complementing the earlier execute_code sandbox fixes (which remain load-bearing for when the sandbox explicitly scrubs child env). Problem: Python on Windows has two long-standing text-encoding pitfalls: 1. sys.stdout/stderr are bound to the console code page (cp1252 on US-locale installs) — print('café') crashes with UnicodeEncodeError. 2. Subprocess children don't know to use UTF-8 unless PYTHONUTF8 and/or PYTHONIOENCODING are set in their env — so any Python we spawn (linters, sandbox children, delegation workers) hits the same bug. Solution: A tiny bootstrap module (hermes_bootstrap.py) imported as the first statement of every Hermes entry point: - hermes_cli/main.py (hermes / hermes-agent console_script) - run_agent.py (hermes-agent direct) - acp_adapter/entry.py (hermes-acp) - gateway/run.py (messaging gateway) - batch_runner.py (parallel batch mode) - cli.py (legacy direct-launch CLI) On Windows, the bootstrap: - os.environ.setdefault('PYTHONUTF8', '1') (PEP 540 UTF-8 mode) - os.environ.setdefault('PYTHONIOENCODING', 'utf-8') - sys.stdout/stderr/stdin.reconfigure(encoding='utf-8', errors='replace') Children inherit the env vars → they run in UTF-8 mode. Current process's stdio is reconfigured → print('café') works now. On POSIX (Linux/macOS), the bootstrap is a complete no-op. We don't touch LANG, LC_, or anything else — users who have intentionally configured a non-UTF-8 locale aren't affected. POSIX systems are already UTF-8 by default in 99% of modern setups, so there's nothing to fix. setdefault() (not overwrite) means users who explicitly set PYTHONUTF8=0 or PYTHONIOENCODING=cp1252 in their environment are respected. What this does NOT fix: bare open(path, 'w') calls in the parent* process still default to locale encoding because PYTHONUTF8 is only read at interpreter init. A ruff PLW1514 sweep (separate follow-up) will add explicit encoding='utf-8' at those ~219 call sites for belt-and-suspenders. Tests (17): 16 passed, 1 skipped on Windows. - Windows: env vars set, stdio reconfigured, child inherits UTF-8 mode - POSIX: complete no-op (verified on fake POSIX + skipped on real POSIX since we don't have a Linux box in this session) - Idempotence: multiple calls safe - Graceful degradation: non-reconfigurable streams don't crash - User opt-out: explicit PYTHONUTF8=0 is respected - Load order: every entry point's FIRST top-level import is hermes_bootstrap, enforced by an AST-level parametrized test pyproject.toml: added hermes_bootstrap to py-modules so it ships with pip installs.	2026-05-08 14:27:40 -07:00
Teknium	e93bfc6c93	feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags Second pass on native Windows support, driven by a systematic audit across five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG, os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns (npm.cmd batch shims, start_new_session no-op on Windows), subsystem health (cron, gateway daemon, update flow), and module-level import guards. Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux, test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix. ## New module - hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command, windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs). All no-ops on non-Windows. ## CRITICAL fixes (would crash or silently break on Windows) - tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would AttributeError on import on Windows, breaking `hermes --tui` entirely (it spawns this module as a subprocess). Guard each signal.signal() call with hasattr() and add SIGBREAK as Windows' SIGHUP equivalent. - hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop behind `os.name != "nt"` — Windows has no zombies anyway. - tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0 ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows). Generated sandbox client parses both. - cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use shutil.which("bash") so Windows (Git Bash via MinGit) works, with a readable error when bash is genuinely absent. - 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py (npm install + node version probe), browser_tool.py x2. On Windows npm is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...]) fails with WinError 193. shutil.which(...) returns the absolute .cmd path which CreateProcessW accepts because the extension routes through cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the same path subprocess would resolve itself). ## HIGH fixes (silent misbehaviour on Windows) - tools/environments/local.py get_temp_dir: hardcoded /tmp returned on Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd tracking silently broken — `cd` in terminal tool did nothing. Windows branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes (works in both bash and Python, guaranteed no spaces). - tools/environments/local.py _make_run_env PATH injection: `/usr/bin not in split(":")` heuristic mangles Windows PATH (";" separator). Gate the injection behind `not _IS_WINDOWS`. - hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer Popen + watcher-script Popen both used start_new_session=True, which Windows silently ignores. Watcher stayed attached to CLI's console, died when user closed terminal after `hermes update`, left gateway stale. Now branches through windows_detach_popen_kwargs() helper (CREATE_NEW_PROCESS_GROUP \| DETACHED_PROCESS \| CREATE_NO_WINDOW on Windows, start_new_session=True on POSIX — identical to main). ## MEDIUM fixes - gateway/run.py /restart and /update handlers: hardcoded bash/setsid chain crashes on Windows when user triggers /update in-gateway. Now has sys.platform=="win32" branch using sys.executable + a tiny Python watcher with proper detach flags. POSIX path is unchanged. - cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/... style paths that break subprocess.Popen(cwd=...) and Path().resolve(). Added _normalize_git_bash_path() helper that translates /c/Users, /cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op. _git_repo_root() now routes every result through it. - cli.py worktree .worktreeinclude: os.symlink on directories failed hard on Windows (requires admin or Developer Mode). Falls back to shutil.copytree with a warning log. ## Tests - 29 new tests in tests/tools/test_windows_native_support.py covering: subprocess_compat helpers, TUI entry signal guards, kanban waitpid guard, code_execution TCP fallback source-level invariants, cron bash resolution, npm/npx bare-spawn lint per-file, local env Windows temp dir, PATH injection gating, git bash path normalization, symlink fallback, gateway detached watcher flags. - One existing test assertion adjusted in test_browser_homebrew_paths: it compared captured Popen argv to the BARE `"npx"` literal; after the shutil.which() change argv[0] is the absolute path. New assertion checks the shape (two items, second is `agent-browser`) rather than the exact first-item string. Behaviour unchanged; test was too strict. All 56 tests pass on Linux (30 from previous commits + 26 new). 267 tests from the affected files/dirs (browser, code_exec, local_env, process_registry, kanban_db, windows_compat) all pass — zero regressions. tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged; all pre-existing test failures confirmed unrelated via `git stash` re-run. ## What's still deferred (LOW priority) - Visible cmd-window flashes on short-lived console apps (~14 sites) — cosmetic, needs a follow-up pass once we have user reports. - agent/file_safety.py POSIX-only security deny patterns — separate hardening task. - tools/process_registry.py returning "/tmp" as fallback — theoretical; reachable only when all env-var candidates fail.	2026-05-08 14:27:40 -07:00
Teknium	9de893e3b0	feat(windows): close native-Windows install gaps — crash-free startup, UTF-8 stdio, tzdata dep, docs Native Windows (with Git for Windows installed) can now run the Hermes CLI and gateway end-to-end without crashing. install.ps1 already existed and the Git Bash terminal backend was already wired up — this PR fills the remaining gaps discovered by auditing every Windows-unsafe primitive (`signal.SIGKILL`, `os.kill(pid, 0)` probes, bare `fcntl`/`termios` imports) and by comparing hermes against how Claude Code, OpenCode, Codex, and Cline handle native Windows. ## What changed ### UTF-8 stdio (new module) - `hermes_cli/stdio.py` — single `configure_windows_stdio()` entry point. Flips the console code page to CP_UTF8 (65001), reconfigures `sys.stdout`/`stderr`/`stdin` to UTF-8, sets `PYTHONIOENCODING` + `PYTHONUTF8` for subprocesses. No-op on non-Windows. Opt out via `HERMES_DISABLE_WINDOWS_UTF8=1`. - Called early in `cli.py::main`, `hermes_cli/main.py::main`, and `gateway/run.py::main` so Unicode banners (box-drawing, geometric symbols, non-Latin chat text) don't `UnicodeEncodeError` on cp1252 consoles. ### Crash sites fixed - `hermes_cli/main.py:7970` (hermes update → stuck gateway sweep): raw `os.kill(pid, _signal.SIGKILL)` → `gateway.status.terminate_pid(pid, force=True)` which routes through `taskkill /T /F` on Windows. - `hermes_cli/profiles.py::_stop_gateway_process`: same fix — also converted SIGTERM path to `terminate_pid()` and widened OSError catch on the intermediate `os.kill(pid, 0)` probe. - `hermes_cli/kanban_db.py:2914, 3041`: raw `signal.SIGKILL` → `getattr(signal, "SIGKILL", signal.SIGTERM)` fallback (matches the pattern already used in `gateway/status.py`). ### OSError widening on `os.kill(pid, 0)` probes Windows raises `OSError` (WinError 87) for a gone PID instead of `ProcessLookupError`. Widened the catch at: - `gateway/run.py:15101` (`--replace` wait-for-exit loop — without this, the loop busy-spins the full 10s every Windows gateway start) - `hermes_cli/gateway.py:228, 460, 940` - `hermes_cli/profiles.py:777` - `tools/process_registry.py::_is_host_pid_alive` - `tools/browser_tool.py:1170, 1206` ### Dashboard PTY graceful degradation `hermes_cli/pty_bridge.py` depends on `fcntl`/`termios`/`ptyprocess`, none of which exist on native Windows. Previously a Windows dashboard would crash on `import hermes_cli.web_server` because of a top-level import. Now: - `hermes_cli/web_server.py` wraps the pty_bridge import in `try/except ImportError` and sets `_PTY_BRIDGE_AVAILABLE=False`. - The `/api/pty` WebSocket handler returns a friendly "use WSL2 for this tab" message instead of exploding. - Every other dashboard feature (sessions, jobs, metrics, config editor) runs natively on Windows. ### Dependency - `pyproject.toml`: add `tzdata>=2023.3; sys_platform == 'win32'` so Python's `zoneinfo` works on Windows (which has no IANA tzdata shipped with the OS). Credits @sprmn24 (PR #13182). ### Docs - README.md: removed "Native Windows is not supported"; added PowerShell one-liner and Git-for-Windows prerequisite note. - `website/docs/getting-started/installation.md`: new Windows section with capability matrix (everything native except the dashboard `/chat` PTY tab, which is WSL2-only). - `website/docs/user-guide/windows-wsl-quickstart.md`: reframed as "WSL2 as an alternative to native" rather than "the only way". - `website/docs/developer-guide/contributing.md`: updated cross-platform guidance with the `signal.SIGKILL` / `OSError` rules we enforce now. - `website/docs/user-guide/features/web-dashboard.md`: acknowledged native Windows works for everything except the embedded PTY pane. ## Why this shape Pulled from a survey of how other agent codebases handle native Windows (Claude Code, OpenCode, Codex, Cline): - All four treat Git Bash as the canonical shell on Windows, same as hermes already does in `tools/environments/local.py::_find_bash()`. - None of them force `SetConsoleOutputCP` — but they don't have to, Node/Rust write UTF-16 to the Win32 console API. Python does not get that for free, so we flip CP_UTF8 via ctypes. - None of them ship PowerShell-as-primary-shell (Claude Code exposes PS as a secondary tool; scope creep for this PR). - All of them use `taskkill /T /F` for force-kill on Windows, which is exactly what `gateway.status.terminate_pid(force=True)` does. ## Non-goals (deliberate scope limits) - No PowerShell-as-a-second-shell tool — worth designing separately. - No terminal routing rewrite (#12317, #15461, #19800 cluster) — that's the hardest design call and needs a separate doc. - No wholesale `open()` → `open(..., encoding="utf-8")` sweep (Tianworld cluster) — will do as follow-up if users hit actual breakage; most modern code already specifies it. ## Validation - 28 new tests in `tests/tools/test_windows_native_support.py` — all platform-mocked, pass on Linux CI. Cover: - `configure_windows_stdio` idempotency, opt-out, env-preservation - `terminate_pid` taskkill routing, failure → OSError, FileNotFoundError fallback - `getattr(signal, "SIGKILL", …)` fallback shape - `_is_host_pid_alive` OSError widening (Windows-gone-PID behavior) - Source-level checks that all entry points call `configure_windows_stdio` - pty_bridge import-guard present in `web_server.py` - README no longer says "not supported" - 12 pre-existing tests in `tests/tools/test_windows_compat.py` still pass. - `tests/hermes_cli/` ran fully (3909 passed, 9 failures — all confirmed pre-existing on main by stash-test). - `tests/gateway/` ran fully (5021 passed, 1 pre-existing failure). - `tests/tools/test_process_registry.py` + `test_browser_*` pass. - Manual smoke: `import hermes_cli.stdio; import gateway.run; import hermes_cli.web_server` — all clean, `_PTY_BRIDGE_AVAILABLE=True` on Linux (as expected). ## Files - New: `hermes_cli/stdio.py`, `tests/tools/test_windows_native_support.py` - Modified: `cli.py`, `gateway/run.py`, `hermes_cli/main.py`, `hermes_cli/profiles.py`, `hermes_cli/gateway.py`, `hermes_cli/kanban_db.py`, `hermes_cli/pty_bridge.py`, `hermes_cli/web_server.py`, `tools/browser_tool.py`, `tools/process_registry.py`, `pyproject.toml`, `README.md`, and 4 docs pages. Credits to everyone whose prior PR work informed these fixes — see the co-author trailers. All of the PRs listed in `~/.hermes/plans/windows-support-prs.md` fixing `os.kill` / `signal.SIGKILL` / UTF-8 stdio / tzdata / README patterns found the same issues; this PR consolidates them. Co-authored-by: Philip D'Souza <9472774+PhilipAD@users.noreply.github.com> Co-authored-by: Arecanon <42595053+ArecaNon@users.noreply.github.com> Co-authored-by: XiaoXiao0221 <263113677+XiaoXiao0221@users.noreply.github.com> Co-authored-by: Lars Hagen <1360677+lars-hagen@users.noreply.github.com> Co-authored-by: Luan Dias <65574834+luandiasrj@users.noreply.github.com> Co-authored-by: Ruzzgar <ruzzgarcn@gmail.com> Co-authored-by: sprmn24 <oncuevtv@gmail.com> Co-authored-by: adybag14-cyber <252811164+adybag14-cyber@users.noreply.github.com> Co-authored-by: Prasanna28Devadiga <54196612+Prasanna28Devadiga@users.noreply.github.com>	2026-05-08 14:27:40 -07:00
Teknium	850413f120	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-05-08 11:07:38 -07:00
Teknium	674fad1483	fix(goals): Ctrl+C during /goal loop auto-pauses the goal (#21888 ) Reported: Ctrl+C during an active /goal loop felt like it did nothing — the agent would interrupt the current turn, then immediately queue another continuation and keep going until the session ended or the 20-turn budget ran out. Root cause: cli.py's _maybe_continue_goal_after_turn() ran in the finally: block around self.chat(...) unconditionally. Whether the turn completed normally, got interrupted, or returned an empty string, the judge ran on whatever was in conversation_history and — because the judge is fail-open — a "continue" verdict pushed another CONTINUATION_PROMPT onto _pending_input. Ctrl+C was invisible to the hook. Fix: - chat() now captures result['interrupted'] onto self._last_turn_interrupted (resets to False at entry so early-returns don't leak prior state). - _maybe_continue_goal_after_turn() checks the flag first: on interrupt, auto-pause via mgr.pause(reason='user-interrupted (Ctrl+C)') and print a one-liner pointing the user at /goal resume or /goal clear. No judge call, no continuation enqueued. - Also added an empty-response guard that mirrors gateway/run.py's _handle_message logic (empty reply → transient failure → skip judging so we don't trip the consecutive-parse-failures backstop unnecessarily). The goal stays in the DB as paused, so /goal resume recovers it after the user has sorted out whatever made them cancel. /goal clear still works as before for a full stop. Tests: tests/cli/test_cli_goal_interrupt.py covers: - interrupted turn pauses + doesn't queue + judge is NOT called - paused goal is resumable - empty / whitespace / missing assistant reply skips judging - healthy turn still enqueues continuation / marks done - chat() resets _last_turn_interrupted at entry (anti-leak guard) All 55 existing goal tests still pass.	2026-05-08 06:53:13 -07:00
kshitij	7338e5d9ba	fix(model-switch): prevent stale Ollama credentials after provider switch (#21703 ) When switching from a custom local provider (e.g. ollama-launch) to a cloud provider, two bugs caused the CLI to misbehave: 1. _explicit_api_key/_explicit_base_url were only updated when the switch result had non-empty values (guarded by `if result.api_key:` etc.). If the previous provider set these to Ollama values ("ollama", "http://127.0.0.1:11434/v1"), those stale values leaked into the next turn's _ensure_runtime_credentials() call and were forwarded to the new provider's API endpoint, causing authentication/routing failures. Fix: unconditionally write result.api_key/base_url into the explicit fields after every successful switch. An empty string is the correct sentinel — it tells _ensure_runtime_credentials to re-resolve from the auth store / config rather than forwarding a stale override. 2. In AIAgent.switch_model(), `self.base_url = base_url or self.base_url` kept the old Ollama localhost URL whenever the incoming base_url was an empty string. For providers that use a native SDK (not an OpenAI-compat endpoint), the caller passes base_url="" and expects the agent to clear the field — not silently inherit Ollama's address. Fix: only update self.base_url when base_url is truthy. 3. _handle_model_picker_selection() was called from the prompt_toolkit Enter key binding without any exception guard. Any unexpected error in the model-selection code path propagated through prompt_toolkit's key-binding dispatcher and caused the entire TUI to exit — which the user sees as "the terminal exits when I switch providers". Fix: wrap the call in try/except and close the picker on failure.	2026-05-08 14:28:54 +05:30
Austin Pickett	d87c7b99e2	fix(analytics): prevent silent token loss and add Claude 4.5–4.7 pricing (#21455 ) - Add pricing entries for Claude Opus 4.5/4.6/4.7, Sonnet 4.5/4.6, and Haiku 4.5 with updated source URLs (platform.claude.com) - Add _normalize_anthropic_model_name() to handle dot-notation variants (e.g. claude-opus-4.7 → claude-opus-4-7) for pricing lookups - Fix silent token loss: ensure session row exists before UPDATE in both run_agent.py and hermes_state.py (INSERT OR IGNORE is idempotent) - Log token persistence failures at DEBUG level instead of swallowing them silently — makes undercounted analytics diagnosable - Surface reasoning tokens in CLI /usage and TUI usage panel - Add 'reasoning' and 'cost_status' fields to TUI Usage type	2026-05-07 13:24:31 -07:00
oluwadareab12	edbbc96b55	fix(cli): replace get_event_loop() with get_running_loop() to silence RuntimeWarning in process_loop thread (#19285 )	2026-05-07 06:35:54 -07:00
Teknium	6e46f99e7e	fix(tui): surface backend error as visible text when final_response is empty (#21245 ) When the provider rejects a request (e.g. invalid model slug like '--provider nous --model kimi-k2.6' where the valid slug is 'moonshotai/kimi-k2.6'), run_conversation() returns {failed: True, error: <detail>, final_response: None}. The TUI gateway and one-shot CLI mode both dropped the error on the floor and emitted an empty turn, so the user saw a blank response with no indication that anything went wrong. Mirror the interactive CLI's existing pattern (cli.py:9832): when final_response is empty AND (failed\|partial) is set AND error is populated, surface 'Error: <detail>' as the visible text. Leaves the None-with-no-error path and the '(empty)' sentinel path untouched — an empty successful turn still renders empty, and existing sentinel handlers keep owning their lane. Reported by @counterposition in PR #20873; taking a minimal fix rather than the broader structured-failure refactor proposed there.	2026-05-07 05:53:19 -07:00
Sofia Yang	f5a232af84	refactor: replace 'cmp' text with 🗜️ emoji in status bar Address review feedback to use the clamp emoji (��️) instead of the plain text 'cmp' prefix for the compression count indicator. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-07 05:27:45 -07:00
Sofia Yang	103e11926f	feat(cli): show context compression count in status bar Display the number of context compressions in the CLI status bar when compressions > 0, helping users understand conversation compression pressure during long sessions. - Wide layout (>=76 cols): shows 'cmp N' between context percent and duration - Medium layout (52-75 cols): shows 'cmp N' between percent and duration - Narrow layout (<52 cols): omitted to save space - Color-coded: dim for 1-4, warn for 5-9, bad for 10+ - Hidden when zero to keep the bar clean for new sessions Closes #18564	2026-05-07 05:27:45 -07:00
Teknium	fb1ce793e6	feat(security): enable secret redaction by default (#17691 , #20785 ) (#21193 ) Flip the default for HERMES_REDACT_SECRETS from off to on so the redactor already wired into send_message_tool, logs, and tool output actually runs on a fresh install. - agent/redact.py: env-var default "" → "true" - hermes_cli/config.py: DEFAULT_CONFIG security.redact_secrets True; two config-template comments rewritten - gateway/run.py + cli.py: startup log / banner warning when the user has explicitly opted out, so the downgrade is visible in agent.log and at CLI banner time - docs/reference/environment-variables.md: description reconciled - tests: flipped the default-pin, restructured the force=True regression test to explicit-false instead of unset Users who need raw credential values (redactor development) can still opt out via security.redact_secrets: false in config.yaml or HERMES_REDACT_SECRETS=false in .env. Closes #17691. Addresses #20785 (short-term output-pipeline recommendation).	2026-05-07 05:10:33 -07:00
brooklyn!	5044e1cbf1	fix(cli): submit LF enter in thin PTYs (#20896 )	2026-05-06 13:51:13 -07:00
Cleo	906881c38b	fix(cli): catch OSError in _resolve_attachment_path to prevent ENAMETOOLONG dropping long slash commands When the user pastes a long slash command like \`/goal <long prose>\` into \`hermes chat\`, the input flows into \`_detect_file_drop()\`, whose \`starts_like_path\` prefilter accepts anything starting with \`/\` and forwards it to \`_resolve_attachment_path()\`. That helper calls \`Path.exists()\` which invokes \`os.stat()\`, which raises \`OSError(errno=ENAMETOOLONG)\` — 63 on macOS, 36 on Linux — when the candidate exceeds NAME_MAX (typically 255 bytes). The OSError propagates up to the broad \`except Exception\` in \`process_loop\` (cli.py:11798), gets logged at WARNING level, and the user's input is silently dropped. From the user's POV the chat prompt hangs — the only signal is in agent.log: WARNING cli: process_loop unhandled error (msg may be lost): [Errno 63] File name too long: "/goal Drive the space board..." This affects any slash command with prose-length arguments — \`/goal\` in particular but also \`/skill\`, \`/cron\`, custom user commands. Fix: wrap the \`exists()\`/\`is_file()\` calls in try/except OSError so structurally-invalid path candidates cleanly return None. The slash- command dispatch path downstream (cli.py:11718) then handles the input correctly. Tests: two new regression cases in test_cli_file_drop.py cover the original \`/goal\` reproducer and a synthetic long path. All 35 file- drop tests pass. Reproducer (without the fix): python -c "from cli import _detect_file_drop; _detect_file_drop('/goal ' + 'a'*300)" → OSError: [Errno 63] File name too long	2026-05-06 06:34:48 -07:00
Teknium	a0fedfbb1b	feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709 ) Replaces the per-directory shadow-repo design with a single shared shadow git store at ~/.hermes/checkpoints/store/. Object DB is now deduplicated across every working directory the agent has ever touched; a dozen worktrees of the same project cost near-zero in additional disk. Why --- Pre-v2 design had three compounding problems that let ~/.hermes/checkpoints/ grow to multi-GB on active machines: 1. Each working directory got its own full shadow git repo — no object dedup across projects or across worktrees of the same project. 2. _prune() was a documented no-op: max_snapshots only limited the /rollback listing. Loose objects accumulated forever. 3. Defaults: enabled=True, auto_prune=False — users paid the disk cost without ever asking for /rollback. Field report on a single workstation: 847 MB across 47 shadow repos, mostly redundant clones of the hermes-agent source tree. Changes ------- - tools/checkpoint_manager.py: full rewrite. Single bare store, per-project refs (refs/hermes/<hash>), per-project indexes (store/indexes/<hash>), per-project metadata (store/projects/<hash>.json with workdir + created_at + last_touch). On first v2 init, any pre-v2 per-directory shadow repos are auto-migrated into legacy-<timestamp>/ so the new store starts clean. _prune() now actually rewrites the per-project ref to the last max_snapshots commits and runs git gc --prune=now. New _enforce_size_cap() drops oldest commits round-robin across projects when the store exceeds max_total_size_mb. _drop_oversize_from_index() filters any single file larger than max_file_size_mb out of the snapshot. - hermes_cli/checkpoints.py: new 'hermes checkpoints' CLI (status / list / prune / clear / clear-legacy) for managing the store outside a session. - hermes_cli/config.py: flipped defaults — enabled=False, max_snapshots=20, auto_prune=True. Added max_total_size_mb=500, max_file_size_mb=10. Tightened DEFAULT_EXCLUDES (added target/, .so/.dylib/.dll, .mp4/.mov, .zip/*.tar.gz, .worktrees/, .mypy_cache/, etc.). - run_agent.py / cli.py / gateway/run.py: thread the new kwargs through AIAgent and the startup auto_prune hooks. - Tests rewritten to match v2 storage while keeping backwards-compat coverage for the pre-v2 prune path (per-directory shadow repos under base/ are still swept correctly for anyone mid-migration). - Docs updated: user-guide/checkpoints-and-rollback.md explains the shared store, new defaults, migration, and the new CLI; reference/cli-commands.md documents 'hermes checkpoints'. E2E validated ------------- - Legacy migration: pre-v2 shadow repos auto-archived into legacy-<ts>/. - Object dedup: two projects with an identical shared.py blob resolve to 7 total objects in the store (v1 would have stored the blob twice). - max_snapshots=3 actually enforced: after 6 commits, list shows 3. - Orphan prune: deleting a project's workdir + 'hermes checkpoints prune --retention-days 0' removes its ref, index, and metadata; GC reclaims the objects. - max_file_size_mb=1 excludes a 2 MB weights.bin while keeping the tracked source code files. - hermes checkpoints {status,prune,clear,clear-legacy} all work from the CLI without an agent running. Breaking / migration -------------------- No in-place data migration — legacy per-directory shadow repos are moved into legacy-<timestamp>/ on first run. Old /rollback history is still accessible by inspecting the archive with git; run 'hermes checkpoints clear-legacy' to reclaim the space when ready. Users relying on /rollback must now set checkpoints.enabled=true (or pass --checkpoints) explicitly.	2026-05-06 05:44:35 -07:00
helix4u	76074d9ee6	fix(cli): recover classic CLI output after resize	2026-05-06 04:20:54 -07:00
adybag14-cyber	e45df2e81e	fix(ui): reduce status-line jitter while scrolling	2026-05-06 04:02:09 -07:00
Teknium	e70e49016f	fix(cli): guard logger.debug in signal handler (#13710 regression) (#20673 ) CPython's logging module is not reentrant-safe. `Logger.isEnabledFor` caches level results in `Logger._cache`; under shutdown races the cache can be cleared (`Logger._clear_cache`, triggered by logging config changes from another thread) or mid-mutation when a signal fires, raising `KeyError: <level_int>` (e.g. `KeyError: 10` for DEBUG) inside the signal handler. When that happens, the KeyError escapes before the `raise KeyboardInterrupt()` on the next line can fire, which bypasses prompt_toolkit's normal interrupt unwind and surfaces as the EIO cascade originally reported in #13710. Issue #13710 shipped two defenses (asyncio exception handler + outer `except (KeyError, OSError)` with EIO suppression) that cover the EIO unwind path. This patch closes the remaining escape hatch: the `logger.debug` call at the top of `_signal_handler` itself. Wrap it in a bare `try/except Exception: pass` so logging can never raise through a signal handler. Observed in the wild: debug report on 0.12.0 (commit `8163d371`) shows the exact stack — KeyError: 10 at logging/__init__.py:1742 inside the signal handler's `logger.debug`, followed by the EIO cascade from prompt_toolkit's emergency flush. Tests: adds `TestSignalHandlerLoggingRace` to `tests/hermes_cli/test_suppress_eio_on_interrupt.py` with 6 new cases: - normal path still raises KeyboardInterrupt - KeyError(10) from logger.debug does not escape - any Exception from logger.debug is swallowed - agent.interrupt still fires when logger.debug raises - agent.interrupt raising also does not escape - BaseException (SystemExit) is NOT swallowed — guard uses `except Exception` deliberately so real shutdown signals still propagate Closes #13710 regression.	2026-05-06 03:55:47 -07:00
kshitijk4poor	395dbcc873	feat(browser): add Lightpanda engine support with automatic Chrome fallback Add Lightpanda as an optional browser engine for local mode. Lightpanda is a headless browser built from scratch in Zig -- faster navigation than Chrome with significantly less memory. One config line to enable: browser: engine: lightpanda New functions in browser_tool.py: - _get_browser_engine() -- config/env reader with validation + caching - _should_inject_engine() -- only inject in local non-cloud mode - _needs_lightpanda_fallback() -- detect empty/failed LP results - _chrome_fallback_screenshot() -- temporary Chrome session for screenshots - Engine injection in _run_browser_command (--engine flag) - browser_vision pre-routes screenshots to Chrome when engine=lightpanda Config: - browser.engine in DEFAULT_CONFIG (auto/lightpanda/chrome) - AGENT_BROWSER_ENGINE in OPTIONAL_ENV_VARS - /browser status shows engine info in local mode Rebased from PR #7144 onto current main. All existing code preserved -- pure additions only (+520/-2). 25 new tests + 81 total browser tests pass (0 failures).	2026-05-06 03:23:19 -07:00
Siddharth Balyan	3b750715a3	fix: resolve lazy session creation regressions (#18370 fallout) (#20363 ) Fix three regressions introduced by PR #18370 (lazy session creation): 1. _finalize_session() uses stale session_key after compression (#20001) 2. session_key not synced after auto-compression in run_conversation (#20001) 3. pending_title ValueError leaves title wedged forever (#19029) 4. Gateway silently swallows null responses when agent did work (#18765) 5. One-time cleanup for accumulated ghost compression continuations (#20001) Changes: - tui_gateway/server.py: _finalize_session() now uses agent.session_id (falls back to session_key when agent is None). Refactor _sync_session_key_after_compress() with clear_pending_title and restart_slash_worker policy flags. Call it post-run_conversation() to sync session_key after auto-compression. Add ValueError handler to pending_title flush. - gateway/run.py: Extract _normalize_empty_agent_response() helper that consolidates failed/partial/null response handling. Surfaces user-facing error when agent did work (api_calls > 0) but returned no text. - hermes_state.py: Add finalize_orphaned_compression_sessions() — marks ghost continuation sessions as ended (non-destructive, preserves data). - cli.py: One-time startup migration for orphaned compression sessions. Test changes: - tests/test_tui_gateway_server.py: Update pending_title ValueError test for post-#18370 architecture (title applied post-message, not at create). - tests/test_lazy_session_regressions.py: 14 new regression tests covering all fixed paths.	2026-05-06 01:11:49 +05:30
Justin Kausel	e805380b82	Discover plugin commands during CLI dispatch	2026-05-05 09:58:37 -07:00
novax635	4e6f51167d	fix(cli): fall back on invalid HERMES_MAX_ITERATIONS	2026-05-05 06:11:03 -07:00
revaraver	aacf36e943	fix(cli): persist manual compress handoff	2026-05-05 04:42:48 -07:00
brooklyn!	20428f5e60	fix(tui): respect voice.record_key config (supersedes #19028 , #19339 ) (#19835 ) * fix(tui): respect voice.record_key config instead of hardcoded Ctrl+B Classic CLI loaded ``voice.record_key`` from config.yaml and bound the prompt-toolkit handler dynamically (``cli.py`` paths). The new TUI hard- coded ``Ctrl+B`` everywhere — ``isVoiceToggleKey`` (input handler), ``/voice status`` ("Record key: Ctrl+B"), and ``/voice on`` ("Ctrl+B to start/stop recording"). A user who set ``voice.record_key: ctrl+o`` (or any other key) saw the documented config silently ignored — only Ctrl+B worked, the displayed shortcut lied about it. Wire the configured key end to end through the existing channels: * Backend (``tui_gateway/server.py``): ``voice.toggle`` action=status AND action=on/off responses now include ``record_key``, sourced from ``config.get('voice', {}).get('record_key', 'ctrl+b')``. * Backend types (``ui-tui/src/gatewayTypes.ts``): ``ConfigFullResponse`` now exposes ``config.voice.record_key`` and ``VoiceToggleResponse`` carries ``record_key`` so the TUI can both bind and display it. * Frontend parser/formatter (``ui-tui/src/lib/platform.ts``): ``parseVoiceRecordKey()`` accepts ``ctrl+b`` / ``alt+r`` / ``cmd+space`` and the common aliases (``option``, ``cmd``, ``win``, …); falls back to the documented Ctrl+B for empty / multi-character / malformed input so a typo never silently disables the shortcut. ``formatVoiceRecordKey()`` renders for status text. ``isVoiceToggleKey`` now takes a parsed ``ParsedVoiceRecordKey`` argument; the hardcoded ``ch === 'b'`` is gone. Default arg keeps existing call sites back-compat. * Hydration (``ui-tui/src/app/useConfigSync.ts``, ``useMainApp.ts``): startup ``config.get full`` already runs; extract ``cfg.voice.record_key`` from it, parse, push into a new ``voiceRecordKey`` state, and forward to the input handler ctx (``InputHandlerContext.voice.recordKey``). Mtime-poll path also re-applies the parsed key so a hand-edit of config.yaml takes effect the next tick — matches existing behaviour for display options. * Input handler (``ui-tui/src/app/useInputHandlers.ts``): ``isVoiceToggleKey(key, ch, voice.recordKey)`` so the configured binding fires. * Slash command (``ui-tui/src/app/slash/commands/session.ts``): ``/voice status`` and ``/voice on`` use ``formatVoiceRecordKey`` on the response's ``record_key`` instead of the hardcoded label. Tests: * ``parseVoiceRecordKey`` covers ctrl/alt/cmd/super aliases, multi-char rejection, and empty fallback. * ``formatVoiceRecordKey`` covers the doc examples (``Ctrl+B``, ``Ctrl+O``, ``Alt+R``, ``Cmd+B``). * ``isVoiceToggleKey`` regression: ``ctrl+o`` configured → only ``o`` matches, not ``b``; ``alt+r`` matches both alt-bit and meta-bit encodings (terminal protocol parity); omitted-arg call still binds Ctrl+B for back-compat. Full TUI suite (555 tests) passes; ``tsc --noEmit`` clean. Fixes #18994 Co-authored-by: asheriif <ahmedsherif95@gmail.com> * fix(tui): support named-key tokens in voice.record_key (space, enter, …) Reviewer caught that the round-1 parser in #18994 rejected every multi-character token, so a config value like ``ctrl+space`` (which the CLI happily binds via prompt_toolkit's ``c-space`` rewrite in ``cli.py``) silently fell back to the documented Ctrl+B default — re-introducing the same false-shortcut bug the PR was meant to fix, just at a different surface. Add explicit named-key support that mirrors what the CLI accepts: * ``space`` (alias: ``spc``) → matches ``ch === ' '`` * ``enter`` (alias: ``return``, ``ret``) → matches ``key.return`` * ``tab`` → matches ``key.tab`` * ``escape`` (alias: ``esc``) → matches ``key.escape`` * ``backspace`` (alias: ``bs``) → matches ``key.backspace`` * ``delete`` (alias: ``del``) → matches ``key.delete`` ``ParsedVoiceRecordKey`` gains an optional ``named`` field; ``ch`` holds either a single char (back-compat) or the canonical named token, and the runtime matcher dispatches on ``named`` before checking the modifier shape. Aliases collapse to one canonical name so ``ctrl+esc`` and ``ctrl+escape`` behave identically. Unrecognised multi-character tokens (e.g. ``ctrl+spcae`` typo, or unsupported keys like ``ctrl+f5``) still fall back to the Ctrl+B default rather than silently disabling the binding — keeps the "typo never silently kills the shortcut" guarantee. Tests: * ``parseVoiceRecordKey`` parametrised over every named token + each alias variant. * New ``isVoiceToggleKey`` cases for space (ch-based match), enter (``key.return``), tab, escape, backspace, delete, including modifier-mismatch negatives. * ``formatVoiceRecordKey`` renders named keys in title case (``Ctrl+Space``, ``Ctrl+Enter``). * Existing fall-back-to-Ctrl+B contract preserved for empty input AND unrecognised multi-char tokens. Full TUI suite: 559/559 pass; ``tsc --noEmit`` clean. Refs #18994 (round-1 review feedback) Co-authored-by: asheriif <ahmedsherif95@gmail.com> * test(tui): assert voice.toggle returns configured record_key Salvage the backend regression from #19339 — asserts ``voice.toggle`` action=on AND action=status responses carry the configured ``voice.record_key`` end-to-end through ``_load_cfg()``. Keeps the CLI→TUI parity contract visible in the Python test suite alongside the existing frontend parser/matcher/formatter coverage from #19028. * fix(tui): address Copilot review on #19835 voice.record_key wiring Five tightenings on the parser + matcher + hydration surface, all caught by the Copilot review on the PR — each one turns a silent false-fire or display/binding skew into a deterministic behaviour. * isVoiceToggleKey ctrl branch was too permissive for named keys. The doc-default macOS Cmd+B muscle-memory fallback (``isActionMod(key)`` on top of ``key.ctrl``) fired for every configured key, so bare Esc — which hermes-ink reports with ``key.meta`` on some macOS terminals — triggered ``ctrl+escape``, and Alt+Space / Alt+Tab triggered ``ctrl+space`` / ``ctrl+tab``. Gate the fallback to the literal ``ctrl+b`` binding so any custom chord requires the real Ctrl bit. * Alt branch guarded against Ctrl/Cmd co-press. Without this, Ctrl+Alt+<letter> and Cmd+Alt+<letter> also fired ``alt+<letter>``. * Dropped the ``meta`` modifier variant and its alias. In hermes-ink ``key.meta`` is Alt on xterm-style terminals and Cmd on legacy macOS ones, so a literal ``meta+b`` config displayed as ``Cmd+B`` while matching Alt+B — exactly the kind of false shortcut the PR was meant to remove. ``cmd`` / ``command`` now collapse onto ``super`` (kitty-style ``key.super``, with a macOS ``key.meta`` fallback) and render as ``Cmd+B``. Unknown modifier tokens fall back to the documented Ctrl+B default rather than silently coercing to Ctrl. * Slash-command display/binding skew. ``/voice status`` and ``/voice on`` rendered from the fresh gateway ``record_key`` response, but ``useInputHandlers()`` still bound the old key until the next 5s mtime poll. Thread ``setVoiceRecordKey`` through ``SlashHandlerContext.voice`` and push the parsed spec into frontend state on every response so text and binding stay consistent. * Test coverage for the two paths Copilot flagged. Added vitest coverage for (a) the three-case ``/voice`` slash output in ``createSlashHandler.test.ts`` and (b) the ``applyDisplay → voice.record_key`` hydration + omit-setter back-compat paths in ``useConfigSync.test.ts``. Plus regression cases for every false-fire scenario above. Suite: 575/575 green, tsc --noEmit clean. * fix(tui): address Copilot round-2 review on #19835 Three tightenings on the surface introduced in the round-1 fix: * ``/voice tts`` reset custom bindings to Ctrl+B. The ``tts`` branch of ``voice.toggle`` omitted ``record_key`` from its response, so the frontend's ``r.record_key ?? 'ctrl+b'`` coerced a user's custom binding back to the default on every TTS toggle. Two-sided fix: the backend now includes ``record_key`` on the ``tts`` branch (parity with ``status``/``on``/``off``), and the slash handler only pushes frontend state when the response actually carries ``record_key`` — belt-and-suspenders against any future branch forgetting to include it. * ``super+b`` / ``win+b`` / ``cmd+b`` displayed "Cmd+B" on Linux and Windows. ``formatVoiceRecordKey`` rendered ``mod === 'super'`` as ``Cmd`` universally, which told non-mac users the wrong modifier to press even though ``isVoiceToggleKey`` matched the right event bits. Gate the label to ``isMac`` so non-mac renders ``Super+B``. * ``control+b`` / ``ctrl + b`` lost the macOS Cmd+B fallback. ``_isDefaultVoiceKey`` keyed off ``parsed.raw`` — so semantically-equal aliases of the documented default dropped into the strict branch even though they bind Ctrl+B. Compare on the parsed spec (mod + ch + named) instead. Coverage added: Linux ``Super+B`` rendering (and macOS ``Cmd+B``), ``control+b`` / ``ctrl + b`` accepting the Cmd+B fallback on darwin, ``/voice tts`` without ``record_key`` not clobbering cached binding, and a backend regression asserting every ``voice.toggle`` branch carries the configured key. Suite: 579/579 TUI vitest green, 2/2 backend voice tests green, tsc --noEmit clean. * fix(tui): address Copilot round-3 review on #19835 Three classes of robustness issue caught on the second pass — all revolve around malformed YAML tipping ``parseVoiceRecordKey`` or ``_voice_record_key`` into a crash instead of the documented fallback. * Parser crashed on non-string YAML scalars. ``config.get full`` returns raw ``yaml.safe_load`` output, so ``voice.record_key: 1`` or ``voice.record_key: true`` in a hand-edited config would hit ``.trim()`` on a number/bool and throw, breaking startup and every mtime re-apply. Accept ``unknown`` at the signature, guard with ``typeof raw !== 'string'``, and fall back to the default. * Backend blew up on non-dict ``voice:``. Same YAML hazard on the gateway side: ``voice: true`` / ``voice: cmd+b`` left ``_load_cfg().get("voice")`` as a bool/str, so ``.get("record_key")`` raised AttributeError and took every ``voice.toggle`` branch down with it. Centralised the lookup in a single ``_voice_record_key()`` helper that ``isinstance``-guards both ``voice`` and ``record_key`` and falls back to ``ctrl+b``. * Multi-modifier chords silently dropped extras. The previous validator only checked the first modifier token, so ``ctrl+alt+r`` silently parsed as ``ctrl+r`` and ``cmd+ctrl+b`` as ``super+b`` — a typo bound a different shortcut than the user configured. Reject multi-modifier spellings outright; the classic CLI only supports single-modifier bindings via prompt_toolkit's ``c-x`` / ``a-x`` rewrite, so this matches CLI parity. Coverage added: * ``parseVoiceRecordKey`` fallback on ``1`` / ``true`` / ``null`` / ``undefined`` / ``{}``. * ``parseVoiceRecordKey`` fallback on ``ctrl+alt+r`` / ``cmd+ctrl+b`` / ``alt+ctrl+space``. * ``test_voice_toggle_handles_non_dict_voice_cfg`` exercises every non-dict ``voice:`` shape (bool, str, None, int, list) and asserts each falls back to ``record_key: 'ctrl+b'``. Suite: 581/581 TUI vitest green, 3/3 backend voice tests green, tsc --noEmit clean. * fix(tui): address Copilot round-4 review on #19835 Four final corners of the voice.record_key surface: * Bare-char configs silently coerced to ``ctrl+<key>``. A config like ``voice.record_key: o`` / ``space`` / ``escape`` fell through to the default ``mod = 'ctrl'`` and silently bound Ctrl+O, while the classic CLI's prompt_toolkit would bind the raw key (no rewrite) — so the two runtimes silently disagreed on what "o" means. Require an explicit modifier; bare-char configs fall back to the documented Ctrl+B default. * Reserved ctrl+<letter> bindings would never fire. ``useInputHandlers()`` intercepts ``ctrl+c`` (interrupt), ``ctrl+d`` (quit), and ``ctrl+l`` (clear screen) before the voice check runs, so those configs would be advertised in /voice status but the advertised shortcut never actually triggers push-to-talk. Added ``_RESERVED_CTRL_CHARS`` at parse time so the user gets the documented default instead of a dead shortcut. (``alt+c``, ``cmd+l``, etc. are not intercepted and stay usable.) * ``_load_cfg()`` root itself may be a non-dict. ``_voice_record_key()`` isinstance-guarded the ``voice`` subkey but not the root — a malformed config.yaml that collapsed to a scalar/list at the top level (``config.yaml: true`` or ``[]``) would still raise on ``.get("voice")``. Added the top-level guard too so every malformed shape falls back to ``ctrl+b``. * Stale header comment on ``isVoiceToggleKey``. The doc-comment still claimed "On macOS we additionally accept the platform action modifier (Cmd) for the configured letter" even though the implementation gates the Cmd fallback to the documented default only. Rewrote to match. Coverage added: * ``parseVoiceRecordKey`` fallback on bare chars (``o``, ``b``, ``space``, ``escape``). * ``parseVoiceRecordKey`` fallback on ``ctrl+c`` / ``ctrl+d`` / ``ctrl+l``; positive case for ``alt+c`` / ``cmd+l`` still usable. * Backend ``test_voice_toggle_handles_non_dict_voice_cfg`` now exercises 5 non-dict shapes at the YAML root too. Suite: 583/583 TUI vitest green, 3/3 backend voice tests green, tsc --noEmit clean. * fix(tui): address Copilot round-5 review on #19835 Three follow-ups on the voice matcher's modifier + shift discipline: * ``super`` branch falsely fired on Alt+<key> / bare Esc on macOS. ``isVoiceToggleKey`` accepted ``isMac && key.meta`` as a Cmd fallback for the ``super`` modifier — but hermes-ink sets ``key.meta`` for plain Alt/Option AND for bare Escape on some macOS terminals. A ``cmd+b`` config silently fired on Alt+B; ``cmd+space`` on Alt+Space; ``cmd+escape`` on bare Esc. Drop the fallback and require the literal ``key.super`` bit. Legacy- terminal users who need Cmd should upgrade to a kitty-protocol terminal or bind ``alt+X`` explicitly. * Shift bit was never checked. The parser rejects multi- modifier configs like ``ctrl+shift+tab``, but the runtime matcher didn't check ``key.shift`` — so ``ctrl+tab`` also fired on Ctrl+Shift+Tab and ``alt+enter`` on Alt+Shift+Enter. Early-return on ``key.shift === true`` so the runtime only fires the exact chord the user configured. * Test leaked ``HERMES_VOICE=1`` into later tests. ``voice.toggle`` action=on writes to ``os.environ`` directly (CLI parity, runtime-only flag); ``test_voice_toggle_returns_ configured_record_key`` dispatched action=on without letting monkeypatch take ownership of the var first. Any later test that read voice mode in the same Python process could inherit a stale enabled state. Added ``monkeypatch.setenv("HERMES_VOICE", "0")`` up front so monkeypatch restores the original value at teardown. Coverage added: * ``cmd+b`` / ``cmd+space`` / ``cmd+escape`` do NOT fire on ``key.meta``-only events on darwin. * ``ctrl+tab`` / ``alt+enter`` / ``ctrl+o`` reject matches when ``key.shift`` is held; sanity cases without Shift still fire. Suite: 585/585 TUI vitest green, 3/3 backend voice tests green, tsc --noEmit clean. * fix(tui): address Copilot round-6 review on #19835 Three classes of modifier-discipline tightening + one config-surface honesty fix: * Default ``ctrl+b`` Cmd fallback leaked Alt+B. The default's macOS Cmd+B muscle-memory path used ``isActionMod(key)``, which returns ``key.meta \|\| key.super`` on darwin. hermes-ink also reports plain Alt as ``key.meta``, so Alt+B silently fired the default binding. Replaced with strict ``isMac && key.super === true`` — kitty-style Cmd+B still works, Alt+B correctly rejected. Legacy-terminal mac users (Terminal.app without CSI-u) now get raw Ctrl+B only; the documented default still works everywhere. * ctrl / super branches accepted extra modifier bits. The parser rejects multi-modifier configs like ``ctrl+alt+o``, but the runtime matcher was permissive — ``ctrl+o`` fired on Ctrl+Alt+O / Ctrl+Cmd+O, and ``super+b`` fired on Cmd+Alt+B / Ctrl+Cmd+B. Added strict ``!key.alt && !key.meta && key.super !== true`` on ctrl, and ``!key.ctrl && !key.alt && !key.meta`` on super, so the runtime only fires the exact chord the parser would let you configure. * Dropped ``cmd`` / ``command`` aliases. They parsed to ``super`` and rendered as ``Cmd+X``, but legacy macOS terminals report Cmd as ``key.meta`` (same signal as Alt), so a ``cmd+o`` config was advertised as working but never actually fired on Terminal.app-without-CSI-u. That recreated the "displayed shortcut does not work" problem this PR was meant to remove. Users who want the platform action modifier spell it ``super`` / ``win`` — that matches the unambiguous ``key.super`` bit, and kitty-style macOS terminals render it as ``Cmd+X`` via platform-aware formatter. Coverage updated: * Default ctrl+b no longer fires on Alt+B via ``key.meta`` leak; raw Ctrl+B and kitty-style Cmd+B still fire. * ``ctrl+o`` rejects Ctrl+Alt+O / Ctrl+Cmd+O / Ctrl+Meta+O chords. * ``super+b`` rejects Cmd+Alt+B / Cmd+Meta+B / Ctrl+Cmd+B chords. * ``cmd+b`` / ``command+b`` / ``meta+b`` all fall back to the documented default at parse time (joined the ambiguous-mac-mod rejection class). * Round-2 expectations that asserted ``cmd+b`` parsed as super and accepted ``key.meta`` on darwin updated to reflect the new stricter contract. Suite: 588/588 TUI vitest green, 3/3 backend voice tests green, tsc --noEmit clean. * fix(tui): address Copilot follow-up on wire typing + escape precedence Two follow-ups from the latest Copilot pass: * Config wire typing honesty (`gatewayTypes.ts`) `config.get full` forwards raw `yaml.safe_load()` output, so `voice.record_key` can be any scalar/container when hand-edited. Typing it as `string` suggests a normalized contract that the backend does not guarantee and makes unsafe callers more likely. Change `ConfigVoiceConfig.record_key` to `unknown` with an explicit comment that callers must normalize at runtime. * Escape-based voice bindings were swallowed before voice check `useInputHandlers()` handled `key.escape` for queue-edit cancel and selection clear before `isVoiceToggleKey(...)`, so configured `ctrl+escape` / `alt+escape` / `super+escape` chords were advertised but never toggled recording in those UI states. Add an early escape+voice check before generic Esc handlers so escape-based voice bindings win when configured, while plain Esc behavior remains unchanged. Also updated PR #19835 description text to remove stale cmd/command alias claims and match the current parser contract. * fix(tui): pass configured voice shortcut through TextInput layer Thread the live parsed voiceRecordKey into TextInput so configured voice.record_key chords bubble to useInputHandlers instead of being consumed as editor input. This removes the last hardcoded Ctrl+B pass-through in the composer path while preserving existing global control chord behavior. * fix(tui): require explicit alt bit for escape-based alt chords Hermes-ink reports bare Escape as meta=true+escape=true on some terminals, so a configured alt+escape binding was firing on bare Esc. Require an explicit key.alt bit when the configured named key is escape so plain Esc stays plain Esc; kitty-style alt+escape still fires. * fix(tui): harden voice.record + TextInput paste + super-mod reserved list Three round-7 Copilot follow-ups on #19835: - voice.record start handler used _load_cfg().get('voice', {}).get(...) without shape checks, so malformed YAML (bool/scalar/list) returned 5025 instead of using VAD defaults. Centralized _voice_cfg_dict() helper and type-guarded silence_threshold/silence_duration with numeric fallbacks. - TextInput pass-through check moved above paste/copy handling so configured voice chords (ctrl+v / alt+v / cmd+v) beat the composer's paste/copy defaults. - parser now also rejects super+{c,d,l,v} — on macOS those are copy/exit/clear/paste and would be advertised in /voice status but never actually toggle recording. * Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> * fix(tui): round-8 Copilot review — allow ctrl+x, gate super reservations to macOS, preserve voice key on transient RPC failure Three round-8 Copilot follow-ups on #19835: - Revert ctrl+x addition to _RESERVED_CTRL_CHARS (landed via Copilot Autofix commit 731ec86): ctrl+x is only claimed during queue-edit (queueEditIdx !== null), so voice works the rest of the session and matches CLI ctrl+<letter> parity. - Gate super+{c,d,l,v} reservation to isMac. Linux/Windows TUI globals key off Ctrl, so kitty/CSI-u super+<letter> configs don't collide on non-mac and should stay usable. - applyDisplay() now skips setVoiceRecordKey when cfg is null so one transient quietRpc() failure after a config edit doesn't clobber the cached binding back to Ctrl+B until the next successful poll. New coverage: - parseVoiceRecordKey preserves ctrl+x on linux - super+{c,d,l,v} rejected on darwin, allowed on linux - applyDisplay(null, ...) leaves voiceRecordKey untouched * fix(cli,tui): normalize voice.record_key aliases across CLI + TUI for parity Round-9 Copilot review on #19835: TUI accepted control+/option+/opt+/super+/win+ aliases but the classic CLI only rewrote literal ctrl+/alt+ before handing to prompt_toolkit, so a TUI-valid config silently bound a different (or no) shortcut in the CLI. - Added normalize_voice_record_key_for_prompt_toolkit() in hermes_cli/voice.py with a single alias table (ctrl/control/alt/option/opt → c-/a-). - Wired it into all three cli.py sites (_enable_voice_mode hint, _show_voice_status display, and the prompt_toolkit binding in _register_voice_handler). - /voice status display now renders control+x as Ctrl+X and option+x as Alt+X (canonical casing) to match TUI formatVoiceRecordKey. - super/win/windows are intentionally left unchanged: prompt_toolkit has no super modifier, so the CLI will reject them loudly at startup rather than silently binding Ctrl+B. Documented this split at both the TUI _MOD_ALIASES comment and the CLI normalizer docstring. - Added tests covering ctrl/control/alt/option/opt mapping, case-insensitivity, non-string fallback, empty-string fallback, and super/win pass-through. * fix(cli): port TUI parser contract into CLI voice.record_key normalizer Round-10 Copilot review on #19835. hermes_cli/voice.py's normalize_voice_record_key_for_prompt_toolkit() previously did blind substring replacement with no trim/validate step, so the CLI diverged from the TUI parser on: - whitespace ('ctrl + b' -> 'c- b' instead of 'c-b') - typoed named keys ('ctrl+spcae' passed through as 'c-spcae' and prompt_toolkit would reject at startup) - bare-char configs ('o' should fall back, not pass through as 'o') - multi-modifier chords ('ctrl+alt+r') - reserved ctrl chars ('ctrl+c/d/l') - unknown modifiers ('meta+b' / 'shift+b') - named-key aliases ('return'/'esc'/'bs'/'del' not collapsed to prompt_toolkit canonicals) Port the TUI parser contract into Python (_VOICE_MOD_ALIASES, _VOICE_NAMED_KEYS, _VOICE_RESERVED_CTRL_CHARS) so one config value binds the same shortcut in both runtimes. Also added format_voice_record_key_for_status() shared between the PTT hint and /voice status display. Non-string scalars (voice.record_key: true / 1) now surface as 'Ctrl+B' instead of the raw scalar — /voice status no longer advertises a shortcut that can never bind. Tests: 29/29 in test_voice_wrapper.py, including 11 new regressions covering whitespace, named-key aliases, typos, bare-char, multi-modifier, reserved ctrl, unknown mods, non-string fallback, and formatter contract. * fix(cli): shape-safe voice config read + graceful super/win fallback Round-11 Copilot review on #19835. Two remaining cross-runtime gaps: 1. load_config().get('voice', {}) still assumed voice was a dict, so a hand-edited voice: true / voice: cmd+b at the top level raised AttributeError before the voice UI could start. Added voice_record_key_from_config(cfg) to hermes_cli/voice.py that isinstance-guards both the root and the voice subkey. All three cli.py read sites (_enable_voice_mode hint, _show_voice_status, PTT binding) now use it. 2. The CLI normalizer previously passed super+/win+/windows+ through unrewritten so prompt_toolkit would reject them loudly at startup — but that crash was a worse UX than a silent fallback. Normalizer now returns c-b for those spellings, and the PTT binding site logs a warning so users see why their TUI-only shortcut isn't binding in the CLI. Coverage: 34/34 in tests/hermes_cli/test_voice_wrapper.py (5 new cases for voice_record_key_from_config + malformed-root + malformed-voice + extractor/normalizer composition). * fix(cli): self-audit cleanup — remaining voice-config shape safety + doc drift Self-review of the voice.record_key change set turned up four remaining items Copilot would very likely flag next round: 1. cli.py _voice_start_continuous still read load_config().get('voice', {}).get('silence_threshold') without an isinstance guard, so a hand-edited voice: true / voice: cmd+b (non-dict) raised AttributeError on VAD recording start. Shape-safe coerce the voice dict and numeric-guard silence_threshold/silence_duration. 2. cli.py _enable_voice_mode's auto_tts check had the same bug — fixed with the same isinstance guard. 3. hermes_cli/voice.py module comment on _VOICE_MOD_ALIASES still said super/win/windows 'pass through unchanged and prompt_toolkit's add() call loudly rejects them at startup'. Round 11 changed the normalizer to silently fall back to c-b with a warning at the binding site; updated the comment to match. 4. ui-tui/src/lib/platform.ts header comment had the same stale 'CLI will loudly reject them at startup' claim; updated to 'falls back to the documented default and logs a warning'. No behavior change on the code paths already covered by test_voice_wrapper.py; the two cli.py fixes are defensive against malformed YAML that previous rounds already hardened in tui_gateway/server.py but missed in the classic CLI. * fix(cli,tui): round-12 Copilot review — alt-collide on mac, bool-in-int guards, voice UI hardcodes, mtime-reload test Five round-12 Copilot review items on #19835: 1. platform.ts: hermes-ink reports Alt as key.meta on many terminals; isActionMod on darwin accepts key.meta as the action modifier. So alt+c/d/l get claimed by isCopyShortcut / isAction('d')/'l') before the voice check. Reject those configs at parse time on macOS only (non-mac keeps them usable). 2. cli.py: four remaining hardcoded 'Ctrl+B' sites in voice-facing UI (_get_voice_status_fragments status bar, _voice_start_recording hints, _get_placeholder composer text) were still lying about non-default configs. Added self._voice_record_key_label() shared helper and wired it into all three sites. 3. server.py + cli.py: bool is a subclass of int, so isinstance(silence_threshold, (int, float)) accepted True/False from malformed YAML and forwarded 1/0 to the VAD engine. Exclude bool explicitly so boolean typos fall back to the documented 200 / 3.0 defaults. 4. useConfigSync.ts: extracted the config.get-full fetch+apply body into a shared hydrateFullConfig() helper. Both the initial hydration and mtime-reload paths now use it, so the polling/RPC wiring is exercised by direct unit tests (4 new cases: fresh apply, reapply on new value, transient RPC failure preserves cache, back-compat without voice setter). 5. Added alt+{c,d,l} rejection regressions on darwin + allow on linux, and bool-leak regressions for both silence_threshold and silence_duration in tests/test_tui_gateway_server.py. Suite: 602/602 TUI vitest, 38/38 backend voice tests, typecheck + lints clean. * fix(cli): cache voice record-key label at binding time + status-bar coverage Round-13 Copilot review on #19835. _voice_record_key_label() was reading live config on every render, which caused two problems: 1. prompt_toolkit registers the push-to-talk binding once at session start (@kb.add(_voice_key)); the binding does NOT re-read config. Editing voice.record_key mid-session would switch the status-bar / placeholder / recording-hint label to the new shortcut while the actual keybinding stayed on the startup chord — reintroducing the display/binding drift this whole PR is fighting. 2. Hot render path: during recording the UI is invalidated every 150ms, so re-loading + deep-merging config on every call added avoidable UI overhead. Fix: cache the label at the same site that registers the prompt_toolkit binding via new set_voice_record_key_cache(raw_key). _voice_record_key_label() now just returns the cached value (falls back to 'Ctrl+B' before startup). Status/placeholder/hint are always in sync with the live binding; no config reload per render. Also added 4 regression cases to tests/cli/test_cli_status_bar.py: configured ctrl+<letter> renders in both wide and compact status bars, configured named key (ctrl+space) renders in the recording hint, pre-startup absent cache falls back to Ctrl+B, and malformed configs (bool True) fall through the formatter to Ctrl+B. Suite: 60/60 test_cli_status_bar + test_voice_wrapper, typecheck + lints clean. * fix(cli): route /voice on + /voice status through startup-pinned label; mac alt+cdl parity Round-14 Copilot review on #19835. All three comments legit: 1. _enable_voice_mode still formatted label from live load_config() — mid-session config edit would make /voice on announce the new shortcut while the prompt_toolkit binding stayed the startup chord. Use self._voice_record_key_label() (cached at binding time, round-13) so /voice on cannot drift from the live binding. 2. _show_voice_status had the same bug — /voice status reported live config instead of the pinned startup binding. Fixed the same way. 3. CLI normalizer accepted alt+c/alt+d/alt+l even though the TUI parser rejects them on macOS (Copilot round-12 — hermes-ink reports Alt as key.meta, isActionMod on darwin accepts it, collides with isCopyShortcut / isAction). Added _VOICE_RESERVED_ALT_CHARS_MAC = {c,d,l} gated to sys.platform == 'darwin' so a shared config like option+c falls back to c-b on both runtimes on macOS; non-mac still binds a-c. Coverage: 4 new tests in test_voice_wrapper.py covering mac alt+cdl rejection, linux alt+cdl allowed, option/opt alias forms, and mac-specific exclusions for other alt letters. 62/62 in voice wrapper + status bar suites. --------- Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com> Co-authored-by: asheriif <ahmedsherif95@gmail.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>	2026-05-04 15:49:28 -07:00
Teknium	b8fb9270c4	refactor(cli): drop dead c-S-c key binding (follow-up to #19895 ) (#19919 ) #19884 added a prompt_toolkit key binding for Ctrl+Shift+C to "prevent Hermes from intercepting the keystroke as an interrupt signal." #19895 then wrapped the binding in try/except after discovering it crashed startup with ValueError on every platform. Both PRs were based on a misreading of how terminal key events propagate: 1. Terminal emulators (GNOME Terminal, iTerm2, kitty, Windows Terminal, etc.) intercept Ctrl+Shift+C before the keystroke reaches the application's stdin. prompt_toolkit never sees it. The binding could never have intercepted anything. 2. prompt_toolkit's key spec parser doesn't recognise 'c-S-c' on any platform — the Shift modifier is meaningless on control-sequence keys. Verified: every prompt_toolkit version raises 'Invalid key: c-S-c' at registration time. The handler is dead code. Delete it and leave a comment explaining why no binding is needed here. Ctrl+Q alias (#19884's other addition) stays — that's a real prompt_toolkit key and a legitimate interrupt shortcut. Verified the CLI starts cleanly — key binding phase no longer raises and the subsequent chat flow reaches the provider setup check without error.	2026-05-04 14:49:38 -07:00
nftpoetrist	429b8eceb4	fix(cli): guard c-S-c key binding with try/except to prevent startup crash (#19895 ) PR #19884 added @kb.add('c-S-c') unconditionally. prompt_toolkit raises ValueError("Invalid key: c-S-c") during HermesCLI.__init__ on platforms where this key spec is not recognised — the process exits before reaching the prompt loop. Reported on macOS (#19894) and Linux (#19896) immediately after #19884 landed. Fix: wrap the registration in try/except ValueError so that startup continues cleanly on any platform/version that rejects the spec. Where the spec is accepted the binding is registered normally as a no-op, allowing the terminal to handle Ctrl+Shift+C natively as before. Fixes #19894 Fixes #19896	2026-05-04 14:45:01 -07:00
Harry Riddle	645a2f482d	fix(cli): fix shortcut config conflict in hermes_cli	2026-05-04 12:41:05 -07:00
giwaov	026a5e47df	fix(cli): preserve Windows hidden-dir paths in markdown	2026-05-04 05:04:36 -07:00
Teknium	5b6d413476	fix(cli,gateway): surface title errors from /new <name> The contributor's PR silently swallowed ValueError from SessionDB.set_session_title() with bare except Exception: pass. Users typing /new <title> with an already-in-use title got an untitled session and no feedback. Changes: - cli.py: catch ValueError from both sanitize_title() and set_session_title(); print the error and mark the session untitled in the banner (never echo the rejected title back). - gateway/run.py: append a warning note to the reset reply on title rejection; reflect the accepted title in the header. - Add regression tests for the duplicate-title path in CLI and gateway. Also map exx@example.com -> @exxmen in scripts/release.py.	2026-05-04 03:14:50 -07:00
Exx	f720751d79	feat(cli,gateway): /new accepts optional session name argument Allow users to start a fresh session and immediately set its title by passing a name to /new (or /reset): /new Refactor auth module Changes: - hermes_cli/commands.py: add args_hint='[name]' to /new command - cli.py: parse title argument in process_command(), pass to new_session() - cli.py: new_session() accepts title=None, sets title via SessionDB - gateway/run.py: _handle_reset_command() parses title, sets on new entry - gateway/session.py: reset_session() accepts optional display_name - tests: add test_new_session_with_title, test_reset_command_with_title, test_new_command_in_help_output All 36 affected tests pass.	2026-05-04 03:14:50 -07:00
ChanlerDev	e3461e0b2a	fix(cli): remove dead 'q' check from quit command resolution The 'q' alias is defined for 'queue' command in commands.py:93. The hardcoded 'q' in cli.py:5910 was dead code - resolve_command('q') returns the queue CommandDef, so canonical would never be 'q'. Removes the misleading check without changing any behavior: - /quit and /exit still exit (defined aliases) - /q still maps to queue (as intended)	2026-05-04 03:11:30 -07:00
ms-alan	c659a16899	fix(cli): detect quoted relative paths in _detect_file_drop Closes #15197	2026-05-04 02:48:20 -07:00
tmdgusya	a1cb811cb8	fix(cli): avoid voice TTS restart race	2026-05-04 01:36:07 -07:00
Siddharth Balyan	a11aed1acc	fix(cli): local backend CLI always uses launch directory, stops .env sync of TERMINAL_CWD (#19334 ) The old CWD heuristic was fooled by: 1. TERMINAL_CWD persisted to .env by `hermes config set terminal.cwd` 2. Inherited TERMINAL_CWD from parent hermes processes 3. Only resolved when config had a placeholder value (not explicit paths) Fix: - load_cli_config() unconditionally uses os.getcwd() for local backend - TERMINAL_CWD always force-exported in CLI mode (overrides stale values) - Gateway sets _HERMES_GATEWAY=1 marker so lazy cli.py imports don't clobber - Remove terminal.cwd from config-set .env sync map (prevents re-poisoning) - Clarify setup wizard label as 'Gateway working directory' Closes #19214	2026-05-04 11:36:19 +05:30
Siddharth Balyan	167b5648ea	Revert "fix(cli): CLI/TUI on local backend always uses launch directory, ignores terminal.cwd (#19242 )" (#19329 ) This reverts commit `9eaddfafa3`.	2026-05-04 00:43:58 +05:30
Siddharth Balyan	9eaddfafa3	fix(cli): CLI/TUI on local backend always uses launch directory, ignores terminal.cwd (#19242 ) CLI/TUI sessions on the local backend now unconditionally use os.getcwd() as the working directory. The terminal.cwd config value is only consumed by gateway/cron/delegation modes (where there's no shell to cd from). Previously, 'hermes setup' would write an absolute path (e.g. $HOME) into terminal.cwd which then pinned the CLI to that directory regardless of where the user launched hermes from. This was a silent foot-gun — the user's 'cd' was being ignored. Changes: 1. cli.py: Restructured CWD resolution — if TERMINAL_CWD is not already set by the gateway, and the backend is local, always use os.getcwd(). Config terminal.cwd is irrelevant for interactive CLI/TUI sessions. 2. setup.py: Moved the cwd prompt from setup_terminal_backend() to setup_gateway(). It now only appears when configuring messaging platforms and is labeled 'Gateway working directory'. 3. Tests: Rewrote test_cwd_env_respect.py to validate the new behavior: explicit config paths are ignored for CLI, gateway pre-set values are preserved, non-local backends keep their config paths. 4. Docs: Updated configuration.md, profiles.md, and environment-variables.md to clarify that terminal.cwd only affects gateway/cron mode on local backend. Closes #19214	2026-05-04 00:14:36 +05:30
ambition0802	7696ddc59e	fix(cli): robust paste file expansion and process_loop error handling (#17666 ) Two narrow fixes for long pasted messages silently disappearing: 1. _expand_paste_references: replace path.exists() + read_text() with try/except (OSError, IOError). Closes the TOCTOU window where a paste file deleted between check and read raised FileNotFoundError, bubbled up through process_loop's outer except, and silently dropped the user's input. Failures now return the placeholder text and log a warning. 2. process_loop outer except: logger.warning() instead of print(). prompt_toolkit's TUI swallows stdout, so 'Error: …' was invisible to the user. Logged errors are discoverable via hermes logs. Dropped the larger interrupt_queue→pending_input drain that was part of the original PR — that's a separate class of input-drop (in-progress interrupt handling) unrelated to the paste-file TOCTOU reported in the issue, and worth its own review. Salvage of #17939.	2026-05-02 02:07:14 -07:00
Siddharth Balyan	c5b4c48165	fix: lazy session creation — defer DB row until first message (#18370 ) Prevents ghost sessions from accumulating in state.db when the TUI/web dashboard is opened and closed without sending a message. Changes: - run_agent.py: Add _ensure_db_session() gate method, called at run_conversation() entry. Remove eager create_session() from __init__. Handle compression rotation flag correctly. - tui_gateway/server.py: Remove eager db.create_session() in _start_agent_build(). Add post-first-message pending_title re-apply. - hermes_state.py: Extract _insert_session_row() shared helper (DRY). Add prune_empty_ghost_sessions() for one-time migration. - cli.py: One-time ghost session prune on startup. Fix _pending_title to call _ensure_db_session() before set_session_title(). - hermes_cli/main.py: Guard TUI exit summary on message_count > 0. - tests: Update test_860_dedup to call _ensure_db_session() before direct _flush_messages_to_session_db() calls. Closes: ghost session clutter in hermes sessions list and web dashboard.	2026-05-01 18:39:12 +05:30
Teknium	265bd59c1d	feat: /goal — persistent cross-turn goals (Ralph loop) (#18262 ) Add a standing-goal slash command that keeps Hermes working toward a user-stated objective across turns until it is achieved, paused, or the turn budget runs out. Our take on the Ralph loop — cf. Codex CLI 0.128.0's /goal. After each turn, a lightweight auxiliary-model judge call asks 'is this goal satisfied by the assistant's last response?'. If not, and we're under the turn budget (default 20), Hermes feeds a continuation prompt back into the same session as a normal user message. Any real user message preempts the continuation loop automatically. Judge failures fail OPEN (continue) so a flaky judge never wedges progress — the turn budget is the real backstop. ### Commands - `/goal <text>` — set a standing goal (kicks off the first turn) - `/goal` or `/goal status` — show current state - `/goal pause` — pause the continuation loop - `/goal resume` — resume (resets turn counter) - `/goal clear` — drop the goal Works on both CLI and gateway platforms via the central CommandDef registry. ### Design invariants preserved - Prompt cache: continuation prompts are regular user-role messages appended to history. No system-prompt mutation, no toolset swap. - Role alternation: continuation is a user turn, never injected mid-tool-loop. - Session persistence: goal state lives in SessionDB.state_meta keyed by `goal:<session_id>`, so `/resume` picks it up. - Mid-run safety: on the gateway, `/goal status\|pause\|clear` are allowed mid-run (control-plane only); setting a new goal requires `/stop` first so we don't race a second continuation prompt against the current turn. ### Files - `hermes_cli/goals.py` (new, 380 lines) — GoalManager + judge + state - `hermes_cli/commands.py` — CommandDef entry - `hermes_cli/config.py` — `goals.max_turns` default - `hermes_cli/web_server.py` — dashboard category merge - `cli.py` — /goal handler + post-turn continuation hook in process_loop - `gateway/run.py` — /goal handler + post-turn continuation hook wrapping _handle_message_with_agent - `tests/hermes_cli/test_goals.py` (new, 26 tests) — judge parsing, fail-open semantics, lifecycle, persistence, budget exhaustion - `website/docs/reference/slash-commands.md` — docs entry	2026-04-30 23:10:20 -07:00
Teknium	f0dc919f92	fix(compression): include system prompt + tool schemas in token estimates (#18265 ) The user-visible /compress banner and the post-compression last_prompt_tokens writeback both counted only the raw message transcript (chars/4). With a 15KB system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of request pressure — a 234x gap. Two user-facing consequences: - Banner shows 'Compressing … (~45 tokens)…' while compression is actually firing on 10K+ tokens of real pressure, confusing users about why compression triggered (reported by @codecovenant on X; #6217). - Post-compression last_prompt_tokens writeback omits tool schemas, so the next should_compress() check compares real usage against a stale underestimate — compression triggers late, potentially past the model's context limit on small-context models (#14695). Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough() at every user-visible banner and at the post-compression writeback. estimate_request_tokens_rough() already existed for exactly this purpose and includes system prompt + tool schemas. Touched call sites: - run_agent.py: post-compression last_prompt_tokens writeback, post-tool call should_compress() fallback when provider usage is missing - cli.py: /compress banner + summary - gateway/run.py: gateway /compress banner + summary - tui_gateway/server.py: TUI /compress status + summary - acp_adapter/server.py: ACP /compact before/after Left intentionally alone: - Session-hygiene fallback and the 'no agent' /status path in gateway/run.py — no agent instance is in scope to query for system prompt/tools, and the existing 30-50% overestimate wobble on hygiene is safety-accepted. - Verbose-mode 'Request size' logging — informational only, already counts system prompt via api_messages[0]. Also relabels the feedback line from 'Rough transcript estimate' to 'Approx request size' so the metric label matches what it actually measures. Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217); user report @codecovenant on X (2026-04-30). Closes #14695 Closes #6217	2026-04-30 23:03:54 -07:00
hharry11	24130b7e53	fix(approval): harden YOLO mode env parsing against quoted-bool strings	2026-04-30 20:37:37 -07:00
Teknium	9a75743496	fix(gateway): apply agent.disabled_toolsets in gateway message loop Widens the cherry-picked fix from @jatingodnani (#17343) to the gateway path. On main, user_config.agent.disabled_toolsets was only honored by _get_platform_tools' name-level subtraction — it did not catch tools pulled in implicitly by a composite toolset (browser includes web_search, hermes-* platforms include most tools). Changes: - gateway/run.py: resolve disabled_toolsets alongside enabled_toolsets and pass to AIAgent at both user-facing construction sites (normal message loop + single-turn cron-like path). Hygiene/compression agents (fixed enabled_toolsets=[memory]) are intentionally untouched. - gateway/run.py: add (agent, disabled_toolsets) to _CACHE_BUSTING_CONFIG_KEYS so editing the list in config.yaml invalidates the cached AIAgent on the next message. - cli.py: drop unused 'import platform' left over from PR #17343's import churn; restore 'import sys' used throughout the file. - model_tools.py: drop unused 'import os, sys' added by PR #17343; fix comment reference from #15291 (unrelated OAuth issue) to #17309. Co-authored-by: jatin godnani <godnanijatin@gmail.com>	2026-04-30 20:24:39 -07:00
jatin godnani	e3624e00db	fix: enforce strictly subtractive toolset filtration Refactor tool resolution logic in model_tools.py to ensure that disabled_toolsets are always subtracted at the end, preventing composite toolsets (e.g. 'browser') from implicitly enabling tools that should be hidden. - Added 'disabled_toolsets' to DEFAULT_CONFIG in hermes_cli/config.py - Updated HermesCLI in cli.py to load and propagate disabled toolsets to AIAgent - Implemented robust two-phase resolution (additive then subtractive) in model_tools.py	2026-04-30 20:24:39 -07:00
hharry11	ca9a61ae38	fix(plugins): await async handlers in CLI and TUI dispatch	2026-04-30 19:56:18 -07:00
Teknium	80a676658c	fix(cli): surface self-improvement review summaries from bg thread When the self-improvement background review fires after a turn, it runs in a bg thread and emits a ' 💾 <summary>' line to announce what it saved to memory or skills. Two problems made this invisible to users even when the review successfully modified a skill: 1. The print went through `_cprint` (prompt_toolkit's print_formatted_text) on a bg thread while the CLI's PromptSession was live. Direct print_formatted_text races with the input-area redraw and the line can land behind/above the prompt, scrolled off without the user seeing it. 2. The message said only '💾 Skill created.' / '💾 Memory updated' with no indication that the self-improvement loop was the one doing this. Users who did catch the line couldn't tell the background review from some other agent action. Fixes: - `_cprint` now detects when it's called from a non-app thread with a running prompt_toolkit Application, and routes through `run_in_terminal` via `loop.call_soon_threadsafe`. That pauses the input, prints the line above the prompt, and redraws — the normal prompt_toolkit contract for bg-thread output. Direct-print fallback preserved for the no-app / same-thread / import-error paths. Affects every bg-thread emission, not just the review summary (curator summaries and auxiliary failure prints benefit too). - The summary now reads ' 💾 Self-improvement review: <summary>' in both the CLI and the gateway `background_review_callback` path, so the origin is unambiguous. Tests: - New `tests/cli/test_cprint_bg_thread.py` covers all five routing branches (no app, app-not-running, cross-thread schedule, same-thread direct, app-loop-attribute-error, import-error). - New case in `tests/run_agent/test_background_review.py` asserts the attributed prefix shows up in both `_safe_print` and `background_review_callback`. Live E2E: exercised _cprint from a bg thread inside a real Application event loop; confirmed get_app_or_none() sees the app, call_soon_threadsafe schedules run_in_terminal, and the inner _pt_print runs.	2026-04-30 14:07:22 -07:00
Teknium	c868425467	feat(kanban): durable multi-profile collaboration board (#17805 ) Salvage of PR #16100 onto current main (after emozilla's #17514 fix that unblocks plugin Pydantic body validation). History preserved on the standing `feat/kanban-standing` branch; this squashes the 22 iterative commits into one clean landing. What this lands: - SQLite kernel (hermes_cli/kanban_db.py) — durable task board with tasks, task_links, task_runs, task_comments, task_events, kanban_notify_subs tables. WAL mode, atomic claim via CAS, tenant-namespaced, skills JSON array per task, max-runtime timeouts, worker heartbeats, idempotency keys, circuit breaker on repeated spawn failures, crash detection via /proc/<pid>/status, run history preserved across attempts. - Dispatcher — runs inside the gateway by default (`kanban.dispatch_in_gateway: true`). Ticks every 60s, reclaims stale claims, promotes ready tasks, spawns `hermes -p <assignee> chat -q "work kanban task <id>"` with HERMES_KANBAN_TASK + HERMES_KANBAN_WORKSPACE env. Auto-loads `--skills kanban-worker` plus any per-task skills. Health telemetry warns on stuck ready queue. - Structured tool surface (tools/kanban_tools.py) — 7 tools (kanban_show, kanban_complete, kanban_block, kanban_heartbeat, kanban_comment, kanban_create, kanban_link). Gated on HERMES_KANBAN_TASK via check_fn so zero schema footprint in normal sessions. - System-prompt guidance (agent/prompt_builder.py KANBAN_GUIDANCE) injected only when kanban tools are active. - Dashboard plugin (plugins/kanban/dashboard/) — Linear-style board UI: triage/todo/ready/running/blocked/done columns, drag-drop, inline create, task drawer with markdown, comments, run history, dependency editor, bulk ops, lanes-by-profile grouping, WS-driven live refresh. Matches active dashboard theme via CSS variables. - CLI — `hermes kanban init\|create\|list\|show\|assign\|link\|unlink\| claim\|comment\|complete\|block\|unblock\|archive\|tail\|dispatch\|context\| init\|gc\|watch\|stats\|notify\|log\|heartbeat\|runs\|assignees` + `/kanban` slash in-session. - Worker + orchestrator skills (skills/devops/kanban-worker + kanban-orchestrator) — pattern library for good summary/metadata shapes, retry diagnostics, block-reason examples, fan-out patterns. - Per-task force-loaded skills — `--skill <name>` (repeatable), stored as JSON, threaded through to dispatcher argv as one `--skills X` pair per skill alongside the built-in kanban-worker. Dashboard + CLI + tool parity. - Deprecation of standalone `hermes kanban daemon` — stub exits 2 with migration guidance; `--force` escape hatch for headless hosts. - Docs (website/docs/user-guide/features/kanban.md + kanban-tutorial.md) with 11 dashboard screenshots walking through four user stories (Solo Dev, Fleet Farming, Role Pipeline, Circuit Breaker). - Tests (251 passing): kernel schema + migration + CAS atomicity, dispatcher logic, circuit breaker, crash detection, max-runtime timeouts, claim lifecycle, tenant isolation, idempotency keys, per- task skills round-trip + validation + dispatcher argv, tool surface (7 tools × round-trip + error paths), dashboard REST (CRUD + bulk + links + warnings), gateway-embedded dispatcher (config gate, env override, graceful shutdown), CLI deprecation stub, migration from legacy schemas. Gateway integration: - GatewayRunner._kanban_dispatcher_watcher — new asyncio background task, symmetric with _kanban_notifier_watcher. Runs dispatch_once via asyncio.to_thread so SQLite WAL never blocks the loop. Sleeps in 1s slices for snappy shutdown. Respects HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 env override for debugging. - Config: new `kanban` section in DEFAULT_CONFIG with `dispatch_in_gateway: true` (default) + `dispatch_interval_seconds: 60`. Additive — no \_config_version bump needed. Forward-compat: - workflow_template_id / current_step_key columns on tasks (v1 writes NULL; v2 will use them for routing). - task_runs holds claim machinery (claim_lock, claim_expires, worker_pid, last_heartbeat_at) so multi-attempt history is first- class from day one. Closes #16102. Co-authored-by: emozilla <emozilla@nousresearch.com>	2026-04-30 13:36:47 -07:00
Brooklyn Nicholson	e30de51ee9	fix(cli): tighten terminal leak fast path	2026-04-30 12:16:04 -05:00
brooklyn!	285e9efb3f	Merge pull request #17701 from NousResearch/bb/mouse-mode-self-heal fix(cli): recover leaked mouse tracking terminal state	2026-04-30 10:09:39 -07:00
Brooklyn Nicholson	cad7944b92	fix(tui): reset extended keyboard modes	2026-04-30 12:05:15 -05:00
Rob Moen	0dd373ec43	fix(context): honor model.context_length for Ollama num_ctx and all display paths When a user sets model.context_length in config.yaml, the value was only used for Hermes' internal compression decisions (context_compressor) but NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF metadata (often 256K+) and allocates that much VRAM regardless of the user's config — causing OOM on smaller GPUs like the P100 (16GB). Root cause: two separate context values existed independently: - context_compressor.context_length = config value (e.g. 65536) ✓ - _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config Changes: 1. Cap Ollama num_ctx to config context_length (run_agent.py) When model.context_length is explicitly set and no explicit ollama_num_ctx override exists, cap the auto-detected GGUF value to the user's context_length. This is the core fix — it prevents Ollama from allocating more VRAM than the user budgeted. 2. Pass config_context_length through all secondary call sites Several paths called get_model_context_length() without the config override, falling through to the 256K default fallback: - cli.py: @-reference expansion and /model switch display - gateway/run.py: @-reference expansion and /model switch display - tui_gateway/server.py: @-reference expansion - hermes_cli/model_switch.py: resolve_display_context_length() 3. Normalize root-level context_length in config (hermes_cli/config.py) _normalize_root_model_keys() now migrates root-level context_length into the model section, matching existing behavior for provider and base_url. Users who wrote `context_length: 65536` at the YAML root instead of under `model:` had it silently ignored. 4. Fix misleading comments (agent/model_metadata.py) DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K as two comments stated. Tests: 3 new tests for root-level context_length normalization. All existing context_length tests pass (96 tests).	2026-04-30 04:31:23 -07:00
Teknium	4d7fc0f37c	feat(gateway,cli): confirm /reload-mcp to warn about prompt cache invalidation Reloading MCP servers rebuilds the tool set for the active session, which invalidates the provider prompt cache (tool schemas are baked into the system prompt). The next message re-sends full input tokens — can be expensive on long-context or high-reasoning models. To surface that cost, /reload-mcp now routes through a new slash-confirm primitive with three options: Approve Once / Always Approve / Cancel. 'Always Approve' persists approvals.mcp_reload_confirm: false so future reloads run silently. Coverage: * Classic CLI (cli.py) — interactive numbered prompt. * TUI (tui_gateway + Ink ops.ts) — text warning on first call; `now` / `always` args skip the gate; `always` also persists the opt-out. * Messenger gateway — button UI on Telegram (inline keyboard), Discord (discord.ui.View), Slack (Block Kit actions); text fallback on every other platform via /approve /always /cancel replies intercepted in gateway/run.py _handle_message. * Config key: approvals.mcp_reload_confirm (default true). * Auto-reload paths (CLI file watcher, TUI config-sync mtime poll) pass confirm=true so they do NOT prompt. Implementation: * tools/slash_confirm.py — module-level pending-state store used by all adapters and by the CLI prompt. Thread-safe register/resolve/clear. * gateway/platforms/base.py — send_slash_confirm hook (default 'Not supported' → text fallback). * gateway/run.py — _request_slash_confirm helper + text intercept in _handle_message (yields to in-progress tool-exec approvals so dangerous-command /approve still unblocks the tool thread first). Tests: * tests/tools/test_slash_confirm.py — primitive lifecycle + async resolution + double-click atomicity (16 tests). * tests/hermes_cli/test_mcp_reload_confirm_gate.py — default-config shape + deep-merge preserves user opt-out (5 tests). Targeted runs (hermetic): 89 passed (slash-confirm, config gate, existing agent cache, existing telegram approval buttons).	2026-04-29 21:56:47 -07:00
teknium1	dd2d1ba5e6	refactor(reload-skills): queue note for next turn, drop cache invalidation + agent tool Salvage-follow-up to @shannonsands's /reload-skills PR. Trims the feature to match the design: user-initiated rescan, no prompt-cache reset, no new schema surface, no phantom user turn, and the next-turn note carries each added/removed skill's 60-char description (not just its name). Changes vs the original PR: * Drop the in-process skills prompt-cache clear in reload_skills(). Skills are invoked at runtime via /skill-name, skills_list, or skill_view — they don't need to live in the system prompt for the model to use them. Keeping the cache intact preserves prefix caching across the reload so /reload-skills pays no cache-reset cost. (MCP has to break the cache because tool schemas must be known at conversation start; skills do not.) * Drop the skills_reload agent tool and SKILLS_RELOAD_SCHEMA from tools/skills_tool.py, plus the four skills_reload enumerations in toolsets.py. No new schema surface — agents can already see a freshly- installed skill via skill_view / skills_list the moment it's on disk. * Replace the phantom 'role: user' turn injection with a one-shot queued note. CLI uses self._pending_skills_reload_note (same pattern as _pending_model_switch_note, prepended to the next API call and cleared). Gateway uses self._pending_skills_reload_notes[session_key]. The note is prepended to the NEXT real user message in this session, so message alternation stays intact and nothing out-of-band is persisted to the transcript. * reload_skills() now returns added/removed as [{'name': str, 'description': str}, ...] (description truncated to 60 chars — matches the curator / gateway adapter budget). The injected next-turn note formats each entry as 'name — description' so the model can actually reason about which new skills to call without running skills_list first. * Only emit the note when the diff is non-empty. On empty diff, print 'No new skills detected' and do nothing else. * Tests rewritten to cover the queue semantics, the description payload, and a regression guard that the prompt-cache snapshot is preserved.	2026-04-29 21:07:47 -07:00
Shannon Sands	7966560fb5	feat(skills): /reload-skills slash command + skills_reload agent tool Adds a public reload path for the in-process skill caches so newly installed (or removed) skills become visible mid-session without a gateway restart. Mirrors the shape of /reload-mcp. Three surfaces: * /reload-skills slash command — CLI (cli.py) and gateway (gateway/run.py), with /reload_skills alias for Telegram autocomplete and an explicit Discord registration. * skills_reload agent tool (tools/skills_tool.py) — lets agents/subagents pick up freshly-installed skills via tool call. * agent.skill_commands.reload_skills() — shared helper that clears _skill_commands, _SKILLS_PROMPT_CACHE (in-process LRU), and the on-disk .skills_prompt_snapshot.json, then returns an added/removed diff plus the new total count. Tested: * tests/agent/test_skill_commands_reload.py (9 cases) * tests/cli/test_cli_reload_skills.py (3 cases) * tests/gateway/test_reload_skills_command.py (4 cases) Use case: NemoClaw / OpenShell-style sandboxed orchestrators that drop skills into ~/.hermes/skills mid-session, plus agentic flows where the agent itself installs a skill via the shell tool and needs it bound without a gateway restart. The Python helper clear_skills_system_prompt_cache(clear_snapshot=True) already exists internally — this PR just exposes it via slash command and tool.	2026-04-29 21:07:47 -07:00
Brooklyn Nicholson	87e259a678	fix(cli): tighten mouse leak sanitizer Handle unbounded SGR mouse report coordinates and avoid regex work on ordinary prompt-buffer edits by short-circuiting before sanitizer passes.	2026-04-29 22:10:18 -05:00
Brooklyn Nicholson	98a428fd61	fix(cli): recover from leaked mouse tracking escapes Detect leaked SGR mouse-report fragments in CLI input, strip them, and reset terminal modes in-place so scroll and typing recover without reopening the tab. Add regression tests for escaped, visible, and bare leak forms.	2026-04-29 21:35:47 -05:00
kshitijk4poor	13c238327e	fix: address self-review findings for Vercel Sandbox salvage - Add vercel_sandbox to hardline blocklist container bypass test - Add vercel_sandbox to skills_tool remote backend parametrize test - Deduplicate runtime set: doctor.py and setup.py now import _SUPPORTED_VERCEL_RUNTIMES from terminal_tool.py - Add docstring to _run_bash explaining timeout/stdin_data discards - Always stop sandbox during cleanup (unconditional, matching Modal/Daytona) - Update security.md: container bypass text, production tip, comparison table - Update environment-variables.md: TERMINAL_ENV list, Vercel auth vars, TERMINAL_VERCEL_RUNTIME - Update inline comments in cli.py and config.py to include vercel_sandbox	2026-04-29 07:22:33 -07:00
Scott Trinh	5a1d4f6804	feat: add Vercel Sandbox backend Adds Vercel Sandbox as a supported Hermes terminal backend alongside existing providers (Local, Docker, Modal, SSH, Daytona, Singularity). Uses the Vercel Python SDK to create/manage cloud microVMs, supports snapshot-based filesystem persistence keyed by task_id, and integrates with the existing BaseEnvironment shell contract and FileSyncManager for credential/skill syncing. Based on #17127 by @scotttrinh, cherry-picked onto current main.	2026-04-29 07:22:33 -07:00
Teknium	13683c0842	feat(memory): notify providers on mid-process session_id rotation (#17409 ) Fixes #6672 Memory providers now receive on_session_switch() whenever AIAgent.session_id rotates mid-process — /resume, /branch, /reset, /new, and context compression. Before this, providers that cached per-session state in initialize() (Hindsight's _session_id, _document_id, accumulated _session_turns, _turn_counter) kept writing into the old session's record after the agent had moved on. MemoryProvider ABC ------------------ - New optional hook on_session_switch(new_session_id, , parent_session_id='', reset=False, *kwargs) with no-op default for backward compat. reset=True signals /reset or /new — providers should flush accumulated per-session buffers. reset=False for /resume, /branch, compression where the logical conversation continues. MemoryManager ------------- - on_session_switch() fans the hook out to every registered provider. Isolated try/except per provider — one bad provider can't block others. - Empty/None new_session_id is a no-op to avoid corrupting provider state during shutdown paths. run_agent.py ------------ - _sync_external_memory_for_turn now passes session_id=self.session_id into sync_all() and queue_prefetch_all(). Providers with defensive session_id updates in sync_turn (Hindsight already had this at plugins/memory/hindsight/__init__.py:1199) now actually receive the current id. - Compression block at ~L8884 already notified the context engine of the rollover; now also calls _memory_manager.on_session_switch(reason='compression'). cli.py ------ - new_session() fires reset=True, reason='new_session' so providers flush buffers. - _handle_resume_command fires reset=False, reason='resume' with the previous session as parent_session_id. - _handle_branch_command fires reset=False, reason='branch' with the parent session_id already captured for the DB parent link. gateway/run.py -------------- - _handle_resume_command now evicts the cached AIAgent, mirroring /branch and /reset. The next message rebuilds a fresh agent whose memory provider initialize() runs with the correct session_id — matches the pattern the gateway already uses for provider state cross-session transitions. Hindsight reference implementation ---------------------------------- - plugins/memory/hindsight/__init__.py adds on_session_switch that: updates _session_id, mints a fresh _document_id (prevents vectorize-io/hindsight#1303 overwrite), and clears _session_turns / _turn_counter / _turn_index so in-flight batches don't flush under the new document id. parent_session_id only overwritten when provided (avoids clobbering on a bare switch). Tests ----- - tests/agent/test_memory_session_switch.py: new dedicated file. ABC default no-op, manager fan-out, failure isolation, empty-id no-op, session_id propagation through sync_all/queue_prefetch_all, Hindsight state transitions for every reset/non-reset case, parent preservation. - tests/cli/test_branch_command.py: new test verifying /branch fires the hook with correct parent_session_id + reset=False + reason. - tests/gateway/test_resume_command.py: new test verifying /resume evicts the cached agent. - tests/run_agent/test_memory_sync_interrupted.py: updated existing assertions to account for the session_id kwarg on sync_all and queue_prefetch_all. E2E verified (real imports, tmp HERMES_HOME): - /resume: session_id updates, doc_id fresh, buffers cleared, parent set - /branch: session_id forks, parent links to original - /new: reset=True clears accumulated state - compression: reason='compression' propagated, lineage preserved - Empty id: no-op, state preserved - Legacy provider without on_session_switch: no crash Reported by @nicoloboschi (Hindsight maintainer); related scope-widening comment by @kidonng extending coverage to compression.	2026-04-29 04:57:22 -07:00
Ben Barclay	58a6171bfb	Merge pull request #17305 from NousResearch/feat/docker-run-as-host-user feat(docker): run container as host user to avoid root-owned bind mounts	2026-04-29 16:41:55 +10:00
Ben	5531c0df82	feat(docker): run container as host user to avoid root-owned bind mounts Add opt-in terminal.docker_run_as_host_user config flag that passes --user $(id -u):$(id -g) to the Docker backend so files written into bind-mounted directories (/workspace, /root, docker_volumes entries) are owned by the host user instead of root. When enabled on POSIX platforms, also drops SETUID/SETGID caps since the container no longer needs gosu/su to switch users. Falls back cleanly on platforms without os.getuid (e.g. native Windows Docker) with a warning. Wired through all three config.yaml -> TERMINAL_* env-var bridges: - cli.py env_mappings (CLI + TUI startup) - gateway/run.py _terminal_env_map (gateway / messaging platforms) - hermes_cli/config.py _config_to_env_sync (`hermes config set`) Also fixes docker_mount_cwd_to_workspace silently failing in gateway mode -- it was missing from gateway/run.py's _terminal_env_map. Adds tests/tools/test_terminal_config_env_sync.py to guard against future drift between the three bridges (same bug class shipped twice in one month). Bundled Hermes image won't work with this flag since its entrypoint expects to start as root for the usermod/gosu hermes flow; works with the default nikolaik/python-nodejs image and plain Debian/Ubuntu.	2026-04-29 16:16:43 +10:00
Teknium	bc79e227e6	feat(curator): background skill maintenance (issue #7816 ) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.	2026-04-28 22:33:33 -07:00
Brooklyn Nicholson	f95c34f415	fix(browser): address Copilot round-4 on /browser connect * Reject unsupported schemes (anything outside http/https/ws/wss) in cli.py /browser connect before probing or persisting, matching the gateway's existing 4015 path. * Defend gateway browser.manage against `{"url": null}` and non-string urls: empty/null falls back to DEFAULT_BROWSER_CDP_URL, non-string returns a 4015 instead of slipping into the generic 5031 catch via TypeError on `"://" in url`. * Add regression tests for both null-url fallback and non-string rejection.	2026-04-28 22:11:10 -07:00
Brooklyn Nicholson	d1ee4915f3	fix(browser): address Copilot review on /browser connect Fixes from Copilot's two passes on PR #17238: * Validate parsed URL once: reject missing host, invalid port, and unsupported scheme up front so malformed inputs (e.g. http://:9222 or http://localhost:abc) don't fall through to a generic 5031. * Tighten _is_default_local_cdp to require a discovery-style path so ws://127.0.0.1:9222/devtools/browser/<id> is not collapsed to bare http://127.0.0.1:9222 (which would lose the path and break the connect). * Move browser.manage into _LONG_HANDLERS so the up-to-10s launch-and-retry loop runs on the RPC pool instead of blocking the main dispatcher. * try_launch_chrome_debug uses Windows-appropriate detach kwargs (creationflags=DETACHED_PROCESS\|CREATE_NEW_PROCESS_GROUP) instead of POSIX-only start_new_session=True. * manual_chrome_debug_command uses subprocess.list2cmdline on Windows so the printed instruction is cmd.exe-compatible. * Mirror host/port validation in cli.py /browser connect so the classic CLI never persists an invalid BROWSER_CDP_URL.	2026-04-28 22:11:10 -07:00
Brooklyn Nicholson	69ff114ee2	fix(browser): avoid bogus Chrome launch fallback Detect an actual Chrome/Chromium executable before printing a manual CDP launch command, including common WSL-mounted Windows browser paths, so /browser connect does not suggest google-chrome when it is unavailable.	2026-04-28 22:11:10 -07:00
Brooklyn Nicholson	f10a3df632	fix(tui): align /browser connect local CDP handling Share Chrome CDP launch helpers between the classic CLI and TUI so default /browser connect uses loopback consistently, retries local Chrome launch, and reports a copyable manual-start command instead of claiming a dead connection.	2026-04-28 22:11:10 -07:00
Rugved Somwanshi	214ca943ac	feat(agent): add lmstudio integration	2026-04-28 12:27:36 -07:00
Teknium	b5128a751b	perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage (#17046 ) * perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage Four heavy SDK/module imports are now deferred off the hot startup path. Net savings on cold module imports: cli 1200 → 958 ms (-242) run_agent 1220 → 901 ms (-319) tools.web_tools 711 → 423 ms (-288) agent.anthropic_adapter 230 → 15 ms (-215) agent.auxiliary_client 253 → 68 ms (-185) Four independent changes in one PR since they all use the same pattern and share the same risk profile (heavy SDK import → lazy proxy or function-local import): 1. tools/web_tools.py: 'from firecrawl import Firecrawl' moved into _get_firecrawl_client(), which is only called when backend='firecrawl'. Users on Exa/Tavily/ Parallel pay zero firecrawl cost. 2. cli.py + gateway/run.py: 'from agent.account_usage import ...' moved into the /limits handlers. account_usage transitively pulls the OpenAI SDK chain; only needed when the user runs /limits. 3. agent/anthropic_adapter.py: 'try: import anthropic as _anthropic_sdk' replaced with a cached '_get_anthropic_sdk()' accessor. The three usage sites (build_anthropic_client, build_anthropic_bedrock_client, read_claude_code_credentials_from_keychain) now resolve via the accessor. All pre-existing test patches of 'agent.anthropic_adapter._anthropic_sdk' keep working because the accessor respects any value already in module globals. 4. agent/auxiliary_client.py AND run_agent.py: 'from openai import OpenAI' replaced with an '_OpenAIProxy()' module- level object that looks like the OpenAI class but imports the SDK on first call/isinstance check. This preserves: - 15+ in-module OpenAI(...) construction sites in auxiliary_client and the single site in run_agent's _create_openai_client (Python's function-scope name lookup finds the proxy, forwards the call); - 'patch("agent.auxiliary_client.OpenAI", ...)' and 'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test files (patch replaces the module attribute as usual). Tried two alternatives first: - 'from openai._client import OpenAI' — doesn't skip openai/__init__.py (the audit's hypothesis here was wrong). - Module-level __getattr__ — works for external access but Python function-scope name resolution skips __getattr__, so in-module OpenAI(...) calls NameError. Note: 'openai' still loads on 'import cli' because cli.py -> neuter_async_httpx_del() -> openai._base_client, and run_agent.py -> code_execution_tool.py (module-level build_execute_code_schema) -> _load_config() -> 'from cli import CLI_CONFIG'. Deferring those is a separate, larger change — out of scope for this PR. The savings above all come from avoiding the openai/, anthropic/, and firecrawl/* top-level type-tree imports on paths that don't need them. Verified: - 302/302 tests in tests/agent/{test_anthropic_adapter, test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain} pass. Two pre-existing failures on main unchanged. - 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail). - 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py, test_plugin_context_engine_init.py, test_invalid_context_length_warning.py, test_api_max_retries_config.py, tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py pass (1 pre-existing fail). - Live hermes chat smoke: 2 turns + /model switch + tool calls, zero errors in the 57-line agent.log window. - Module-level import of run_agent + auxiliary_client + anthropic_adapter no longer pulls 'anthropic' or 'firecrawl' at all. * fix(gateway): restore top-level account_usage import for test-patch surface CI caught two failures in tests/gateway/test_usage_command.py that I missed locally: AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage' The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...) to inject a fake account-fetch call. Moving the import inside the handler deleted that module-level attribute, breaking the patch surface. Restoring the top-level import in gateway/run.py gives up the ~230 ms gateway-boot savings from that one lazy, but: 1. the gateway is a long-running daemon — boot cost is paid once per install, not per turn; 2. the other four lazy-imports (firecrawl, openai, anthropic, cli's account_usage) remain in place and still account for the bulk of the savings reported in the PR body; 3. preserving the patch surface keeps the established 'gateway.run.fetch_account_usage' monkeypatch pattern working without touching tests. Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed. Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent): 2332 passed, 4 failed — all 4 pre-existing on main. --------- Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 09:38:42 -07:00
Teknium	e123f4ecf0	feat(gateway): opt-in runtime-metadata footer on final replies (#17026 ) Append a compact 'model · 68% · ~/projects/hermes' footer to the FINAL message of each turn, disabled by default (display.runtime_footer.enabled). Answers the Telegram-side parity ask: runtime context that the CLI status bar already shows is now available in messaging replies when enabled. Wiring: - gateway/runtime_footer.py: resolve_footer_config + format_runtime_footer + build_footer_line. Pure-function renderer; per-platform overrides under display.platforms.<platform>.runtime_footer. - gateway/run.py: appends footer to response right after reasoning prepend so it lands only on the final message (never tool progress or streaming chunks). When streaming already delivered the body (already_sent), the footer is sent as a small trailing message instead. - agent_result now exposes context_length alongside last_prompt_tokens so the footer can compute the pct; both gateway return paths updated. - /footer [on\|off\|status] slash command, wired in CLI (cli.py) and gateway (gateway/run.py both running-agent bypass and main dispatch). Global toggle only; per-platform overrides via config.yaml. Graceful degradation: - Missing context_length (unknown model) → pct field silently dropped (no '?%' artifact). - Empty final_response → no footer appended. - Unknown field names in config → silently ignored. Tests: 25-case unit suite (tests/gateway/test_runtime_footer.py) plus E2E harness covering streaming vs non-streaming branches, per-platform override, and the exact argument contract gateway/run.py uses. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 06:50:04 -07:00
ygd58	fb112d6a73	fix(cli): pass None as system_message in manual compress to prevent duplication _manual_compress() passed self.agent._cached_system_prompt to _compress_context() as the system_message argument. _compress_context calls _build_system_prompt(system_message), which appends system_message to prompt_parts that already contain the agent identity block — causing the identity to appear twice in the new session's system prompt (20,957 -> 42,303 chars, +102% as reported in issue #15281). Fix: pass None instead of _cached_system_prompt. _build_system_prompt(None) rebuilds the system prompt correctly from scratch without appending a pre-built prompt on top of the identity layers. Fixes #15281	2026-04-28 05:21:49 -07:00
Teknium	e63364b8df	revert: computer-use cua-driver (PR #16919 ) (#16927 ) Reverts PR #16919 (commits `dad10a78d`, `413ee1a28`, `b4a8031b2`, `afb958829`) which was merged prematurely. Restoring the pre-merge state so #14817 and #15328 can be revisited as standing PRs. Reverted commits: - `afb958829` fix(computer-use): harden image-rejection fallback + AUTHOR_MAP - `b4a8031b2` fix(computer-use): unwrap _multimodal tool results - `413ee1a28` feat(computer-use): background focus-safe backend - `dad10a78d` feat(computer-use): cua-driver backend, universal any-model schema Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 01:57:21 -07:00
crayfish-ai	f3371c39a4	fix(auxiliary): custom provider URL rewrite + main_runtime model for title gen - auxiliary_client: apply _to_openai_base_url() to custom base_url (fixes /anthropic → /v1 rewrite missing for provider="custom") - auxiliary_client: use main_runtime.get("model") instead of _read_main_model() so auxiliary tasks follow system default model changes - title_generator: thread main_runtime through generate_title → auto_title_session → maybe_auto_title - cli.py / gateway/run.py: pass main_runtime to maybe_auto_title - tests: update mock assertions for new main_runtime parameter	2026-04-28 01:47:25 -07:00
Teknium	dad10a78d0	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-04-28 01:46:36 -07:00
helix4u	49fb75463f	fix(gateway): keep env-token Slack enabled	2026-04-27 18:19:14 -07:00
Teknium	ac0325c257	diagnostic(cli): log slow bracketed-paste handler (>500ms) for #16263 (#16575 ) When a paste takes longer than 500ms to process on the prompt_toolkit event-loop thread, emit a logger.warning with elapsed time, byte size, line count, and sys.platform. Gives us concrete repro data for the recurring 'CLI freezes after paste on macOS' class of reports (issue #16263, plus sibling reports across Claude Code / Cursor / Lightroom against macOS Tahoe 26). Pure diagnostic — no behavior change. Two time.perf_counter() calls and one conditional per paste event. Log line only fires when the handler is actually slow, so normal pastes add no log noise.	2026-04-27 06:44:36 -07:00
Teknium	a59a98b180	fix(cli): pass session messages to shutdown_memory_provider (#15165 sibling) The gateway fix in the previous commit forwards _session_messages on gateway session teardown. The CLI exit cleanup path had the same bug: it read getattr(agent, 'conversation_history', None) or [] — but AIAgent has no conversation_history attribute, so providers always received []. Switch to _session_messages (same attribute the gateway now uses), guarded by isinstance(..., list) to preserve the no-arg fallback for MagicMock-based CLI test stubs. Adds tests/cli/test_cli_shutdown_memory_messages.py (4 cases mirroring the gateway suite).	2026-04-27 06:41:16 -07:00
Teknium	ec671c4154	feat(image-input): native multimodal routing based on model vision capability (#16506 ) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto \| native \| text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.	2026-04-27 06:27:59 -07:00
Teknium	bb00b783fb	fix(cli): eliminate ghost status-bar + DSR input leaks from terminal drift The CLI renders through prompt_toolkit in non-full-screen mode, so every repaint uses the renderer's tracked _cursor_pos.y to cursor_up() + erase before drawing the new frame. Any time that tracked position drifts from terminal reality, redraws stack on top of stale content instead of overwriting it. Four user-visible bugs share this root cause. Fixes: - #5474 (SIGWINCH ghosts): the resize wrapper previously only handled column-shrink reflow. Generalize it to force a full screen-clear (erase_screen + cursor_goto(0,0)) and renderer.reset() on every resize — covers widen, row-shrink, and multiplexer SIGWINCH-less redraws. - #8688 (cmux/tmux tab switch): no SIGWINCH fires on focus regain, so prompt_toolkit has no signal to recover. Add a _force_full_redraw() helper, bound to Ctrl+L (standard bash/zsh/vim convention) and exposed as /redraw. Users can manually clear drift without restarting Hermes. - #14692 (DSR response leaks — ^[[53;1R): resize storms make prompt_toolkit's CSI 6n queries race past the input parser; the terminal's reply ends up as literal input text. Add a sibling of the bracketed-paste sanitizer that strips \x1b[<row>;<col>R and the caret-escape visible form from paste text, buffer text-filter, and the input-processing loop. The idle-redraw removal (#12641) is in the preceding commit from @foxion37 — keeping them as separate commits preserves attribution.	2026-04-27 05:31:47 -07:00
Q	5e92b67807	fix: stop idle CLI redraws	2026-04-27 05:31:47 -07:00
Teknium	4a2ee6c162	fix(title-gen): surface auxiliary failures via _emit_auxiliary_failure Closes #15775. Title generation swallowed exceptions at debug level and returned None, so a depleted auxiliary provider (e.g. OpenRouter 402) silently left sessions with NULL titles. Reporter observed 45 untitled sessions accumulated over 19 days with no user-visible indication. - agent/title_generator.py: accept optional failure_callback, bump log to WARNING, invoke callback on call_llm exception (swallowing callback errors so nothing can crash the fire-and-forget worker thread). - cli.py, gateway/run.py: pass agent._emit_auxiliary_failure as the callback so failures route through the existing user-visible warning channel. - tests: cover callback fires / errors are swallowed / no-callback legacy behavior / maybe_auto_title forwards kwarg to worker.	2026-04-26 21:49:34 -07:00
romanornr	a0fe73bada	fix(cli): strip leaked bracketed-paste wrappers	2026-04-26 21:47:40 -07:00
Teknium	6c87371815	fix(openclaw-migration): case-preserving brand rewrite + one-time ~/.openclaw residue banner (#16327 ) Two related fixes for OpenClaw-residue problems after an OpenClaw→Hermes migration (especially migrations done via OpenClaw's own tool, which doesn't archive the source directory). 1. optional-skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py: rebrand_text() was rewriting ~/.openclaw/config.yaml → ~/.Hermes/config.yaml (capital H — a directory that doesn't exist). Now case-preserving: "OpenClaw" → "Hermes" (prose), but "openclaw" → "hermes" (so filesystem paths land on the real Hermes home). Regex logic unchanged — replacement function now checks if the matched text was all-lowercase and emits the replacement in the matching case. 2. agent/onboarding.py + cli.py: one-time startup banner the first time Hermes launches and finds ~/.openclaw/. Tells the user to run `hermes claw cleanup` to archive it, gated on the existing onboarding seen-flag framework (onboarding.seen.openclaw_residue_cleanup in config.yaml). Fires once per install; re-running requires wiping that flag or running cleanup directly. Tests: - 4 new TestDetectOpenclawResidue tests (present / absent / file-instead- of-dir / default-home smoke) - 2 TestOpenclawResidueHint tests (content check) - 2 TestOpenclawResidueSeenFlag tests (flag isolation + round-trip) - test_rebrand_text_preserves_filesystem_path_casing regression test with 4 scenarios including the exact ~/.openclaw/config.yaml case - Existing test_rebrand_text_* tests updated to the new case-preserving contract (lowercase input → lowercase output) Co-authored-by: teknium1 <teknium@noreply.github.com>	2026-04-26 20:57:26 -07:00
Teknium	478444c262	feat(checkpoints): auto-prune orphan and stale shadow repos at startup (#16303 ) Every working dir hermes ever touches gets its own shadow git repo under ~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/. The per-repo _prune is a no-op (comment in CheckpointManager._prune says so), so abandoned repos from deleted/moved projects or one-off tmp dirs pile up forever. Field reports put the typical offender at 1000+ repos / ~12 GB on active contributor machines. Adds an opt-in startup sweep that mirrors the sessions.auto_prune pattern from #13861 / #16286: - tools/checkpoint_manager.py: new prune_checkpoints() and maybe_auto_prune_checkpoints() helpers. Deletes shadow repos that are orphan (HERMES_WORKDIR marker points to a path that no longer exists) or stale (newest in-repo mtime older than retention_days). Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only runs once per min_interval_hours regardless of how many hermes processes start up. - hermes_cli/config.py: new checkpoints.auto_prune / retention_days / delete_orphans / min_interval_hours knobs. Default auto_prune: false so users who rely on /rollback against long-ago sessions never lose data silently. - cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune, called right next to the existing state.db maintenance block. - Docs updated with the new config knobs. - 11 regression tests: orphan/stale deletion, precedence, byte-freed tracking, non-shadow dir skip, interval gating, corrupt marker recovery. Refs #3015 (session-file disk growth was fixed in #16286; this covers the checkpoint side noted out-of-scope there).	2026-04-26 19:05:52 -07:00

1 2 3 4 5 ...

733 Commits