Commit Graph

342 Commits

Author SHA1 Message Date
0xbyt4
68fbcdaa06 fix: add browser_console to browser toolset and core tools list (#1084)
browser_console was registered in the tool registry but missing from
all toolset definitions (TOOLSETS, _HERMES_CORE_TOOLS, _LEGACY_TOOLSET_MAP),
so the agent could never discover or use it.

Added to all 4 locations + 4 wiring tests.

Cherry-picked from PR #1084 by @0xbyt4 (authorship preserved in tests).
2026-03-17 02:02:57 -07:00
teknium1
7d91b436e4 fix: exclude hidden directories from find/grep search backends (#1558)
The primary injection vector in #1558 was search_files discovering
catalog cache files in .hub/index-cache/ via find or grep, which
don't skip hidden directories like ripgrep does by default.

Three-layer fix:

1. _search_files (find): add -not -path '*/.*' to exclude hidden
   directories, matching ripgrep's default behavior.

2. _search_with_grep: add --exclude-dir='.*' to skip hidden
   directories in the grep fallback path.

3. _write_index_cache: write a .ignore file to .hub/ so ripgrep
   also skips it even when invoked with --hidden (belt-and-suspenders).

This makes all three search backends (rg, grep, find) consistently
exclude hidden directories, preventing the agent from discovering
and reading unvetted community content in hub cache files.
2026-03-17 02:02:57 -07:00
Teknium
4cb6735541
fix(approval): show full command in dangerous command approval (#1553)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

* fix(approval): show full command in dangerous command approval (#1553)

Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:

- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
  in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests

Cherry-picked from PR #1566 by crazywriter1.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
2026-03-17 02:02:33 -07:00
Teknium
12afccd9ca
fix(tools): chunk long messages in send_message_tool before dispatch (#1552)
* fix: prevent infinite 400 failure loop on context overflow (#1630)

When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message.  This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error.  Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.

Three-layer fix:

1. run_agent.py — Fallback heuristic: when a 400 error has a very short
   generic message AND the session is large (>40% of context or >80
   messages), treat it as a probable context overflow and trigger
   compression instead of aborting.

2. run_agent.py + gateway/run.py — Don't persist failed messages:
   when the agent returns failed=True before generating any response,
   skip writing the user's message to the transcript/DB. This prevents
   the session from growing on each failure.

3. gateway/run.py — Smarter error messages: detect context-overflow
   failures and suggest /compact or /reset specifically, instead of a
   generic 'try again' that will fail identically.

* fix(skills): detect prompt injection patterns and block cache file reads

Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):

1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
   (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
   was the original injection vector — untrusted skill descriptions
   in the catalog contained adversarial text that the model executed.

2. skill_view: warns when skills are loaded from outside the trusted
   ~/.hermes/skills/ directory, and detects common injection patterns
   in skill content ("ignore previous instructions", "<system>", etc.).

Cherry-picked from PR #1562 by ygd58.

* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)

Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.

- Apply truncate_message() chunking in _send_to_platform() before
  dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
  in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement

Cherry-picked from PR #1557 by llbn.

---------

Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
2026-03-17 01:52:43 -07:00
Teknium
81f76111b0
Merge pull request #1560 from eren-karakus0/fix/singularity-preflight-check
fix(terminal): add Singularity/Apptainer preflight availability check
2026-03-17 01:52:03 -07:00
teknium1
19eaf5d956 test: fix telegram mock to include ParseMode constant
The MarkdownV2 formatting change imports telegram.constants.ParseMode,
which the test mock didn't provide. Add ParseMode to the mock so
existing tests continue working.
2026-03-17 01:44:11 -07:00
Teknium
949fac192f
fix(tools): remove unnecessary crontab requirement from cronjob tool (#1638)
* fix(tools): remove unnecessary crontab requirement from cronjob tool

The hermes cron system is internal — it uses a JSON-based scheduler
ticked by the gateway (cron/scheduler.py), not system crontab.

The check for shutil.which('crontab') was preventing the cronjob tool
from being available in environments without crontab installed (e.g.
minimal Ubuntu containers).

Changes:
- Remove shutil.which('crontab') check from check_cronjob_requirements()
- Remove unused shutil import
- Update docstring to clarify internal scheduler is used
- Update tests to reflect new behavior and add coverage for all
  session modes (interactive, gateway, exec_ask)

Fixes #1589

* test: add HERMES_EXEC_ASK coverage for cronjob requirements

Adds missing test for the exec_ask session mode, complementing
the cherry-picked fix from PR #1633.

---------

Co-authored-by: Bartok9 <bartokmagic@proton.me>
2026-03-17 01:40:02 -07:00
Teknium
e3f9894caf
fix: send_animation metadata, MarkdownV2 inline code splitting, tirith cosign-free install (#1626)
* fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting

Anthropic routes OAuth/subscription requests based on Claude Code's
identity markers. Without them, requests get intermittent 500 errors
(~25% failure rate observed). This matches what pi-ai (clawdbot) and
OpenCode both implement for OAuth compatibility.

Changes (OAuth tokens only — API key users unaffected):

1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli'
2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI'
3. System prompt sanitization: replace Hermes/Nous references
4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools)
5. Tool name stripping: remove 'mcp_' prefix from response tool calls

Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate)
After: 16/16 OK, 0 failures, 0 retries (0% error rate)

* fix: three gateway issues from user error logs

1. send_animation missing metadata kwarg (base.py)
   - Base class send_animation lacked the metadata parameter that the
     call site in base.py line 917 passes. Telegram's override accepted
     it, but any platform without an override (Discord, Slack, etc.)
     hit TypeError. Added metadata to base class signature.

2. MarkdownV2 split-inside-inline-code (base.py truncate_message)
   - truncate_message could split at a space inside an inline code span
     (e.g. `function(arg1, arg2)`), leaving an unpaired backtick and
     unescaped parentheses in the chunk. Telegram rejects with
     'character ( is reserved'. Added inline code awareness to the
     split-point finder — detects odd backtick counts and moves the
     split before the code span.

3. tirith auto-install without cosign (tirith_security.py)
   - Previously required cosign on PATH for auto-install, blocking
     install entirely with a warning if missing. Now proceeds with
     SHA-256 checksum verification only when cosign is unavailable.
     Cosign is still used for full supply chain verification when
     present. If cosign IS present but verification explicitly fails,
     install is still aborted (tampered release).
2026-03-16 23:39:41 -07:00
Muhammet Eren Karakuş
43b8ecd172 fix(tests): use case-insensitive regex in singularity preflight tests
pytest.raises(match=...) is case-sensitive by default. The error
message starts with "Neither" (capital N) but the regex used lowercase
"neither", causing CI failures on Linux.
2026-03-16 19:01:39 +03:00
Muhammet Eren Karakuş
606f57a3ab fix(terminal): add Singularity/Apptainer preflight availability check
When neither apptainer nor singularity is installed, the Singularity
backend silently defaults to "singularity" and fails with a cryptic
FileNotFoundError inside _start_instance().  Add a preflight check
that resolves the executable and verifies it responds, raising a
clear RuntimeError with install instructions on failure.

Closes #1511
2026-03-16 18:25:20 +03:00
teknium1
b72f522e30 test: fake minisweagent for docker cwd mount regressions
Make the new Docker cwd-mount tests pass in CI environments that do not have the minisweagent package installed by injecting a fake module instead of monkeypatching an import path that may not exist.
2026-03-16 05:40:05 -07:00
teknium1
780ddd102b fix(docker): gate cwd workspace mount behind config
Keep Docker sandboxes isolated by default. Add an explicit terminal.docker_mount_cwd_to_workspace opt-in, thread it through terminal/file environment creation, and document the security tradeoff and config.yaml workflow clearly.
2026-03-16 05:20:56 -07:00
Bartok9
8cdbbcaaa2 fix(docker): auto-mount host CWD to /workspace
Fixes #1445 — When using Docker backend, the user's current working
directory is now automatically bind-mounted to /workspace inside the
container. This allows users to run `cd my-project && hermes` and have
their project files accessible to the agent without manual volume config.

Changes:
- Add host_cwd and auto_mount_cwd parameters to DockerEnvironment
- Capture original host CWD in _get_env_config() before container fallback
- Pass host_cwd through _create_environment() to Docker backend
- Add TERMINAL_DOCKER_NO_AUTO_MOUNT env var to disable if needed
- Skip auto-mount when /workspace is already explicitly mounted
- Add tests for auto-mount behavior
- Add documentation for the new feature

The auto-mount is skipped when:
1. TERMINAL_DOCKER_NO_AUTO_MOUNT=true is set
2. User configured docker_volumes with :/workspace
3. persistent_filesystem=true (persistent sandbox mode)

This makes the Docker backend behave more intuitively — the agent
operates on the user's actual project directory by default.
2026-03-16 05:20:21 -07:00
Teknium
c1da1fdcd5
feat: auto-detect provider when switching models via /model (#1506)
When typing /model deepseek-chat while on a different provider, the
model name now auto-resolves to the correct provider instead of
silently staying on the wrong one and causing API errors.

Detection priority:
1. Direct provider with credentials (e.g. DEEPSEEK_API_KEY set)
2. OpenRouter catalog match with proper slug remapping
3. Direct provider without creds (clear error beats silent failure)

Also adds DeepSeek as a first-class API-key provider — just set
DEEPSEEK_API_KEY and /model deepseek-chat routes directly.

Bare model names get remapped to proper OpenRouter slugs:
  /model gpt-5.4 → openai/gpt-5.4
  /model claude-opus-4.6 → anthropic/claude-opus-4.6

Salvages the concept from PR #1177 by @virtaava with credential
awareness and OpenRouter slug mapping added.

Co-authored-by: virtaava <virtaava@users.noreply.github.com>
2026-03-16 04:34:45 -07:00
Teknium
dd7921d514
fix(honcho): isolate session routing for multi-user gateway (#1500)
Salvaged from PR #1470 by adavyas.

Core fix: Honcho tool calls in a multi-session gateway could route to
the wrong session because honcho_tools.py relied on process-global
state. Now threads session context through the call chain:
  AIAgent._invoke_tool() → handle_function_call() → registry.dispatch()
  → handler **kw → _resolve_session_context()

Changes:
- Add _resolve_session_context() to prefer per-call context over globals
- Plumb honcho_manager + honcho_session_key through handle_function_call
- Add sync_honcho=False to run_conversation() for synthetic flush turns
- Pass honcho_session_key through gateway memory flush lifecycle
- Harden gateway PID detection when /proc cmdline is unreadable
- Make interrupt test scripts import-safe for pytest-xdist
- Wrap BibTeX examples in Jekyll raw blocks for docs build
- Fix thread-order-dependent assertion in client lifecycle test
- Expand Honcho docs: session isolation, lifecycle, routing internals

Dropped from original PR:
- Indentation change in _create_request_openai_client that would move
  client creation inside the lock (causes unnecessary contention)

Co-authored-by: adavyas <adavyas@users.noreply.github.com>
2026-03-16 00:23:47 -07:00
teknium1
1f72ce71b7 fix: restore local STT fallback for gateway voice notes
Restore local STT command fallback for voice transcription, detect whisper and ffmpeg in common local install paths, and avoid bogus no-provider messaging when only a backend-specific key is missing.
2026-03-15 21:51:40 -07:00
teknium1
01e62c067b merge: resolve conflicts with origin/main (SSH preflight check) 2026-03-15 21:13:40 -07:00
Teknium
ceb970c559
fix(terminal): add SSH preflight check (#1486) 2026-03-15 21:09:07 -07:00
teknium1
210d5ade1e feat(tools): centralize tool emoji metadata in registry + skin integration
- Add 'emoji' field to ToolEntry and 'get_emoji()' to ToolRegistry
- Add emoji= to all 50+ registry.register() calls across tool files
- Add get_tool_emoji() helper in agent/display.py with 3-tier resolution:
  skin override → registry default → hardcoded fallback
- Replace hardcoded emoji maps in run_agent.py, delegate_tool.py, and
  gateway/run.py with centralized get_tool_emoji() calls
- Add 'tool_emojis' field to SkinConfig so skins can override per-tool
  emojis (e.g. ares skin could use swords instead of wrenches)
- Add 11 tests (5 registry emoji, 6 display/skin integration)
- Update AGENTS.md skin docs table

Based on the approach from PR #1061 by ForgingAlex (emoji centralization
in registry). This salvage fixes several issues from the original:
- Does NOT split the cronjob tool (which would crash on missing schemas)
- Does NOT change image_generate toolset/requires_env/is_async
- Does NOT delete existing tests
- Completes the centralization (gateway/run.py was missed)
- Hooks into the skin system for full customizability
2026-03-15 20:21:21 -07:00
teknium1
33ebedc76d feat: enable persistent shell by default for SSH, add config option
SSH persistent shell now defaults to true — non-local backends benefit
most from state persistence across execute() calls. Local backend
remains opt-in via TERMINAL_LOCAL_PERSISTENT env var.

New config.yaml option: terminal.persistent_shell (default: true)
Controls the default for non-local backends. Users can disable with:
  hermes config set terminal.persistent_shell false

Precedence: per-backend env var > TERMINAL_PERSISTENT_SHELL > default.

Wired through cli.py, gateway/run.py, and hermes_cli/config.py so the
config.yaml value reaches terminal_tool via env var bridge.
2026-03-15 20:17:13 -07:00
teknium1
5b80654198 feat(tools): add persistent shell mode to local and SSH backends
Cherry-picked from PR #1067 by alt-glitch.
Adds PersistentShellMixin with file-based IPC protocol for long-lived
bash shells. LocalEnvironment and SSHEnvironment gain persistent=True
option. Controlled via TERMINAL_LOCAL_PERSISTENT / TERMINAL_SSH_PERSISTENT
env vars. Fixes latent stderr pipe buffer deadlock.

Co-authored-by: alt-glitch <balyan.sid@gmail.com>
2026-03-15 20:13:02 -07:00
Teknium
471c663fdf
fix(cli): silence tirith prefetch install warnings at startup (#1452) 2026-03-15 18:07:03 -07:00
Teknium
64d333204b
Merge pull request #1242 from NousResearch/fix/file-tool-log-noise
fix: reduce file tool log noise
2026-03-15 11:11:18 -07:00
alt-glitch
4511322f56 Merge origin/main into sid/persistent-backend
Resolve conflict in local.py: keep refactored _make_run_env helper
over inline _sanitize_subprocess_env logic.
2026-03-15 21:08:11 +05:30
teyrebaz33
20f381cfb6 fix: preserve thread context for cronjob deliver=origin
When a cronjob is created from within a Telegram or Slack thread,
deliver=origin was posting to the parent channel instead of the thread.

Root cause: the gateway never set HERMES_SESSION_THREAD_ID in the
session environment, so cronjob_tools.py could not capture thread_id
into the job's origin metadata — even though the scheduler already
reads origin.get('thread_id').

Fix:
- gateway/run.py: set HERMES_SESSION_THREAD_ID when thread_id is
  present on the session context, and clear it in _clear_session_env
- tools/cronjob_tools.py: read HERMES_SESSION_THREAD_ID into origin

Closes #1219
2026-03-15 06:57:00 -07:00
teknium1
b177b4abad fix(security): block gateway and tool env vars in subprocesses
Extend subprocess env sanitization beyond provider credentials by blocking Hermes-managed tool, messaging, and related gateway runtime vars. Reuse a shared sanitizer in LocalEnvironment and ProcessRegistry so background and PTY processes honor the same blocklist and _HERMES_FORCE_ escape hatch. Add regression coverage for local env execution and process_registry spawning.
2026-03-15 02:51:04 -07:00
Teknium
fd0e1aac72
Merge pull request #1400 from NousResearch/hermes/hermes-45b79a59-clawhub-search
fix: harden ClawHub skill search exact matches
2026-03-14 23:17:24 -07:00
teknium1
8ccd14a0d4 fix: improve clawhub skill search matching 2026-03-14 23:15:04 -07:00
teknium1
df9020dfa3 fix: harden clawhub skill search exact matches 2026-03-14 22:31:09 -07:00
Teknium
c6fb7f6463
Merge pull request #1399 from NousResearch/hermes/hermes-629f8bde
fix(#1002): expand environment blocklist for terminal isolation
2026-03-14 22:30:05 -07:00
teknium1
672dc1666f test: cover extra provider env blocklist vars 2026-03-14 22:29:35 -07:00
Teknium
5b11570517
Merge pull request #1398 from NousResearch/hermes/hermes-1b6f4583
fix(cron): support per-job runtime overrides
2026-03-14 22:29:30 -07:00
Synergy
28b3764d1e fix(cron): support per-job runtime overrides
Salvaged from PR #1292 onto current main. Preserve per-job model,
provider, and base_url overrides in cron execution, persist them in
job records, expose them through the cronjob tool create/update paths,
and add regression coverage. Deliberately does not persist per-job
api_key values.
2026-03-14 22:22:31 -07:00
Teknium
62f1c2b622
Merge pull request #1397 from NousResearch/hermes/hermes-629f8bde
fix: escape parens and braces in fork bomb regex pattern
2026-03-14 22:17:16 -07:00
Teknium
84d99f7754
Merge pull request #1394 from NousResearch/hermes/hermes-eca4a640
fix: honor stt.enabled false across gateway transcription
2026-03-14 22:11:47 -07:00
teknium1
d5b64ebdb3 fix: preserve legacy approval keys after pattern key migration 2026-03-14 22:10:39 -07:00
teknium1
f8ceadbad0 fix: propagate STT disable through shared transcription config
- add stt.enabled to the default user config
- make transcription_tools respect the disabled flag globally
- surface disabled state cleanly in voice mode diagnostics
- add regression coverage for disabled STT provider selection
2026-03-14 22:09:59 -07:00
0xbyt4
4a93cfd889 fix: use description as pattern_key to prevent approval collisions
pattern_key was derived by splitting the regex on \b and taking [1],
so patterns starting with the same word (e.g. find -exec rm and
find -delete) produced the same key "find". Approving one silently
approved the other. Using the unique description string as the key
eliminates all collisions.
2026-03-14 22:07:58 -07:00
0xbyt4
e6417cb7bc fix: escape parens and braces in fork bomb regex pattern
The fork bomb regex used `()` (empty capture group) and unescaped `{}`
instead of literal `\(\)` and `\{\}`. This meant the classic fork bomb
`:(){ :|:& };:` was never detected. Also added `\s*` between `:` and
`&` and between `;` and trailing `:` to catch whitespace variants.
2026-03-14 22:06:44 -07:00
Teknium
f9a61a0d9e
Merge pull request #1383 from NousResearch/hermes/hermes-7ef7cb6a
fix: add project root to PYTHONPATH in execute_code sandbox
2026-03-14 21:41:50 -07:00
teknium1
0614969f7b test: cover repo-root imports in execute_code sandbox 2026-03-14 21:41:12 -07:00
teknium1
f6ff6639e8 fix: complete salvaged cronjob dependency check
Add regression coverage for cronjob availability and import shutil for the crontab PATH check added from PR #1380.
2026-03-14 21:39:59 -07:00
teknium1
9f6bccd76a feat: add direct endpoint overrides for auxiliary and delegation
Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.
2026-03-14 21:11:37 -07:00
Teknium
88a48037d1
Merge pull request #1367 from NousResearch/hermes/hermes-aa701810
refactor: unify vision backend gating
2026-03-14 20:31:58 -07:00
teknium1
dc11b86e4b refactor: unify vision backend gating 2026-03-14 20:22:13 -07:00
teknium1
3229e434b8 Merge origin/main into hermes/hermes-5d160594 2026-03-14 19:34:05 -07:00
teknium1
2536ff328b fix: prefer prompt names for multi-skill cron jobs 2026-03-14 19:28:52 -07:00
teknium1
c3ea620796 feat: add multi-skill cron editing and docs 2026-03-14 19:18:10 -07:00
teknium1
7b140b31e6 fix: suppress duplicate cron sends to auto-delivery targets
Allow cron runs to keep using send_message for additional destinations, but
skip same-target sends when the scheduler will already auto-deliver the final
response there. Add prompt/tool guidance, docs, and regression coverage for
origin/home-channel resolution and thread-aware comparisons.
2026-03-14 19:07:50 -07:00
alt-glitch
879b7d3fbf fix(tests): update mock stdout in env blocklist tests
The fake_popen mock used iter([]) for proc.stdout which doesn't
support .close(). Use MagicMock with __iter__ instead, since
_drain_stdout now calls proc.stdout.close() in its finally block.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-15 02:48:05 +05:30
balyan.sid@gmail.com
9001b34146 simplify docstrings, fix some bugs 2026-03-15 01:20:42 +05:30
balyan.sid@gmail.com
861202b56c wip: add persistent shell to ssh and local terminal backends 2026-03-15 01:20:42 +05:30
teknium1
df5c61b37c feat: compress cron management into one tool 2026-03-14 12:21:50 -07:00
Teknium
1114841a2c
Merge pull request #1329 from NousResearch/hermes/hermes-2f2b4807
fix: tighten memory and session recall guidance
2026-03-14 11:38:54 -07:00
teknium1
5319bb6ac4 fix: tighten memory and session recall guidance
Remove diary-style memory framing from the system prompt and memory tool
schema, explicitly steer task/session logs to session_search, and clarify
that session_search is for cross-session recall after checking the current
conversation first. Add regression tests for the updated guidance text.
2026-03-14 11:36:47 -07:00
Teknium
80a243efe6
Merge pull request #1333 from NousResearch/hermes/hermes-1fc28d17
fix: improve browser cleanup, local browser PATH setup, and screenshot recovery
2026-03-14 11:36:09 -07:00
Dave Tist
895fe5a5d3 Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.

This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.

The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.

Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.

Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-14 11:28:26 -07:00
Stable Genius
3325e51e53 fix(skills): honor policy table for dangerous verdicts
Salvaged from PR #1007 by stablegenius49.

- let INSTALL_POLICY decide dangerous verdict handling for builtin skills
- allow --force to override blocked dangerous decisions for trusted and community sources
- accept --yes / -y as aliases for --force in /skills install
- update regression tests to match the intended policy precedence
2026-03-14 11:27:02 -07:00
Teknium
681f1068ea
Merge pull request #1303 from NousResearch/hermes/hermes-aa653753
feat(skills): integrate skills.sh as a hub source
2026-03-14 09:48:18 -07:00
teknium1
05770520af test(skills): isolate well-known cache in adapter tests
Prevent the mocked well-known adapter tests from sharing index-cache state across runs or xdist workers.
2026-03-14 08:24:59 -07:00
teknium1
43d25af964 feat(skills): add update checks and well-known support
Round out the skills hub integration with:
- richer skills.sh metadata and security surfacing during inspect/install
- generic check/update flows for hub-installed skills
- support for well-known Agent Skills endpoints via /.well-known/skills/index.json

Also persist upstream bundle metadata in the lock file and add
regression coverage plus live-compatible path handling for both
skills.sh aliases and well-known endpoints.
2026-03-14 08:21:16 -07:00
Teknium
707f3ff41f
refactor: tighten MoA traceback logging scope (#1307)
* improve: add exc_info to MoA error logging

* refactor: tighten MoA traceback logging scope

Follow up on salvaged PR #998 by limiting exc_info logging to terminal
failure paths, avoiding duplicate aggregator errors, and refreshing the
MoA default OpenRouter model lineup to current frontier options.

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-14 07:53:56 -07:00
teknium1
02c307b004 fix(skills): resolve skills.sh alias installs
Harden the skills.sh hub adapter by parsing skill detail pages when
search slugs do not map cleanly onto GitHub skill folder names.

This adds detail-page resolution for alias-style skills, improves
inspect metadata from the page itself, and covers the behavior with
regression tests plus live smoke validation for json-render-react.
2026-03-14 06:50:25 -07:00
Teknium
95c0bee7f8
Merge pull request #1299 from NousResearch/hermes/hermes-f5fb1d3b
fix: salvage PR #327 voice mode onto current main
2026-03-14 06:45:20 -07:00
teknium1
483a0b5233 feat(skills): integrate skills.sh as a hub source
Add a skills.sh-backed source adapter for the Hermes Skills Hub.

The new adapter uses skills.sh search results for discovery, falls back to
featured homepage links for browse-style queries, and resolves installs /
inspects through the underlying GitHub repo using common Agent Skills
layout conventions. Also expose skills-sh in CLI source filters and add
regression coverage for search, alias resolution, and source routing.
2026-03-14 06:23:36 -07:00
teknium1
04e151714f feat(mcp): make selective tool loading capability-aware
Extend the salvaged MCP filtering work so utility tools are also governed by policy and server capabilities. Store the registered tool subset per server so rediscovery and status reporting stay accurate after filtering.
2026-03-14 06:22:02 -07:00
teyrebaz33
3198cc8fd9 feat(mcp): per-server tool filtering via include/exclude and enabled flag
Add optional config keys under each mcp_servers entry:
- tools.include: whitelist, only listed tools are registered
- tools.exclude: blacklist, all tools except listed are registered
- enabled: false: skip server entirely, no connection attempt

Backward-compatible: no config keys = all tools registered as before.

Tests: TestMCPSelectiveToolLoading (4 tests), 134 passed total.
2026-03-14 06:12:17 -07:00
Oktay Aydin
00a0f18544 fix: clearer terminal backend requirement errors
Salvaged from PR #979 onto current main.

Preserve the current terminal backend checks while surfacing actionable
preflight errors for unknown TERMINAL_ENV values, missing SSH host/user
configuration, and missing Modal credentials/config. Tighten the modal
regression test so it deterministically exercises the config-missing
path.
2026-03-14 06:04:39 -07:00
teknium1
523a1b6faf merge: salvage PR #327 voice mode branch
Merge contributor branch feature/voice-mode onto current main for follow-up fixes.
2026-03-14 06:03:07 -07:00
Teknium
b646440ca0
fix(mcp): resolve npx stdio connection failures (#1291)
Salvaged from PR #977 onto current main.
Preserves the MCP stdio command resolution and improved error diagnostics,
with deterministic regression tests for the npx/node PATH cases.

Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-14 05:44:00 -07:00
0xbyt4
eb34c0b09a fix: voice pipeline hardening — 7 bug fixes with tests
1. Anthropic + ElevenLabs TTS silence: forward full response to TTS
   callback for non-streaming providers (choices first, then native
   content blocks fallback).

2. Subprocess timeout kill: play_audio_file now kills the process on
   TimeoutExpired instead of leaving zombie processes.

3. Discord disconnect cleanup: leave all voice channels before closing
   the client to prevent leaked state.

4. Audio stream leak: close InputStream if stream.start() fails.

5. Race condition: read/write _on_silence_stop under lock in audio
   callback thread.

6. _vprint force=True: show API error, retry, and truncation messages
   even during streaming TTS.

7. _refresh_level lock: read _voice_recording under _voice_lock.
2026-03-14 14:27:21 +03:00
0xbyt4
35748a2fb0 fix: address PR review round 4 — remove web UI, fix audio/import/interface issues
Remove web UI gateway (web.py, tests, docs, toolset, env vars, Platform.WEB
enum) per maintainer request — Nous is building their own official chat UI.

Fix 1: Replace sd.wait() with polling pattern in play_audio_file() to prevent
indefinite hang when audio device stalls (consistent with play_beep()).

Fix 2: Use importlib.util.find_spec() for faster_whisper/openai availability
checks instead of module-level imports that trigger heavy native library
loading (CUDA/cuDNN) at import time.

Fix 3: Remove inspect.signature() hack in _send_voice_reply() — add **kwargs
to Telegram send_voice() so all adapters accept metadata uniformly.

Fix 4: Make session loading resilient to removed platform enum values — skip
entries with unknown platforms instead of crashing the entire gateway.
2026-03-14 14:27:21 +03:00
0xbyt4
69cb373864 fix: update /voice status to show correct STT provider
Voice status was hardcoded to check API keys only. Now uses the actual
provider resolution (local/groq/openai) so it correctly shows
"local faster-whisper" when installed instead of "Groq" or "MISSING".
2026-03-14 14:27:21 +03:00
0xbyt4
b8f8d3ef9e feat: integrate faster-whisper local STT with three-provider fallback
Merge main's faster-whisper (local, free) with our Groq support into a
unified three-provider STT pipeline: local > groq > openai.

Provider priority ensures free options are tried first. Each provider
has its own transcriber function with model auto-correction, env-
overridable endpoints, and proper error handling.

74 tests cover the full provider matrix, fallback chains, model
correction, config loading, validation edge cases, and dispatch.
2026-03-14 14:27:21 +03:00
0xbyt4
2c84979d77 refactor: extract get_stt_model_from_config helper to eliminate DRY violation
Duplicated YAML config parsing for stt.model existed in gateway/run.py
and gateway/platforms/discord.py. Moved to a single helper in
transcription_tools.py and added 5 tests covering all edge cases.
2026-03-14 14:27:21 +03:00
0xbyt4
34c324ff59 fix(test): use real _strip_markdown_for_tts instead of duplicated copy
- Import from tools.tts_tool instead of reimplementing the logic
- Fix test_truncates_long_text: truncation is the caller's job, not the function's
- Remove unused re import
2026-03-14 14:27:20 +03:00
0xbyt4
eb79dda04b fix: persistent audio stream and silence detection improvements
- Keep InputStream alive across recordings to avoid CoreAudio hang on
  repeated open/close cycles on macOS.  New _ensure_stream() creates the
  stream once; start()/stop()/cancel() only toggle frame collection.
- Add _close_stream_with_timeout() with daemon thread to prevent
  stream.stop()/close() from blocking indefinitely.
- Add generation counter to detect stale stream-open completions after
  cancel or restart.
- Run recorder.cancel() in background thread from Ctrl+C handler to
  keep the event loop responsive.
- Add shutdown() method called on /voice off to release audio resources.
- Fix silence timer reset during active speech: use dip tolerance for
  _resume_start tracker so natural speech pauses (< 0.3s) don't prevent
  the silence timer from being reset.
- Update tests to match persistent stream behavior.
2026-03-14 14:27:20 +03:00
0xbyt4
eec04d180a fix(test): update play_beep test to match polling-based implementation
play_beep was changed from sd.wait() to a poll loop + sd.stop() in
302e1fe but the test was not updated. Now asserts sd.stop() instead
of sd.wait().
2026-03-14 14:27:20 +03:00
0xbyt4
9d58cafec9 fix: move process_loop voice restart to daemon thread, use _cprint consistently
- process_loop's continuous mode restart called _voice_start_recording()
  directly, blocking the loop if play_beep/sd.wait hangs — queued user
  input would stall silently. Dispatch to daemon thread like Ctrl+B handler.
- Replace print() with _cprint() in _handle_voice_command for consistency
  with the rest of the voice mode code.
2026-03-14 14:27:20 +03:00
0xbyt4
d0e3b39e69 fix: prevent Ctrl+B key handler from blocking prompt_toolkit event loop
The handle_voice_record key binding runs in prompt_toolkit's event-loop
thread. When silence auto-stopped recording, _voice_recording was False
but recorder.stop() still held AudioRecorder._lock. A concurrent Ctrl+B
press entered the START path and blocked on that lock, freezing all
keyboard input.

Three changes:
- Set _voice_processing atomically with _voice_recording=False in
  _voice_stop_and_transcribe to close the race window
- Add _voice_processing guard in the START path to prevent starting
  while stop/transcribe is still running
- Dispatch _voice_start_recording to a daemon thread so play_beep
  (sd.wait) and AudioRecorder.start (lock acquire) never block the
  event loop
2026-03-14 14:27:20 +03:00
0xbyt4
ecc3dd7c63 test: add comprehensive voice mode test coverage (86 tests)
- Add TestStreamingApiCall (11 tests) for _streaming_api_call in test_run_agent.py
- Add regression tests for all 7 bug fixes (edge_tts lazy import, output_stream
  cleanup, ctrl+c continuous reset, disable stops TTS, config key, chat cleanup,
  browser_tool signal handler removal)
- Add real behavior tests for CLI voice methods via _make_voice_cli() fixture:
  TestHandleVoiceCommandReal (7), TestEnableVoiceModeReal (7),
  TestDisableVoiceModeReal (6), TestVoiceSpeakResponseReal (7),
  TestVoiceStopAndTranscribeReal (12)
2026-03-14 14:27:20 +03:00
0xbyt4
6e51729c4c fix: remove browser_tool signal handlers that cause voice mode deadlock
browser_tool.py registered SIGINT/SIGTERM handlers that called sys.exit()
at module import time. When a signal arrived during a lock acquisition
(e.g. AudioRecorder._lock in voice mode), SystemExit was raised inside
prompt_toolkit's async event loop, corrupting coroutine state and making
the process unkillable (required SIGKILL).

atexit handler already ensures browser sessions are cleaned up on any
normal exit path, so the signal handlers were redundant and harmful.
2026-03-14 14:27:20 +03:00
0xbyt4
ddfd6e0c59 fix: resolve 6 voice mode bugs found during audit
- edge_tts NameError: _generate_edge_tts now calls _import_edge_tts()
  instead of referencing bare module name (tts_tool.py)
- TTS thread leak: chat() finally block sends sentinel to text_queue,
  sets stop_event, and joins tts_thread on exception paths (cli.py)
- output_stream leak: moved close() into finally block so audio device
  is released even on exception (tts_tool.py)
- Ctrl+C continuous mode: cancel handler now resets _voice_continuous
  to prevent auto-restart after user cancels recording (cli.py)
- _disable_voice_mode: now calls stop_playback() and sets
  _voice_tts_done so TTS stops when voice mode is turned off (cli.py)
- _show_voice_status: reads record key from config instead of
  hardcoding Ctrl+B (cli.py)
2026-03-14 14:27:20 +03:00
0xbyt4
a78249230c fix: address voice mode PR review (streaming TTS, prompt cache, _vprint)
Bug A: Replace stale _HAS_ELEVENLABS/_HAS_AUDIO boolean imports with
lazy import function calls (_import_elevenlabs, _import_sounddevice).
The old constants no longer exist in tts_tool -- the try/except
silently swallowed the ImportError, leaving streaming TTS dead.

Bug B: Use user message prefix instead of modifying system prompt for
voice mode instruction. Changing ephemeral_system_prompt mid-session
invalidates the prompt cache. Now the concise-response hint is
prepended to the user_message passed to run_conversation while
conversation_history keeps the original text.

Minor: Add force parameter to _vprint so critical error messages
(max retries, non-retryable errors, API failures) are always shown
even during streaming TTS playback.

Tests: 15 new tests in test_voice_cli_integration.py covering all
three fixes -- lazy import activation, message prefix behavior,
history cleanliness, system prompt stability, and AST verification
that all critical _vprint calls use force=True.
2026-03-14 14:27:20 +03:00
0xbyt4
b859dfab16 fix: address voice mode review feedback
1. Fully lazy imports: sounddevice, numpy, elevenlabs, edge_tts, and
   openai are never imported at module level. Each is imported only when
   the feature is explicitly activated, preventing crashes in headless
   environments (SSH, Docker, WSL, no PortAudio).

2. No core agent loop changes: streaming TTS path extracted from
   _interruptible_api_call() into separate _streaming_api_call() method.
   The original method is restored to its upstream form.

3. Configurable key binding: push-to-talk key changed from Ctrl+R
   (conflicts with readline reverse-search) to Ctrl+B by default.
   Configurable via voice.push_to_talk_key in config.yaml.

4. Environment detection: new detect_audio_environment() function checks
   for SSH, Docker, WSL, and missing audio devices before enabling voice
   mode. Auto-disables with clear warnings in incompatible environments.

5. Graceful degradation: every audio touchpoint (sd.play, sd.InputStream,
   sd.OutputStream) wrapped in try/except with ImportError/OSError
   handling. Failures produce warnings, not crashes.
2026-03-14 14:27:20 +03:00
0xbyt4
dad865e920 fix: fix silence detection bugs and add Phase 4 voice mode features
Fix 3 critical bugs in silence detection:
- Micro-pause tolerance now tracks dip duration (not time since speech start)
- Peak RMS check in stop() prevents discarding recordings with real speech
- Reduced min_speech_duration from 0.5s to 0.3s for reliable speech confirmation

Phase 4 features: configurable silence params, visual audio level indicator,
voice system prompt, tool call audio cues, TTS interrupt, continuous mode
auto-restart, interruptable playback via Popen tracking.
2026-03-14 14:26:30 +03:00
0xbyt4
32b033c11c feat: add silence filter, hallucination guard, and continuous mode control
- Skip silent recordings before STT call (RMS check in AudioRecorder.stop)
- Filter known Whisper hallucinations ("Thank you.", "Bye." etc.)
- Continuous mode: Ctrl+R starts loop, Ctrl+R during recording exits it
- Wait for TTS to finish before auto-restart to avoid recording speaker
- Silence timeout increased to 3s for natural pauses
- Tests: hallucination filter, silent recording skip, real speech passthrough
2026-03-14 14:25:28 +03:00
0xbyt4
bfd9c97705 feat: add Phase 4 low-latency features for voice mode
- Audio cues: beep on record start (880Hz), double beep on stop (660Hz)
- Silence detection: auto-stop recording after 3s of silence (RMS-based)
- Continuous mode: auto-restart recording after agent responds
  - Ctrl+R starts continuous mode, Ctrl+R during recording exits it
  - Waits for TTS to finish before restarting to avoid recording speaker
- Tests: 7 new tests for beep generation and silence detection
2026-03-14 14:25:28 +03:00
0xbyt4
a69bd55b5a fix: isolate GROQ_API_KEY in test_missing_stt_key test
The test was failing because GROQ_API_KEY leaked from the environment.
Now both VOICE_TOOLS_OPENAI_KEY and GROQ_API_KEY are removed to
properly test the "no STT key" scenario.
2026-03-14 14:25:28 +03:00
0xbyt4
c23928d089 fix: improve voice mode robustness and add integration tests
- Show TTS errors to user instead of silently logging
- Improve markdown stripping: code blocks, URLs, links, horizontal rules
- Fix stripping order: process markdown links before removing URLs
- Add threading.Lock for voice state variables (cross-thread safety)
- Add 14 CLI integration tests (markdown stripping, command parsing, thread safety)
- Total: 47 voice-related tests
2026-03-14 14:25:28 +03:00
0xbyt4
37b01ab964 test: add transcription_tools tests for multi-provider STT
- Provider resolution: OpenAI priority, Groq fallback, no keys
- Model auto-correction: Groq corrects OpenAI models and vice versa
- Success path: transcription, API errors, whitespace stripping
- 12 new tests, 33 total voice-related tests
2026-03-14 14:25:28 +03:00
0xbyt4
1a6fbef8a9 feat: add voice mode with push-to-talk and TTS output for CLI
Implements Issue #314 Phase 2 & 3:
- /voice command to toggle voice mode (on/off/tts/status)
- Ctrl+Space push-to-talk recording via sounddevice
- Whisper STT transcription via existing transcription_tools
- Optional TTS response playback via existing tts_tool
- Visual indicators in prompt (recording/transcribing/voice)
- 21 unit tests, all mocked (no real mic/API)
- Optional deps: sounddevice, numpy (pip install hermes-agent[voice])
2026-03-14 14:25:28 +03:00
teknium1
5c9a84219d fix: complete send_message MEDIA delivery salvage
- prevent raw MEDIA tag leakage outside the gateway pipeline
- make extract_media handle quoted/backticked paths and optional whitespace
- send Telegram media natively with explicit error/warning handling
- add regression tests for Telegram media dispatch and MEDIA parsing
2026-03-14 04:02:03 -07:00
teknium1
96c250e538 test: cover pipe characters in v4a patch apply
Add a regression test for apply_v4a_operations when read content contains a literal pipe character outside a line-number prefix.
2026-03-14 03:54:46 -07:00
Teknium
6036793f60
fix: clearer docker backend preflight errors (#1276)
* feat: improve context compaction handoff summaries

Adapt PR #916 onto current main by replacing the old context summary marker
with a clearer handoff wrapper, updating the summarization prompt for
resume-oriented summaries, and preserving the current call_llm-based
compression path.

* fix: clearer error when docker backend is unavailable

* fix: preserve docker discovery in backend preflight

Follow up on salvaged PR #940 by reusing find_docker() during the new
availability check so non-PATH Docker Desktop installs still work. Add
a regression test covering the resolved executable path.

---------

Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
2026-03-14 02:53:02 -07:00
teknium1
6f1889b0fa fix: preserve current approval semantics for tirith guard
Restore gateway/run.py to current main behavior while keeping tirith startup
and pattern_keys replay, preserve yolo and non-interactive bypass semantics in
the combined guard, and add regression tests for yolo and view-full flows.
2026-03-14 00:17:04 -07:00
sheeki003
375ce8a881 feat(security): add tirith pre-exec command scanning
Integrate tirith as a pre-execution security scanner that detects
homograph URLs, pipe-to-interpreter patterns, terminal injection,
zero-width Unicode, and environment variable manipulation — threats
the existing 50-pattern dangerous command detector doesn't cover.

Architecture: gather-then-decide — both tirith and the dangerous
command detector run before any approval prompt, preventing gateway
force=True replay from bypassing one check when only the other was
shown to the user.

New files:
- tools/tirith_security.py: subprocess wrapper with auto-installer,
  mandatory cosign provenance verification, non-blocking background
  download, disk-persistent failure markers with retryable-cause
  tracking (cosign_missing auto-clears when cosign appears on PATH)
- tests/tools/test_tirith_security.py: 62 tests covering exit code
  mapping, fail_open, cosign verification, background install,
  HERMES_HOME isolation, and failure recovery
- tests/tools/test_command_guards.py: 21 integration tests for the
  combined guard orchestration

Modified files:
- tools/approval.py: add check_all_command_guards() orchestrator,
  add allow_permanent parameter to prompt_dangerous_approval()
- tools/terminal_tool.py: replace _check_dangerous_command with
  consolidated check_all_command_guards
- cli.py: update _approval_callback for allow_permanent kwarg,
  call ensure_installed() at startup
- gateway/run.py: iterate pattern_keys list on replay approval,
  call ensure_installed() at startup
- hermes_cli/config.py: add security config defaults, split
  commented sections for independent fallback
- cli-config.yaml.example: document tirith security config
2026-03-14 00:11:27 -07:00
Teknium
a20d373945
fix: worktree-aware minisweagent path discovery + clean up requirements check (#1248)
Salvage of PR #1246 by ChatGPT (teknium1 session), resolved against
current main which already includes #1239.

Changes:
- Add minisweagent_path.py: worktree-aware helper that finds
  mini-swe-agent/src from either the current checkout or the main
  checkout behind a git worktree
- Use the helper in tools/terminal_tool.py and mini_swe_runner.py
  instead of naive path-relative lookup that fails in worktrees
- Clean up check_terminal_requirements():
  - local: return True (no minisweagent dep, per #1239)
  - singularity/ssh: remove unnecessary minisweagent imports
  - docker/modal: use importlib.util.find_spec with clear error
- Add regression tests for worktree path discovery and tool resolution
2026-03-13 23:39:51 -07:00
Teknium
21422dba44
Merge pull request #1239 from NousResearch/hermes/hermes-07d947aa
fix: stop local terminal warning without minisweagent
2026-03-13 22:14:44 -07:00
teknium1
b59da08730 fix: reduce file tool log noise
- treat git diff --cached --quiet rc=1 as an expected checkpoint state
  instead of logging it as an error
- downgrade expected write PermissionError/EROFS/EACCES failures out of
  error logging while keeping unexpected exceptions at error level
- add regression tests for both logging behaviors
2026-03-13 22:14:00 -07:00
teknium1
329f83ff2d fix: stop local terminal warning without minisweagent 2026-03-13 22:00:36 -07:00
0xIbra
437ec17125 fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths
Several files resolved paths via Path.home() / ".hermes" or
os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME
environment variable. This broke isolation when running multiple
Hermes instances with distinct HERMES_HOME directories.

Replace all hardcoded paths with calls to get_hermes_home() from
hermes_cli.config, consistent with the rest of the codebase.

Files fixed:
- tools/process_registry.py (processes.json)
- gateway/pairing.py (pairing/)
- gateway/sticker_cache.py (sticker_cache.json)
- gateway/channel_directory.py (channel_directory.json, sessions.json)
- gateway/config.py (gateway.json, config.yaml, sessions_dir)
- gateway/mirror.py (sessions/)
- gateway/hooks.py (hooks/)
- gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/)
- gateway/platforms/whatsapp.py (whatsapp/session)
- gateway/delivery.py (cron/output)
- agent/auxiliary_client.py (auth.json)
- agent/prompt_builder.py (SOUL.md)
- cli.py (config.yaml, images/, pastes/, history)
- run_agent.py (logs/)
- tools/environments/base.py (sandboxes/)
- tools/environments/modal.py (modal_snapshots.json)
- tools/environments/singularity.py (singularity_snapshots.json)
- tools/tts_tool.py (audio_cache)
- hermes_cli/status.py (cron/jobs.json, sessions.json)
- hermes_cli/gateway.py (logs/, whatsapp session)
- hermes_cli/main.py (whatsapp/session)

Tests updated to use HERMES_HOME env var instead of patching Path.home().

Closes #892

(cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)
2026-03-13 21:32:53 -07:00
Teknium
07927f6bf2
feat(stt): add free local whisper transcription via faster-whisper (#1185)
* fix: Home Assistant event filtering now closed by default

Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.

Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)

A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.

All 49 gateway HA tests + 52 HA tool tests pass.

* docs: update Home Assistant integration documentation

- homeassistant.md: Fix event filtering docs to reflect closed-by-default
  behavior. Add watch_all option. Replace Python dict config example with
  YAML. Fix defaults table (was incorrectly showing 'all'). Add required
  configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
  diagram, platform toolsets table, and Next Steps links.

* fix(terminal): strip provider env vars from background and PTY subprocesses

Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:

- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)

Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.

Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.

Gap identified by PR #1004 (@PeterFile).

* feat(delegate): add observability metadata to subagent results

Enrich delegate_task results with metadata from the child AIAgent:

- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status

Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.

Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness

Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>

* feat(stt): add free local whisper transcription via faster-whisper

Replace OpenAI-only STT with a dual-provider system mirroring the TTS
architecture (Edge TTS free / ElevenLabs paid):

  STT: faster-whisper local (free, default) / OpenAI Whisper API (paid)

Changes:
- tools/transcription_tools.py: Full rewrite with provider dispatch,
  config loading, local faster-whisper backend, and OpenAI API backend.
  Auto-downloads model (~150MB for 'base') on first voice message.
  Singleton model instance reused across calls.
- pyproject.toml: Add faster-whisper>=1.0.0 as core dependency
- hermes_cli/config.py: Expand stt config to match TTS pattern with
  provider selection and per-provider model settings
- agent/context_compressor.py: Fix .strip() crash when LLM returns
  non-string content (dict from llama.cpp, None). Fixes #1100 partially.
- tests/: 23 new tests for STT providers + 2 for compressor fix
- docs/: Updated Voice & TTS page with STT provider table, model sizes,
  config examples, and fallback behavior

Fallback behavior:
- Local not installed → OpenAI API (if key set)
- OpenAI key not set → local whisper (if installed)
- Neither → graceful error message to user

Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>

---------

Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
Teknium
02a819b16e
feat(delegate): add observability metadata to subagent results (#1175)
* fix: Home Assistant event filtering now closed by default

Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.

Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)

A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.

All 49 gateway HA tests + 52 HA tool tests pass.

* docs: update Home Assistant integration documentation

- homeassistant.md: Fix event filtering docs to reflect closed-by-default
  behavior. Add watch_all option. Replace Python dict config example with
  YAML. Fix defaults table (was incorrectly showing 'all'). Add required
  configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
  diagram, platform toolsets table, and Next Steps links.

* fix(terminal): strip provider env vars from background and PTY subprocesses

Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:

- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)

Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.

Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.

Gap identified by PR #1004 (@PeterFile).

* feat(delegate): add observability metadata to subagent results

Enrich delegate_task results with metadata from the child AIAgent:

- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status

Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.

Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness

Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>

---------

Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
2026-03-13 08:07:12 -07:00
Muhammet Eren Karakuş
c92507e53d
fix(terminal): strip Hermes provider env vars from subprocess environment (#1157)
Terminal subprocesses inherit OPENAI_BASE_URL and other provider env
vars loaded from ~/.hermes/.env, silently misrouting external CLIs
like codex.  Build a blocklist dynamically from the provider registry
so new providers are automatically covered.  Callers that truly need
a blocked var can opt in via the _HERMES_FORCE_ prefix.

Closes #1002

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 07:52:03 -07:00
teknium1
06a5cc484c fix: improve gateway secret capture guidance message
The old message referenced 'hermes setup' which doesn't handle
skill-specific env vars. Updated to direct users to load the skill
in the local CLI (which triggers the secure prompt) or add the key
to ~/.hermes/.env manually.
2026-03-13 04:10:22 -07:00
Teknium
0157253145
Merge pull request #1152 from NousResearch/hermes/hermes-f47f71c0
feat: concurrent tool execution with ThreadPoolExecutor
2026-03-13 03:20:38 -07:00
kshitijk4poor
ccfbf42844 feat: secure skill env setup on load (core #688)
When a skill declares required_environment_variables in its YAML
frontmatter, missing env vars trigger a secure TUI prompt (identical
to the sudo password widget) when the skill is loaded. Secrets flow
directly to ~/.hermes/.env, never entering LLM context.

Key changes:
- New required_environment_variables frontmatter field for skills
- Secure TUI widget (masked input, 120s timeout)
- Gateway safety: messaging platforms show local setup guidance
- Legacy prerequisites.env_vars normalized into new format
- Remote backend handling: conservative setup_needed=True
- Env var name validation, file permissions hardened to 0o600
- Redact patterns extended for secret-related JSON fields
- 12 existing skills updated with prerequisites declarations
- ~48 new tests covering skip, timeout, gateway, remote backends
- Dynamic panel widget sizing (fixes hardcoded width from original PR)

Cherry-picked from PR #723 by kshitijk4poor, rebased onto current main
with conflict resolution.

Fixes #688

Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
2026-03-13 03:14:04 -07:00
teknium1
5d0d5b191c feat: concurrent tool execution with ThreadPoolExecutor
When the model returns multiple tool calls in a single response, they are
now executed concurrently using a thread pool instead of sequentially.
This significantly reduces wall-clock time when multiple independent tools
are batched (e.g. parallel web_search, read_file, terminal calls).

Architecture:
- _execute_tool_calls() dispatches to sequential or concurrent path
- Single tool calls and batches containing 'clarify' use sequential path
- Multiple non-interactive tools use ThreadPoolExecutor (max 8 workers)
- Results are collected and appended to messages in original order
- _invoke_tool() extracted as shared tool invocation helper

Safety:
- Pre-flight interrupt check skips all tools if interrupted
- Per-tool exception handling: one failure doesn't crash the batch
- Result truncation (100k char limit) applied per tool
- Budget pressure injection after all tools complete
- Checkpoints taken before file-mutating tools
- CLI spinner shows batch progress, then per-tool completion messages

Tests: 10 new tests covering dispatch logic, ordering, error handling,
interrupt behavior, truncation, and _invoke_tool routing.
2026-03-13 02:51:51 -07:00
Teknium
2a62514d17
feat: add 'View full command' option to dangerous command approval (#887)
When a dangerous command is detected and the user is prompted for
approval, long commands are truncated (80 chars in fallback, 70 chars
in the TUI). Users had no way to see the full command before deciding.

This adds a 'View full command' option across all approval interfaces:

- CLI fallback (tools/approval.py): [v]iew option in the prompt menu.
  Shows the full command and re-prompts for approval decision.
- CLI TUI (cli.py): 'Show full command' choice in the arrow-key
  selection panel. Expands the command display in-place and removes
  the view option after use.
- CLI callbacks (callbacks.py): 'view' choice added to the list when
  the command exceeds 70 characters.
- Gateway (gateway/run.py): 'full', 'show', 'view' responses reveal
  the complete command while keeping the approval pending.

Includes 7 new tests covering view-then-approve, view-then-deny,
short command fallthrough, and double-view behavior.

Closes community feedback about the 80-char cap on dangerous commands.
2026-03-12 06:27:21 -07:00
teknium1
a37fc05171 fix: skip hanging tests + add global test timeout
4 test files spawn real processes or make live API calls that hang
indefinitely in batch/CI runs. Skip them with pytestmark:

- tests/tools/test_code_execution.py (subprocess spawns)
- tests/tools/test_file_tools_live.py (live LocalEnvironment)
- tests/test_413_compression.py (blocks on process)
- tests/test_agent_loop_tool_calling.py (live OpenRouter API calls)

Also added global 30s signal.alarm timeout in conftest.py as a safety
net, and removed stale nous-api test that hung on OAuth browser login.

Suite now runs in ~55s with no hangs.
2026-03-12 01:23:28 -07:00
teknium1
2192b17670 merge: resolve conflicts with origin/main
- gateway/run.py: Take main's _resolve_gateway_model() helper
- hermes_cli/setup.py: Re-apply nous-api removal after merge brought
  it back. Fix provider_idx offset (Custom is now index 3, not 4).
- tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)
2026-03-12 00:29:04 -07:00
teknium1
29ef69c703 fix: update all test mocks for call_llm migration
Update 14 test files to use the new call_llm/async_call_llm mock
patterns instead of the old get_text_auxiliary_client/
get_vision_auxiliary_client tuple returns.

- vision_tools tests: mock async_call_llm instead of _aux_async_client
- browser tests: mock call_llm instead of _aux_vision_client
- flush_memories tests: mock call_llm instead of get_text_auxiliary_client
- session_search tests: mock async_call_llm with RuntimeError
- mcp_tool tests: fix whitelist model config, use side_effect for
  multi-response tests
- auxiliary_config_bridge: update for model=None (resolved in router)

3251 passed, 2 pre-existing unrelated failures.
2026-03-11 21:06:54 -07:00
teknium1
0aa31cd3cb feat: call_llm/async_call_llm + config slots + migrate all consumers
Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).
2026-03-11 20:52:19 -07:00
0xbyt4
4a8f23eddf fix: correctly track failed MCP server connections in discovery
_discover_one() caught all exceptions and returned [], making
asyncio.gather(return_exceptions=True) redundant. The
isinstance(result, Exception) branch in _discover_all() was dead
code, so failed_count was always 0. This caused:
- No summary printed when all servers fail (silent failure)
- ok_servers always equaling total_servers (misleading count)
- Unused variables transport_desc and transport_type

Fix: let exceptions propagate to gather() so failed_count increments
correctly. Move per-server failure logging to _discover_all(). Remove
dead variables.
2026-03-11 18:24:45 +03:00
dmahan93
59b53f0a23 fix: skip tests when atroposlib/minisweagent unavailable in CI
- test_agent_loop_tool_calling.py: import atroposlib at module level
  to trigger skip (environments.agent_loop is now importable without
  atroposlib due to __init__.py graceful fallback)
- test_modal_sandbox_fixes.py: skip TestToolResolution tests when
  minisweagent not installed
2026-03-11 06:52:55 -07:00
dmahan93
0f53275169 test: skip atropos-dependent tests when atroposlib not installed
Guard all test files that import from environments/ or atroposlib
with try/except + pytest.skip(allow_module_level=True) so they
gracefully skip instead of crashing when deps aren't available.
2026-03-11 06:52:55 -07:00
dmahan93
b03aefaf20 test: 13 tests for Modal sandbox infra fixes 2026-03-11 06:51:42 -07:00
teknium1
184aa5b2b3 fix: tighten exc_info assertion in vision test (from PR #803)
The weaker assertion (r.exc_info is not None) passes even when
exc_info is (None, None, None). Check r.exc_info[0] is not None
to verify actual exception info is present.

The _aux_async_client mock was already applied on main.

Co-authored-by: OutThisLife <nickolasgustafsson@gmail.com>
2026-03-11 06:32:01 -07:00
teknium1
9423fda5cb feat: configurable subagent provider:model with full credential resolution
Adds delegation.model and delegation.provider config fields so subagents
can run on a completely different provider:model pair than the parent agent.

When delegation.provider is set, the system resolves the full credential
bundle (base_url, api_key, api_mode) via resolve_runtime_provider() —
the same path used by CLI/gateway startup. This means all configured
providers work out of the box: openrouter, nous, zai, kimi-coding,
minimax, minimax-cn.

Key design decisions:
- Provider resolution uses hermes_cli.runtime_provider (single source of
  truth for credential resolution across CLI, gateway, cron, and now
  delegation)
- When only delegation.model is set (no provider), the model name changes
  but parent credentials are inherited (for switching models within the
  same provider like OpenRouter)
- When delegation.provider is set, full credentials are resolved
  independently — enabling cross-provider delegation (e.g. parent on
  Nous Portal, subagents on OpenRouter)
- Clear error messages if provider resolution fails (missing API key,
  unknown provider name)
- _load_config() now falls back to hermes_cli.config.load_config() for
  gateway/cron contexts where CLI_CONFIG is unavailable

Based on PR #791 by 0xbyt4 (closes #609), reworked to use proper
provider credential resolution instead of passing provider as metadata.

Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>
2026-03-11 06:12:21 -07:00
teknium1
09336a6710 Merge PR #795: fix: handle empty choices in MCP sampling callback
Adds defensive guard against empty/None/missing choices in SamplingHandler.__call__
before accessing response.choices[0]. Returns proper ErrorData instead of crashing
with IndexError/TypeError on content filtering, provider errors, or rate limits.

Authored by 0xbyt4.

Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>
2026-03-11 05:47:51 -07:00
Teknium
fe9da5280f
Merge pull request #766 from spanishflu-est1918/codex/telegram-topic-session-pr
Isolate Telegram forum topic sessions — each topic gets its own independent session key, history, and interrupt tracking. Progress, hygiene, and cron messages all route to the correct topic.
2026-03-11 03:14:43 -07:00
alireza78a
f1510ec33e test(terminal): add tests for env var validation in _get_env_config 2026-03-11 02:59:12 -07:00
SPANISH FLU
0d6b25274c fix(gateway): isolate telegram forum topic sessions 2026-03-11 09:15:34 +01:00
teknium1
a9241f3e3e fix: head+tail truncation for execute_code stdout
Replaces head-only stdout capture with a two-buffer approach (40% head,
60% tail rolling window) so scripts that print() their final results
at the end never lose them. Adds truncation notice between sections.

Cherry-picked from PR #755, conflict resolved (test file additions).

3 new tests for short output, head+tail preservation, and notice format.
2026-03-11 00:26:13 -07:00
teknium1
586fe5d62d Merge PR #724: feat: --yolo flag to bypass all approval prompts
Authored by dmahan93. Adds HERMES_YOLO_MODE env var and --yolo CLI flag
to auto-approve all dangerous command prompts.

Post-merge: renamed --fuck-it-ship-it to --yolo for brevity,
resolved conflict with --checkpoints flag.
2026-03-10 20:56:30 -07:00
Teknium
b76cae94d4
Merge pull request #889 from NousResearch/hermes/hermes-b0162f8d
fix: Docker backend fails when docker is not in PATH (macOS gateway)
2026-03-10 20:45:34 -07:00
teknium1
24479625a2 fix: Docker backend fails when docker is not in PATH (macOS gateway)
On macOS, Docker Desktop installs the CLI to /usr/local/bin/docker, but
when Hermes runs as a gateway service (launchd) or in other non-login
contexts, /usr/local/bin is often not in PATH. This causes the Docker
requirements check to fail with 'No such file or directory: docker' even
though docker works fine from the user's terminal.

Add find_docker() helper that uses shutil.which() first, then probes
common Docker Desktop install paths on macOS (/usr/local/bin,
/opt/homebrew/bin, Docker.app bundle). The resolved path is cached and
passed to mini-swe-agent via its 'executable' parameter.

- tools/environments/docker.py: add find_docker(), use it in
  _storage_opt_supported() and pass to _Docker(executable=...)
- tools/terminal_tool.py: use find_docker() in requirements check
- tests/tools/test_docker_find.py: 4 tests (PATH, fallback, not found, cache)

2877 tests pass.
2026-03-10 20:45:13 -07:00
teknium1
03a4f184e6 fix: call _stop_training_run on early-return failure paths
The 4 early-return paths in _spawn_training_run (API exit, trainer
exit, env not found, env exit) were doing manual process.terminate()
or returning without cleanup, leaking open log file handles. Now all
paths call _stop_training_run() which handles both process termination
and file handle closure.

Also adds 12 tests for _stop_training_run covering file handle
cleanup, process termination, status transitions, and edge cases.

Inspired by PR #715 (0xbyt4) which identified the early-return issue.
Core file handle fix was already on main via e28dc13 (memosr.eth).
2026-03-10 17:09:51 -07:00
teknium1
a458b535c9 fix: improve read-loop detection — consecutive-only, correct thresholds, fix bugs
Follow-up to PR #705 (merged from 0xbyt4). Addresses several issues:

1. CONSECUTIVE-ONLY TRACKING: Redesigned the read/search tracker to only
   warn/block on truly consecutive identical calls. Any other tool call
   in between (write, patch, terminal, etc.) resets the counter via
   notify_other_tool_call(), called from handle_function_call() in
   model_tools.py. This prevents false blocks in read→edit→verify flows.

2. THRESHOLD ADJUSTMENT: Warn on 3rd consecutive (was 2nd), block on
   4th+ consecutive (was 3rd+). Gives the model more room before
   intervening.

3. TUPLE UNPACKING BUG: Fixed get_read_files_summary() which crashed on
   search keys (5-tuple) when trying to unpack as 3-tuple. Now uses a
   separate read_history set that only tracks file reads.

4. WEB_EXTRACT DOCSTRING: Reverted incorrect removal of 'title' from
   web_extract return docs in code_execution_tool.py — the field IS
   returned by web_tools.py.

5. TESTS: Rewrote test_read_loop_detection.py (35 tests) to cover
   consecutive-only behavior, notify_other_tool_call, interleaved
   read/search, and summary-unaffected-by-searches.
2026-03-10 16:25:41 -07:00
teknium1
b53d5dad67 Merge PR #705: fix: detect, warn, and block file re-read/search loops after context compression
Authored by 0xbyt4. Adds read/search loop detection, file history injection after compression, and todo filtering for active items only.
2026-03-10 16:17:03 -07:00
teknium1
c1171fe666 fix: eliminate 3x SQLite message duplication in gateway sessions (#860)
Three separate code paths all wrote to the same SQLite state.db with
no deduplication, inflating session transcripts by 3-4x:

1. _log_msg_to_db() — wrote each message individually after append
2. _flush_messages_to_session_db() — re-wrote ALL new messages at
   every _persist_session() call (~18 exit points), with no tracking
   of what was already written
3. gateway append_to_transcript() — wrote everything a third time
   after the agent returned

Since load_transcript() prefers SQLite over JSONL, the inflated data
was loaded on every session resume, causing proportional token waste.

Fix:
- Remove _log_msg_to_db() and all 16 call sites (redundant with flush)
- Add _last_flushed_db_idx tracking in _flush_messages_to_session_db()
  so repeated _persist_session() calls only write truly new messages
- Reset flush cursor on compression (new session ID)
- Add skip_db parameter to SessionStore.append_to_transcript() so the
  gateway skips SQLite writes when the agent already persisted them
- Gateway now passes skip_db=True for agent-managed messages, still
  writes to JSONL as backup

Verified: a 12-message CLI session with tool calls produces exactly
12 SQLite rows with zero duplicates (previously would be 36-48).

Tests: 9 new tests covering flush deduplication, skip_db behavior,
compression reset, and initialization. Full suite passes (2869 tests).
2026-03-10 15:22:44 -07:00
SHL0MS
0229e6b407 Fix test_analysis_error_logs_exc_info: mock _aux_async_client so download path is reached 2026-03-10 16:03:19 -04:00
0xbyt4
52e3580cd4 refactor: merge new tests into test_code_execution.py
Move all new tests (schema, env filtering, edge cases, interrupt) into
the existing test_code_execution.py instead of a separate file.
Delete the now-redundant test_code_execution_schema.py.
2026-03-10 06:18:27 -07:00
0xbyt4
694a3ebdd5 fix(code_execution): handle empty enabled_sandbox_tools in schema description
build_execute_code_schema(set()) produced "from hermes_tools import , ..."
in the code property description — invalid Python syntax shown to the model.

This triggers when a user enables only the code_execution toolset without
any of the sandbox-allowed tools (e.g. `hermes tools code_execution`),
because SANDBOX_ALLOWED_TOOLS & {"execute_code"} = empty set.

Also adds 29 unit tests covering build_execute_code_schema, environment
variable filtering, execute_code edge cases, and interrupt handling.
2026-03-10 06:18:27 -07:00
teknium1
5e6c7bc205 Merge PR #602: fix: prevent data loss in clipboard PNG conversion when ImageMagick fails
Authored by 0xbyt4. Only deletes temp .bmp after confirmed successful conversion, restores original on failure. Adds 3 tests.
2026-03-10 04:15:05 -07:00
teknium1
c1775de56f feat: filesystem checkpoints and /rollback command
Automatic filesystem snapshots before destructive file operations,
with user-facing rollback.  Inspired by PR #559 (by @alireza78a).

Architecture:
- Shadow git repos at ~/.hermes/checkpoints/{hash}/ via GIT_DIR
- CheckpointManager: take/list/restore, turn-scoped dedup, pruning
- Transparent — the LLM never sees it, no tool schema, no tokens
- Once per turn — only first write_file/patch triggers a snapshot

Integration:
- Config: checkpoints.enabled + checkpoints.max_snapshots
- CLI flag: hermes --checkpoints
- Trigger: run_agent.py _execute_tool_calls() before write_file/patch
- /rollback slash command in CLI + gateway (list, restore by number)
- Pre-rollback snapshot auto-created on restore (undo the undo)

Safety:
- Never blocks file operations — all errors silently logged
- Skips root dir, home dir, dirs >50K files
- Disables gracefully when git not installed
- Shadow repo completely isolated from project git

Tests: 35 new tests, all passing (2798 total suite)
Docs: feature page, config reference, CLI commands reference
2026-03-10 00:49:15 -07:00
teknium1
5212644861 fix(security): prevent shell injection in tilde-username path expansion
Validate that the username portion of ~username paths contains only
valid characters (alphanumeric, dot, hyphen, underscore) before passing
to shell echo for expansion. Previously, paths like '~; rm -rf /'
would be passed unquoted to self._exec(f'echo {path}'), allowing
arbitrary command execution.

The approach validates the username rather than using shlex.quote(),
which would prevent tilde expansion from working at all since
echo '~user' outputs the literal string instead of expanding it.

Added tests for injection blocking and valid ~username/path expansion.

Credit to @alireza78a for reporting (PR #442, issue #442).
2026-03-09 17:33:19 -07:00
0xbyt4
4e3a8a0637 fix: handle empty choices in MCP sampling callback
SamplingHandler.__call__ accessed response.choices[0] without checking
if the list was non-empty. LLM APIs can return empty choices on content
filtering, provider errors, or rate limits, causing an unhandled
IndexError that propagates to the MCP SDK and may crash the connection.

Add a defensive guard that returns a proper ErrorData when choices is
empty, None, or missing. Includes three test cases covering all
variants.
2026-03-10 02:24:53 +03:00
teknium1
2d44ed1c5b test: add comprehensive tests for vision_tools (42 tests)
Covers PR #428 changes and existing vision_tools functionality:
- _validate_image_url: 20 tests for urlparse-based validation
- _determine_mime_type: 6 tests for MIME type detection
- _image_to_base64_data_url: 3 tests for base64 conversion
- _handle_vision_analyze: 5 tests for type hints, prompt building,
  AUXILIARY_VISION_MODEL env var override
- Error logging exc_info: 3 async tests verifying stack traces are
  logged on download failure, analysis error, and cleanup error
- check_vision_requirements & get_debug_session_info: 2 basic tests
- Registry integration: 3 tests for tool registration
2026-03-09 15:32:02 -07:00
Teknium
654e16187e
feat(mcp): add sampling support — server-initiated LLM requests (#753)
Add MCP sampling/createMessage capability via SamplingHandler class.

Text-only sampling + tool use in sampling with governance (rate limits,
model whitelist, token caps, tool loop limits). Per-server audit metrics.

Based on concept from PR #366 by eren-karakus0. Restructured as class-based
design with bug fixes and tests using real MCP SDK types.

50 new tests, 2600 total passing.
2026-03-09 03:37:38 -07:00
0xbyt4
912efe11b5 fix(tests): add content attribute to fake result objects
_FakeReadResult and _FakeSearchResult now expose the attributes
that read_file_tool/search_tool access after the redact_sensitive_text
integration from main.
2026-03-09 13:25:52 +03:00
0xbyt4
4684aaffdc merge: resolve file_tools.py conflict with origin/main
Combine read/search loop detection with main's redact_sensitive_text
and truncation hint features. Add tracker reset to TestSearchHints
to prevent cross-test state leakage.
2026-03-09 13:21:46 +03:00
teknium1
7af33accf1 fix: apply secret redaction to file tool outputs
Terminal output was already redacted via redact_sensitive_text() but
read_file and search_files returned raw content. Now both tools
redact secrets before returning results to the LLM.

Based on PR #372 by @teyrebaz33 (closes #363) — applied manually
due to branch conflicts with the current codebase.
2026-03-09 00:49:46 -07:00
teknium1
a8bf414f4a feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.

## New tool: browser_console

Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.

## Enhanced tool: browser_vision(annotate=True)

New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.

## Config: browser.record_sessions

Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default

## Built-in skill: dogfood

Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
   (Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence

Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template

## Tests

21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation

Addresses #315.
2026-03-08 21:28:12 -07:00
0xbyt4
d8df91dfa8 fix: resolve merge conflict with main in clipboard.py 2026-03-09 03:50:29 +03:00
teknium1
491605cfea feat: add high-value tool result hints for patch and search_files (#722)
Add contextual [Hint: ...] suffixes to tool results where they save
real iterations:

- patch (no match): suggests read_file/search_files to verify content
  before retrying — addresses the common pattern where the agent retries
  with stale old_string instead of re-reading the file.
- search_files (truncated): provides explicit next offset and suggests
  narrowing the search — clearer than relying on total_count inference.

Other hints proposed in #722 (terminal, web_search, web_extract,
browser_snapshot, search zero-results, search content-matches) were
evaluated and found to be low-value: either already covered by existing
mechanisms (read_file pagination, similar-files, schema descriptions)
or guidance the agent already follows from its own reasoning.

5 new tests covering hint presence/absence for both tools.
2026-03-08 17:46:28 -07:00
teknium1
c0520223fd fix: clipboard BMP conversion file loss and broken test
Source code (hermes_cli/clipboard.py):
- _convert_to_png() lost the file when both Pillow and ImageMagick were
  unavailable: path.rename(tmp) moved the file to .bmp, then subprocess.run
  raised FileNotFoundError, but the file was never renamed back. The final
  fallback 'return path.exists()' returned False.
- Fix: restore the original file in both except handlers by renaming tmp
  back to path when the original is missing.

Test (tests/tools/test_clipboard.py):
- test_file_still_usable_when_no_converter expected 'from PIL import Image'
  to raise an Exception, but Pillow is installed so pytest.raises fired
  'DID NOT RAISE'. The test also never called _convert_to_png().
- Fix: properly mock PIL unavailability via patch.dict(sys.modules),
  actually call _convert_to_png(), and assert the correct result.
2026-03-08 17:22:27 -07:00
teknium1
3fb8938cd3 fix: search_files now reports error for non-existent paths instead of silent empty results
Previously, search_files would silently return 0 results when the
search path didn't exist (e.g., /root/.hermes/... when HOME is
/home/user). The path was passed to rg/grep/find which would fail
silently, and the empty stdout was parsed as 'no matches found'.

Changes:
- Add path existence check at the top of search() using test -e.
  Returns SearchResult with a clear error message when path doesn't exist.
- Add exit code 2 checks in _search_with_rg() and _search_with_grep()
  as secondary safety net for other error types (bad regex, permissions).
- Add 4 new tests covering: nonexistent path (content mode), nonexistent
  path (files mode), existing path proceeds normally, rg error exit code.

Tests: 37 → 41 in test_file_operations.py, full suite 2330 passed.
2026-03-08 16:47:20 -07:00
dmahan93
7791174ced feat: add --fuck-it-ship-it flag to bypass dangerous command approvals
Adds a fun alias for skipping all dangerous command approval prompts.
When passed, sets HERMES_YOLO_MODE=1 which causes check_dangerous_command()
to auto-approve everything.

Available on both top-level and chat subcommand:
  hermes --fuck-it-ship-it
  hermes chat --fuck-it-ship-it

Includes 5 tests covering normal blocking, yolo bypass, all patterns,
and edge cases (empty string env var).
2026-03-08 18:36:37 -05:00