The security scanner is meant to protect against hostile external skills
pulled from GitHub via hermes skills install — trusted/community policies
block or ask on dangerous verdicts accordingly. But agent-created skills
(from skill_manage) run in the same process as the agent that wrote them.
The agent can already execute the same code paths via terminal() with no
gate, so the ask-on-dangerous policy adds friction without meaningful
security.
Concrete trigger: an agent writing a PR-review skill that describes
cache-busting or persistence semantics in prose gets blocked because
those words appear in the patterns list. The skill isn't actually doing
anything dangerous — it's just documenting what reviewers should watch
for in other PRs.
Change: agent-created dangerous verdict maps to 'allow' instead of 'ask'.
External hub installs (trusted/community) keep their stricter policies
intact. Tests updated: renamed test_dangerous_agent_created_asks →
test_dangerous_agent_created_allowed; renamed force-override test and
updated assertion since force is now a no-op for agent-created (the allow
branch returns first).
The environment-snapshot login shell was auto-sourcing only ~/.bashrc when
building the PATH snapshot. On Debian/Ubuntu the default ~/.bashrc starts
with a non-interactive short-circuit:
case $- in *i*) ;; *) return;; esac
Sourcing it from a non-interactive shell returns before any PATH export
below that guard runs. Node version managers like n and nvm append their
PATH line under that guard, so Hermes was capturing a PATH without
~/n/bin — and the terminal tool saw 'node: command not found' even when
node was on the user's interactive shell PATH.
Expand the auto-source list (when auto_source_bashrc is on) to:
~/.profile → ~/.bash_profile → ~/.bashrc
~/.profile and ~/.bash_profile have no interactivity guard — installers
that write their PATH there (n's n-install, nvm's curl installer on most
setups) take effect. ~/.bashrc still runs last to preserve behaviour for
users who put PATH logic there without the guard.
Added two tests covering the new behaviour plus an E2E test that spins up
a real LocalEnvironment with a guard-prefixed ~/.bashrc and a ~/.profile
PATH export, and verifies the captured snapshot PATH contains the profile
entry.
When a newly-bundled skill's name collides with a pre-existing user
skill, sync silently kept the user's copy. Users never learned that
a bundled version shipped by that name.
Now (on non-quiet sync only) print:
⚠ <name>: bundled version shipped but you already have a local
skill by this name — yours was kept. Run `hermes skills reset
<name>` to replace it with the bundled version.
No behavior change to manifest writes or to the kept user copy —
purely additive warning on the existing collision-skip path.
When a new bundled skill's name collided with a pre-existing user skill
(from hub, custom, or leftover), sync_skills() recorded the bundled hash
in the manifest even though the on-disk copy was unrelated to bundled.
On the next sync, user_hash != origin_hash (bundled_hash) marked the
skill as "user-modified" permanently, blocking all bundled updates for
that skill until the user ran `hermes skills reset`.
Fix: only baseline the manifest entry when the user's on-disk copy is
byte-identical to bundled (safe to track — this is the reset re-sync or
coincidentally-identical install case). Otherwise skip the manifest
write entirely: the on-disk skill is unrelated to bundled and shouldn't
be tracked as if it were.
This preserves reset_bundled_skill()'s re-baseline flow (its post-delete
sync still writes to the manifest when user copy matches bundled) while
fixing the poisoning scenario for genuinely unrelated collisions.
Adds two tests following the existing test_failed_copy_does_not_poison_manifest
pattern: one verifying the manifest stays clean after a collision with
differing content, one verifying no false user_modified flag on resync.
When _resolve_tirith_path() returns None (e.g. install failed on
unsupported platform or all resolution paths exhausted), the function
passed None directly to subprocess.run(), causing a TypeError instead
of respecting the fail_open config.
Add a None check before the subprocess call that allows or blocks
according to the configured fail_open policy, matching the existing
error handling behavior for OSError and TimeoutExpired.
When override_acp_command was passed to _build_child_agent, it failed to
override effective_provider to 'copilot-acp' and effective_api_mode to
'chat_completions'. This caused the child AIAgent to inherit the parent's
native API configuration (e.g. Anthropic) and attempt real HTTP requests
using the parent's API key, leading to HTTP 401 errors and completely
bypassing the ACP subprocess.
Ensure that if an ACP command override is provided, the child agent
correctly routes through CopilotACPClient.
Refs #2653
Adds MiniMax-AI/cli to the default taps list so the mmx-cli skill
is discoverable and installable out of the box via /skills browse
and /skills install. The skill definition lives upstream at
github.com/MiniMax-AI/cli/skill/SKILL.md, keeping updates decoupled.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace hardcoded 'fr' default with DEFAULT_LOCAL_STT_LANGUAGE ('en')
— removes locale leak, matches other providers
- Drop redundant default=True on is_truthy_value (dict .get already defaults)
- Update auto-detect comment to include 'xai' in the chain
- Fix docstring: 21 languages (match PR body + actual xAI API)
- Update test_sends_language_and_format to set HERMES_LOCAL_STT_LANGUAGE=fr
explicitly, since default is no longer 'fr'
All 18 xAI STT tests pass locally.
In _detect_target(), platform.system() returns "Android" on Termux,
not "Linux". Without this change tirith's auto-installer skips
Android even though the Linux GNU binaries are ABI-compatible.
When an MCP server config has ssl_verify: false (e.g. local dev with
a self-signed cert), the setting was read from config.yaml but never
passed to the httpx client, causing CERTIFICATE_VERIFY_FAILED errors
and silent connection failures.
Fix: read ssl_verify from config and pass it as the 'verify' kwarg to
both code paths:
- New API (mcp >= 1.24.0): httpx.AsyncClient(verify=ssl_verify)
- Legacy API (mcp < 1.24.0): streamablehttp_client(..., verify=ssl_verify)
Fixes local dev setups using ServBay, LocalWP, MAMP, or any stack with
a self-signed TLS certificate.
On Windows, Path.open() defaults to the system ANSI code page (cp1252).
If the .env file contains UTF-8 characters, decoding fails with
'gbk codec can't decode byte 0x94'. Specify encoding='utf-8'
explicitly to ensure consistent behavior across platforms.
The Docker terminal backend runs containers with `--cap-drop ALL`
and re-adds only DAC_OVERRIDE, CHOWN, FOWNER. Since commit fee0e0d3
("run as non-root user, use virtualenv") the image entrypoint drops
from root to the `hermes` user via `gosu`, which requires CAP_SETUID
and CAP_SETGID. Without them every sandbox container exits
immediately with:
Dropping root privileges
error: failed switching to 'hermes': operation not permitted
Breaking every terminal/file tool invocation in `terminal.backend: docker`
mode.
Fix: add SETUID and SETGID to the cap-add list. The `no-new-privileges`
security-opt is kept, so gosu still cannot escalate back to root after
the one-way drop — the hardening posture is preserved.
Reproduction
------------
With any image whose ENTRYPOINT calls `gosu <user>`, the container
exits immediately under the pre-fix cap set. Post-fix, the drop
succeeds and the container proceeds normally.
docker run --rm \
--cap-drop ALL \
--cap-add DAC_OVERRIDE --cap-add CHOWN --cap-add FOWNER \
--security-opt no-new-privileges \
--entrypoint /usr/local/bin/gosu \
hermes-claude:latest hermes id
# -> error: failed switching to 'hermes': operation not permitted
# Same command with SETUID+SETGID added:
# -> uid=10000(hermes) gid=10000(hermes) groups=10000(hermes)
Tests
-----
Added `test_security_args_include_setuid_setgid_for_gosu_drop` that
asserts both caps are present and the overall hardening posture
(cap-drop ALL + no-new-privileges) is preserved.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LocalEnvironment._run_bash() spawned subprocess.Popen without a cwd
argument, so init_session()'s pwd -P ran in the gateway process's
startup directory and overwrote self.cwd. Pass cwd=self.cwd so the
initial snapshot captures the user-configured working directory.
Tested:
- pytest tests/ -q (255 env-related tests passed)
- Full suite: 13,537 passed; 70 pre-existing failures unrelated to local env
The container_config builder in terminal_tool.py was missing
docker_forward_env and docker_env keys, causing config.yaml's
docker_forward_env setting to be silently ignored. Environment
variables listed in docker_forward_env were never injected into
Docker containers.
This fix adds both keys to the container_config dict so they are
properly passed to _create_environment().
Follow-up on helix4u's PR #14211:
- Flip default to true: narrowing toolsets=['web','browser'] expresses
'I want these extras', not 'silently strip MCP'. Parent MCP tools
(registered at runtime) should survive narrowing by default.
- Drop _config_version bump (22->23); additive nested key under
delegation.* is handled by _deep_merge, no migration needed.
- Update tests to reflect new default behavior.
browser_cdp_tool.py registers before browser_tool.py (alphabetical
import order), so its stricter check_fn (requires CDP endpoint) becomes
the toolset-level check for all 11 browser tools. This causes
'hermes doctor' to report the entire browser toolset as unavailable
even when agent-browser is correctly installed.
Move browser_cdp to toolset='browser-cdp' so it is evaluated
independently. browser_navigate et al. only need agent-browser;
browser_cdp additionally requires a reachable CDP endpoint.
Version managers like frum (Ruby), rvm, nvm, and others commonly alias
cd to a wrapper function that runs additional logic after directory
changes. When Hermes captures the shell environment into a session
snapshot, these aliases are preserved. If the wrapper function fails
in the subprocess context (e.g. frum not on PATH), every cd fails,
causing all terminal commands to exit with code 126.
Using builtin cd bypasses any aliases or functions, ensuring the
directory change always uses the real bash builtin regardless of
what version managers are installed.
Make Tavily client respect a TAVILY_BASE_URL environment variable,
defaulting to https://api.tavily.com for backward compatibility.
Consistent with FIRECRAWL_API_URL pattern already used in this module.
The code execution sandbox creates a Unix domain socket in /tmp with
default permissions, allowing any local user to connect and execute
tool calls. Restrict to 0o600 after bind.
Closes#6230
Upgrades agent-browser from 0.13.0 to 0.26.0, picking up 13 releases of
daemon reliability fixes:
- Daemon hang on Linux from waitpid(-1) race in SIGCHLD handler (#1098)
- Chrome killed after ~10s idle due to PR_SET_PDEATHSIG thread tracking (#1157)
- Orphaned Chrome processes via process-group kill on shutdown (#1137)
- Stale daemon after upgrade via .version sidecar and auto-restart (#1134)
- Idle timeout not firing (sleep future recreated each loop) (#1110)
- Navigation hanging on lifecycle events that never fire (#1059, #1092)
- CDP attach hang on Chrome 144+ (#1133)
- Windows daemon TCP bind with Hyper-V port conflicts (#1041)
- Shadow DOM traversal in accessibility tree snapshots
- doctor command for user self-diagnosis
Also wires AGENT_BROWSER_IDLE_TIMEOUT_MS into the browser subprocess
environment so the daemon self-terminates after our configured inactivity
timeout (default 300s). This is the daemon-side counterpart to the
Python-side inactivity reaper — the daemon kills itself and its Chrome
children when no commands arrive, preventing orphan accumulation even
when the Python process dies without running atexit handlers.
Addresses #7343 (daemon socket hangs, shadow DOM) and #13793 (orphan
accumulation from force-killed sessions).
Adds security.allow_private_urls / HERMES_ALLOW_PRIVATE_URLS toggle so
users on OpenWrt routers, TUN-mode proxies (Clash/Mihomo/Sing-box),
corporate split-tunnel VPNs, and Tailscale networks — where DNS resolves
public domains to 198.18.0.0/15 or 100.64.0.0/10 — can use web_extract,
browser, vision URL fetching, and gateway media downloads.
Single toggle in tools/url_safety.py; all 23 is_safe_url() call sites
inherit automatically. Cached for process lifetime.
Cloud metadata endpoints stay ALWAYS blocked regardless of the toggle:
169.254.169.254 (AWS/GCP/Azure/DO/Oracle), 169.254.170.2 (AWS ECS task
IAM creds), 169.254.169.253 (Azure IMDS wire server), 100.100.100.200
(Alibaba), fd00:ec2::254 (AWS IPv6), the entire 169.254.0.0/16
link-local range, and the metadata.google.internal / metadata.goog
hostnames (checked pre-DNS so they can't be bypassed on networks where
those names resolve to local IPs).
Supersedes #3779 (narrower HERMES_ALLOW_RFC2544 for the same class of
users).
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
- delegate_task: use shared tool_error() for the paused-spawn early return
so the error envelope matches the rest of the tool.
- Disk snapshot label: treat orphaned nodes (parentId missing from the
snapshot) as top-level, matching buildSubagentTree / summarizeLabel.
Four real issues Copilot flagged:
1. delegate_tool: `_build_child_agent` never passed `toolsets` to the
progress callback, so the event payload's `toolsets` field (wired
through every layer) was always empty and the overlay's toolsets
row never populated. Thread `child_toolsets` through.
2. event handler: the race-protection on subagent.spawn_requested /
subagent.start only preserved `completed`, so a late-arriving queued
event could clobber `failed` / `interrupted` too. Preserve any
terminal status (`completed | failed | interrupted`).
3. SpawnHud: comment claimed concurrency was approximated by "widest
level in the tree" but code used `totals.activeCount` (total across
all parents). `max_concurrent_children` is a per-parent cap, so
activeCount over-warns for multi-orchestrator runs. Switch to
`max(widthByDepth(tree))`; the label now reads `⚡W/cap+extra` where
W is the widest level (drives the ratio) and `+extra` is the rest.
4. spawn_tree.list: comment said "peek header without parsing full list"
but the code json.loads()'d every snapshot. Adds a per-session
`_index.jsonl` sidecar written on save; list() reads only the index
(with a full-scan fallback for pre-index sessions). O(1) per
snapshot now vs O(file-size).
Adds a live + post-hoc audit surface for recursive delegate_task fan-out.
None of cc/oc/oclaw tackle nested subagent trees inside an Ink overlay;
this ships a view-switched dashboard that handles arbitrary depth + width.
Python
- delegate_tool: every subagent event now carries subagent_id, parent_id,
depth, model, tool_count; subagent.complete also ships input/output/
reasoning tokens, cost, api_calls, files_read/files_written, and a
tail of tool-call outputs
- delegate_tool: new subagent.spawn_requested event + _active_subagents
registry so the overlay can kill a branch by id and pause new spawns
- tui_gateway: new RPCs delegation.status, delegation.pause,
subagent.interrupt, spawn_tree.save/list/load (disk under
\$HERMES_HOME/spawn-trees/<session>/<ts>.json)
TUI
- /agents overlay: full-width list mode (gantt strip + row picker) and
Enter-to-drill full-width scrollable detail mode; inverse+amber
selection, heat-coloured branch markers, wall-clock gantt with tick
ruler, per-branch rollups
- Detail pane: collapsible accordions (Budget, Files, Tool calls, Output,
Progress, Summary); open-state persists across agents + mode switches
via a shared atom
- /replay [N|last|list|load <path>] for in-memory + disk history;
/replay-diff <a> <b> for side-by-side tree comparison
- Status-bar SpawnHud warns as depth/concurrency approaches caps;
overlay auto-follows the just-finished turn onto history[1]
- Theme: bump DARK dim #B8860B → #CC9B1F for readable secondary text
globally; keep LIGHT untouched
Tests: +29 new subagentTree unit tests; 215/215 passing.
* feat(plugins): pluggable image_gen backends + OpenAI provider
Adds a ImageGenProvider ABC so image generation backends register as
bundled plugins under `plugins/image_gen/<name>/`. The plugin scanner
gains three primitives to make this work generically:
- `kind:` manifest field (`standalone` | `backend` | `exclusive`).
Bundled `kind: backend` plugins auto-load — no `plugins.enabled`
incantation. User-installed backends stay opt-in.
- Path-derived keys: `plugins/image_gen/openai/` gets key
`image_gen/openai`, so a future `tts/openai` cannot collide.
- Depth-2 recursion into category namespaces (parent dirs without a
`plugin.yaml` of their own).
Includes `OpenAIImageGenProvider` as the first consumer (gpt-image-1.5
default, plus gpt-image-1, gpt-image-1-mini, DALL-E 3/2). Base64
responses save to `$HERMES_HOME/cache/images/`; URL responses pass
through.
FAL stays in-tree for this PR — a follow-up ports it into
`plugins/image_gen/fal/` so the in-tree `image_generation_tool.py`
slims down. The dispatch shim in `_handle_image_generate` only fires
when `image_gen.provider` is explicitly set to a non-FAL value, so
existing FAL setups are untouched.
- 41 unit tests (scanner recursion, kind parsing, gate logic,
registry, OpenAI payload shapes)
- E2E smoke verified: bundled plugin autoloads, registers, and
`_handle_image_generate` routes to OpenAI when configured
* fix(image_gen/openai): don't send response_format to gpt-image-*
The live API rejects it: 'Unknown parameter: response_format'
(verified 2026-04-21 with gpt-image-1.5). gpt-image-* models return
b64_json unconditionally, so the parameter was both unnecessary and
actively broken.
* feat(image_gen/openai): gpt-image-2 only, drop legacy catalog
gpt-image-2 is the latest/best OpenAI image model (released 2026-04-21)
and there's no reason to expose the older gpt-image-1.5 / gpt-image-1 /
dall-e-3 / dall-e-2 alongside it — slower, lower quality, or awkward
(dall-e-2 squares only). Trim the catalog down to a single model.
Live-verified end-to-end: landscape 1536x1024 render of a Moog-style
synth matches prompt exactly, 2.4MB PNG saved to cache.
* feat(image_gen/openai): expose gpt-image-2 as three quality tiers
Users pick speed/fidelity via the normal model picker instead of a
hidden quality knob. All three tier IDs resolve to the single underlying
gpt-image-2 API model with a different quality parameter:
gpt-image-2-low ~15s fast iteration
gpt-image-2-medium ~40s default
gpt-image-2-high ~2min highest fidelity
Live-measured on OpenAI's API today: 15.4s / 40.8s / 116.9s for the
same 1024x1024 prompt.
Config:
image_gen.openai.model: gpt-image-2-high
# or
image_gen.model: gpt-image-2-low
# or env var for scripts/tests
OPENAI_IMAGE_MODEL=gpt-image-2-medium
Live-verified end-to-end with the low tier: 18.8s landscape render of a
golden retriever in wildflowers, vision-confirmed exact match.
* feat(tools_config): plugin image_gen providers inject themselves into picker
'hermes tools' → Image Generation now shows plugin-registered backends
alongside Nous Subscription and FAL.ai without tools_config.py needing
to know about them. OpenAI appears as a third option today; future
backends appear automatically as they're added.
Mechanism:
- ImageGenProvider gains an optional get_setup_schema() hook
(name, badge, tag, env_vars). Default derived from display_name.
- tools_config._plugin_image_gen_providers() pulls the schemas from
every registered non-FAL plugin provider.
- _visible_providers() appends those rows when rendering the Image
Generation category.
- _configure_provider() handles the new image_gen_plugin_name marker:
writes image_gen.provider and routes to the plugin's list_models()
catalog for the model picker.
- _toolset_needs_configuration_prompt('image_gen') stops demanding a
FAL key when any plugin provider reports is_available().
FAL is skipped in the plugin path because it already has hardcoded
TOOL_CATEGORIES rows — when it gets ported to a plugin in a follow-up
PR the hardcoded rows go away and it surfaces through the same path
as OpenAI.
Verified live: picker shows Nous Subscription / FAL.ai / OpenAI.
Picking OpenAI prompts for OPENAI_API_KEY, then shows the
gpt-image-2-low/medium/high model picker sourced from the plugin.
397 tests pass across plugins/, tools_config, registry, and picker.
* fix(image_gen): close final gaps for plugin-backend parity with FAL
Two small places that still hardcoded FAL:
- hermes_cli/setup.py status line: an OpenAI-only setup showed
'Image Generation: missing FAL_KEY'. Now probes plugin providers
and reports '(OpenAI)' when one is_available() — or falls back to
'missing FAL_KEY or OPENAI_API_KEY' if nothing is configured.
- image_generate tool schema description: said 'using FAL.ai, default
FLUX 2 Klein 9B'. Rewrote provider-neutral — 'backend and model are
user-configured' — and notes the 'image' field can be a URL or an
absolute path, which the gateway delivers either way via
extract_local_files().
- Wrap child.run_conversation() in a ThreadPoolExecutor with configurable
timeout (delegation.child_timeout_seconds, default 300s) to prevent
indefinite blocking when a subagent's API call or tool HTTP request hangs.
- Add heartbeat stale detection: if a child's api_call_count doesn't
advance for 5 consecutive heartbeat cycles (~2.5 min), stop touching
the parent's activity timestamp so the gateway inactivity timeout
can fire as a last resort.
- Add 'timeout' as a new exit_reason/status alongside the existing
completed/max_iterations/interrupted states.
- Use shutdown(wait=False) on the timeout executor to avoid the
ThreadPoolExecutor.__exit__ deadlock when a child is stuck on
blocking I/O.
Closes#13768
A single global MAX_TEXT_LENGTH = 4000 truncated every TTS provider at
4000 chars, causing long inputs to be silently chopped even though the
underlying APIs allow much more:
- OpenAI: 4096
- xAI: 15000
- MiniMax: 10000
- ElevenLabs: 5000 / 10000 / 30000 / 40000 (model-aware)
- Gemini: ~5000
- Edge: ~5000
The schema description also told the model 'Keep under 4000 characters',
which encouraged the agent to self-chunk long briefs into multiple TTS
calls (producing 3 separate audio files instead of one).
New behavior:
- PROVIDER_MAX_TEXT_LENGTH table + ELEVENLABS_MODEL_MAX_TEXT_LENGTH
encode the documented per-provider limits.
- _resolve_max_text_length(provider, cfg) resolves:
1. tts.<provider>.max_text_length user override
2. ElevenLabs model_id lookup
3. provider default
4. 4000 fallback
- text_to_speech_tool() and stream_tts_to_speaker() both call the
resolver; old MAX_TEXT_LENGTH alias kept for back-compat.
- Schema description no longer hardcodes 4000.
Tests: 27 new unit + E2E tests; all 53 existing TTS tests and 253
voice-command/voice-cli tests still pass.
* feat(models): hide OpenRouter models that don't advertise tool support
Port from Kilo-Org/kilocode#9068.
hermes-agent is tool-calling-first — every provider path assumes the
model can invoke tools. Models whose OpenRouter supported_parameters
doesn't include 'tools' (e.g. image-only or completion-only models)
cannot be driven by the agent loop and fail at the first tool call.
Filter them out of fetch_openrouter_models() so they never appear in
the model picker (`hermes model`, setup wizard, /model slash command).
Permissive when the field is missing — OpenRouter-compatible gateways
(Nous Portal, private mirrors, older snapshots) don't always populate
supported_parameters. Treat missing as 'unknown → allow' rather than
silently emptying the picker on those gateways. Only hide models
whose supported_parameters is an explicit list that omits tools.
Tests cover: tools present → kept, tools absent → dropped, field
missing → kept, malformed non-list → kept, non-dict item → kept,
empty list → dropped.
* feat(delegate): cross-agent file state coordination for concurrent subagents
Prevents mangled edits when concurrent subagents touch the same file
(same process, same filesystem — the mangle scenario from #11215).
Three layers, all opt-out via HERMES_DISABLE_FILE_STATE_GUARD=1:
1. FileStateRegistry (tools/file_state.py) — process-wide singleton
tracking per-agent read stamps and the last writer globally.
check_stale() names the sibling subagent in the warning when a
non-owning agent wrote after this agent's last read.
2. Per-path threading.Lock wrapped around the read-modify-write
region in write_file_tool and patch_tool. Concurrent siblings on
the same path serialize; different paths stay fully parallel.
V4A multi-file patches lock in sorted path order (deadlock-free).
3. Delegate-completion reminder in tools/delegate_tool.py: after a
subagent returns, writes_since(parent, child_start, parent_reads)
appends '[NOTE: subagent modified files the parent previously
read — re-read before editing: ...]' to entry.summary when the
child touched anything the parent had already seen.
Complements (does not replace) the existing path-overlap check in
run_agent._should_parallelize_tool_batch — batch check prevents
same-file parallel dispatch within one agent's turn (cheap prevention,
zero API cost), registry catches cross-subagent and cross-turn
staleness at write time (detection).
Behavior is warning-only, not hard-failing — matches existing project
style. Errors surface naturally: sibling writes often invalidate the
old_string in patch operations, which already errors cleanly.
Tests: tests/tools/test_file_state_registry.py — 16 tests covering
registry state transitions, per-path locking, per-path-not-global
locking, writes_since filtering, kill switch, and end-to-end
integration through the real read_file/write_file/patch handlers.
Adds role='leaf'|'orchestrator' to delegate_task. With max_spawn_depth>=2,
an orchestrator child retains the 'delegation' toolset and can spawn its
own workers; leaf children cannot delegate further (identical to today).
Default posture is flat — max_spawn_depth=1 means a depth-0 parent's
children land at the depth-1 floor and orchestrator role silently
degrades to leaf. Users opt into nested delegation by raising
max_spawn_depth to 2 or 3 in config.yaml.
Also threads acp_command/acp_args through the main agent loop's delegate
dispatch (previously silently dropped in the schema) via a new
_dispatch_delegate_task helper, and adds a DelegateEvent enum with
legacy-string back-compat for gateway/ACP/CLI progress consumers.
Config (hermes_cli/config.py defaults):
delegation.max_concurrent_children: 3 # floor-only, no upper cap
delegation.max_spawn_depth: 1 # 1=flat (default), 2-3 unlock nested
delegation.orchestrator_enabled: true # global kill switch
Salvaged from @pefontana's PR #11215. Overrides vs. the original PR:
concurrency stays at 3 (PR bumped to 5 + cap 8 — we keep the floor only,
no hard ceiling); max_spawn_depth defaults to 1 (PR defaulted to 2 which
silently enabled one level of orchestration for every user).
Co-authored-by: pefontana <fontana.pedro93@gmail.com>
Adds OpenAI's new GPT Image 2 model via FAL.ai, selectable through
`hermes tools` → Image Generation. SOTA text rendering (including CJK)
and world-aware photorealism.
- FAL_MODELS entry with image_size_preset style
- 4:3 presets on all aspect ratios — 16:9 (1024x576) falls below
GPT-Image-2's 655,360 min-pixel floor and would be rejected
- quality pinned to medium (same rule as gpt-image-1.5) for
predictable Nous Portal billing
- BYOK (openai_api_key) deliberately omitted from supports so all
users stay on shared FAL billing
- 6 new tests covering preset mapping, quality pinning, and
supports-whitelist integrity
- Docs table + aspect-ratio map updated
Live-tested end-to-end: 39.9s cold request, clean 1024x768 PNG
Two related ACP approval issues:
GHSA-96vc-wcxf-jjff — ACP's _run_agent never set HERMES_INTERACTIVE
(or any other flag recognized by tools.approval), so check_all_command_guards
took the non-interactive auto-approve path and never consulted the
ACP-supplied approval callback (conn.request_permission). Dangerous
commands executed in ACP sessions without operator approval despite
the callback being installed. Fix: set HERMES_INTERACTIVE=1 around
the agent run so check_all_command_guards routes through
prompt_dangerous_approval(approval_callback=...) — the correct shape
for ACP's per-session request_permission call. HERMES_EXEC_ASK would
have routed through the gateway-queue path instead, which requires a
notify_cb registered in _gateway_notify_cbs (not applicable to ACP).
GHSA-qg5c-hvr5-hjgr — _approval_callback and _sudo_password_callback
were module-level globals in terminal_tool. Concurrent ACP sessions
running in ThreadPoolExecutor threads each installed their own callback
into the same slot, racing. Fix: store both callbacks in threading.local()
so each thread has its own slot. CLI mode (single thread) is unaffected;
gateway mode uses a separate queue-based approval path and was never
touched.
set_approval_callback is now called INSIDE _run_agent (the executor
thread) rather than before dispatching — so the TLS write lands on the
correct thread.
Tests: 5 new in tests/acp/test_approval_isolation.py covering
thread-local isolation of both callbacks and the HERMES_INTERACTIVE
callback routing. Existing tests/acp/ (159 tests) and tests/tools/
approval-related tests continue to pass.
Fixes GHSA-96vc-wcxf-jjff
Fixes GHSA-qg5c-hvr5-hjgr
A skill declaring `required_environment_variables: [ANTHROPIC_TOKEN]` in
its SKILL.md frontmatter silently bypassed the `execute_code` sandbox's
credential-scrubbing guarantee. `register_env_passthrough` had no
blocklist, so any name a skill chose flipped `is_env_passthrough(name) =>
True`, which shortcircuits the sandbox's secret filter.
Fix: reject registration when the name appears in
`_HERMES_PROVIDER_ENV_BLOCKLIST` (the canonical list of Hermes-managed
credentials — provider keys, gateway tokens, etc.). Log a warning naming
GHSA-rhgp-j443-p4rf so operators see the rejection in logs.
Non-Hermes third-party API keys (TENOR_API_KEY for gif-search,
NOTION_TOKEN for notion skills, etc.) remain legitimately registerable —
they were never in the sandbox scrub list in the first place.
Tests: 16 -> 17 passing. Two old tests that documented the bypass
(`test_passthrough_allows_blocklisted_var`, `test_make_run_env_passthrough`)
are rewritten to assert the new fail-closed behavior. New
`test_non_hermes_api_key_still_registerable` locks in that legitimate
third-party keys are unaffected.
Reported in GHSA-rhgp-j443-p4rf by @q1uf3ng. Hardening; not CVE-worthy
on its own per the decision matrix (attacker must already have operator
consent to install a malicious skill).
Fixes#13027
Previously, `_is_skill_disabled()` only checked the explicit `platform`
argument and `os.getenv('HERMES_PLATFORM')`, missing the gateway session
context (`HERMES_SESSION_PLATFORM`). This caused `skill_view()` to expose
skills that were platform-disabled for the active gateway session.
Add `_get_session_platform()` helper that resolves the platform from
`gateway.session_context.get_session_env`, mirroring the logic in
`agent.skill_utils.get_disabled_skill_names()`.
Now the platform resolution follows the same precedence as skill_utils:
1. Explicit `platform` argument
2. `HERMES_PLATFORM` environment variable
3. `HERMES_SESSION_PLATFORM` from gateway session context
Previously the breaker was only cleared when the post-reconnect retry
call itself succeeded (via _reset_server_error at the end of the try
block). If OAuth recovery succeeded but the retry call happened to
fail for a different reason, control fell through to the
needs_reauth path which called _bump_server_error — adding to an
already-tripped count instead of the fresh count the reconnect
justified. With fix#1 in place this would still self-heal on the
next cooldown, but we should not pay a 60s stall when we already
have positive evidence the server is viable.
Move _reset_server_error(server_name) up to immediately after the
reconnect-and-ready-wait block, before the retry_call. The
subsequent retry still goes through _bump_server_error on failure,
so a genuinely broken server re-trips the breaker as normal — but
the retry starts from a clean count (1 after a failure), not a
stale one.
The MCP circuit breaker previously had no path back to the closed
state: once _server_error_counts[srv] reached _CIRCUIT_BREAKER_THRESHOLD
the gate short-circuited every subsequent call, so the only reset
path (on successful call) was unreachable. A single transient
3-failure blip (bad network, server restart, expired token) permanently
disabled every tool on that MCP server for the rest of the agent
session.
Introduce a classic closed/open/half-open state machine:
- Track a per-server breaker-open timestamp in _server_breaker_opened_at
alongside the existing failure count.
- Add _CIRCUIT_BREAKER_COOLDOWN_SEC (60s). Once the count reaches
threshold, calls short-circuit for the cooldown window.
- After the cooldown elapses, the *next* call falls through as a
half-open probe that actually hits the session. Success resets the
breaker via _reset_server_error; failure re-bumps the count via
_bump_server_error, which re-stamps the open timestamp and re-arms
the cooldown.
The error message now includes the live failure count and an
"Auto-retry available in ~Ns" hint so the model knows the breaker
will self-heal rather than giving up on the tool for the whole
session.
Covers tests 1 (half-opens after cooldown) and 2 (reopens on probe
failure); test 3 (cleared on reconnect) still fails pending fix#2.
Follow-up to PR #2504. The original fix covered the two direct FAL_KEY
checks in image_generation_tool but left four other call sites intact,
including the managed-gateway gate where a whitespace-only FAL_KEY
falsely claimed 'user has direct FAL' and *skipped* the Nous managed
gateway fallback entirely.
Introduce fal_key_is_configured() in tools/tool_backend_helpers.py as a
single source of truth (consults os.environ, falls back to .env for
CLI-setup paths) and route every FAL_KEY presence check through it:
- tools/image_generation_tool.py : _resolve_managed_fal_gateway,
image_generate_tool's upfront check, check_fal_api_key
- hermes_cli/nous_subscription.py : direct_fal detection, selected
toolset gating, tools_ready map
- hermes_cli/tools_config.py : image_gen needs-setup check
Verified by extending tests/tools/test_image_generation_env.py and by
E2E exercising whitespace + managed-gateway composition directly.
Treat whitespace-only FAL_KEY the same as unset so users who export
FAL_KEY=" " (or CI that leaves a blank token) get the expected
'not set' error path instead of a confusing downstream fal_client
failure.
Applied to the two direct FAL_KEY checks in image_generation_tool.py:
image_generate_tool's upfront credential check and check_fal_api_key().
Both keep the existing managed-gateway fallback intact.
Adapted the original whitespace/valid tests to pin the managed gateway
to None so the whitespace assertion exercises the direct-key path
rather than silently relying on gateway absence.
Follow-ups on top of @teyrebaz33's cherry-picked commit:
1. New shared helper format_no_match_hint() in fuzzy_match.py with a
startswith('Could not find') gate so the snippet only appends to
genuine no-match errors — not to 'Found N matches' (ambiguous),
'Escape-drift detected', or 'identical strings' errors, which would
all mislead the model.
2. file_tools.patch_tool suppresses the legacy generic '[Hint: old_string
not found...]' string when the rich 'Did you mean?' snippet is
already attached — no more double-hint.
3. Wire the same helper into patch_parser.py (V4A patch mode, both
_validate_operations and _apply_update) and skill_manager_tool.py so
all three fuzzy callers surface the hint consistently.
Tests: 7 new gating tests in TestFormatNoMatchHint cover every error
class (ambiguous, drift, identical, non-zero match count, None error,
no similar content, happy path). 34/34 test_fuzzy_match, 96/96
test_file_tools + test_patch_parser + test_skill_manager_tool pass.
E2E verified across all four scenarios: no-match-with-similar,
no-match-no-similar, ambiguous, success. V4A mode confirmed
end-to-end with a non-matching hunk.
When patch_replace() cannot find old_string in a file, the error message
now includes the closest matching lines from the file with line numbers
and context. This helps the LLM self-correct without a separate read_file
call.
Implements Phase 1 of #536: enhanced patch error feedback with no
architectural changes.
- tools/fuzzy_match.py: new find_closest_lines() using SequenceMatcher
- tools/file_operations.py: attach closest-lines hint to patch errors
- tests/tools/test_fuzzy_match.py: 5 new tests for find_closest_lines
Builds on @AxDSan's PR #2109 to finish the KittenTTS wiring so the
provider behaves like every other TTS backend end to end.
- tools/tts_tool.py: `_check_kittentts_available()` helper and wire
into `check_tts_requirements()`; extend Opus-conversion list to
include kittentts (WAV → Opus for Telegram voice bubbles); point the
missing-package error at `hermes setup tts`.
- hermes_cli/tools_config.py: add KittenTTS entry to the "Text-to-Speech"
toolset picker, with a `kittentts` post_setup hook that auto-installs
the wheel + soundfile via pip.
- hermes_cli/setup.py: `_install_kittentts_deps()`, new choice + install
flow in `_setup_tts_provider()`, provider_labels entry, and status row
in the `hermes setup` summary.
- website/docs/user-guide/features/tts.md: add KittenTTS to the provider
table, config example, ffmpeg note, and the zero-config voice-bubble tip.
- tests/tools/test_tts_kittentts.py: 10 unit tests covering generation,
model caching, config passthrough, ffmpeg conversion, availability
detection, and the missing-package dispatcher branch.
E2E verified against the real `kittentts` wheel:
- WAV direct output (pcm_s16le, 24kHz mono)
- MP3 conversion via ffmpeg (from WAV)
- Telegram flow (provider in Opus-conversion list) produces
`codec_name=opus`, 48kHz mono, `voice_compatible=True`, and the
`[[audio_as_voice]]` marker
- check_tts_requirements() returns True when kittentts is installed
Add support for KittenTTS - a lightweight, local TTS engine with models
ranging from 25-80MB that runs on CPU without requiring a GPU or API key.
Features:
- Support for 8 built-in voices (Jasper, Bella, Luna, etc.)
- Configurable model size (nano 25MB, micro 41MB, mini 80MB)
- Adjustable speech speed
- Model caching for performance
- Automatic WAV to Opus conversion for Telegram voice messages
Configuration example (config.yaml):
tts:
provider: kittentts
kittentts:
model: KittenML/kitten-tts-nano-0.8-int8
voice: Jasper
speed: 1.0
clean_text: true
Installation:
pip install https://github.com/KittenML/KittenTTS/releases/download/0.8.1/kittentts-0.8.1-py3-none-any.whl
Full AST-based scan of all .py files to find every case where a module
or name is imported locally inside a function body but is already
available at module level. This is the second pass — the first commit
handled the known cases from the lint report; this one catches
everything else.
Files changed (19):
cli.py — 16 removals: time as _time/_t/_tmod (×10),
re / re as _re (×2), os as _os, sys,
partial os from combo import,
from model_tools import get_tool_definitions
gateway/run.py — 8 removals: MessageEvent as _ME /
MessageType as _MT (×3), os as _os2,
MessageEvent+MessageType (×2), Platform,
BasePlatformAdapter as _BaseAdapter
run_agent.py — 6 removals: get_hermes_home as _ghh,
partial (contextlib, os as _os),
cleanup_vm, cleanup_browser,
set_interrupt as _sif (×2),
partial get_toolset_for_tool
hermes_cli/main.py — 4 removals: get_hermes_home, time as _time,
logging as _log, shutil
hermes_cli/config.py — 1 removal: get_hermes_home as _ghome
hermes_cli/runtime_provider.py
— 1 removal: load_config as _load_bedrock_config
hermes_cli/setup.py — 2 removals: importlib.util (×2)
hermes_cli/nous_subscription.py
— 1 removal: from hermes_cli.config import load_config
hermes_cli/tools_config.py
— 1 removal: from hermes_cli.config import load_config, save_config
cron/scheduler.py — 3 removals: concurrent.futures, json as _json,
from hermes_cli.config import load_config
batch_runner.py — 1 removal: list_distributions as get_all_dists
(kept print_distribution_info, not at top level)
tools/send_message_tool.py
— 2 removals: import os (×2)
tools/skills_tool.py — 1 removal: logging as _logging
tools/browser_camofox.py
— 1 removal: from hermes_cli.config import load_config
tools/image_generation_tool.py
— 1 removal: import fal_client
environments/tool_context.py
— 1 removal: concurrent.futures
gateway/platforms/bluebubbles.py
— 1 removal: httpx as _httpx
gateway/platforms/whatsapp.py
— 1 removal: import asyncio
tui_gateway/server.py — 2 removals: from datetime import datetime,
import time
All alias references (_time, _t, _tmod, _re, _os, _os2, _json, _ghh,
_ghome, _sif, _ME, _MT, _BaseAdapter, _load_bedrock_config, _httpx,
_logging, _log, get_all_dists) updated to use the top-level names.
Sweep ~74 redundant local imports across 21 files where the same module
was already imported at the top level. Also includes type fixes and lint
cleanups on the same branch.
* feat(skills): inject absolute skill dir and expand ${HERMES_SKILL_DIR} templates
When a skill loads, the activation message now exposes the absolute
skill directory and substitutes ${HERMES_SKILL_DIR} /
${HERMES_SESSION_ID} tokens in the SKILL.md body, so skills with
bundled scripts can instruct the agent to run them by absolute path
without an extra skill_view round-trip.
Also adds opt-in inline-shell expansion: !`cmd` snippets in SKILL.md
are pre-executed (with the skill directory as CWD) and their stdout is
inlined into the message before the agent reads it. Off by default —
enable via skills.inline_shell in config.yaml — because any snippet
runs on the host without approval.
Changes:
- agent/skill_commands.py: template substitution, inline-shell
expansion, absolute skill-dir header, supporting-files list now
shows both relative and absolute forms.
- hermes_cli/config.py: new skills.template_vars,
skills.inline_shell, skills.inline_shell_timeout knobs.
- tests/agent/test_skill_commands.py: coverage for header, both
template tokens (present and missing session id), template_vars
disable, inline-shell default-off, enabled, CWD, and timeout.
- website/docs/developer-guide/creating-skills.md: documents the
template tokens, the absolute-path header, and the opt-in inline
shell with its security caveat.
Validation: tests/agent/ 1591 passed (includes 9 new tests).
E2E: loaded a real skill in an isolated HERMES_HOME; confirmed
${HERMES_SKILL_DIR} resolves to the absolute path, ${HERMES_SESSION_ID}
resolves to the passed task_id, !`date` runs when opt-in is set, and
stays literal when it isn't.
* feat(terminal): source ~/.bashrc (and user-listed init files) into session snapshot
bash login shells don't source ~/.bashrc, so tools that install themselves
there — nvm, asdf, pyenv, cargo, custom PATH exports — stay invisible to
the environment snapshot Hermes builds once per session. Under systemd
or any context with a minimal parent env, that surfaces as
'node: command not found' in the terminal tool even though the binary
is reachable from every interactive shell on the machine.
Changes:
- tools/environments/local.py: before the login-shell snapshot bootstrap
runs, prepend guarded 'source <file>' lines for each resolved init
file. Missing files are skipped, each source is wrapped with a
'[ -r ... ] && . ... || true' guard so a broken rc can't abort the
bootstrap.
- hermes_cli/config.py: new terminal.shell_init_files (explicit list,
supports ~ and ${VAR}) and terminal.auto_source_bashrc (default on)
knobs. When shell_init_files is set it takes precedence; when it's
empty and auto_source_bashrc is on, ~/.bashrc gets auto-sourced.
- tests/tools/test_local_shell_init.py: 10 tests covering the resolver
(auto-bashrc, missing file, explicit override, ~/${VAR} expansion,
opt-out) and the prelude builder (quoting, guarded sourcing), plus
a real-LocalEnvironment snapshot test that confirms exports in the
init file land in subsequent commands' environment.
- website/docs/reference/faq.md: documents the fix in Troubleshooting,
including the zsh-user pattern of sourcing ~/.zshrc or nvm.sh
directly via shell_init_files.
Validation: 10/10 new tests pass; tests/tools/test_local_*.py 40/40
pass; tests/agent/ 1591/1591 pass; tests/hermes_cli/test_config.py
50/50 pass. E2E in an isolated HERMES_HOME: confirmed that a fake
~/.bashrc setting a marker var and PATH addition shows up in a real
LocalEnvironment().execute() call, that auto_source_bashrc=false
suppresses it, that an explicit shell_init_files entry wins over the
auto default, and that a missing bashrc is silently skipped.
Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the
two openai/xai sites in run_agent.py. This finishes the sweep: the same
substring-match false-positive class (e.g. https://api.openai.com.evil/v1,
https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1)
existed in eight more call sites, and the hostname helper was duplicated
in two modules.
- utils: add shared base_url_hostname() (single source of truth).
- hermes_cli/runtime_provider, run_agent: drop local duplicates, import
from utils. Reuse the cached AIAgent._base_url_hostname attribute
everywhere it's already populated.
- agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens
gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg
selection to hostname equality.
- run_agent: native-anthropic check in the Claude-style model branch
and in the AIAgent init provider-auto-detect branch.
- agent/model_metadata: Anthropic /v1/models context-length lookup.
- hermes_cli/providers.determine_api_mode: anthropic / openai URL
heuristics for custom/unknown providers (the /anthropic path-suffix
convention for third-party gateways is preserved).
- tools/delegate_tool: anthropic detection for delegated subagent
runtimes.
- hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint
native-OpenAI detection (paired with deduping the repeated check into
a single is_native_openai boolean per branch).
Tests:
- tests/test_base_url_hostname.py covers the helper directly
(path-containing-host, host-suffix, trailing dot, port, case).
- tests/hermes_cli/test_determine_api_mode_hostname.py adds the same
regression class for determine_api_mode, plus a test that the
/anthropic third-party gateway convention still wins.
Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.
Users can declare shell scripts in config.yaml under a hooks: block that
fire on plugin-hook events (pre_tool_call, post_tool_call, pre_llm_call,
subagent_stop, etc). Scripts receive JSON on stdin, can return JSON on
stdout to block tool calls or inject context pre-LLM.
Key design:
- Registers closures on existing PluginManager._hooks dict — zero changes
to invoke_hook() call sites
- subprocess.run(shell=False) via shlex.split — no shell injection
- First-use consent per (event, command) pair, persisted to allowlist JSON
- Bypass via --accept-hooks, HERMES_ACCEPT_HOOKS=1, or hooks_auto_accept
- hermes hooks list/test/revoke/doctor CLI subcommands
- Adds subagent_stop hook event fired after delegate_task children exit
- Claude Code compatible response shapes accepted
Cherry-picked from PR #13143 by @pefontana.
Cherry-picked from PR #13159 by @cdanis.
Adds native media attachment delivery to Signal via signal-cli JSON-RPC
attachments param. Signal messages with media now follow the same
early-return pattern as Telegram/Discord/Matrix — attachments are sent
only with the last chunk to avoid duplicates.
Follow-up fixes on top of the original PR:
- Moved Signal into its own early-return block above the restriction
check (matches Telegram/Discord/Matrix pattern)
- Fixed media_files being sent on every chunk in the generic loop
- Restored restriction/warning guards to simple form (Signal exits early)
- Fixed non-hermetic test writing to /tmp instead of tmp_path
Adds a _resolve_path() helper that reads TERMINAL_CWD and uses it as
the base for relative path resolution. Applied to _check_sensitive_path,
read_file_tool, _update_read_timestamp, and _check_file_staleness.
Absolute paths and non-worktree sessions (no TERMINAL_CWD) are
unaffected — falls back to os.getcwd().
Fixes#12689.
Replaces the serial for-loop in tick() with ThreadPoolExecutor so all
jobs due in a single tick run concurrently. A slow job no longer blocks
others from executing, fixing silent job skipping (issue #9086).
Thread safety:
- Session/delivery env vars migrated from os.environ to ContextVars
(gateway/session_context.py) so parallel jobs can't clobber each
other's delivery targets. Each thread gets its own copied context.
- jobs.json read-modify-write cycles (advance_next_run, mark_job_run)
protected by threading.Lock to prevent concurrent save clobber.
- send_message_tool reads delivery vars via get_session_env() for
ContextVar-aware resolution with os.environ fallback.
Configuration:
- cron.max_parallel_jobs in config.yaml (null = unbounded, 1 = serial)
- HERMES_CRON_MAX_PARALLEL env var override
Based on PR #9169 by @VenomMoth1.
Fixes#9086
Cherry-picked from PR #2545 by @Mibayy.
The setup wizard could leave stt.model: "whisper-1" in config.yaml.
When using the local faster-whisper provider, this crashed with
"Invalid model size 'whisper-1'". Voice messages were silently ignored.
_normalize_local_model() now detects cloud-only names (whisper-1,
gpt-4o-transcribe, etc.) and maps them to the default local model
with a warning. Valid local sizes (tiny, base, small, medium, large-v3)
pass through unchanged.
- Renamed _normalize_local_command_model -> _normalize_local_model
(backward-compat wrapper preserved)
- 6 new tests including integration test
- Added lowercase AUTHOR_MAP alias for @Mibayy
Closes#2544
On macOS, Unix domain socket paths are capped at 104 bytes (sun_path).
SSH appends a 16-byte random suffix to the ControlPath when operating
in ControlMaster mode. With an IPv6 host embedded literally in the
filename and a deeply-nested macOS $TMPDIR like
/var/folders/XX/YYYYYYYYYYYY/T/, the full path reliably exceeds the
limit — every terminal/file-op tool call then fails immediately with
``unix_listener: path "…" too long for Unix domain socket``.
Swap the ``user@host:port.sock`` filename for a sha256-derived 16-char
hex digest. The digest is deterministic for a given (user, host, port)
triple, so ControlMaster reuse across reconnects is preserved, and the
full path fits comfortably under the limit even after SSH's random
suffix. Collision space is 2^64 — effectively unreachable for the
handful of concurrent connections any single Hermes process holds.
Regression tests cover: path length under realistic macOS $TMPDIR with
the IPv6 host from the issue report, determinism for reconnects, and
distinctness across different (user, host, port) triples.
Closes#11840
Follow-up to #12704. The SignalAdapter can resolve +E164 numbers to
UUIDs via listContacts, but _parse_target_ref() in the send_message
tool rejected '+' as non-digit and fell through to channel-name
resolution — which fails for contacts without a prior session entry.
Adds an E.164 branch in _parse_target_ref for phone-based platforms
(signal, sms, whatsapp) that preserves the leading '+' so downstream
adapters keep the format they expect. Non-phone platforms are
unaffected.
Reported by @qdrop17 on Discord after pulling #12704.
file_tools._get_file_ops() built a container_config dict for Docker/
Singularity/Modal/Daytona backends but omitted docker_mount_cwd_to_workspace
and docker_forward_env. Both are read by _create_environment() from
container_config, so file tools (read_file, write_file, patch, search)
silently ignored those config values when running in Docker.
Add the two missing keys to match the container_config already built by
terminal_tool.terminal_tool().
Fixes#2672.
The vision tool hardcoded temperature=0.1, ignoring the user's
config.yaml setting. This broke providers like Kimi/Moonshot that
require temperature=1 for vision models. Now reads temperature
from auxiliary.vision.temperature, falling back to 0.1.
bash parses `A && B &` with `&&` tighter than `&`, so it forks a subshell
for the compound and backgrounds the subshell. Inside the subshell, B
runs foreground, so the subshell waits for B. When B is a process that
doesn't naturally exit (`python3 -m http.server`, `yes > /dev/null`, a
long-running daemon), the subshell is stuck in `wait4` forever and leaks
as an orphan reparented to init.
Observed in production: agents running `cd X && python3 -m http.server
8000 &>/dev/null & sleep 1 && curl ...` as a "start a local server, then
verify it" one-liner. Outer bash exits cleanly; the subshell never does.
Across ~3 days of use, 8 unique stuck-terminal events and 7 leaked
bash+server pairs accumulated on the fleet, with some sessions appearing
hung from the user's perspective because the subshell's open stdout pipe
kept the terminal tool's drain thread blocked.
This is distinct from the `set +m` fix in 933fbd8f (which addressed
interactive-shell job-control waiting at exit). `set +m` doesn't help
here because `bash -c` is non-interactive and job control is already
off; the problem is the subshell's own internal wait for its foreground
B, not the outer shell's job-tracking.
The fix: walk the command shell-aware (respecting quotes, parens, brace
groups, `&>`/`>&` redirects), find `A && B &` / `A || B &` at depth 0
and rewrite the tail to `A && { B & }`. Brace groups don't fork a
subshell — they run in the current shell. `B &` inside the group is a
simple background (no subshell wait). The outer `&` is absorbed into
the group, so the compound no longer needs an explicit subshell.
`&&` error-propagation is preserved exactly: if A fails, `&&`
short-circuits and B never runs.
- Skips quoted strings, comment lines, and `(…)` subshells
- Handles `&>/dev/null`, `2>&1`, `>&2` without mistaking them for `&`
- Resets chain state at `;`, `|`, and newlines
- Tracks brace depth so already-rewritten output is idempotent
- Walks using the existing `_read_shell_token` tokenizer, matching the
pattern of `_rewrite_real_sudo_invocations`
Called once from `BaseEnvironment.execute` right after
`_prepare_command`, so it runs for every backend (local, ssh, docker,
modal, etc.) with no per-backend plumbing.
34 new tests covering rewrite cases, preservation cases, redirect
edge-cases, quoting/parens/backticks, idempotency, and empty/edge
inputs. End-to-end verified on a test VM: the exact vela-incident
command now returns in ~1.3s with no leaked bash, only the intentional
backgrounded server.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* [verified] fix(mcp-oauth): bridge httpx auth_flow bidirectional generator
HermesMCPOAuthProvider.async_auth_flow wrapped the SDK's auth_flow with
'async for item in super().async_auth_flow(request): yield item', which
discards httpx's .asend(response) values and resumes the inner generator
with None. This broke every OAuth MCP server on the first HTTP response
with 'NoneType' object has no attribute 'status_code' crashing at
mcp/client/auth/oauth2.py:505.
Replace with a manual bridge that forwards .asend() values into the
inner generator, preserving httpx's bidirectional auth_flow contract.
Add tests/tools/test_mcp_oauth_bidirectional.py with two regression
tests that drive the flow through real .asend() round-trips. These
catch the bug at the unit level; prior tests only exercised
_initialize() and disk-watching, never the full generator protocol.
Verified against BetterStack MCP:
Before: 'Connection failed (11564ms): NoneType...' after 3 retries
After: 'Connected (2416ms); Tools discovered: 83'
Regression from #11383.
* [verified] fix(mcp-oauth): seed token_expiry_time + pre-flight AS discovery on cold-load
PR #11383's consolidation fixed external-refresh reloading and 401 dedup
but left two latent bugs that surfaced on BetterStack and any other OAuth
MCP with a split-origin authorization server:
1. HermesTokenStorage persisted only a relative 'expires_in', which is
meaningless after a process restart. The MCP SDK's OAuthContext
does NOT seed token_expiry_time in _initialize, so is_token_valid()
returned True for any reloaded token regardless of age. Expired
tokens shipped to servers, and app-level auth failures (e.g.
BetterStack's 'No teams found. Please check your authentication.')
were invisible to the transport-layer 401 handler.
2. Even once preemptive refresh did fire, the SDK's _refresh_token
falls back to {server_url}/token when oauth_metadata isn't cached.
For providers whose AS is at a different origin (BetterStack:
mcp.betterstack.com for MCP, betterstack.com/oauth/token for the
token endpoint), that fallback 404s and drops into full browser
re-auth on every process restart.
Fix set:
- HermesTokenStorage.set_tokens persists an absolute wall-clock
expires_at alongside the SDK's OAuthToken JSON (time.time() + TTL
at write time).
- HermesTokenStorage.get_tokens reconstructs expires_in from
max(expires_at - now, 0), clamping expired tokens to zero TTL.
Legacy files without expires_at fall back to file-mtime as a
best-effort wall-clock proxy, self-healing on the next set_tokens.
- HermesMCPOAuthProvider._initialize calls super(), then
update_token_expiry on the reloaded tokens so token_expiry_time
reflects actual remaining TTL. If tokens are loaded but
oauth_metadata is missing, pre-flight PRM + ASM discovery runs
via httpx.AsyncClient using the MCP SDK's own URL builders and
response handlers (build_protected_resource_metadata_discovery_urls,
handle_auth_metadata_response, etc.) so the SDK sees the correct
token_endpoint before the first refresh attempt. Pre-flight is
skipped when there are no stored tokens to keep fresh-install
paths zero-cost.
Test coverage (tests/tools/test_mcp_oauth_cold_load_expiry.py):
- set_tokens persists absolute expires_at
- set_tokens skips expires_at when token has no expires_in
- get_tokens round-trips expires_at -> remaining expires_in
- expired tokens reload with expires_in=0
- legacy files without expires_at fall back to mtime proxy
- _initialize seeds token_expiry_time from stored tokens
- _initialize flags expired-on-disk tokens as is_token_valid=False
- _initialize pre-flights PRM + ASM discovery with mock transport
- _initialize skips pre-flight when no tokens are stored
Verified against BetterStack MCP:
hermes mcp test betterstack -> Connected (2508ms), 83 tools
mcp_betterstack_telemetry_list_teams_tool -> real team data, not
'No teams found. Please check your authentication.'
Reference: mcp-oauth-token-diagnosis skill, Fix A.
* chore: map hermes@noushq.ai to benbarclay in AUTHOR_MAP
Needed for CI attribution check on cherry-picked commits from PR #12025.
---------
Co-authored-by: Hermes Agent <hermes@noushq.ai>
Two hardening layers in the patch tool, triggered by a real silent failure
in the previous session:
(1) Post-write verification in patch_replace — after write_file succeeds,
re-read the file and confirm the bytes on disk match the intended write.
If not, return an error instead of the current success-with-diff. Catches
silent persistence failures from any cause (backend FS oddities, stdin
pipe truncation, concurrent task races, mount drift).
(2) Escape-drift guard in fuzzy_find_and_replace — when a non-exact
strategy matches and both old_string and new_string contain literal
\' or \" sequences but the matched file region does not, reject the
patch with a clear error pointing at the likely cause (tool-call
serialization adding a spurious backslash around apostrophes/quotes).
Exact matches bypass the guard, and legitimate edits that add or
preserve escape sequences in files that already have them still work.
Why: in a prior tool call, old_string was sent with \' where the file
has ' (tool-call transport drift). The fuzzy matcher's block_anchor
strategy matched anyway and produced a diff the tool reported as
successful — but the file was never modified on disk. The agent moved
on believing the edit landed when it hadn't.
Tests: added TestPatchReplacePostWriteVerification (3 cases) and
TestEscapeDriftGuard (6 cases). All pass, existing fuzzy match and
file_operations tests unaffected.
* feat: add Discord server introspection and management tool
Add a discord_server tool that gives the agent the ability to interact
with Discord servers when running on the Discord gateway. Uses Discord
REST API directly with the bot token — no dependency on the gateway
adapter's discord.py client.
The tool is only included in the hermes-discord toolset (zero cost for
users on other platforms) and gated on DISCORD_BOT_TOKEN via check_fn.
Actions (14):
- Introspection: list_guilds, server_info, list_channels, channel_info,
list_roles, member_info, search_members
- Messages: fetch_messages, list_pins, pin_message, unpin_message
- Management: create_thread, add_role, remove_role
This addresses a gap where users on Discord could not ask Hermes to
review server structure, channels, roles, or members — a task competing
agents (OpenClaw) handle out of the box.
Files changed:
- tools/discord_tool.py (new): Tool implementation + registration
- model_tools.py: Add to discovery list
- toolsets.py: Add to hermes-discord toolset only
- tests/tools/test_discord_tool.py (new): 43 tests covering all actions,
validation, error handling, registration, and toolset scoping
* feat(discord): intent-aware schema filtering + config allowlist + schema cleanup
- _detect_capabilities() hits GET /applications/@me once per process
to read GUILD_MEMBERS / MESSAGE_CONTENT privileged intent bits.
- Schema is rebuilt per-session in model_tools.get_tool_definitions:
hides search_members / member_info when GUILD_MEMBERS intent is off,
annotates fetch_messages description when MESSAGE_CONTENT is off.
- New config key discord.server_actions (comma-separated or YAML list)
lets users restrict which actions the agent can call, intersected
with intent availability. Unknown names are warned and dropped.
- Defense-in-depth: runtime handler re-checks the allowlist so a stale
cached schema cannot bypass a tightened config.
- Schema description rewritten as an action-first manifest (signature
per action) instead of per-parameter 'required for X, Y, Z' cross-refs.
~25% shorter; model can see each action's required params at a glance.
- Added bounds: limit gets minimum=1 maximum=100, auto_archive_duration
becomes an enum of the 4 valid Discord values.
- 403 enrichment: runtime 403 errors are mapped to actionable guidance
(which permission is missing and what to do about it) instead of the
raw Discord error body.
- 36 new tests: capability detection with caching and force refresh,
config allowlist parsing (string/list/invalid/unknown), intent+allowlist
intersection, dynamic schema build, runtime allowlist enforcement,
403 enrichment, and model_tools integration wiring.
The first draft of the fix called `chunk.decode("utf-8")` directly on
each 4096-byte `os.read()` result, which corrupts output whenever a
multi-byte UTF-8 character straddles a read boundary:
* `UnicodeDecodeError` fires on the valid-but-truncated byte sequence.
* The except handler clears ALL previously-decoded output and replaces
the whole buffer with `[binary output detected ...]`.
Empirically: 10000 '日' chars (30001 bytes) through the wrapper loses
all 10000 characters on the first draft; the baseline TextIOWrapper
drain (which uses `encoding='utf-8', errors='replace'` on Popen)
preserves them all. This regression affects any command emitting
non-ASCII output larger than one chunk — CJK/Arabic/emoji in
`npm install`, `pip install`, `docker logs`, `kubectl logs`, etc.
Fix: swap to `codecs.getincrementaldecoder('utf-8')(errors='replace')`,
which buffers partial multi-byte sequences across chunks and substitutes
U+FFFD for genuinely invalid bytes. Flush on drain exit via
`decoder.decode(b'', final=True)` to emit any trailing replacement
character for a dangling partial sequence.
Adds two regression tests:
* test_utf8_multibyte_across_read_boundary — 10000 U+65E5 chars,
verifies count round-trips and no fallback fires.
* test_invalid_utf8_uses_replacement_not_fallback — deliberate
\xff\xfe between valid ASCII, verifies surrounding text survives.
When a user's command backgrounds a child (`cmd &`, `setsid cmd & disown`,
etc.), the backgrounded grandchild inherits the write-end of our stdout
pipe via fork(). The old `for line in proc.stdout` drain never EOF'd
until the grandchild closed the pipe — so for a uvicorn server, the
terminal tool hung indefinitely (users reported the whole session
deadlocking when asking the agent to restart a backend).
Fix: switch _drain() to select()-based non-blocking reads and stop
draining shortly after bash exits even if the pipe hasn't EOF'd. Any
output the grandchild writes after that point goes to an orphaned pipe,
which is exactly what the user asked for when they said '&'.
Adds regression tests covering the issue's exact repro and 5 related
patterns (plain bg, setsid+disown, streaming output, high volume,
timeout, UTF-8).
Agents can now send arbitrary CDP commands to the browser. The tool is
gated on a reachable CDP endpoint at session start — it only appears in
the toolset when BROWSER_CDP_URL is set (from '/browser connect') or
'browser.cdp_url' is configured in config.yaml. Backends that don't
currently expose CDP to the Python side (Camofox, default local
agent-browser, cloud providers whose per-session cdp_url is not yet
surfaced) do not see the tool at all.
Tool schema description links to the CDP method reference at
https://chromedevtools.github.io/devtools-protocol/ so the agent can
web_extract specific method docs on demand.
Stateless per call. Browser-level methods (Target.*, Browser.*,
Storage.*) omit target_id. Page-level methods attach to the target
with flatten=true and dispatch the method on the returned sessionId.
Clean errors when the endpoint becomes unreachable mid-session or
the URL isn't a WebSocket.
Tests: 19 unit (mock CDP server + gate checks) + E2E against real
headless Chrome (Target.getTargets, Browser.getVersion,
Runtime.evaluate with target_id, Page.navigate + re-eval, bogus
method, bogus target_id, missing endpoint) + E2E of the check_fn
gate (tool hidden without CDP URL, visible with it, hidden again
after unset).
Add approvals.cron_mode config option that controls how cron jobs handle
dangerous commands. Previously, cron jobs silently auto-approved all
dangerous commands because there was no user present to approve them.
Now the behavior is configurable:
- deny (default): block dangerous commands and return a message telling
the agent to find an alternative approach. The agent loop continues —
it just can't use that specific command.
- approve: auto-approve all dangerous commands (previous behavior).
When a command is blocked, the agent receives the same response format as
a user denial in the CLI — exit_code=-1, status=blocked, with a message
explaining why and pointing to the config option. This keeps the agent
loop running and encourages it to adapt.
Implementation:
- config.py: add approvals.cron_mode to DEFAULT_CONFIG
- scheduler.py: set HERMES_CRON_SESSION=1 env var before agent runs
- approval.py: both check_command_approval() and check_all_command_guards()
now check for cron sessions and apply the configured mode
- 21 new tests covering config parsing, deny/approve behavior, and
interaction with other bypass mechanisms (yolo, containers)
Stacking both features on the same event produces duplicate, delayed
notifications — delivery is async and continues firing after the process
exits, so matches on end-of-run markers (SUMMARY, DONE, PASS) arrive
after the agent has already polled/waited and moved on.
Updates both the terminal tool JSON schema description and the
terminal_tool() function docstring to make the split explicit:
- watch_patterns: mid-process signals only (errors, readiness markers,
intermediate steps you want to react to before the process exits)
- notify_on_complete: end-of-run completion signal
No behavioural change.
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
Weaker models (Gemma-class) repeatedly rediscover and forget that execute_code's
working directory differs from terminal()/read_file()'s, leading to
os.path.exists('.env') returning False even though the file exists in the
session's CWD. They then bounce between 'the file exists' and 'the file is
missing' across tool calls.
Adds a 'Working directory' note to the execute_code schema description
pointing agents at absolute paths (os.path.expanduser) or terminal()/read_file()
for inspecting user files.
Carefully avoids the 'sandbox'/'isolated'/'cloud' language that commit
39b83f34 removed (it caused agents on local backends to refuse networking
tasks and save false sandbox beliefs to persistent memory). Purely factual
CWD guidance — no restriction implications.
Error messages that tell users to install optional extras now use
{sys.executable} -m pip install ... instead of a bare 'pip install
hermes-agent[extra]' string. Under the curl installer, bare 'pip'
resolves to system pip, which either fails with PEP 668
externally-managed-environment or installs into the wrong Python.
Affects: hermes dashboard, hermes web server startup, mcp_serve,
hermes doctor Bedrock check, CLI voice mode, voice_mode tool runtime
error, Discord voice-channel join failure message.
* fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace
interrupt() previously only flagged the agent's _execution_thread_id.
Tools running inside _execute_tool_calls_concurrent execute on
ThreadPoolExecutor worker threads whose tids are distinct from the
agent's, so is_interrupted() inside those tools returned False no matter
how many times the gateway called .interrupt() — hung ssh / curl / long
make-builds ran to their own timeout.
Changes:
- run_agent.py: track concurrent-tool worker tids in a per-agent set,
fan interrupt()/clear_interrupt() out to them, and handle the
register-after-interrupt race at _run_tool entry. getattr fallback
for the tracker so test stubs built via object.__new__ keep working.
- tools/environments/base.py: opt-in _wait_for_process trace (ENTER,
per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT
DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1.
- tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target
tid, set snapshot) behind the same env flag.
- tests: new regression test runs a polling tool on a concurrent worker
and asserts is_interrupted() flips to True within ~1s of interrupt().
Second new test guards clear_interrupt() clearing tracked worker bits.
Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env
subset 216 pass.
* fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log
AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when
quiet_mode=True (the CLI default). This would silently swallow every
INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation
added in the parent commit — confirmed by running hermes chat -q with
the flag and finding zero trace lines in agent.log even though
_wait_for_process was clearly executing (subprocess pid existed).
Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets
its own logger level to INFO at import time, overriding the 'tools'
parent-level filter. Scoped to the opt-in case only, so production
(quiet_mode default) logs stay quiet as designed.
Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes
'_wait_for_process ENTER/EXIT' lines to agent.log as expected.
* fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses
Tool subprocesses spawned by the local environment backend use
os.setsid so they run in their own process group. Before this fix,
SIGTERM/SIGHUP to the hermes CLI killed the main thread via
KeyboardInterrupt but the worker thread running _wait_for_process
never got a chance to call _kill_process — Python exited, the child
was reparented to init (PPID=1), and the subprocess ran to its
natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM
to the agent until manual cleanup).
Changes:
- cli.py _signal_handler (interactive) + _signal_handler_q (-q mode):
route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll
loop sees the per-thread interrupt flag and calls _kill_process
(os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default
1.5s) gives the worker time to complete its SIGTERM+SIGKILL
escalation before KeyboardInterrupt unwinds main.
- tools/environments/base.py _wait_for_process: wrap the poll loop in
try/except (KeyboardInterrupt, SystemExit) so the cleanup fires
even on paths the signal handlers don't cover (direct sys.exit,
unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace
line when HERMES_DEBUG_INTERRUPT=1.
- New regression test: injects KeyboardInterrupt into a running
_wait_for_process via PyThreadState_SetAsyncExc, verifies the
subprocess process group is dead within 3s of the exception and
that KeyboardInterrupt re-raises cleanly afterward.
Validation:
| Before | After |
|---------------------------------------------------------|--------------------|
| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM | dies within 2 s |
| No INTERRUPT DETECTED in trace | INTERRUPT DETECTED fires + killing process group |
| tests/tools/test_local_interrupt_cleanup | 1/1 pass |
| tests/run_agent/test_concurrent_interrupt | 4/4 pass |
Extend forum support from PR #10145:
- REST path (_send_discord): forum thread creation now uploads media
files as multipart attachments on the starter message in a single
call. Previously media files were silently dropped on the forum
path.
- Websocket media paths (_send_file_attachment, send_voice, send_image,
send_animation — covers send_image_file, send_video, send_document
transitively): forum channels now go through a new _forum_post_file
helper that creates a thread with the file as starter content,
instead of failing via channel.send(file=...) which forums reject.
- _send_to_forum chunk follow-up failures are collected into
raw_response['warnings'] so partial-send outcomes surface.
- Process-local probe cache (_DISCORD_CHANNEL_TYPE_PROBE_CACHE) avoids
GET /channels/{id} on every uncached send after the first.
- Dedup of TestSendDiscordMedia that the PR merge-resolution left
behind.
- Docs: Forum Channels section under website/docs/user-guide/messaging/discord.md.
Tests: 117 passed (22 new for forum+media, probe cache, warnings).
ShellFileOperations captured the terminal env's cwd at __init__ time and
used that stale value for every subsequent _exec() call. When the user
ran `cd` via the terminal tool, `env.cwd` updated but `ops.cwd` did not.
Relative paths passed to patch_replace / read_file / write_file / search
then targeted the ORIGINAL directory instead of the current one.
Observed symptom in agent sessions:
terminal: cd .worktrees/my-branch
patch hermes_cli/main.py <old> <new>
→ returns {"success": true} with a plausible unified diff
→ but `git diff` in the worktree shows nothing
→ the patch landed in the main repo's checkout of main.py instead
The diff looked legitimate because patch_replace computes it from the
IN-MEMORY content vs new_content, not by re-reading the file. The
write itself DID succeed — it just wrote to the wrong directory's copy
of the same-named file.
Fix: _exec() now resolves cwd from live sources in this order:
1. Explicit `cwd` arg (if provided by the caller)
2. Live `self.env.cwd` (tracks `cd` commands run via terminal)
3. Init-time `self.cwd` (fallback when env has no cwd attribute)
Includes a 5-test regression suite covering:
- cd followed by relative read follows live cwd
- the exact reported bug: patch_replace with relative path after cd
- explicit cwd= arg still wins over env.cwd
- env without cwd attribute falls back to init-time cwd
- patch_replace success reflects real file state (safety rail)
Co-authored-by: teknium1 <teknium@nousresearch.com>
Follow-up polish on top of the cherry-picked #11023 commit.
- feishu_comment_rules.py: replace import-time "~/.hermes" expanduser fallback
with get_hermes_home() from hermes_constants (canonical, profile-safe).
- tools/feishu_doc_tool.py, tools/feishu_drive_tool.py: drop the
asyncio.get_event_loop().run_until_complete(asyncio.to_thread(...)) dance.
Tool handlers run synchronously in a worker thread with no running loop, so
the RuntimeError branch was always the one that executed. Calls client.request
directly now. Unused asyncio import removed.
- tests/gateway/test_feishu.py: add register_p2_customized_event to the mock
EventDispatcher builder so the existing adapter test matches the new handler
registration for drive.notice.comment_add_v1.
- scripts/release.py: map liujinkun@bytedance.com -> liujinkun2025 for
contributor attribution on release notes.
- Full comment handler: parse drive.notice.comment_add_v1 events, build
timeline, run agent, deliver reply with chunking support.
- 5 tools: feishu_doc_read, feishu_drive_list_comments,
feishu_drive_list_comment_replies, feishu_drive_reply_comment,
feishu_drive_add_comment.
- 3-tier access control rules (exact doc > wildcard "*" > top-level >
defaults) with per-field fallback. Config via
~/.hermes/feishu_comment_rules.json, mtime-cached hot-reload.
- Self-reply filter using generalized self_open_id (supports future
user-identity subscriptions). Receiver check: only process events
where the bot is the @mentioned target.
- Smart timeline selection, long text chunking, semantic text extraction,
session sharing per document, wiki link resolution.
Change-Id: I31e82fd6355173dbcc400b8934b6d9799e3137b9
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
Two accretion-over-time leaks that compound over long CLI / gateway
lifetimes. Both were flagged in the memory-leak audit.
## file_tools._read_tracker
_read_tracker[task_id] holds three sub-containers that grew unbounded:
read_history set of (path, offset, limit) tuples — 1 per unique read
dedup dict of (path, offset, limit) → mtime — same growth pattern
read_timestamps dict of resolved_path → mtime — 1 per unique path
A CLI session uses one stable task_id for its lifetime, so these were
uncapped. A 10k-read session accumulated ~1.5MB of tracker state that
the tool no longer needed (only the most recent reads are relevant for
dedup, consecutive-loop detection, and write/patch external-edit
warnings).
Fix: _cap_read_tracker_data() enforces hard caps on each container
after every add. Defaults: read_history=500, dedup=1000,
read_timestamps=1000. Eviction is insertion-order (Python 3.7+ dict
guarantee) for the dicts; arbitrary for the set (which only feeds
diagnostic summaries).
## process_registry._completion_consumed
Module-level set that recorded every session_id ever polled / waited /
logged. No pruning. Each entry is ~20 bytes, so the absolute leak is
small, but on a gateway processing thousands of background commands
per day the set grows until process exit.
Fix: _prune_if_needed() now discards _completion_consumed entries
alongside the session dict evictions it already performs (both the
TTL-based prune and the LRU-over-cap prune). Adds a final
belt-and-suspenders pass that drops any dangling entries whose
session_id no longer appears in _running or _finished.
Tests: tests/tools/test_accretion_caps.py — 9 cases
* Each container bound respected, oldest evicted
* No-op when under cap (no unnecessary work)
* Handles missing sub-containers without crashing
* Live read_file_tool path enforces caps end-to-end
* _completion_consumed pruned on TTL expiry
* _completion_consumed pruned on LRU eviction
* Dangling entries (no backing session) cleared
Broader suite: 3486 tests/tools + tests/cli pass. The single flake
(test_alias_command_passes_args) reproduces on unchanged main — known
cross-test pollution under suite-order load.
- gateway/platforms/weixin.py:
- Split aiohttp.ClientSession into _poll_session and _send_session
- Add _LIVE_ADAPTERS registry so send_weixin_direct() reuses the connected gateway adapter instead of creating a competing session
- Fixes silent message loss when gateway is running (iLink token contention)
- cron/scheduler.py:
- Support comma-separated deliver values (e.g. 'feishu,weixin') for multi-target delivery
- Delay pconfig/enabled check until standalone fallback so live adapters work even when platform is not in gateway config
- tools/send_message_tool.py:
- Synthesize PlatformConfig from WEIXIN_* env vars when gateway config lacks a weixin entry
- Fall back to WEIXIN_HOME_CHANNEL env var for home channel resolution
- tests/gateway/test_weixin.py:
- Update mocks to include _send_session
The send_message tool's direct-REST QQBot path used "QQBotAccessToken {token}"
which QQ's API rejects with 401. The correct format is "QQBot {token}" — the
gateway adapter at gateway/platforms/qqbot.py uses this format in all 5 header
sites (lines 341, 551, 579, 1068, 1467); this was the one outlier.
Credit to @Quon for surfacing this in #10257 (that PR had unrelated issues in
its media-upload logic and was closed; this salvages the genuine 1-line fix).
When a user edits a bundled skill, sync flags it as user_modified and
skips it forever. The problem: if the user later tries to undo the edit
by copying the current bundled version back into ~/.hermes/skills/, the
manifest still holds the old origin hash from the last successful
sync, so the fresh bundled hash still doesn't match and the skill stays
stuck as user_modified.
Adds an escape hatch for this case.
hermes skills reset <name>
Drops the skill's entry from ~/.hermes/skills/.bundled_manifest and
re-baselines against the user's current copy. Future 'hermes update'
runs accept upstream changes again. Non-destructive.
hermes skills reset <name> --restore
Also deletes the user's copy and re-copies the bundled version.
Use when you want the pristine upstream skill back.
Also available as /skills reset in chat.
- tools/skills_sync.py: new reset_bundled_skill(name, restore=False)
- hermes_cli/skills_hub.py: do_reset() + wired into skills_command and
handle_skills_slash; added to the slash /skills help panel
- hermes_cli/main.py: argparse entry for 'hermes skills reset'
- tests/tools/test_skills_sync.py: 5 new tests covering the stuck-flag
repro, --restore, unknown-skill error, upstream-removed-skill, and
no-op on already-clean state
- website/docs/user-guide/features/skills.md: new 'Bundled skill updates'
section explaining the origin-hash mechanic + reset usage
* feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro
Upstream asked for these two upgrades ASAP — the old entries show
stale models when newer, higher-quality versions are available on FAL.
Recraft V3 → Recraft V4 Pro
ID: fal-ai/recraft-v3 → fal-ai/recraft/v4/pro/text-to-image
Price: $0.04/image → $0.25/image (6x — V4 Pro is premium tier)
Schema: V4 dropped the required `style` enum entirely; defaults
handle taste now. Added `colors` and `background_color`
to supports for brand-palette control. `seed` is not
supported by V4 per the API docs.
Nano Banana → Nano Banana Pro
ID: fal-ai/nano-banana → fal-ai/nano-banana-pro
Price: $0.08/image → $0.15/image (1K); $0.30 at 4K
Schema: Aspect ratio family unchanged. Added `resolution`
(1K/2K/4K, default 1K for billing predictability),
`enable_web_search` (real-time info grounding, +$0.015),
and `limit_generations` (force exactly 1 image).
Architecture: Gemini 2.5 Flash → Gemini 3 Pro Image. Quality
and reasoning depth improved; slower (~6s → ~8s).
Migration: users who had the old IDs in `image_gen.model` will
fall through the existing 'unknown model → default' warning path
in `_resolve_fal_model()` and get the Klein 9B default on the next
run. Re-run `hermes tools` → Image Generation to pick the new
version. No silent cost-upgrade aliasing — the 2-6x price jump
on these tiers warrants explicit user re-selection.
Portal note: both new model IDs need to be allowlisted on the
Nous fal-queue-gateway alongside the previous 7 additions, or
users on Nous Subscription will see the 'managed gateway rejected
model' error we added previously (which is clear and
self-remediating, just noisy).
* docs: wrap '<1s' in backticks to unblock MDX compilation
Docusaurus's MDX parser treats unquoted '<' as the start of JSX, and
'<1s' fails because '1' isn't a valid tag-name start character. This
was broken on main since PR #11265 (never noticed because
docs-site-checks was failing on OTHER issues at the time and we
admin-merged through it).
Wrapping in backticks also gives the cell monospace styling which
reads more cleanly alongside the inline-code model ID in the same row.
The other '<1s' occurrence (line 52) is inside a fenced code block
and is already safe — code fences bypass MDX parsing.
* feat(mcp-oauth): scaffold MCPOAuthManager
Central manager for per-server MCP OAuth state. Provides
get_or_build_provider (cached), remove (evicts cache + deletes
disk), invalidate_if_disk_changed (mtime watch, core fix for
external-refresh workflow), and handle_401 (dedup'd recovery).
No behavior change yet — existing call sites still use
build_oauth_auth directly. Task 1 of 8 in the MCP OAuth
consolidation (fixes Cthulhu's BetterStack reliability issues).
* feat(mcp-oauth): add HermesMCPOAuthProvider with pre-flow disk watch
Subclasses the MCP SDK's OAuthClientProvider to inject a disk
mtime check before every async_auth_flow, via the central
manager. When a subclass instance is used, external token
refreshes (cron, another CLI instance) are picked up before
the next API call.
Still dead code: the manager's _build_provider still delegates
to build_oauth_auth and returns the plain OAuthClientProvider.
Task 4 wires this subclass in. Task 2 of 8.
* refactor(mcp-oauth): extract build_oauth_auth helpers
Decomposes build_oauth_auth into _configure_callback_port,
_build_client_metadata, _maybe_preregister_client, and
_parse_base_url. Public API preserved. These helpers let
MCPOAuthManager._build_provider reuse the same logic in Task 4
instead of duplicating the construction dance.
Also updates the SDK version hint in the warning from 1.10.0 to
1.26.0 (which is what we actually require for the OAuth types
used here). Task 3 of 8.
* feat(mcp-oauth): manager now builds HermesMCPOAuthProvider directly
_build_provider constructs the disk-watching subclass using the
helpers from Task 3, instead of delegating to the plain
build_oauth_auth factory. Any consumer using the manager now gets
pre-flow disk-freshness checks automatically.
build_oauth_auth is preserved as the public API for backwards
compatibility. The code path is now:
MCPOAuthManager.get_or_build_provider ->
_build_provider ->
_configure_callback_port
_build_client_metadata
_maybe_preregister_client
_parse_base_url
HermesMCPOAuthProvider(...)
Task 4 of 8.
* feat(mcp): wire OAuth manager + add _reconnect_event
MCPServerTask gains _reconnect_event alongside _shutdown_event.
When set, _run_http / _run_stdio exit their async-with blocks
cleanly (no exception), and the outer run() loop re-enters the
transport to rebuild the MCP session with fresh credentials.
This is the recovery path for OAuth failures that the SDK's
in-place httpx.Auth cannot handle (e.g. cron externally consumed
the refresh_token, or server-side session invalidation).
_run_http now asks MCPOAuthManager for the OAuth provider
instead of calling build_oauth_auth directly. Config-time,
runtime, and reconnect paths all share one provider instance
with pre-flow disk-watch active.
shutdown() defensively sets both events so there is no race
between reconnect and shutdown signalling.
Task 5 of 8.
* feat(mcp): detect auth failures in tool handlers, trigger reconnect
All 5 MCP tool handlers (tool call, list_resources, read_resource,
list_prompts, get_prompt) now detect auth failures and route
through MCPOAuthManager.handle_401:
1. If the manager says recovery is viable (disk has fresh tokens,
or SDK can refresh in-place), signal MCPServerTask._reconnect_event
to tear down and rebuild the MCP session with fresh credentials,
then retry the tool call once.
2. If no recovery path exists, return a structured needs_reauth
JSON error so the model stops hallucinating manual refresh
attempts (the 'let me curl the token endpoint' loop Cthulhu
pasted from Discord).
_is_auth_error catches OAuthFlowError, OAuthTokenError,
OAuthNonInteractiveError, and httpx.HTTPStatusError(401). Non-auth
exceptions still surface via the generic error path unchanged.
Task 6 of 8.
* feat(mcp-cli): route add/remove through manager, add 'hermes mcp login'
cmd_mcp_add and cmd_mcp_remove now go through MCPOAuthManager
instead of calling build_oauth_auth / remove_oauth_tokens
directly. This means CLI config-time state and runtime MCP
session state are backed by the same provider cache — removing
a server evicts the live provider, adding a server populates
the same cache the MCP session will read from.
New 'hermes mcp login <name>' command:
- Wipes both the on-disk tokens file and the in-memory
MCPOAuthManager cache
- Triggers a fresh OAuth browser flow via the existing probe
path
- Intended target for the needs_reauth error Task 6 returns
to the model
Task 7 of 8.
* test(mcp-oauth): end-to-end integration tests
Five new tests exercising the full consolidation with real file
I/O and real imports (no transport mocks):
1. external_refresh_picked_up_without_restart — Cthulhu's cron
workflow. External process writes fresh tokens to disk;
on the next auth flow the manager's mtime-watch flips
_initialized and the SDK re-reads from storage.
2. handle_401_deduplicates_concurrent_callers — 10 concurrent
handlers for the same failed token fire exactly ONE recovery
attempt (thundering-herd protection).
3. handle_401_returns_false_when_no_provider — defensive path
for unknown servers.
4. invalidate_if_disk_changed_handles_missing_file — pre-auth
state returns False cleanly.
5. provider_is_reused_across_reconnects — cache stickiness so
reconnects preserve the disk-watch baseline mtime.
Task 8 of 8 — consolidation complete.
* feat(image_gen): multi-model FAL support with picker in hermes tools
Adds 8 FAL text-to-image models selectable via `hermes tools` →
Image Generation → (FAL.ai | Nous Subscription) → model picker.
Models supported:
- fal-ai/flux-2/klein/9b (new default, <1s, $0.006/MP)
- fal-ai/flux-2-pro (previous default, kept backward-compat upscaling)
- fal-ai/z-image/turbo (Tongyi-MAI, bilingual EN/CN)
- fal-ai/nano-banana (Gemini 2.5 Flash Image)
- fal-ai/gpt-image-1.5 (with quality tier: low/medium/high)
- fal-ai/ideogram/v3 (best typography)
- fal-ai/recraft-v3 (vector, brand styles)
- fal-ai/qwen-image (LLM-based)
Architecture:
- FAL_MODELS catalog declares per-model size family, defaults, supports
whitelist, and upscale flag. Three size families handled uniformly:
image_size_preset (flux family), aspect_ratio (nano-banana), and
gpt_literal (gpt-image-1.5).
- _build_fal_payload() translates unified inputs (prompt + aspect_ratio)
into model-specific payloads, merges defaults, applies caller overrides,
wires GPT quality_setting, then filters to the supports whitelist — so
models never receive rejected keys.
- IMAGEGEN_BACKENDS registry in tools_config prepares for future imagegen
providers (Replicate, Stability, etc.); each provider entry tags itself
with imagegen_backend: 'fal' to select the right catalog.
- Upscaler (Clarity) defaults off for new models (preserves <1s value
prop), on for flux-2-pro (backward-compat). Per-model via FAL_MODELS.
Config:
image_gen.model = fal-ai/flux-2/klein/9b (new)
image_gen.quality_setting = medium (new, GPT only)
image_gen.use_gateway = bool (existing)
Agent-facing schema unchanged (prompt + aspect_ratio only) — model
choice is a user-level config decision, not an agent-level arg.
Picker uses curses_radiolist (arrow keys, auto numbered-fallback on
non-TTY). Column-aligned: Model / Speed / Strengths / Price.
Docs: image-generation.md rewritten with the model table and picker
walkthrough. tools-reference, tool-gateway, overview updated to drop
the stale "FLUX 2 Pro" wording.
Tests: 42 new in tests/tools/test_image_generation.py covering catalog
integrity, all 3 size families, supports filter, default merging, GPT
quality wiring, model resolution fallback. 8 new in
tests/hermes_cli/test_tools_config.py for picker wiring (registry,
config writes, GPT quality follow-up prompt, corrupt-config repair).
* feat(image_gen): translate managed-gateway 4xx to actionable error
When the Nous Subscription managed FAL proxy rejects a model with 4xx
(likely portal-side allowlist miss or billing gate), surface a clear
message explaining:
1. The rejected model ID + HTTP status
2. Two remediation paths: set FAL_KEY for direct access, or
pick a different model via `hermes tools`
5xx, connection errors, and direct-FAL errors pass through unchanged
(those have different root causes and reasonable native messages).
Motivation: new FAL models added to this release (flux-2-klein-9b,
z-image-turbo, nano-banana, gpt-image-1.5, ideogram-v3, recraft-v3,
qwen-image) are untested against the Nous Portal proxy. If the portal
allowlists model IDs, users on Nous Subscription will hit cryptic
4xx errors without guidance on how to work around it.
Tests: 8 new cases covering status extraction across httpx/fal error
shapes and 4xx-vs-5xx-vs-ConnectionError translation policy.
Docs: brief note in image-generation.md for Nous subscribers.
Operator action (Nous Portal side): verify that fal-queue-gateway
passes through these 7 new FAL model IDs. If the proxy has an
allowlist, add them; otherwise Nous Subscription users will see the
new translated error and fall back to direct FAL.
* feat(image_gen): pin GPT-Image quality to medium (no user choice)
Previously the tools picker asked a follow-up question for GPT-Image
quality tier (low / medium / high) and persisted the answer to
`image_gen.quality_setting`. This created two problems:
1. Nous Portal billing complexity — the 22x cost spread between tiers
($0.009 low / $0.20 high) forces the gateway to meter per-tier per
user, which the portal team can't easily support at launch.
2. User footgun — anyone picking `high` by mistake burns through
credit ~6x faster than `medium`.
This commit pins quality at medium by baking it into FAL_MODELS
defaults for gpt-image-1.5 and removes all user-facing override paths:
- Removed `_resolve_gpt_quality()` runtime lookup
- Removed `honors_quality_setting` flag on the model entry
- Removed `_configure_gpt_quality_setting()` picker helper
- Removed `_GPT_QUALITY_CHOICES` constant
- Removed the follow-up prompt call in `_configure_imagegen_model()`
- Even if a user manually edits `image_gen.quality_setting` in
config.yaml, no code path reads it — always sends medium.
Tests:
- Replaced TestGptQualitySetting (6 tests) with TestGptQualityPinnedToMedium
(5 tests) — proves medium is baked in, config is ignored, flag is
removed, helper is removed, non-gpt models never get quality.
- Replaced test_picker_with_gpt_image_also_prompts_quality with
test_picker_with_gpt_image_does_not_prompt_quality — proves only 1
picker call fires when gpt-image is selected (no quality follow-up).
Docs updated: image-generation.md replaces the quality-tier table
with a short note explaining the pinning decision.
* docs(image_gen): drop stale 'wires GPT quality tier' line from internals section
Caught in a cleanup sweep after pinning quality to medium. The
"How It Works Internally" walkthrough still described the removed
quality-wiring step.
Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR #8018:
tools/environments/daytona.py
- PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls
against the same sandbox don't collide on the remote temp path
- Move sync_back() inside the cleanup lock and after the _sandbox-None
guard, with its own try/except. Previously a no-op cleanup (sandbox
already cleared) still fired sync_back → 3-attempt retry storm against
a nil sandbox (~6s of sleep). Now short-circuits cleanly.
tools/environments/file_sync.py
- Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a
tar larger than the limit. Protects against runaway sandboxes
producing arbitrary-size archives.
- Add 'nothing previously pushed' guard at the top of sync_back(). If
_pushed_hashes and _synced_files are both empty, the FileSyncManager
was never initialized from the host side — there is nothing coherent
to sync back. Skips the retry/backoff machinery on uninitialized
managers and eliminates test-suite slowdown from pre-existing cleanup
tests that don't mock the sync layer.
tests/tools/test_file_sync_back.py
- Update _make_manager helper to seed a _pushed_hashes entry by default
so sync_back() exercises its real path. A seed_pushed_state=False
opt-out is available for noop-path tests.
- Add TestSyncBackSizeCap with positive and negative coverage of the
new cap.
tests/tools/test_sync_back_backends.py
- Update Daytona bulk download test to assert the PID-suffixed path
pattern instead of the fixed /tmp/.hermes_sync.tar.
Salvage of PR #8018 by @alt-glitch onto current main.
On sandbox teardown, FileSyncManager now downloads the remote .hermes/
directory, diffs against SHA-256 hashes of what was originally pushed,
and applies only changed files back to the host.
Core (tools/environments/file_sync.py):
- sync_back(): orchestrates download -> unpack -> diff -> apply with:
- Retry with exponential backoff (3 attempts, 2s/4s/8s)
- SIGINT trap + defer (prevents partial writes on Ctrl-C)
- fcntl.flock serialization (concurrent gateway sandboxes)
- Last-write-wins conflict resolution with warning
- New remote files pulled back via _infer_host_path prefix matching
Backends:
- SSH: _ssh_bulk_download — tar cf - piped over SSH
- Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read
- Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file
- All three call sync_back() at the top of cleanup()
Fixes applied during salvage (vs original PR #8018):
| # | Issue | Fix |
|---|-------|-----|
| C1 | import fcntl unconditional — crashes Windows | try/except with fallback; _sync_back_locked skips locking when fcntl=None |
| W1 | assert for runtime guard (stripped by -O) | Replaced with proper if/raise RuntimeError |
| W2 | O(n*m) from _get_files_fn() called per file | Cache mapping once at start of _sync_back_impl, pass to resolve/infer |
| W3 | Dead BulkDownloadFn imports in 3 backends | Removed unused imports |
| W4 | Modal hardcodes root/.hermes, no explanation | Added docstring comment explaining Modal always runs as root |
| S1 | SHA-256 computed for new files where pushed_hash=None | Skip hashing when pushed_hash is None (comparison always False) |
| S2 | Daytona /tmp/.hermes_sync.tar never cleaned up | Added rm -f after download (best-effort) |
Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT
main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup).
Based on #8018 by @alt-glitch.
Users with 'commit.gpgsign = true' in their global git config got a
pinentry popup (or a failed commit) every time the agent took a
background filesystem snapshot — every write_file, patch, or diff
mid-session. With GPG_TTY unset, pinentry-qt/gtk would spawn a GUI
window, constantly interrupting the session.
The shadow repo is internal Hermes infrastructure. It must not
inherit user-level git settings (signing, hooks, aliases, credential
helpers, etc.) under any circumstance.
Fix is layered:
1. _git_env() sets GIT_CONFIG_GLOBAL=os.devnull,
GIT_CONFIG_SYSTEM=os.devnull, and GIT_CONFIG_NOSYSTEM=1. Shadow
git commands no longer see ~/.gitconfig or /etc/gitconfig at all
(uses os.devnull for Windows compat).
2. _init_shadow_repo() explicitly writes commit.gpgsign=false and
tag.gpgSign=false into the shadow's own config, so the repo is
correct even if inspected or run against directly without the
env vars, and for older git versions (<2.32) that predate
GIT_CONFIG_GLOBAL.
3. _take() passes --no-gpg-sign inline on the commit call. This
covers existing shadow repos created before this fix — they will
never re-run _init_shadow_repo (it is gated on HEAD not existing),
so they would miss layer 2. Layer 1 still protects them, but the
inline flag guarantees correctness at the commit call itself.
Existing checkpoints, rollback, list, diff, and restore all continue
to work — history is untouched. Users who had the bug stop getting
pinentry popups; users who didn't see no observable change.
Tests: 5 new regression tests in TestGpgAndGlobalConfigIsolation,
including a full E2E repro with fake HOME, global gpgsign=true, and
a deliberately broken GPG binary — checkpoint succeeds regardless.
The blocking gateway approval wait at tools/approval.py called
`entry.event.wait(timeout=...)` which never touched the agent's
activity tracker. When a user was slow to respond to a /approve prompt
(or the gateway_timeout config was set higher than the default 300s),
the agent thread sat silent long enough for the gateway's inactivity
watchdog (agent.gateway_timeout, default 1800s) to kill it — even
though the agent was doing exactly the right thing and the user was
the one causing the delay.
The fix polls the event in 1s slices and calls touch_activity_if_due
between slices, mirroring the _wait_for_process() pattern in
tools/environments/base.py that covers the subprocess-waiting side of
the same problem. At the default 10s heartbeat cadence, a 300s
approval wait now pings activity ~30 times, well under the 1800s
idle threshold.
Observed in community user logs: 12 repeated 'Agent idle 1800s,
last_activity=executing tool: terminal' events across April 12-14.
Companion to PR #10501 which covered streaming / concurrent-tool /
Modal-backend gaps but did not touch approval.py.
Test: tests/tools/test_approval_heartbeat.py — verifies (1) heartbeats
fire during the wait, (2) user responses are still near-instant, and
(3) the approval path stays functional when the heartbeat helper
can't be imported.
Adds Google Gemini TTS as the seventh voice provider, with 30 prebuilt
voices (Zephyr, Puck, Kore, Enceladus, Gacrux, etc.) and natural-language
prompt control. Integrates through the existing provider chain:
- tools/tts_tool.py: new _generate_gemini_tts() calls the
generativelanguage REST endpoint with responseModalities=[AUDIO],
wraps the returned 24kHz mono 16-bit PCM (L16) in a WAV RIFF header,
then ffmpeg-converts to MP3 or Opus depending on output extension.
For .ogg output, libopus is forced explicitly so Telegram voice
bubbles get Opus (ffmpeg defaults to Vorbis for .ogg).
- hermes_cli/tools_config.py: exposes 'Google Gemini TTS' as a provider
option in the curses-based 'hermes tools' UI.
- hermes_cli/setup.py: adds gemini to the setup wizard picker, tool
status display, and API key prompt branch (accepts existing
GEMINI_API_KEY or GOOGLE_API_KEY, falls back to Edge if neither set).
- tests/tools/test_tts_gemini.py: 15 unit tests covering WAV header
wrap correctness, env var fallback (GEMINI/GOOGLE), voice/model
overrides, snake_case vs camelCase inlineData handling, HTTP error
surfacing, and empty-audio edge cases.
- docs: TTS features page updated to list seven providers with the new
gemini config block and ffmpeg notes.
Live-tested against api key against gemini-2.5-flash-preview-tts: .wav,
.mp3, and Telegram-compatible .ogg (Opus codec) all produce valid
playable audio.
Replace the HERMES_ENABLE_NOUS_MANAGED_TOOLS env-var feature flag with
subscription-based detection. The Tool Gateway is now available to any
paid Nous subscriber without needing a hidden env var.
Core changes:
- managed_nous_tools_enabled() checks get_nous_auth_status() +
check_nous_free_tier() instead of an env var
- New use_gateway config flag per tool section (web, tts, browser,
image_gen) records explicit user opt-in and overrides direct API
keys at runtime
- New prefers_gateway(section) shared helper in tool_backend_helpers.py
used by all 4 tool runtimes (web, tts, image gen, browser)
UX flow:
- hermes model: after Nous login/model selection, shows a curses
prompt listing all gateway-eligible tools with current status.
User chooses to enable all, enable only unconfigured tools, or skip.
Defaults to Enable for new users, Skip when direct keys exist.
- hermes tools: provider selection now manages use_gateway flag —
selecting Nous Subscription sets it, selecting any other provider
clears it
- hermes status: renamed section to Nous Tool Gateway, added
free-tier upgrade nudge for logged-in free users
- curses_radiolist: new description parameter for multi-line context
that survives the screen clear
Runtime behavior:
- Each tool runtime (web_tools, tts_tool, image_generation_tool,
browser_use) checks prefers_gateway() before falling back to
direct env-var credentials
- get_nous_subscription_features() respects use_gateway flags,
suppressing direct credential detection when the user opted in
Removed:
- HERMES_ENABLE_NOUS_MANAGED_TOOLS env var and all references
- apply_nous_provider_defaults() silent TTS auto-set
- get_nous_subscription_explainer_lines() static text
- Override env var warnings (use_gateway handles this properly now)
The completion-line printing block (idx = entry['task_index'] etc.)
was outside the 'for future in done:' loop but referenced 'entry'
which is only assigned inside that loop. When concurrent.futures.wait()
returns with an empty 'done' set (timeout expired, no futures finished),
the loop body never executes and 'entry' is unbound.
Moved the completion-line printing and spinner-update code inside
the for loop so each completed future gets its own status line,
and empty poll cycles simply loop back without accessing 'entry'.
- Extract duplicated activity-callback polling into shared
touch_activity_if_due() helper in tools/environments/base.py
- Use helper from both base.py _wait_for_process and
code_execution_tool.py local polling loop (DRY)
- Add test assertion that timeout output field contains the
timeout message and emoji (#10807)
- Add stream_consumer test for tool-boundary fallback scenario
where continuation is empty but final_text differs from
visible prefix (#10807)
When execute_code times out, the result JSON had status="timeout" and an
error field, but the output field was empty. Many models treat empty
output as "nothing happened" and produce an empty/minimal response. The
gateway stream consumer then considers the response "already sent" (from
pre-tool streaming) and silently drops it — leaving the user staring at
silence.
Three changes:
1. Include the timeout message in the output field (both local and remote
paths) so the model always has visible content to relay to the user.
2. Add periodic activity callbacks to the local execution polling loop so
the gateway's inactivity monitor knows execute_code is alive during
long runs.
3. Fix stream_consumer._send_fallback_final to not silently drop content
when the continuation appears empty but the final text differs from
what was previously streamed (e.g. after a tool boundary reset).
Wraps provider.create_session() in _get_session_info() with try/except
to catch cloud provider runtime failures (timeouts, auth errors, rate
limits, invalid responses). Falls back to _create_local_session() so
browser automation continues working when cloud APIs are down.
Marks fallback sessions with fallback_from_cloud, fallback_reason, and
fallback_provider metadata for observability. If both cloud and local
fail, raises RuntimeError with chained context from both errors.
Closes#10883
Co-authored-by: konsisumer <konsisumer@users.noreply.github.com>
* fix: stop /model from silently rerouting direct providers to OpenRouter (#10300)
detect_provider_for_model() silently remapped models to OpenRouter when
the direct provider's credentials weren't found via env vars. Three bugs:
1. Credential check only looked at env vars from PROVIDER_REGISTRY,
missing credential pool entries, auth store, and OAuth tokens
2. When env var check failed, silently returned ('openrouter', slug)
instead of the direct provider the model actually belongs to
3. Users with valid credentials via non-env-var mechanisms (pool,
OAuth, Claude Code tokens) got silently rerouted
Fix:
- Expand credential check to also query credential pool and auth store
- Always return the direct provider match regardless of credential
status -- let client init handle missing creds with a clear error
rather than silently routing through the wrong provider
Same philosophy as the provider-required fix: don't guess, don't
silently reroute, error clearly when something is missing.
Closes#10300
* fix: word-wrap spinner, interruptable agent join, and delegate_task interrupt
Three fixes:
1. Spinner widget clips long tool commands — prompt_toolkit Window had
height=1 and wrap_lines=False. Now uses wrap_lines=True with dynamic
height from text length / terminal width. Long commands wrap naturally.
2. agent_thread.join() blocked forever after interrupt — if the agent
thread took time to clean up, the process_loop thread froze. Now polls
with 0.2s timeout on the interrupt path, checking _should_exit so
double Ctrl+C breaks out immediately.
3. Root cause of 5-hour CLI hang: delegate_task() used as_completed()
with no interrupt check. When subagent children got stuck, the parent
blocked forever inside the ThreadPoolExecutor. Now polls with
wait(timeout=0.5) and checks parent_agent._interrupt_requested each
iteration. Stuck children are reported as interrupted, and the parent
returns immediately.
Fixes 12 CI test failures:
1. test_cli_new_session (4): _FakeAgent missing commit_memory_session
attribute added in the memory provider refactoring. Added MagicMock.
2. test_run_progress_topics (1): already_sent detection only checked
stream consumer flags, missing the response_previewed path from
interim_assistant_callback. Restructured guard to check both paths.
3. test_timezone (1): HERMES_TIMEZONE leaked into child processes via
_SAFE_ENV_PREFIXES matching HERMES_*. The code correctly converts
it to TZ but didn't remove the original. Added child_env.pop().
4. test_session_env (1): contextvars baseline captured from a different
context couldn't be restored after clear. Changed assertion to verify
the test's value was removed rather than comparing to a fragile baseline.
5. test_discord_slash_commands (5): already fixed on current main.
When an MCP server returns errors consistently (crashed, disconnected,
auth expired), the model sees each error and retries the tool call.
With no circuit breaker, this burned through all 90 iterations — each
one a full LLM API call plus failed MCP call — producing 15-45 minutes
of zero useful output while the gateway inactivity timeout never fired
(because the agent WAS active, just uselessly).
Fix: track consecutive error counts per MCP server. After 3 consecutive
failures (connection errors, MCP-level errors, or transport exceptions),
the handler short-circuits with a message telling the model to stop
retrying and use alternative approaches. The counter resets to 0 on
any successful call.
Closes#10447
Wrap the TelegramAdapter import in _send_to_platform() with a try/except
ImportError guard, matching the existing Feishu pattern in the same function.
When python-telegram-bot is not installed, the import no longer crashes the
cron scheduler. Instead, MAX_MESSAGE_LENGTH falls back to a hardcoded 4096.
The _send_telegram() function already had its own ImportError guard for the
telegram package; this fixes the remaining bare import of TelegramAdapter
in the platform-routing function.
Text-only Matrix sends should continue using the lightweight _send_matrix()
HTTP helper (~100ms). Only route through the heavy MatrixAdapter (full sync +
E2EE setup) when media files are present. Adds test verifying text-only
messages don't take the adapter path.
Matrix media delivery was silently dropped by send_message because Matrix
wasn't wired into the native adapter-backed media path. Only Telegram,
Discord, and Weixin had native media support.
Adds _send_matrix_via_adapter() which creates a MatrixAdapter instance,
connects, sends text + media via the adapter's native upload methods
(send_document, send_image_file, send_video, send_voice), then disconnects.
Also fixes a stale URL-encoding assertion in test_send_message_missing_platforms
that broke after PR #10151 added quote() to room IDs.
Cherry-picked from PR #10486 by helix4u.
bash -lic with a PTY enables job control (set -m), which waits for all
background jobs before the shell exits. A command like
`python3 -m http.server &>/dev/null &` hangs forever because the shell
never completes.
Prefix `set +m;` to disable job control while keeping -i for .bashrc
sourcing and PTY for interactive tools.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
_load_skill_payload() reconstructed skill_dir as SKILLS_DIR / relative_path,
which is wrong for external skills from skills.external_dirs — they live
outside SKILLS_DIR entirely. Scripts and linked files failed to load.
Fix: skill_view() now includes the absolute skill_dir in its result dict.
_load_skill_payload() uses that directly when available, falling back to
the SKILLS_DIR-relative reconstruction only for legacy responses.
Closes#10313
Add "HERMES_" to _SAFE_ENV_PREFIXES in code_execution_tool.py so HERMES_HOME and other Hermes env vars pass through to execute_code subprocesses. Fixes vision_analyze and other tools that rely on get_hermes_home() failing in Docker environments with non-default HERMES_HOME.
Authored by @shin4.
* fix: show correct env var name in provider API key error (#9506)
The error message for missing provider API keys dynamically built
the env var name as PROVIDER_API_KEY (e.g. ALIBABA_API_KEY), but
some providers use different names (alibaba uses DASHSCOPE_API_KEY).
Users following the error message set the wrong variable.
Fix: look up the actual env var from PROVIDER_REGISTRY before
building the error. Falls back to the dynamic name if the registry
lookup fails.
Closes#9506
* fix: five HERMES_HOME profile-isolation leaks (#5947)
Bug A: Thread session_title from session_db to memory provider init kwargs
so honcho can derive chat-scoped session keys instead of falling back to
cwd-based naming that merges all gateway users into one session.
Bug B: Replace 14 hardcoded ~/.hermes/skills/ paths across 10 skill files
with HERMES_HOME-aware alternatives (${HERMES_HOME:-$HOME/.hermes} in
shell, os.environ.get('HERMES_HOME', ...) in Python).
Bug C: install.sh now respects HERMES_HOME env var and adds --hermes-home
flag. Previously --dir only set INSTALL_DIR while HERMES_HOME was always
hardcoded to $HOME/.hermes.
Bug D: Remove hardcoded ~/.hermes/honcho.json fallback in resolve_config_path().
Non-default profiles no longer silently inherit the default profile's honcho
config. Falls through to ~/.honcho/config.json (global) instead.
Bug E: Guard _edit_skill, _patch_skill, _delete_skill, _write_file, and
_remove_file against writing to skills found in external_dirs. Skills
outside the local SKILLS_DIR are now read-only from the agent's perspective.
Closes#5947
_install_tirith() uses shutil.move() to place the binary from tmpdir
to ~/.hermes/bin/. When these are on different filesystems (common in
Docker, NFS), shutil.move() falls back to copy2 + unlink, but copy2's
metadata step can raise PermissionError. This exception propagated
past the fail_open guard, crashing the terminal tool entirely.
Additionally, a failed install could leave a non-executable tirith
binary at the destination, causing a retry loop on every subsequent
terminal command.
Fix:
- Catch OSError from shutil.move() and fall back to shutil.copy()
(skips metadata/xattr copying that causes PermissionError)
- If even copy fails, clean up the partial dest file to prevent
the non-executable retry loop
- Return (None, 'cross_device_copy_failed') so the failure routes
through the existing install-failure caching and fail_open logic
Closes#10127
When a user runs /browser connect to attach browser tools to their real
Chrome instance via CDP, the BROWSER_CDP_URL env var is set. However,
every browser tool function checks _is_camofox_mode() first, which
short-circuits to the Camofox backend before _get_session_info() ever
checks for the CDP override.
Fix: is_camofox_mode() now returns False when BROWSER_CDP_URL is set,
so the explicit CDP connection takes priority. This is the correct
behavior — /browser connect is an intentional user override.
Reported by SkyLinx on Discord.
Models (especially open-source like qwen3.5-plus) may send non-int values
for the limit parameter — None (JSON null), string, or even a type object.
This caused TypeError: '<=' not supported between instances of 'int' and
'type' when the value reached min()/comparison operations.
Changes:
- Add defensive int coercion at session_search() entry with fallback to 3
- Clamp limit to [1, 5] range (was only capped at 5, not floored)
- Add tests for None, type object, string, negative, and zero limit values
Reported by community user ludoSifu via Discord.
Python's json.dumps() defaults to ensure_ascii=True, escaping non-ASCII
characters to \uXXXX sequences. For CJK characters this inflates
token count 3-4x — a single Chinese character like '中' becomes
'\u4e2d' (6 chars vs 3 bytes, ~6 tokens vs ~1 token).
Since MCP tool results feed directly into the model's conversation
context, this silently multiplied API costs for Chinese, Japanese,
and Korean users.
Fix: add ensure_ascii=False to all 20 json.dumps calls in mcp_tool.py.
Raw UTF-8 is valid JSON per RFC 8259 and all downstream consumers
(LLM APIs, display) handle it correctly.
Closes#10234
Multiple gaps in activity tracking could cause the gateway's inactivity
timeout to fire while the agent is actively working:
1. Streaming wait loop had no periodic heartbeat — the outer thread only
touched activity when the stale-stream detector fired (180-300s), and
for local providers (Ollama) the stale timeout was infinity, meaning
zero heartbeats. Now touches activity every 30s.
2. Concurrent tool execution never set the activity callback on worker
threads (threading.local invisible across threads) and never set
_current_tool. Workers now set the callback, and the concurrent wait
uses a polling loop with 30s heartbeats.
3. Modal backend's execute() override had its own polling loop without
any activity callback. Now matches _wait_for_process cadence (10s).
- Populate watcher_* routing fields for watch-only processes (not just
notify_on_complete), so watch-pattern events carry direct metadata
instead of relying solely on session_key parsing fallback
- Extract _parse_session_key() helper to dedupe session key parsing
at two call sites in gateway/run.py
- Add negative test proving cross-thread leakage doesn't happen
- Add edge-case tests for _build_process_event_source returning None
(empty evt, invalid platform, short session_key)
- Add unit tests for _parse_session_key helper
Tool schema descriptions and tool return values contained hardcoded
~/.hermes paths that the model sees and uses. When HERMES_HOME is set
to a custom path (Docker containers, profiles), the agent would still
reference ~/.hermes — looking at the wrong directory.
Fixes 6 locations across 5 files:
- tools/tts_tool.py: output_path schema description
- tools/cronjob_tools.py: script path schema description
- tools/skill_manager_tool.py: skill_manage schema description
- tools/skills_tool.py: two tool return messages
- agent/skill_commands.py: skill config injection text
All now use display_hermes_home() which resolves to the actual
HERMES_HOME path (e.g. /opt/data for Docker, ~/.hermes/profiles/X
for profiles, ~/.hermes for default).
Reported by: Sandeep Narahari (PrithviDevs)
- Fix file handle closed before POST: nest session.post() inside
the 'with open()' block so aiohttp can read the file during upload
- Update warning text to include weixin (also supports media delivery)
- Add 8 unit tests covering: text+media, media-only, missing files,
upload failures, multiple files, and _send_to_platform routing
Previously send_message only supported media delivery for Telegram.
Discord users received a warning that media was omitted.
- Add media_files parameter to _send_discord()
- Upload media via Discord multipart/form-data API (files[0] field)
- Handle Discord in _send_to_platform() same way as Telegram block
- Remove Discord from generic chunk loop (now handled above)
- Update error/warning strings to mention telegram and discord
* fix(gateway): suppress duplicate replies on interrupt and streaming flood control
Three fixes for the duplicate reply bug affecting all gateway platforms:
1. base.py: Suppress stale response when the session was interrupted by a
new message that hasn't been consumed yet. Checks both interrupt_event
and _pending_messages to avoid false positives. (#8221, #2483)
2. run.py (return path): Remove response_previewed guard from already_sent
check. Stream consumer's already_sent alone is authoritative — if
content was delivered via streaming, the duplicate send must be
suppressed regardless of the agent's response_previewed flag. (#8375)
3. run.py (queued-message path): Same fix — already_sent without
response_previewed now correctly marks the first response as already
streamed, preventing re-send before processing the queued message.
The response_previewed field is still produced by the agent (run_agent.py)
but is no longer required as a gate for duplicate suppression. The stream
consumer's already_sent flag is the delivery-level truth about what the
user actually saw.
Concepts from PR #8380 (konsisumer). Closes#8375, #8221, #2483.
* fix(cron): include job_id in delivery and guide models on removal workflow
Users reported cron reminders keep firing after asking the agent to stop.
Root cause: the conversational agent didn't know the job_id (not in delivery)
and models don't reliably do the list→remove two-step without guidance.
1. Include job_id in the cron delivery wrapper so users and agents can
reference it when requesting removal.
2. Replace confusing footer ('The agent cannot see this message') with
actionable guidance ('To stop or manage this job, send me a new
message').
3. Add explicit list→remove guidance in the cronjob tool schema so models
know to list first and never guess job IDs.
Matrix room IDs contain ! and : which must be percent-encoded in URI
path segments per the Matrix C-S spec. Without encoding, some
homeservers reject the PUT request.
Also adds 'matrix:!roomid:server.org' and 'matrix:@user:server.org'
to the tool schema examples so models know the correct target format.
`_parse_target_ref` has explicit-reference branches for Telegram, Feishu,
and numeric IDs, but none for Matrix. As a result, callers of
`send_message(target="matrix:!roomid:server")` or
`send_message(target="matrix:@user:server")` fall through to
`(None, None, False)` and the tool errors out with a resolution failure —
even though a raw Matrix room ID or MXID is the most unambiguous possible
target.
Three-line fix: recognize `!…` as a room ID and `@…` as a user MXID when
platform is `matrix`, and return them as explicit targets. Alias-based
targets (`#…`) continue to go through the normal resolve path.
- find_docker() now checks HERMES_DOCKER_BINARY env var first, then
docker on PATH, then podman on PATH, then macOS known locations
- Entrypoint respects HERMES_HOME env var (was hardcoded to /opt/data)
- Entrypoint uses groupmod -o to tolerate non-unique GIDs (fixes macOS
GID 20 conflict with Debian's dialout group)
- Entrypoint makes chown best-effort so rootless Podman continues
instead of failing with 'Operation not permitted'
- 5 new tests covering env var override, podman fallback, precedence
Based on work by alanjds (PR #3996) and malaiwah (PR #8115).
Closes#4084.
The original tree-wide ast.walk() would match registry.register() calls
inside functions too. Restrict to top-level ast.Expr statements so helper
modules that call registry.register() inside a function are never picked
up as tool modules.
Refactor browser tool PATH construction to include Termux directories
(/data/data/com.termux/files/usr/bin, /data/data/com.termux/files/usr/sbin)
so agent-browser and npx are discoverable on Android/Termux.
Extracts _browser_candidate_path_dirs() and _merge_browser_path() helpers
to centralize PATH construction shared between _find_agent_browser() and
_run_browser_command(), replacing duplicated inline logic.
Also fixes os.pathsep usage (was hardcoded ':') for cross-platform correctness.
Cherry-picked from PR #9846.
Add dangerous command patterns that require approval when the agent
tries to run gateway lifecycle commands via the terminal tool:
- hermes gateway stop/restart — kills all running agents mid-work
- hermes update — pulls code and restarts the gateway
- systemctl restart/stop (with optional flags like --user)
These patterns fire the approval prompt so the user must explicitly
approve before the agent can kill its own gateway process. In YOLO
mode, the commands run without approval (by design — YOLO means the
user accepts all risks).
Also fixes the existing systemctl pattern to handle flags between
the command and action (e.g. 'systemctl --user restart' was previously
undetected because the regex expected the action immediately after
'systemctl').
Root cause: issue #6666 reported agents running 'hermes gateway
restart' via terminal, killing the gateway process mid-agent-loop.
The user sees the agent suddenly stop responding with no explanation.
Combined with the SIGTERM auto-recovery from PR #9875, the gateway
now both prevents accidental self-destruction AND recovers if it
happens anyway.
Test plan:
- Updated test_systemctl_restart_not_flagged → test_systemctl_restart_flagged
- All 119 approval tests pass
- E2E verified: hermes gateway restart, hermes update, systemctl
--user restart all detected; hermes gateway status, systemctl
status remain safe
Add ctx.register_skill() API so plugins can ship SKILL.md files under
a 'plugin:skill' namespace, preventing name collisions with built-in
Hermes skills. skill_view() detects the ':' separator and routes to
the plugin registry while bare names continue through the existing
flat-tree scan unchanged.
Key additions:
- agent/skill_utils: parse_qualified_name(), is_valid_namespace()
- hermes_cli/plugins: PluginContext.register_skill(), PluginManager
skill registry (find/list/remove)
- tools/skills_tool: qualified name dispatch in skill_view(),
_serve_plugin_skill() with full guards (disabled, platform,
injection scan), bundle context banner with sibling listing,
stale registry self-heal
- Hoisted _INJECTION_PATTERNS to module level (dedup)
- Updated skill_view schema description
Based on PR #9334 by N0nb0at. Lean P1 salvage — omits autogen shim
(P2) for a simpler first merge.
Closes#8422
- Fix _camofox_eval() endpoint: /tabs/{id}/eval → /tabs/{id}/evaluate
(correct Camofox REST API path)
- Add required userId field to JS eval request body (all other Camofox
endpoints already include it)
- Update npm package from @askjo/camoufox-browser ^1.0.0 to
@askjo/camofox-browser ^1.5.2 (upstream package was renamed)
- Update tools_config.py post-setup to reference new package directory
and npx command
- Bump Node engine requirement from >=18 to >=20 (required by
camoufox-js dependency in camofox-browser v1.5.2)
- Regenerate package-lock.json
Fixes issues reported in PRs #9472, #8267, #7208 (stale).
Match cron/scheduler.py pattern — only attempt msvcrt import when
fcntl is unavailable. Pre-declare msvcrt = None at module level so
_file_lock() references don't NameError on Linux.
Production fixes:
- Add clear_session_context() to hermes_logging.py (fixes 48 teardown errors)
- Add clear_session() to tools/approval.py (fixes 9 setup errors)
- Add SyncError M_UNKNOWN_TOKEN check to Matrix _sync_loop (bug fix)
- Fall back to inline api_key in named custom providers when key_env
is absent (runtime_provider.py)
Test fixes:
- test_memory_user_id: use builtin+external provider pair, fix honcho
peer_name override test to match production behavior
- test_display_config: remove TestHelpers for non-existent functions
- test_auxiliary_client: fix OAuth tokens to match _is_oauth_token
patterns, replace get_vision_auxiliary_client with resolve_vision_provider_client
- test_cli_interrupt_subagent: add missing _execution_thread_id attr
- test_compress_focus: add model/provider/api_key/base_url/api_mode
to mock compressor
- test_auth_provider_gate: add autouse fixture to clean Anthropic env
vars that leak from CI secrets
- test_opencode_go_in_model_list: accept both 'built-in' and 'hermes'
source (models.dev API unavailable in CI)
- test_email: verify email Platform enum membership instead of source
inspection (build_channel_directory now uses dynamic enum loop)
- test_feishu: add bot_added/bot_deleted handler mocks to _Builder
- test_ws_auth_retry: add AsyncMock for sync_store.get_next_batch,
add _pending_megolm and _joined_rooms to Matrix adapter mocks
- test_restart_drain: monkeypatch-delete INVOCATION_ID (systemd sets
this in CI, changing the restart call signature)
- test_session_hygiene: add user_id to SessionSource
- test_session_env: use relative baseline for contextvar clear check
(pytest-xdist workers share context)
Improvements from our earlier #8269 salvage work applied to #7616:
- Platform token lock: acquire_scoped_lock/release_scoped_lock prevents
two profiles from double-connecting the same QQ bot simultaneously
- Send retry with exponential backoff (3 attempts, 1s/2s/4s) with
permanent vs transient error classification (matches Telegram pattern)
- Proper long-message splitting via truncate_message() instead of
hard-truncating at MAX_MESSAGE_LENGTH (preserves code blocks, adds 1/N)
- REST-based one-shot send in send_message_tool — uses QQ Bot REST API
directly with httpx instead of creating a full WebSocket adapter per
message (fixes the connect→send race condition)
- Use shared strip_markdown() from helpers.py instead of 15 lines of
inline regex with import-inside-method (DRY, same as BlueBubbles/SMS)
- format_message() now wired into send() pipeline
- Rename platform from 'qq' to 'qqbot' across all integration points
(Platform enum, toolset, config keys, import paths, file rename qq.py → qqbot.py)
- Add PLATFORM_HINTS for QQBot in prompt_builder (QQ supports markdown)
- Set SUPPORTS_MESSAGE_EDITING = False to skip streaming on QQ
(prevents duplicate messages from non-editable partial + final sends)
- Add _send_qqbot() standalone send function for cron/send_message tool
- Add interactive _setup_qq() wizard in hermes_cli/setup.py
- Restore missing _setup_signal/email/sms/dingtalk/feishu/wecom/wecom_callback
functions that were lost during the original merge
Three improvements to file search based on user feedback:
1. Fuzzy @ completions (commands.py):
- Bare @query now does project-wide fuzzy file search instead of
prefix-only directory listing
- Uses rg --files with 5-second cache for responsive completions
- Scoring: exact name (100) > prefix (80) > substring (60) >
path contains (40) > subsequence with boundary bonus (35/25)
- Bare @ with no query shows recently modified files first
2. Mtime-sorted file search (file_operations.py):
- _search_files_rg now uses --sortr=modified (rg 13+) to surface
recently edited files first
- Falls back to unsorted on older rg versions
3. Improved file-not-found suggestions (file_operations.py):
- Replaced crude character-set overlap with ranked scoring:
same basename (90) > prefix (70) > substring (60) >
reverse substring (40) > same extension (30)
- search_files path-not-found now suggests similar directories
from the parent
Cherry-picked from PR #9177 by @haileymarshall.
Adds a fitness and nutrition skill for gym-goers and health-conscious users:
- Exercise search via wger API (690+ exercises, free, no auth)
- Nutrition lookup via USDA FoodData Central (380K+ foods, DEMO_KEY fallback)
- Offline body composition calculators (BMI, TDEE, 1RM, macros, body fat %)
- Pure stdlib Python, no pip dependencies
Changes from original PR:
- Moved from skills/ to optional-skills/health/ (correct location)
- Fixed BMR formula in FORMULAS.md (removed confusing -5+10, now just +5)
- Fixed author attribution to match PR submitter
- Marked USDA_API_KEY as optional (DEMO_KEY works without signup)
Also adds optional env var support to the skill readiness checker:
- New 'optional: true' field in required_environment_variables entries
- Optional vars are preserved in metadata but don't block skill readiness
- Optional vars skip the CLI capture prompt flow
- Skills with only optional missing vars show as 'available' not 'setup_needed'
Port two improvements inspired by Kilo-Org/kilocode analysis:
1. Error classifier: add context overflow patterns for vLLM, Ollama,
and llama.cpp/llama-server. These local inference servers return
different error formats than cloud providers (e.g., 'exceeds the
max_model_len', 'context length exceeded', 'slot context'). Without
these patterns, context overflow errors from local servers are
misclassified as format errors, causing infinite retries instead
of triggering compression.
2. MCP initial connection retry: previously, if the very first
connection attempt to an MCP server failed (e.g., transient DNS
blip at startup), the server was permanently marked as failed with
no retry. Post-connect reconnection had 5 retries with exponential
backoff, but initial connection had zero. Now initial connections
retry up to 3 times with backoff before giving up, matching the
resilience of post-connect reconnection.
(Inspired by Kilo Code's MCP server disappearing fix in v1.3.3)
Tests: 6 new error classifier tests, 4 new MCP retry tests, 1
updated existing test. All 276 affected tests pass.
On macOS, /etc is a symlink to /private/etc, so os.path.realpath()
resolves /etc/hosts to /private/etc/hosts. The sensitive path check
only matched /etc/ prefixes against the resolved path, allowing
writes to system files on macOS.
- Add /private/etc/ and /private/var/ to _SENSITIVE_PATH_PREFIXES
- Check both realpath-resolved and normpath-normalized paths
- Add regression tests for macOS symlink bypass
Closes#8734
Co-authored-by: ElhamDevelopmentStudio (PR #8829)
Three-tier match strategy for _truncate_around_matches():
1. Full-phrase search (exact query string positions)
2. Proximity co-occurrence (all terms within 200 chars)
3. Individual terms (fallback, preserves existing behavior)
Sliding window picks the start offset covering the most matches.
Moved inline import re to module level.
Co-authored-by: Al Sayed Hoota <78100282+AlsayedHoota@users.noreply.github.com>
The terminal and execute_code tool schemas unconditionally mentioned
'cloud sandboxes' in their descriptions sent to the model. This caused
agents running on local backends to believe they were in a sandboxed
environment, refusing networking tasks and other operations. Worse,
agents sometimes saved this false belief to persistent memory, making
it persist across sessions.
Reported by multiple users (XLion, 林泽).
Port from nearai/ironclaw#2304: Telegram's 4096 character limit is
measured in UTF-16 code units, not Unicode codepoints. Characters
outside the Basic Multilingual Plane (emoji like 😀, CJK Extension B,
musical symbols) are surrogate pairs: 1 Python char but 2 UTF-16 units.
Previously, truncate_message() used Python's len() which counts
codepoints. This could produce chunks exceeding Telegram's actual limit
when messages contain many astral-plane characters.
Changes:
- Add utf16_len() helper and _prefix_within_utf16_limit() for
UTF-16-aware string measurement and truncation
- Add _custom_unit_to_cp() binary-search helper that maps a custom-unit
budget to the largest safe codepoint slice position
- Update truncate_message() to accept optional len_fn parameter
- Telegram adapter now passes len_fn=utf16_len when splitting messages
- Fix fallback truncation in Telegram error handler to use
_prefix_within_utf16_limit instead of codepoint slicing
- Update send_message_tool.py to use utf16_len for Telegram platform
- Add comprehensive tests: utf16_len, _prefix_within_utf16_limit,
truncate_message with len_fn (emoji splitting, content preservation,
code block handling)
- Update mock lambdas in reply_mode tests to accept **kw for len_fn
Add a CI-built skills index served from the docs site. The index is
crawled daily by GitHub Actions, resolves all GitHub paths upfront, and
is cached locally by the client. When the index is available:
- Search uses the cached index (0 GitHub API calls, was 23+)
- Install uses resolved paths from index (6 API calls for file
downloads only, was 31-45 for discovery + downloads)
Total: 68 → 6 GitHub API calls for a typical search + install flow.
Unauthenticated users (60 req/hr) can now search and install without
hitting rate limits.
Components:
- scripts/build_skills_index.py: Crawl all sources (skills.sh, GitHub
taps, official, clawhub, lobehub), batch-resolve GitHub paths via
tree API, output JSON index
- tools/skills_hub.py: HermesIndexSource class — search/fetch/inspect
backed by the index, with lazy GitHubSource for file downloads
- parallel_search_sources() skips external API sources when index is
available (0 GitHub calls for search)
- .github/workflows/skills-index.yml: twice-daily CI build + deploy
- .github/workflows/deploy-site.yml: also builds index during docs deploy
Graceful degradation: when the index is unavailable (first run, network
down, stale), all methods return empty/None and downstream sources
handle the request via direct API as before.
Skills.sh installs hit the GitHub API 45 times per install because the
same repo tree was fetched 6 times redundantly. Combined with search
(23 API calls), this totals 68 — exceeding the unauthenticated rate
limit of 60 req/hr, causing 'Could not fetch' errors for users without
a GITHUB_TOKEN.
Changes:
- Add _get_repo_tree() cache to GitHubSource — repo info + recursive
tree fetched once per repo per source instance, eliminating 10
redundant API calls (6 tree + 4 candidate 404s)
- _download_directory_via_tree returns {} (not None) when cached tree
shows path doesn't exist, skipping unnecessary Contents API fallback
- _check_rate_limit_response() detects exhausted quota and sets
is_rate_limited flag
- do_install() shows actionable hint when rate limited: set
GITHUB_TOKEN or install gh CLI
Before: 45 API calls per install (68 total with search)
After: 31 API calls per install (54 total with search — under 60/hr)
Reported by community user from Vietnam (no GitHub auth configured).
Centralize container detection in hermes_constants.is_container() with
process-lifetime caching, matching existing is_wsl()/is_termux() patterns.
Dedup _is_inside_container() in config.py to delegate to the new function.
Add _run_systemctl() wrapper that converts FileNotFoundError to RuntimeError
for defense-in-depth — all 10 bare subprocess.run(_systemctl_cmd(...)) call
sites now route through it.
Make supports_systemd_services() return False in containers and when
systemctl binary is absent (shutil.which check).
Add Docker-specific guidance in gateway_command() for install/uninstall/start
subcommands — exit 0 with helpful instructions instead of crashing.
Make 'hermes status' show 'Manager: docker (foreground)' and 'hermes dump'
show 'running (docker, pid N)' inside containers.
Fix setup_gateway() to use supports_systemd instead of _is_linux for all
systemd-related branches, and show Docker restart policy instructions in
containers.
Replace inline /.dockerenv check in voice_mode.py with is_container().
Fixes#7420
Co-authored-by: teknium1 <teknium1@users.noreply.github.com>
* fix: list all available toolsets in delegate_task schema description
The delegate_task tool's toolsets parameter description only mentioned
'terminal', 'file', and 'web' as examples. Models (especially smaller
ones like Gemma) would substitute 'web' for 'browser' because they
didn't know 'browser' was a valid option.
Now dynamically builds the toolset list from the TOOLSETS dict at import
time, excluding blocked, composite, and platform-specific toolsets.
Auto-updates when new toolsets are added.
Reported by jeffutter on Discord.
* chore: exclude moa and rl from delegate_task toolset list
When the agent calls process(action='wait') or process(action='poll')
and gets the exited status, the completion_queue notification is
redundant — the agent already has the output from the tool return.
Previously, the drain loops in CLI and gateway would still inject
the [SYSTEM: Background process completed] message, causing the
agent to receive the same information twice.
Fix: track session IDs in _completion_consumed set when wait/poll/log
returns an exited process. Drain loops in cli.py and gateway watcher
skip completion events for consumed sessions. Watch pattern events
are never suppressed (they have independent semantics).
Adds 4 tests covering wait/poll/log marking and running-process
negative case.
Rewrite the cronjob tool's 'deliver' parameter description to strongly
guide models toward omitting the parameter (which auto-detects origin
including thread/topic). The previous description listed all platform
names equally, inviting models to construct explicit targets like
'telegram:<chat_id>' which silently drops the thread_id.
New description:
- Leads with 'Omit this parameter' as the recommended path
- Explicitly warns that platform:chat_id without :thread_id loses topics
- Removes the long flat list of platform names that invited construction
Also adds diagnostic logging at two key points:
- _origin_from_env(): logs when thread_id is captured during job creation
- _deliver_result(): warns when origin has thread_id but delivery target
lost it; logs at debug when delivering to a specific thread
Helps diagnose user-reported issue where cron responses from Telegram
topics are delivered to the main chat instead of the originating topic.
* perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive
SSH: symlink-staging + tar -ch piped over SSH in a single TCP stream.
Eliminates per-file scp round-trips. Handles timeout (kills both
processes), SSH Popen failure (kills tar), and tar create failure.
Modal: in-memory gzipped tar archive, base64-encoded, decoded+extracted
in one exec call. Checks exit code and raises on failure.
Both backends use shared helpers extracted into file_sync.py:
- quoted_mkdir_command() — mirrors existing quoted_rm_command()
- unique_parent_dirs() — deduplicates parent dirs from file pairs
Migrates _ensure_remote_dirs to use the new helpers.
28 new tests (21 SSH + 7 Modal), all passing.
Closes#7465Closes#7467
* fix(modal): pipe stdin to avoid ARG_MAX, clean up review findings
- Modal bulk upload: stream base64 payload through proc.stdin in 1MB
chunks instead of embedding in command string (Modal SDK enforces
64KB ARG_MAX_BYTES — typical payloads are ~4.3MB)
- Modal single-file upload: same stdin fix, add exit code checking
- Remove what-narrating comments in ssh.py and modal.py (keep WHY
comments: symlink staging rationale, SIGPIPE, deadlock avoidance)
- Remove unnecessary `sandbox = self._sandbox` alias in modal bulk
- Daytona: use shared helpers (unique_parent_dirs, quoted_mkdir_command)
instead of inlined duplicates
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
The check_interval parameter on terminal_tool sent periodic output
updates to the gateway chat, but these were display-only — the agent
couldn't see or act on them. This added schema bloat and introduced
a bug where notify_on_complete=True was silently dropped when
check_interval was also set (the not-check_interval guard skipped
fast-watcher registration, and the check_interval watcher dict
was missing the notify_on_complete key).
Removing check_interval entirely:
- Eliminates the notify_on_complete interaction bug
- Reduces tool schema size (one fewer parameter for the model)
- Simplifies the watcher registration path
- notify_on_complete (agent wake-on-completion) still works
- watch_patterns (output alerting) still works
- process(action='poll') covers manual status checking
Closes#7947 (root cause eliminated rather than patched).
Deduplicate todo items by ID before writing to the store, keeping the
last occurrence. Prevents ghost entries when the model sends duplicate
IDs in a single write() call, which corrupts subsequent merge operations.
Co-authored-by: WAXLYY <WAXLYY@users.noreply.github.com>
Three root causes of the 'agent stops mid-task' gateway bug:
1. Compression threshold floor (64K tokens minimum)
- The 50% threshold on a 100K-context model fired at 50K tokens,
causing premature compression that made models lose track of
multi-step plans. Now threshold_tokens = max(50% * context, 64K).
- Models with <64K context are rejected at startup with a clear error.
2. Budget warning removal — grace call instead
- Removed the 70%/90% iteration budget warnings entirely. These
injected '[BUDGET WARNING: Provide your final response NOW]' into
tool results, causing models to abandon complex tasks prematurely.
- Now: no warnings during normal execution. When the budget is
actually exhausted (90/90), inject a user message asking the model
to summarise, allow one grace API call, and only then fall back
to _handle_max_iterations.
3. Activity touches during long terminal execution
- _wait_for_process polls every 0.2s but never reported activity.
The gateway's inactivity timeout (default 1800s) would fire during
long-running commands that appeared 'idle.'
- Now: thread-local activity callback fires every 10s during the
poll loop, keeping the gateway's activity tracker alive.
- Agent wires _touch_activity into the callback before each tool call.
Also: docs update noting 64K minimum context requirement.
Closes#7915 (root cause was agent-loop termination, not Weixin delivery limits).
Complete the contextvars migration by adding HERMES_SESSION_KEY to the
unified _VAR_MAP in session_context.py. Without this, concurrent gateway
handlers race on os.environ["HERMES_SESSION_KEY"].
- Add _SESSION_KEY ContextVar to _VAR_MAP, set_session_vars(), clear_session_vars()
- Wire session_key through _set_session_env() from SessionContext
- Replace os.getenv fallback in tools/approval.py with get_session_env()
(function-level import to avoid cross-layer coupling)
- Keep os.environ set as CLI/cron fallback
Cherry-picked from PR #7878 by 0xbyt4.
Add a second WeCom integration mode for regular enterprise self-built
applications. Unlike the existing bot/websocket adapter (wecom.py),
this handles WeCom's standard callback flow: WeCom POSTs encrypted XML
to an HTTP endpoint, the adapter decrypts, queues for the agent, and
immediately acknowledges. The agent's reply is delivered proactively
via the message/send API.
Key design choice: always acknowledge immediately and use proactive
send — agent sessions take 3-30 minutes, so the 5-second inline reply
window is never useful. The original PR's Future/pending-reply
machinery was removed in favour of this simpler architecture.
Features:
- AES-CBC encrypt/decrypt (BizMsgCrypt-compatible)
- Multi-app routing scoped by corp_id:user_id
- Legacy bare user_id fallback for backward compat
- Access-token management with auto-refresh
- WECOM_CALLBACK_* env var overrides
- Port-in-use pre-check before binding
- Health endpoint at /health
Salvaged from PR #7774 by @chqchshj. Simplified by removing the
inline reply Future system and fixing: secrets.choice for nonce
generation, immediate plain-text acknowledgment (not encrypted XML
containing 'success'), and initial token refresh error handling.
Adds _normalize_path() helper that calls expanduser().resolve() to
properly handle tilde paths (e.g. ~/.hermes, ~/.config). Previously
Path.resolve() alone treated ~ as a literal directory name, producing
invalid paths like /root/~/.hermes.
Also improves _run_git() error handling to distinguish missing working
directories from missing git executable, and adds pre-flight directory
validation.
Cherry-picked from PR #7898 by faishal882.
Fixes#7807
_write_to_sandbox interpolated storage_dir and remote_path directly into
a shell command passed to env.execute(). Paths containing shell
metacharacters (spaces, semicolons, $(), backticks) could trigger
arbitrary command execution inside the sandbox.
Fix: wrap both paths with shlex.quote(). Clean paths (alphanumeric +
slashes/hyphens/dots) are left unmodified by shlex.quote, so existing
behavior is unchanged. Paths with unsafe characters get single-quoted.
Tests added for spaces, $(command) substitution, and semicolon injection.
This commit addresses a security vulnerability where unsanitized user inputs for commit_hash and file_path were passed directly to git commands in CheckpointManager.restore() and diff(). It validates commit hashes to be strictly hexadecimal characters without leading dashes (preventing flag injection like '--patch') and enforces file paths to stay within the working directory via root resolution. Regression tests test_restore_rejects_argument_injection, test_restore_rejects_invalid_hex_chars, and test_restore_rejects_path_traversal were added.
The interrupt mechanism in tools/interrupt.py used a process-global
threading.Event. In the gateway, multiple agents run concurrently in
the same process via run_in_executor. When any agent was interrupted
(user sends a follow-up message), the global flag killed ALL agents'
running tools — terminal commands, browser ops, web requests — across
all sessions.
Changes:
- tools/interrupt.py: Replace single threading.Event with a set of
interrupted thread IDs. set_interrupt() targets a specific thread;
is_interrupted() checks the current thread. Includes a backward-
compatible _ThreadAwareEventProxy for legacy _interrupt_event usage.
- run_agent.py: Store execution thread ID at start of run_conversation().
interrupt() and clear_interrupt() pass it to set_interrupt() so only
this agent's thread is affected.
- tools/code_execution_tool.py: Use is_interrupted() instead of
directly checking _interrupt_event.is_set().
- tools/process_registry.py: Same — use is_interrupted().
- tests: Update interrupt tests for per-thread semantics. Add new
TestPerThreadInterruptIsolation with two tests verifying cross-thread
isolation.
When a Python process exits uncleanly (SIGKILL, crash, gateway restart
via hermes update), in-memory _active_sessions tracking is lost but the
agent-browser node daemons and their Chromium child processes keep
running indefinitely. On a long-running system this causes unbounded
memory growth — 24 orphaned sessions consumed 7.6 GB on a production
machine over 9 days.
Add _reap_orphaned_browser_sessions() which scans the tmp directory for
agent-browser-{h_*,cdp_*} socket dirs on cleanup thread startup. For
each dir not tracked by the current process, reads the daemon PID file
and sends SIGTERM if the daemon is still alive. Handles edge cases:
dead PIDs, corrupt PID files, permission errors, foreign processes.
The reaper runs once on thread startup (not every 30s) to avoid races
with sessions being actively created by concurrent agents.
Background process watchers (notify_on_complete, check_interval) created
synthetic SessionSource objects without user_id/user_name. While the
internal=True bypass (1d8d4f28) prevented false pairing for agent-
generated notifications, the missing identity caused:
- Garbage entries in pairing rate limiters (discord:None, telegram:None)
- 'User None' in approval messages and logs
- No user identity available for future code paths that need it
Additionally, platform messages arriving without from_user (Telegram
service messages, channel forwards, anonymous admin actions) could still
trigger false pairing because they are not internal events.
Fix:
1. Propagate user_id/user_name through the full watcher chain:
session_context.py → gateway/run.py → terminal_tool.py →
process_registry.py (including checkpoint persistence/recovery)
2. Add None user_id guard in _handle_message() — silently drop
non-internal messages with no user identity instead of triggering
the pairing flow.
Salvaged from PRs #7664 (kagura-agent, ContextVar approach),
#6540 (MestreY0d4-Uninter, tests), and #7709 (guang384, None guard).
Closes#6341, #6485, #7643
Relates to #6516, #7392
Independent halving of width and height caused aspect ratio distortion
for extreme dimensions (e.g. 8000x200 panoramas). When one axis hit the
64px floor, the other kept shrinking — collapsing the ratio toward 1:1.
Use proportional scaling instead: when either dimension hits the floor,
derive the effective scale factor and apply it to both axes.
Add tests for extreme panorama (8000x200) and tall narrow (200x6000)
images to verify aspect ratio preservation.
Cherry-picked from PR #7749 by kshitijk4poor with modifications:
- Raise hard image limit from 5 MB to 20 MB (matches most restrictive provider)
- Send images at full resolution first; only auto-resize to 5 MB on API failure
- Add _is_image_size_error() helper to detect size-related API rejections
- Auto-resize uses Pillow (soft dep) with progressive downscale + JPEG quality reduction
- Fix get_model_capabilities() to check modalities.input for vision support
- Increase default vision timeout from 30s to 120s (matches hardcoded fallback intent)
- Applied retry-with-resize to both vision_analyze_tool and browser_vision
Closes#7740
* feat: add watch_patterns to background processes for output monitoring
Adds a new 'watch_patterns' parameter to terminal(background=true) that
lets the agent specify strings to watch for in process output. When a
matching line appears, a notification is queued and injected as a
synthetic message — triggering a new agent turn, similar to
notify_on_complete but mid-process.
Implementation:
- ProcessSession gets watch_patterns field + rate-limit state
- _check_watch_patterns() in ProcessRegistry scans new output chunks
from all three reader threads (local, PTY, env-poller)
- Rate limited: max 8 notifications per 10s window
- Sustained overload (45s) permanently disables watching for that process
- watch_queue alongside completion_queue, same consumption pattern
- CLI drains watch_queue in both idle loop and post-turn drain
- Gateway drains after agent runs via _inject_watch_notification()
- Checkpoint persistence + crash recovery includes watch_patterns
- Blocked in execute_code sandbox (like other bg params)
- 20 new tests covering matching, rate limiting, overload kill,
checkpoint persistence, schema, and handler passthrough
Usage:
terminal(
command='npm run dev',
background=true,
watch_patterns=['ERROR', 'WARN', 'listening on port']
)
* refactor: merge watch_queue into completion_queue
Unified queue with 'type' field distinguishing 'completion',
'watch_match', and 'watch_disabled' events. Extracted
_format_process_notification() in CLI and gateway to handle
all event types in a single drain loop. Removes duplication
across both CLI drain sites and the gateway.
Three fixes for vision_analyze returning cryptic 400 "Invalid request data":
1. Pre-flight base64 size check — base64 inflates data ~33%, so a 3.8 MB
file exceeds the 5 MB API limit. Reject early with a clear message
instead of letting the provider return a generic 400.
2. Handle file:// URIs — strip the scheme and resolve as a local path.
Previously file:///path/to/image.png fell through to the "invalid
image source" error since it matched neither is_file() nor http(s).
3. Separate invalid_request errors from "does not support vision" errors
so the user gets actionable guidance (resize/compress/retry) instead
of a misleading "model does not support vision" message.
Closes#6677
vision_tools.py: _download_image() loads the full HTTP response body into
memory via response.content (line 190) with no Content-Length check and no
max file size limit. An attacker-hosted multi-gigabyte file causes OOM.
Add a 50 MB hard cap: check Content-Length header before download, and
verify actual body size before writing to disk.
hermes_parser.py: tc_data["name"] at line 57 raises KeyError when the LLM
outputs a tool call JSON without a "name" field. The outer except catches
it silently, causing the entire tool call to be lost with zero diagnostics.
Add "name" field validation before constructing the ChatCompletionMessage.
mistral_parser.py: tc["name"] at line 101 has the same KeyError issue in
the pre-v11 format path. The fallback decoder (line 112) already checks
"name" correctly, but the primary path does not. Add validation to match.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
process_registry.py: _reader_loop() has process.wait() after the try-except
block (line 380). If the reader thread crashes with an unexpected exception
(e.g. MemoryError, KeyboardInterrupt), control exits the except handler but
skips wait() — leaving the child as a zombie process. Move wait() and the
cleanup into a finally block so the child is always reaped.
cron/scheduler.py: _run_job_script() only redacts secrets in stdout on the
SUCCESS path (line 417-421). When a cron script fails (non-zero exit), both
stdout and stderr are returned WITHOUT redaction (lines 407-413). A script
that accidentally prints an API key to stderr during a failure would leak it
into the LLM context. Move redaction before the success/failure branch so
both paths benefit.
skill_commands.py: _build_skill_message() enumerates supporting files using
rglob("*") but only checks is_file() (line 171) without filtering symlinks.
PR #6693 added symlink protection to scan_skill_commands() but missed this
function. A malicious skill can create symlinks in references/ pointing to
arbitrary files, exposing their paths (and potentially content via skill_view)
to the LLM. Add is_symlink() check to match the guard in scan_skill_commands.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
_discover_bundled_skills() used the directory name to identify skills,
but skills_tool.py and skills_hub.py use the `name:` field from SKILL.md
frontmatter. This mismatch caused 9 builtin skills whose directory name
differs from their SKILL.md name to be written to .bundled_manifest
under the wrong key, so `hermes skills list` showed them as "local"
instead of "builtin".
Read the frontmatter name field (with directory-name fallback) so the
manifest keys match what the rest of the codebase expects.
Closes#6835
- Remove unreachable `if not content_sample` branch inside the truthy
`if content_sample` block in `_is_likely_binary()` (dead code that
could never execute).
- Replace `linter_cmd.format(file=...)` with `linter_cmd.replace("{file}", ...)`
in `_check_lint()` so file paths containing curly braces (e.g.
`src/{test}.py`) no longer raise KeyError/ValueError.
- Add 16 unit tests covering both fixes and edge cases.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add delegation.reasoning_effort config key so subagents can run at a
different thinking level than the parent agent. When set, overrides
the parent's reasoning_config; when empty, inherits as before.
Valid values: xhigh, high, medium, low, minimal, none (disables thinking).
Config path: delegation.reasoning_effort in config.yaml
Files changed:
- tools/delegate_tool.py: resolve override in _build_child_agent
- hermes_cli/config.py: add reasoning_effort to DEFAULT_CONFIG
- tests/tools/test_delegate.py: 4 new tests covering all cases
warnings.warn() is suppressed/invisible when running as a gateway
or agent. Switch to logger.warning() so the disk cap message
actually appears in logs.
Fixes#7362 (item 3).
FileSyncManager now accepts an optional bulk_upload_fn callback.
When provided, all changed files are uploaded in one call instead
of iterating one-by-one with individual HTTP POSTs.
DaytonaEnvironment wires this to sandbox.fs.upload_files() which
batches everything into a single multipart POST — ~580 files goes
from ~5 min to <2s on init.
Parent directories are pre-created in one mkdir -p call.
Fixes#7362 (item 1).
When two gateway messages arrived concurrently, _set_session_env wrote
HERMES_SESSION_PLATFORM/CHAT_ID/CHAT_NAME/THREAD_ID into the process-global
os.environ. Because asyncio tasks share the same process, Message B would
overwrite Message A's values mid-flight, causing background-task notifications
and tool calls to route to the wrong thread/chat.
Replace os.environ with Python's contextvars.ContextVar. Each asyncio task
(and any run_in_executor thread it spawns) gets its own copy, so concurrent
messages never interfere.
Changes:
- New gateway/session_context.py with ContextVar definitions, set/clear/get
helpers, and os.environ fallback for CLI/cron/test backward compatibility
- gateway/run.py: _set_session_env returns reset tokens, _clear_session_env
accepts them for proper cleanup in finally blocks
- All tool consumers updated: cronjob_tools, send_message_tool, skills_tool,
terminal_tool (both notify_on_complete AND check_interval blocks), tts_tool,
agent/skill_utils, agent/prompt_builder
- Tests updated for new contextvar-based API
Fixes#7358
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Call child.close() in the _run_single_child finally block after
unregistering the child from the parent's active children list.
Previously child AIAgent instances were only removed from the tracking
list but never had their resources released — the OpenAI/httpx client
and any tool subprocesses relied entirely on garbage collection.
Ref: #7131
- Bug 1: replace read_file(limit=10000) with read_file_raw in _apply_update,
preventing silent truncation of files >2000 lines and corruption of lines
>2000 chars; add read_file_raw to FileOperations abstract interface and
ShellFileOperations
- Bug 2: split apply_v4a_operations into validate-then-apply phases; if any
hunk fails validation, zero writes occur (was: continue after failure,
leaving filesystem partially modified)
- Bug 3: parse_v4a_patch now returns an error for begin-marker-with-no-ops,
empty file paths, and moves missing a destination (was: always returned
error=None)
- Bug 4: raise strategy 7 (block anchor) single-candidate similarity threshold
from 0.10 to 0.50, eliminating false-positive matches in repetitive code
- Bug 5: add _strategy_unicode_normalized (new strategy 7) with position
mapping via _build_orig_to_norm_map; smart quotes and em-dashes in
LLM-generated patches now match via strategies 1-6 before falling through
to fuzzy strategies
- Bug 6: extend fuzzy_find_and_replace to return 4-tuple (content, count,
error, strategy); update all 5 call sites across patch_parser.py,
file_operations.py, and skill_manager_tool.py
- Bug 7: guard in _apply_update returns error when addition-only context hint
is ambiguous (>1 occurrences); validation phase errors on both 0 and >1
- Bug 8: _apply_delete returns error (not silent success) on missing file
- Bug 9: _validate_operations checks source existence and destination absence
for MOVE operations before any write occurs
Three locations perform `int()` conversion on environment variables or
HTTP headers without error handling, causing unhandled `ValueError` crashes
when the values are non-numeric:
1. `send_message_tool.py` — `EMAIL_SMTP_PORT` env var parsed outside the
try/except block; a non-numeric value crashes `_send_email()` instead
of returning a user-friendly error.
2. `process_registry.py` — `TERMINAL_TIMEOUT` env var parsed without
protection; a non-numeric value crashes the `wait()` method.
3. `skills_hub.py` — HTTP `Retry-After` header can contain date strings
per RFC 7231; `int()` conversion crashes on non-numeric values.
All three now fall back to their default values on `ValueError`/`TypeError`.
Two issues with sandbox container spawning:
1. PID 1 was `sleep 2h` which doesn't call wait() — every background
process that exited became a zombie (<defunct>), and the process
tool reported them as "running" because zombie PIDs still exist in
the process table. Fix: add --init to docker run, which uses
tini (Docker) or catatonit (Podman) as PID 1 to reap children
automatically. Both runtimes support --init natively.
2. The fixed 2-hour lifetime was arbitrary and sometimes too short
for long agent sessions. Fix: replace 'sleep 2h' with
'sleep infinity'. The idle reaper (_cleanup_inactive_envs, gated
by terminal.lifetime_seconds, default 300s) already handles
cleanup based on last activity timestamp — there's no need for
the container itself to have a fixed death timer.
Fixes#6908.
`delegate_task` silently truncated batch tasks to 3 — the model sends
5 tasks, gets results for 3, never told 2 were dropped. Now returns a
clear tool_error explaining the limit and how to fix it.
The limit is configurable via:
- delegation.max_concurrent_children in config.yaml (priority 1)
- DELEGATION_MAX_CONCURRENT_CHILDREN env var (priority 2)
- default: 3
Uses the same _load_config() path as the rest of delegate_task for
consistent config priority. Clamps to min 1, warns on non-integer
config values.
Also removes the hardcoded maxItems: 3 from the JSON schema — the
schema was blocking the model from even attempting >3 tasks before
the runtime check could fire. The runtime check gives a much more
actionable error message.
Backwards compatible: default remains 3, existing configs unchanged.
Isolate system tool configs (git, ssh, gh, npm) per profile by injecting
a per-profile HOME into subprocess environments only. The Python
process's own os.environ['HOME'] and Path.home() are never modified,
preserving all existing profile infrastructure.
Activation is directory-based: when {HERMES_HOME}/home/ exists on disk,
subprocesses see it as HOME. The directory is created automatically for:
- Docker: entrypoint.sh bootstraps it inside the persistent volume
- Named profiles: added to _PROFILE_DIRS in profiles.py
Injection points (all three subprocess env builders):
- tools/environments/local.py _make_run_env() — foreground terminal
- tools/environments/local.py _sanitize_subprocess_env() — background procs
- tools/code_execution_tool.py child_env — execute_code sandbox
Single source of truth: hermes_constants.get_subprocess_home()
Closes#4426
hermes skills browse ran all 7 source adapters serially with no overall
timeout and no progress indicator. On a cold cache, GitHubSource alone
could make 100+ sequential HTTP calls (directory listing + inspect per
skill per tap), taking 5+ minutes with no output — appearing to hang.
Changes:
- Add parallel_search_sources() in tools/skills_hub.py that runs all
source adapters concurrently via ThreadPoolExecutor with a 30s
overall timeout. Sources that finish in time contribute results;
slow ones are skipped gracefully with a visible notice.
- Update unified_search() to use parallel_search_sources() internally.
- Update do_browse() and do_search() in hermes_cli/skills_hub.py to
show a Rich spinner while fetching, so the user sees activity.
- Bump per-source limits (clawhub 50→500, lobehub 50→500, etc.) now
that fetching is parallel — yields far more results per browse.
- Report timed-out sources and suggest re-running for cached results.
- Replace 'inspect/install' footer with 'search deeper' tip.
Worst-case latency drops from 5+ minutes (serial) to ~30s (parallel
with timeout cap). Result count should jump from ~242 to 1000+.
When delegate_task runs, the parent agent's activity tracker freezes
because child.run_conversation() blocks and the child's own
_touch_activity() never propagates back to the parent. The gateway
inactivity timeout then fires a spurious 'No activity' warning and
eventually kills the agent, even though the subagent is actively working.
Fix: add a heartbeat thread in _run_single_child that calls
parent._touch_activity() every 30 seconds with detail from the child's
activity summary (current tool, iteration count). The thread is a daemon
that starts before child.run_conversation() and is cleaned up in the
finally block.
This also improves the gateway 'Still working...' status messages —
instead of just 'running: delegate_task', users now see what the
subagent is actually doing (e.g., 'delegate_task: subagent running
terminal (iteration 5/50)').
- Remove sys.path.insert hack (leftover from standalone dev)
- Add token lock (acquire_scoped_lock/release_scoped_lock) in
connect()/disconnect() to prevent duplicate pollers across profiles
- Fix get_connected_platforms: WEIXIN check must precede generic
token/api_key check (requires both token AND account_id)
- Add WEIXIN_HOME_CHANNEL_NAME to _EXTRA_ENV_KEYS
- Add gateway setup wizard with QR login flow
- Add platform status check for partially configured state
- Add weixin.md docs page with full adapter documentation
- Update environment-variables.md reference with all 11 env vars
- Update sidebars.ts to include weixin docs page
- Wire all gateway integration points onto current main
Salvaged from PR #6747 by Zihan Huang.
Four gaps in DANGEROUS_PATTERNS found by running 10 targeted tests that
each mapped to a specific pattern in approval.py and checked whether the
documented defense actually held.
1. **Heredoc script injection** — `python3 << 'EOF'` bypasses the
existing `-e`/`-c` flag pattern. Adds pattern for interpreter + `<<`
covering python{2,3}, perl, ruby, node.
2. **PID expansion self-termination** — `kill -9 $(pgrep hermes)` is
opaque to the existing `pkill|killall` + name pattern because command
substitution is not expanded at detection time. Adds structural
patterns matching `kill` + `$(pgrep` and backtick variants.
3. **Git destructive operations** — `git reset --hard`, `push --force`,
`push -f`, `clean -f*`, and `branch -D` were entirely absent.
Note: `branch -d` also triggers because IGNORECASE is global —
acceptable since -d is still a delete, just a safe one, and the
prompt is only a confirmation, not a hard block.
4. **chmod +x then execute** — two-step social engineering where a
script containing dangerous commands is first written to disk (not
checked by write_file), then made executable and run as `./script`.
Pattern catches `chmod +x ... [;&|]+ ./` combos. Does not solve the
deeper architectural issue (write_file not checking content) — that
is called out in the PR description as a known limitation.
Tests: 23 new cases across 4 test classes, all in test_approval.py:
- TestHeredocScriptExecution (7 cases, incl. regressions for -c)
- TestPgrepKillExpansion (5 cases, incl. safe kill PID negative)
- TestGitDestructiveOps (8 cases, incl. safe git status/push negatives)
- TestChmodExecuteCombo (3 cases, incl. safe chmod-only negative)
Full suite: 146 passed, 0 failed.
Follow-up to Dusk1e's PR #7120 (Slack send_image redirect guard):
- Rename _safe_url_for_log -> safe_url_for_log (drop underscore) since
it is now imported cross-module by the Slack adapter
- Add _ssrf_redirect_guard httpx event hook to cache_image_from_url()
and cache_audio_from_url() in base.py — same pattern as vision_tools
and the Slack adapter fix
- Update url_safety.py docstring to reflect broader coverage
- Add regression tests for image/audio redirect blocking + safe passthrough
When kill_process() sends SIGTERM, both it and the reader thread race
to call _move_to_finished() — kill_process sets exit_code=-15 and
enqueues a notification, then the reader thread's process.wait()
returns with exit_code=143 (128+SIGTERM) and enqueues a second one.
Fix: make _move_to_finished() idempotent by tracking whether the
session was actually removed from _running. The second call sees it
was already moved and skips the completion_queue.put().
Adds regression test: test_move_to_finished_idempotent_no_duplicate
Automated dead code audit using vulture + coverage.py + ast-grep intersection,
confirmed by Opus deep verification pass. Every symbol verified to have zero
production callers (test imports excluded from reachability analysis).
Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines
of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py,
hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py).
Co-authored-by: alt-glitch <balyan.sid@gmail.com>
When an MCP server returns both content (model-oriented text) and
structuredContent (machine-oriented JSON), the client now combines
them instead of discarding content. The text content becomes the
primary result (what the agent reads), and structuredContent is
included as supplementary metadata.
Previously, structuredContent took full precedence — causing data
loss for servers like Desktop Commander that put the actual file
text in content and metadata in structuredContent.
MCP spec guidance: for conversational/agent UX, prefer content.
Legacy flat stt.model config key (from cli-config.yaml.example and older
versions) was passed as a model override to transcribe_audio() by the
gateway, bypassing provider-specific model resolution. When the provider
was 'local' (faster-whisper), this caused:
ValueError: Invalid model size 'whisper-1'
Changes:
- gateway/run.py, discord.py: stop passing model override — let
transcribe_audio() handle provider-specific model resolution internally
- get_stt_model_from_config(): now provider-aware, reads from the correct
nested section (stt.local.model, stt.openai.model, etc.); ignores
legacy flat key for local provider to prevent model name mismatch
- cli-config.yaml.example: updated STT section to show nested provider
config structure instead of legacy flat key
- config migration v13→v14: moves legacy stt.model to the correct
provider section and removes the flat key
Reported by community user on Discord.
Add Discord thread support to cron delivery and send_message_tool.
- _parse_target_ref: handle discord platform with chat_id:thread_id format
- _send_discord: add thread_id param, route to /channels/{thread_id}/messages
- _send_to_platform: pass thread_id through for Discord
- Discord adapter send(): read thread_id from metadata for gateway path
- Update tool schema description to document Discord thread targets
Cherry-picked from PR #7046 by pandacooming (maxyangcn).
Follow-up fixes:
- Restore proxy support (resolve_proxy_url/proxy_kwargs_for_aiohttp) that was
accidentally deleted — would have caused NameError at runtime
- Remove duplicate _DISCORD_TARGET_RE regex; reuse existing _TELEGRAM_TOPIC_TARGET_RE
via _NUMERIC_TOPIC_RE alias (identical pattern)
- Fix misleading test comments about Discord negative snowflake IDs
(Discord uses positive snowflakes; negative IDs are a Telegram convention)
- Rewrite misleading scheduler test that claimed to exercise home channel
fallback but actually tested the explicit platform:chat_id parsing path
The same .* pattern vulnerable to newline bypass that was fixed in
prompt_builder.py (PR #6925) also existed in skills_guard.py. Changed
to [\s\S]*? to match across newlines.
prompt_builder.py: The `hidden_div` detection pattern uses `.*` which does not
match newlines in Python regex (re.DOTALL is not passed). An attacker can bypass
detection by splitting the style attribute across lines:
`<div style="color:red;\ndisplay: none">injected content</div>`
Replace `.*` with `[\s\S]*?` to match across line boundaries.
credential_files.py: `_load_config_files()` catches all exceptions at DEBUG level
(line 171), making YAML parse failures invisible in production logs. Users whose
credential files silently fail to mount into sandboxes have no diagnostic clue.
Promote to WARNING to match the severity pattern used by the path validation
warnings at lines 150 and 158 in the same function.
webhook.py: `_reload_dynamic_routes()` logs JSON parse failures at WARNING (line
265) but the impact — stale/corrupted dynamic routes persisting silently — warrants
ERROR level to ensure operator visibility in alerting pipelines.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Three silent `except Exception` blocks in approval.py (lines 345, 387, 469) return
fallback values with zero logging — making it impossible to debug callback failures,
allowlist load errors, or config read issues. Add logger.warning/error calls that
match the pattern already used by save_permanent_allowlist() and _smart_approve()
in the same file.
In mcp_oauth.py, narrow the overly-broad `except Exception` in get_tokens() and
get_client_info() to the specific exceptions Pydantic's model_validate() can raise
(ValueError, TypeError, KeyError), and include the exception message in the warning.
Also wrap the _wait_for_callback() polling loop in try/finally so the HTTPServer is
always closed — previously an asyncio.CancelledError or any exception in the loop
would leak the server socket.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
error_classifier.py: Message-only auth errors ("invalid api key", "unauthorized",
etc.) were classified as retryable=True (line 707), inconsistent with the HTTP 401
path (line 432) which correctly uses retryable=False + should_fallback=True. The
mismatch causes 3 wasted retries with the same broken credential before fallback,
while 401 errors immediately attempt fallback. Align the message-based path to
match: retryable=False, should_fallback=True.
web_tools.py: The _PREFIX_RE secret-detection check in web_extract_tool() runs
against the raw URL string (line 1196). URL-encoded secrets like %73k-1234... (
sk-1234...) bypass the filter because the regex expects literal ASCII. Add
urllib.parse.unquote() before the check so percent-encoded variants are also caught.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Replace per-backend ad-hoc file sync with a shared FileSyncManager
that handles mtime-based change detection, remote deletion of
locally-removed files, and transactional state updates.
- New FileSyncManager class (tools/environments/file_sync.py)
with callbacks for upload/delete, rate limiting, and rollback
- Shared iter_sync_files() eliminates 3 duplicate implementations
- SSH: replace unconditional rsync with scp + mtime skip
- Modal/Daytona: replace inline _synced_files dict with manager
- All 3 backends now sync credentials + skills + cache uniformly
- Remote deletion: files removed locally are cleaned from remote
- HERMES_FORCE_FILE_SYNC=1 env var for debugging
- Base class _before_execute() simplified to empty hook
- 12 unit tests covering mtime skip, deletion, rollback, rate limiting
Change behavior from silent clamping to returning an error when the
model requests a foreground timeout exceeding FOREGROUND_MAX_TIMEOUT.
This forces the model to use background=true for long-running commands
rather than silently changing its intent.
- Config default timeouts above the cap are NOT rejected (user's choice)
- Only explicit model-requested timeouts trigger rejection
- Added boundary test for timeout exactly at the limit
When the model calls terminal() in foreground mode without background=true
(e.g. to start a server), the tool call blocks until the command exits or
the timeout expires. Without an upper bound the model can request arbitrarily
high timeouts (the schema had minimum=1 but no maximum), blocking the entire
agent session for hours until the gateway idle watchdog kills it.
Changes:
- Add FOREGROUND_MAX_TIMEOUT (600s, configurable via
TERMINAL_MAX_FOREGROUND_TIMEOUT env var) that caps foreground timeout
- Clamp effective_timeout to the cap when background=false and timeout
exceeds the limit
- Include a timeout_note in the tool result when clamped, nudging the
model to use background=true for long-running processes
- Update schema description to show the max timeout value
- Remove dead clamping code in the background branch that could never
fire (max_timeout was set to effective_timeout, so timeout > max_timeout
was always false)
- Add 7 tests covering clamping, no-clamping, config-default-exceeds-cap
edge case, background bypass, default timeout, constant value, and
schema content
Self-review fixes:
- Fixed bug where timeout_note said 'Requested timeout Nones' when
clamping fired from config default exceeding cap (timeout param is
None). Now uses unclamped_timeout instead of the raw timeout param.
- Removed unused pytest import from test file
- Extracted test config dict into _make_env_config() helper
- Fixed tautological test_default_value assertion
- Added missing test for config default > cap with no model timeout
Replace 6 identical copies of the Termux detection function across
cli.py, browser_tool.py, voice_mode.py, status.py, doctor.py, and
gateway.py with a single shared implementation in hermes_constants.py.
Each call site imports with its original local name to preserve all
existing callers (internal references and test monkeypatches).
When managed_persistence is enabled, cleanup_all now only clears local
tracking state without sending DELETE requests to the Camofox server.
This prevents persistent browser profiles (cookies, logins, localStorage)
from being destroyed during process-wide cleanup.
Ephemeral sessions still get full server-side deletion as before.
Follow-up improvements on top of the shared resolver from PR #6562:
- Add platform_env_var parameter to resolve_proxy_url() so DISCORD_PROXY
takes priority over generic HTTPS_PROXY/ALL_PROXY env vars
- Add SOCKS proxy support via aiohttp_socks.ProxyConnector with rdns=True
(critical for GFW/Shadowrocket/Clash users — issue #6649)
- proxy_kwargs_for_bot() returns connector= for SOCKS, proxy= for HTTP
- proxy_kwargs_for_aiohttp() returns split (session_kw, request_kw) for
standalone aiohttp sessions
- Add proxy support to send_message_tool.py (Discord REST, Slack, SMS)
for cron job delivery behind proxies (from PR #2208)
- Add proxy support to Discord image/document downloads
- Fix duplicate import sys in base.py
Fixes blockquote > escaping, edit_message raw markdown, ***bold italic***
handling, HTML entity double-escaping (&amp;), Wikipedia URL parens
truncation, and step numbering format. Also adds format_message to the
tool-layer _send_to_platform for consistent formatting across all
delivery paths.
Changes:
- Protect Slack entities (<@user>, <https://...|label>, <!here>) from
escaping passes
- Protect blockquote > markers before HTML entity escaping
- Unescape-before-escape for idempotent HTML entity handling
- ***bold italic*** → *_text_* conversion (before **bold** pass)
- URL regex upgraded to handle balanced parentheses
- mrkdwn:True flag on chat_postMessage payloads
- format_message applied in edit_message and send_message_tool
- 52 new tests (format, edit, streaming, splitting, tool chunking)
- Use reversed(dict) idiom for placeholder restoration
Based on PR #3715 by dashed, cherry-picked onto current main.
`_cleanup_task_resources` was unconditionally calling `cleanup_vm()` at
the end of every `run_conversation` (i.e. every user turn), tearing down
the docker/daytona/modal sandbox container regardless of its
`persistent_filesystem` setting. This contradicted the documented intent
of `terminal.lifetime_seconds` (idle reaper) and `container_persistent`,
and caused per-turn loss of `/workspace`, `~/.config`, agent CLI auth
state, and any other content living inside the sandbox.
The unconditional teardown was introduced in fbd3a2fd ("prevent leakage
of morph instances between tasks", 2025-11-04) to plug a Morph backend
leak, two days after `lifetime_seconds` shipped in faecbddd. It was
later refactored into `_cleanup_task_resources` in 70dd3a16 without
changing semantics. Code and docs have disagreed since.
Fix: introduce `terminal_tool.is_persistent_env(task_id)` and skip the
per-turn `cleanup_vm` when the active env is persistent. The idle reaper
(`_cleanup_inactive_envs`) still tears persistent envs down once
`terminal.lifetime_seconds` is exceeded. Non-persistent backends (Morph)
are unchanged — still torn down per turn, preserving the original
leak-prevention intent.
Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:
- model_metadata: sync HF context length key casing
(minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)
- cli.py: route quick command error output through self.console
instead of creating a new ChatConsole() instance
- docker.py: explicit docker_forward_env entries now bypass the
Hermes secret blocklist (intentional opt-in wins over generic filter)
- auxiliary_client: revert _read_main_provider() to simple
provider.strip().lower() — the _normalize_aux_provider() call
introduced in 5c03f2e7 stripped the custom: prefix, breaking
named custom provider resolution
- auxiliary_client: flip vision auto-detection order to
active provider → OpenRouter → Nous → stop (was OR → Nous → active)
- test: update vision priority test to match new order
Based on PR #6219-#6222 by xinbenlv.
mistralai v2.x is a namespace package — `Mistral` class lives at
`mistralai.client`, not at the top-level `mistralai` module. The
previous `from mistralai import Mistral` raises ImportError at runtime.
Update both production code and test fixture to use the correct path.
* fix(tools): skip camofox auto-cleanup when managed persistence is enabled
When managed_persistence is enabled, cleanup_browser() was calling
camofox_close() which destroys the server-side browser context via
DELETE /sessions/{userId}, killing login sessions across cron runs.
Add camofox_soft_cleanup() — a public wrapper that drops only the
in-memory session entry when managed persistence is on, returning True.
When persistence is off it returns False so the caller falls back to
the full camofox_close(). The inactivity reaper still handles idle
resource cleanup.
Also surface a logger.warning() when _managed_persistence_enabled()
fails to load config, replacing a silent except-and-return-False.
Salvaged from #6182 by el-analista (Eduardo Perea Fernandez).
Added public API wrapper to avoid cross-module private imports,
and test coverage for both persistence paths.
Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>
* fix(doctor): only check the active memory provider, not all providers unconditionally
hermes doctor had hardcoded Honcho Memory and Mem0 Memory sections that
always ran regardless of the user's memory.provider config setting. After
the swappable memory provider update (#4623), users with leftover Honcho
config but no active provider saw false 'broken' errors.
Replaced both sections with a single Memory Provider section that reads
memory.provider from config.yaml and only checks the configured provider.
Users with no external provider see a green 'Built-in memory active' check.
Reported by community user michaelruiz001, confirmed by Eri (Honcho).
---------
Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>
Previously crash recovery recreated detached sessions as if they were
fully managed, so polls and kills could lie about liveness and the
checkpoint could forget recovered jobs after the next restart.
This commit refreshes recovered host-backed sessions from real PID
state, keeps checkpoint data durable, and preserves notify watcher
metadata while treating sandbox-only PIDs as non-recoverable.
- Persist `pid_scope` in `tools/process_registry.py` and skip
recovering sandbox-backed entries without a host-visible PID handle
- Refresh detached sessions on access so `get`/`poll`/`wait` and active
session queries observe exited processes instead of hanging forever
- Allow recovered host PIDs to be terminated honestly and requeue
`notify_on_complete` watchers during checkpoint recovery
- Add regression tests for durable checkpoints, detached exit/kill
behavior, sandbox skip logic, and recovered notify watchers
- DEFAULT_RESULT_SIZE_CHARS: 50K -> 100K (match current _LARGE_RESULT_CHARS)
- DEFAULT_PREVIEW_SIZE_CHARS: 2K -> 1.5K (match current _LARGE_RESULT_PREVIEW_CHARS)
- Per-tool overrides all set to 100K (terminal, execute_code, search_files)
- Remove pre-read byte guard (no behavioral regression vs current main)
- Revert limit signature change to int=500 (match current default)
- Restore original read_file schema description
- Update test assertions to match 100K thresholds
Add BudgetConfig dataclass to centralize and make overridable the
hardcoded constants (50K per-result, 200K per-turn, 2K preview) that
control when tool outputs get persisted to sandbox. Configurable at
the RL environment level via HermesAgentEnvConfig fields, threaded
through HermesAgentLoop to the storage layer.
Resolution: pinned (read_file=inf) > env config overrides > registry
per-tool > default. CLI override: --env.turn_budget_chars 80000
_deliver_result() now returns Optional[str] — None on success, error
message on failure. All failure paths (unknown platform, platform
disabled, config load error, send failure, unresolvable target)
return descriptive error strings.
mark_job_run() gains delivery_error param, tracked as
last_delivery_error on the job — separate from agent execution errors.
A job where the agent succeeded but delivery failed shows
last_status='ok' + last_delivery_error='...'.
The cronjob list tool now surfaces last_delivery_error so agents and
users can see when cron outputs aren't arriving.
Inspired by PR #5863 (oxngon) — reimplemented with proper wiring.
Tests: 3 new mark_job_run tests + 6 new _deliver_result return tests.
- The MCP SDK Pydantic model uses camelCase (structuredContent), not
snake_case (structured_content). The original getattr was a silent no-op.
- When structuredContent is present, return it AS the result instead of
alongside text — the structured payload is the machine-readable data.
- Move test file to tests/tools/ and fix fake class to use camelCase.
- Patch _run_on_mcp_loop in tests so the handler actually executes.
MCP CallToolResult may include structured_content (a JSON object) alongside
content blocks. The tool handler previously only forwarded concatenated text
from content blocks, silently dropping the structured payload.
This breaks MCP tools that return a minimal human text in content while
putting the actual machine-usable payload in structured_content.
Now, when structured_content is present, it is included in the returned
JSON under the 'structuredContent' key.
FixesNousResearch/hermes-agent#5874
- Add 7 unit tests for _profile_arg: default home, named profile,
hash path, nested path, invalid name, systemd integration, launchd integration
- Add stt.local.language to config.yaml (empty = auto-detect)
- Both STT code paths now read config.yaml first, env var fallback,
then default (auto-detect for faster-whisper, 'en' for CLI command)
- HERMES_LOCAL_STT_LANGUAGE env var still works as backward-compat fallback
- gateway/run.py: Strip "(The user sent a message with no text content)"
placeholder when voice transcription succeeds — it was being appended
alongside the transcript, creating duplicate user turns.
- tools/transcription_tools.py: Wire HERMES_LOCAL_STT_LANGUAGE env var
into the faster-whisper backend. It was only used by the CLI fallback
path (_transcribe_local_command), not the primary faster-whisper path.
Replace 10 callsites across 6 files that manually opened config.yaml,
called yaml.safe_load(), and handled missing-file/parse-error fallbacks
with the new read_raw_config() helper from hermes_cli/config.py.
Each migrated site previously had 5-8 lines of boilerplate:
config_path = get_hermes_home() / 'config.yaml'
if config_path.exists():
import yaml
with open(config_path) as f:
cfg = yaml.safe_load(f) or {}
Now reduced to:
from hermes_cli.config import read_raw_config
cfg = read_raw_config()
Migrated files:
- tools/browser_tool.py (4 sites): command_timeout, cloud_provider,
allow_private_urls, record_sessions
- tools/env_passthrough.py: terminal.env_passthrough
- tools/credential_files.py: terminal.credential_files
- tools/transcription_tools.py: stt.model
- hermes_cli/commands.py: config-gated command resolution
- hermes_cli/auth.py (2 sites): model config read + provider reset
Skipped (intentionally):
- gateway/run.py: 10+ sites with local aliases, critical path
- hermes_cli/profiles.py: profile-specific config path
- hermes_cli/doctor.py: reads raw then writes fixes back
- agent/model_metadata.py: different file (context_length_cache.yaml)
- tools/website_policy.py: custom config_path param + error types
* refactor: re-architect tests to mirror the codebase
* Update tests.yml
* fix: add missing tool_error imports after registry refactor
* fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist
patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers,
causing test_code_execution tests to hit the Modal remote path.
* fix(tests): fix update_check and telegram xdist failures
- test_update_check: replace patch("hermes_cli.banner.os.getenv") with
monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os
directly, it uses get_hermes_home() from hermes_constants.
- test_telegram_conflict/approval_buttons: provide real exception classes
for telegram.error mock (NetworkError, TimedOut, BadRequest) so the
except clause in connect() doesn't fail with "catching classes that do
not inherit from BaseException" when xdist pollutes sys.modules.
* fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock
16 callsites across 14 files were re-deriving the hermes home path
via os.environ.get('HERMES_HOME', ...) instead of using the canonical
get_hermes_home() from hermes_constants. This breaks profiles — each
profile has its own HERMES_HOME, and the inline fallback defaults to
~/.hermes regardless.
Fixed by importing and calling get_hermes_home() at each site. For
files already inside the hermes process (agent/, hermes_cli/, tools/,
gateway/, plugins/), this is always safe. Files that run outside the
process context (mcp_serve.py, mcp_oauth.py) already had correct
try/except ImportError fallbacks and were left alone.
Skipped: hermes_constants.py (IS the implementation), env_loader.py
(bootstrap), profiles.py (intentionally manipulates the env var),
standalone scripts (optional-skills/, skills/), and tests.
Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture)
and manual analysis of the entire codebase.
Changes by category:
Unused imports removed (~95 across 55 files):
- Removed genuinely unused imports from all major subsystems
- agent/, hermes_cli/, tools/, gateway/, plugins/, cron/
- Includes imports in try/except blocks that were truly unused
(vs availability checks which were left alone)
Unused variables removed (~25):
- Removed dead variables: connected, inner, channels, last_exc,
source, new_server_names, verify, pconfig, default_terminal,
result, pending_handled, temperature, loop
- Dropped unused argparse subparser assignments in hermes_cli/main.py
(12 instances of add_parser() where result was never used)
Dead code removed:
- run_agent.py: Removed dead ternary (None if False else None) and
surrounding unreachable branch in identity fallback
- run_agent.py: Removed write-only attribute _last_reported_tool
- hermes_cli/providers.py: Removed dead @property decorator on
module-level function (decorator has no effect outside a class)
- gateway/run.py: Removed unused MCP config load before reconnect
- gateway/platforms/slack.py: Removed dead SessionSource construction
Undefined name bugs fixed (would cause NameError at runtime):
- batch_runner.py: Added missing logger = logging.getLogger(__name__)
- tools/environments/daytona.py: Added missing Dict and Path imports
Unnecessary global statements removed (14):
- tools/terminal_tool.py: 5 functions declared global for dicts
they only mutated via .pop()/[key]=value (no rebinding)
- tools/browser_tool.py: cleanup thread loop only reads flag
- tools/rl_training_tool.py: 4 functions only do dict mutations
- tools/mcp_oauth.py: only reads the global
- hermes_time.py: only reads cached values
Inefficient patterns fixed:
- startswith/endswith tuple form: 15 instances of
x.startswith('a') or x.startswith('b') consolidated to
x.startswith(('a', 'b'))
- len(x)==0 / len(x)>0: 13 instances replaced with pythonic
truthiness checks (not x / bool(x))
- in dict.keys(): 5 instances simplified to in dict
- Redefined unused name: removed duplicate _strip_mdv2 import in
send_message_tool.py
Other fixes:
- hermes_cli/doctor.py: Replaced undefined logger.debug() with pass
- hermes_cli/config.py: Consolidated chained .endswith() calls
Test results: 3934 passed, 17 failed (all pre-existing on main),
19 skipped. Zero regressions.
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
* feat: notify_on_complete for background processes
When terminal(background=true, notify_on_complete=true), the system
auto-triggers a new agent turn when the process exits — no polling needed.
Changes:
- ProcessSession: add notify_on_complete field
- ProcessRegistry: add completion_queue, populate on _move_to_finished()
- Terminal tool: add notify_on_complete parameter to schema + handler
- CLI: drain completion_queue after agent turn AND during idle loop
- Gateway: enhanced _run_process_watcher injects synthetic MessageEvent
on completion, triggering a full agent turn
- Checkpoint persistence includes notify_on_complete for crash recovery
- code_execution_tool: block notify_on_complete in sandbox scripts
- 15 new tests covering queue mechanics, checkpoint round-trip, schema
* docs: update terminal tool descriptions for notify_on_complete
- background: remove 'ONLY for servers' language, describe both patterns
(long-lived processes AND long-running tasks with notify_on_complete)
- notify_on_complete: more prescriptive about when to use it
- TERMINAL_TOOL_DESCRIPTION: remove 'Do NOT use background for builds'
guidance that contradicted the new feature
Selectively cherry-picked from PR #5501 by MestreY0d4-Uninter.
- Add _resolve_workspace_hint() to detect parent's working directory
- Inject WORKSPACE PATH into child system prompts
- Add rule: never assume /workspace/ container paths
- Excludes the cli.py queue-busy-input changes from the original PR
Cherry-picked from PR #5580 by MestreY0d4-Uninter.
- Share parent's credential pool with child agents for key rotation
- Leasing layer spreads parallel children across keys (least-loaded)
- Thread-safe acquire_lease/release_lease in CredentialPool
- Reverted sneaked-in tool-name restoration change (kept original
getattr + isinstance guard pattern)
Two fixes:
1. Replace all stale 'hermes login' references with 'hermes auth' across
auth.py, auxiliary_client.py, delegate_tool.py, config.py, run_agent.py,
and documentation. The 'hermes login' command was deprecated; 'hermes auth'
now handles OAuth credential management.
2. Fix credential removal not persisting for singleton-sourced credentials
(device_code for openai-codex/nous, hermes_pkce for anthropic).
auth_remove_command already cleared env vars for env-sourced credentials,
but singleton credentials stored in the auth store were re-seeded by
_seed_from_singletons() on the next load_pool() call. Now clears the
underlying auth store entry when removing singleton-sourced credentials.