An agent session killed the systemd-managed gateway (PID 1605) and restarted
it with '&disown', taking it outside systemd's Restart= management. When the
orphaned process later received SIGTERM, nothing restarted it.
Add dangerous command patterns to detect:
- 'gateway run' with & (background), disown, nohup, or setsid
- These should use 'systemctl --user restart hermes-gateway' instead
Also applied directly to main repo and fixed the systemd service:
- Changed Restart=on-failure to Restart=always (clean SIGTERM = exit 0 = not
a 'failure', so on-failure never triggered)
- RestartSec=10 for reasonable restart delay
Path('~/.hermes/image.png').is_file() returns False because Path
doesn't expand tilde. This caused the tool to fall through to URL
validation, which also failed, producing a confusing error:
'Invalid image source. Provide an HTTP/HTTPS URL or a valid local
file path.'
Fix: use os.path.expanduser() before constructing the Path object.
Added two tests for tilde expansion (success and nonexistent file).
* fix(mcp-oauth): port mismatch, path traversal, and shared state in OAuth flow
Three bugs in the new MCP OAuth 2.1 PKCE implementation:
1. CRITICAL: OAuth redirect port mismatch — build_oauth_auth() calls
_find_free_port() to register the redirect_uri, but _wait_for_callback()
calls _find_free_port() again getting a DIFFERENT port. Browser redirects
to port A, server listens on port B — callback never arrives, 120s timeout.
Fix: share the port via module-level _oauth_port variable.
2. MEDIUM: Path traversal via unsanitized server_name — HermesTokenStorage
uses server_name directly in filenames. A name like "../../.ssh/config"
writes token files outside ~/.hermes/mcp-tokens/.
Fix: sanitize server_name with the same regex pattern used elsewhere.
3. MEDIUM: Class-level auth_code/state on _CallbackHandler causes data
races if concurrent OAuth flows run. Second callback overwrites first.
Fix: factory function _make_callback_handler() returns a handler class
with a closure-scoped result dict, isolating each flow.
* test: add tests for MCP OAuth path traversal, handler isolation, and port sharing
7 new tests covering:
- Path traversal blocked (../../.ssh/config stays in mcp-tokens/)
- Dots/slashes sanitized and resolved within base dir
- Normal server names preserved
- Special characters sanitized (@, :, /)
- Concurrent handler result dicts are independent
- Handler writes to its own result dict, not class-level
- build_oauth_auth stores port in module-level _oauth_port
---------
Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>
When session_search is called without a query (or with an empty query),
it now returns metadata for the most recent sessions instead of erroring.
This lets the agent quickly see what was worked on recently without
needing specific keywords.
Returns for each session: session_id, title, source, started_at,
last_active, message_count, preview (first user message).
Zero LLM cost — pure DB query. Current session lineage and child
delegation sessions are excluded.
The agent can then keyword-search specific sessions if it needs
deeper context from any of them.
Models occasionally copy ANSI escape sequences from terminal output
or display formatting into file content, breaking shebangs and
injecting binary characters into scripts.
Strip ANSI codes (CSI, OSC, simple escapes) from:
- write_file content
- patch old_string, new_string, and V4A patch content
The check is fast (skips entirely if no ESC byte present).
Reported by Andi Jaeger.
Reads auxiliary.vision.timeout from config.yaml (default: 30s) and
passes it to async_call_llm. Useful for slow local vision models
that need more than 30 seconds.
Setting is in config.yaml (not .env) since it's not a secret:
auxiliary:
vision:
timeout: 120
Based on PR #2306.
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Add hermes mcp add/remove/list/test/configure CLI for managing MCP
server connections interactively. Discovery-first 'add' flow connects,
discovers tools, and lets users select which to enable via curses checklist.
Add OAuth 2.1 PKCE authentication for MCP HTTP servers (RFC 7636).
Supports browser-based and manual (headless) authorization, token
caching with 0600 permissions, automatic refresh. Zero external deps.
Add ${ENV_VAR} interpolation in MCP server config values, resolved
from os.environ + ~/.hermes/.env at load time.
Core OAuth module from PR #2021 by @imnotdev25. CLI and mcp_tool
wiring rewritten against current main. Closes#497, #690.
When sounddevice is installed but libportaudio2 is not present on the
system, the OSError was caught together with ImportError and showed a
generic 'pip install sounddevice' message that sent users down the wrong
path.
Split the except clause to give a clear, actionable message for the
OSError case, including the correct apt/brew commands to install the
system library.
* fix: respect DashScope v1 runtime mode for alibaba
Remove the hardcoded Alibaba branch from resolve_runtime_provider()
that forced api_mode='anthropic_messages' regardless of the base URL.
Alibaba now goes through the generic API-key provider path, which
auto-detects the protocol from the URL:
- /apps/anthropic → anthropic_messages (via endswith check)
- /v1 → chat_completions (default)
This fixes Alibaba setup with OpenAI-compatible DashScope endpoints
(e.g. coding-intl.dashscope.aliyuncs.com/v1) that were broken because
runtime always forced Anthropic mode even when setup saved a /v1 URL.
Based on PR #2024 by @kshitijk4poor.
* docs(skill): add split, merge, search examples to ocr-and-documents skill
Adds pymupdf examples for PDF splitting, merging, and text search
to the existing ocr-and-documents skill. No new dependencies — pymupdf
already covers all three operations natively.
* fix: replace all production print() calls with logger in rl_training_tool
Replace all bare print() calls in production code paths with proper logger calls.
- Add `import logging` and module-level `logger = logging.getLogger(__name__)`
- Replace print() in _start_training_run() with logger.info()
- Replace print() in _stop_training_run() with logger.info()
- Replace print(Warning/Note) calls with logger.warning() and logger.info()
Using the logging framework allows log level filtering, proper formatting,
and log routing instead of always printing to stdout.
---------
Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>
Co-authored-by: memosr.eth <96793918+memosr@users.noreply.github.com>
Parse thread_id from explicit deliver target (e.g. telegram:-1003724596514:17)
and forward it to _send_to_platform and mirror_to_session.
Previously _resolve_delivery_target() always set thread_id=None when
parsing the platform:chat_id format, breaking cron job delivery to
specific Telegram topics.
Added tests:
- test_explicit_telegram_topic_target_with_thread_id
- test_explicit_telegram_chat_id_without_thread_id
Also updated CRONJOB_SCHEMA deliver description to document the
platform:chat_id:thread_id format.
Co-authored-by: Alex Ferrari <alex@thealexferrari.com>
Cherry-picked from PR #2122 by @AtlasMeridia.
1. do_inspect bytes crash: bundle.files returns bytes for official
skills, .split() expected str. Added decode guard.
2. GitHub redirects: three httpx.get calls missing follow_redirects=True,
causing silent 301 failures on renamed orgs.
3. Skill discovery fallback: scan repo root directories when standard
paths (skills/, .agents/skills/, .claude/skills/) miss.
4. tap list KeyError: t['repo'] crashes for local taps. Use safe .get().
Changes the policy for agent-created skills with critical security
findings from 'block' (silently rejected) to 'ask' (allowed with
warning logged). The agent created the skill, so blocking it entirely
is too aggressive — let it through but log the findings.
- Policy: agent-created dangerous changed from block to ask
- should_allow_install returns None for 'ask' (vs True/False)
- format_scan_report shows 'NEEDS CONFIRMATION' for ask
- skill_manager_tool.py caller handles None (allows with warning)
- force=True still overrides as before
Based on PR #2271 by redhelix (closed — 3200 lines of unrelated
Mission Control code excluded).
Two fixes:
1. Use a single open(os.devnull) handle for both stdout and stderr
suppression, preventing a file handle leak if the second open() fails.
2. Set server_sock = None after closing it in the try block to prevent
the finally block from closing it again (causing an OSError).
Closes#2136
Co-authored-by: dieutx <dangtc94@gmail.com>
Cherry-picked from PR #2201 by @Gutslabs.
session_search resolved hits to parent/root sessions but only excluded
the exact current_session_id. If the active session was a child
continuation (compression/delegation), its parent could still appear
as a 'past' conversation result.
Fix: resolve current_session_id to its lineage root before filtering,
so the entire active lineage (parent and children) is excluded.
Cherry-picked from PR #2169 by @0xbyt4.
1. _strip_provider_prefix: skip Ollama model:tag names (qwen:0.5b)
2. Fuzzy match: remove reverse direction that made claude-sonnet-4
resolve to 1M instead of 200K
3. _has_content_after_think_block: reuse _strip_think_blocks() to
handle all tag variants (thinking, reasoning, REASONING_SCRATCHPAD)
4. models.dev lookup: elif→if so nous provider also queries models.dev
5. Disk cache fallback: use 5-min TTL instead of full hour so network
is retried soon
6. Delegate build: wrap child construction in try/finally so
_last_resolved_tool_names is always restored on exception
Matrix, Mattermost, Home Assistant, and DingTalk were missing from the
platform_map in both cron/scheduler.py and tools/send_message_tool.py,
causing delivery to those platforms to silently fail.
Also updates the cronjob tool schema description to list all available
delivery targets so the model knows its options.
Cron jobs run unattended with no user present. Previously the agent had
send_message and clarify tools available, which makes no sense — the
final response is auto-delivered, and there's nobody to ask questions to.
Changes:
- Disable messaging and clarify toolsets for cron agent sessions
- Update cron platform hint to emphasize autonomous execution: no user
present, cannot ask questions, must execute fully and make decisions
- Update cronjob tool schema description to match (remove stale
send_message guidance)
Authored by Hanai. Allows overriding the OpenAI TTS endpoint via
tts.openai.base_url in config.yaml for self-hosted or OpenAI-compatible
TTS services. Falls back to api.openai.com when not set.
find_one is being deprecated. Primary lookup now uses get() with a
deterministic sandbox name (hermes-{task_id}). A legacy fallback via
list(labels=...) ensures sandboxes created before this migration are
still resumable.
The merge at e7844e9c re-introduced a line in _build_child_agent() that
references _saved_tool_names — a variable only defined in _run_single_child().
This caused NameError on every delegate_task call, completely breaking
subagent delegation.
Moves the child._delegate_saved_tool_names assignment to _run_single_child()
where _saved_tool_names is actually defined, keeping the save/restore in the
same scope as the try/finally block.
Adds two regression tests from PR #2038 (YanSte).
Also fixes the same issue reported in PR #2048 (Gutslabs).
Co-authored-by: Yannick Stephan <yannick.stephan@gmail.com>
Co-authored-by: Guts <gutslabs@users.noreply.github.com>
Allow users to configure a custom base_url for the OpenAI TTS provider
in ~/.hermes/config.yaml under tts.openai.base_url. Defaults to the
official OpenAI endpoint. Enables use of self-hosted or OpenAI-compatible
TTS services (e.g. http://localhost:8000/v1).
Also adds a TTS configuration example block to cli-config.yaml.example.
* fix: banner skill count now respects disabled skills and platform filtering
The banner's get_available_skills() was doing a raw rglob scan of
~/.hermes/skills/ without checking:
- Whether skills are disabled (skills.disabled config)
- Whether skills match the current platform (platforms: frontmatter)
This caused the banner to show inflated skill counts (e.g. '100 skills'
when many are disabled) and list macOS-only skills on Linux.
Fix: delegate to _find_all_skills() from tools/skills_tool which already
handles both platform gating and disabled-skill filtering.
* fix: system prompt and slash commands now respect disabled skills
Two more places where disabled skills were still surfaced:
1. build_skills_system_prompt() in prompt_builder.py — disabled skills
appeared in the <available_skills> system prompt section, causing
the agent to suggest/load them despite being disabled.
2. scan_skill_commands() in skill_commands.py — disabled skills still
registered as /skill-name slash commands in CLI help and could be
invoked.
Both now load _get_disabled_skill_names() and filter accordingly.
* fix: skill_view blocks disabled skills
skill_view() checked platform compatibility but not disabled state,
so the agent could still load and read disabled skills directly.
Now returns a clear error when a disabled skill is requested, telling
the user to enable it via hermes skills or inspect the files manually.
---------
Co-authored-by: Test <test@test.com>
Each configured MCP server now registers as its own toolset in TOOLSETS
(e.g. TOOLSETS['github'] = {tools: ['mcp_github_list_files', ...]}),
making raw server names resolvable in platform_toolsets overrides.
Previously MCP tools were only injected into hermes-* umbrella toolsets,
so gateway sessions using raw toolset names like ['terminal', 'github']
in platform_toolsets couldn't resolve MCP tools.
Skips server names that collide with built-in toolsets. Also handles
idempotent reloads (syncs toolsets even when no new servers connect).
Inspired by PR #1876 by @kshitijk4poor.
Adds 2 tests (standalone toolset creation + built-in collision guard).
MiniMax: Add M2.7 and M2.7-highspeed as new defaults across provider
model lists, auxiliary client, metadata, setup wizard, RL training tool,
fallback tests, and docs. Retain M2.5/M2.1 as alternatives.
OpenRouter: Add grok-4.20-beta, nemotron-3-super-120b-a12b:free,
trinity-large-preview:free, glm-5-turbo, and hunter-alpha to the
model catalog.
MiniMax changes based on PR #1882 by @octo-patch (applied manually
due to stale conflicts in refactored pricing module).
Fixes#1802
The v0.3.0 refactor split child agent construction (_build_child_agent)
and execution (_run_single_child) into separate functions. This created
a scope bug where _saved_tool_names was defined in _build_child_agent
but referenced in _run_single_child's finally block, causing a NameError
on every delegate_task call.
Solution: Move the save/restore logic entirely into _run_single_child,
keeping the save and restore in the same scope as the try/finally block.
This is cleaner than passing the variable through and removes the dead
save from _build_child_agent.
Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.
This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Agent-created skills were using the same policy as community hub
installs, blocking any skill with medium/high severity findings
(e.g. docker pull, pip install, git clone). This meant the agent
couldn't create skills that reference Docker or other common tools.
Changed agent-created policy from (allow, block, block) to
(allow, allow, block) — matching the trusted policy. Caution-level
findings (medium/high severity) are now allowed through, while
dangerous findings (critical severity like exfiltration, prompt
injection, reverse shells) remain blocked.
Added 4 tests covering the agent-created policy: safe allowed,
caution allowed, dangerous blocked, force override.
* fix: NameError in OpenCode provider setup (prompt_text -> prompt)
The OpenCode Zen and OpenCode Go setup sections used prompt_text()
which is undefined. All other providers correctly use the local
prompt() function defined in setup.py. Fixes crash during
'hermes setup' when selecting either OpenCode provider.
* fix: Telegram streaming — config bridge, not-modified, flood control
Three fixes for gateway streaming:
1. Bridge streaming config from config.yaml into gateway runtime.
load_gateway_config() now reads the 'streaming' key from config.yaml
(same pattern as session_reset, stt, etc.), matching the docs.
Previously only gateway.json was read.
2. Handle 'Message is not modified' in Telegram edit_message().
This Telegram API error fires when editing with identical content —
a no-op, not a real failure. Previously it returned success=False
which made the stream consumer disable streaming entirely.
3. Handle RetryAfter / flood control in Telegram edit_message().
Fast providers can hit Telegram rate limits during streaming.
Now waits the requested retry_after duration and retries once,
instead of treating it as a fatal edit failure.
Also fixed double-edit on stream finish: the consumer now tracks
last-sent text and skips redundant edits, preventing the not-modified
error at the source.
* refactor: make config.yaml the primary gateway config source
Eliminates the per-key bridge pattern in load_gateway_config().
Previously gateway.json was the primary source and each config.yaml
key needed an individual bridge — easy to forget (streaming was
missing, causing garl4546's bug).
Now config.yaml is read first and its keys are mapped directly into
the GatewayConfig.from_dict() schema. gateway.json is kept as a
legacy fallback layer (loaded first, then overwritten by config.yaml
keys). If gateway.json exists, a log message suggests migrating.
Also:
- Removed dead save_gateway_config() (never called anywhere)
- Updated CLI help text and send_message error to reference
config.yaml instead of gateway.json
---------
Co-authored-by: Test <test@test.com>
Save and restore the process-global _last_resolved_tool_names in
_run_single_child() so the parent's execute_code sandbox generates
correct tool imports after delegation completes.
The global was already mostly mitigated (run_agent.py passes
enabled_tools via self.valid_tool_names), but the global itself
remained corrupted — a footgun for any code that reads it directly.
Co-authored-by: shane9coy <shane9coy@users.noreply.github.com>
* fix(session): skip corrupt lines in load_transcript instead of crashing
Wrap json.loads() in load_transcript() with try/except JSONDecodeError
so that partial JSONL lines (from mid-write crashes like OOM/SIGKILL)
are skipped with a warning instead of crashing the entire transcript
load. The rest of the history loads fine.
Adds a logger.warning with the session ID and truncated corrupt line
content for debugging visibility.
Salvaged from PR #1193 by alireza78a.
Closes#1193
* fix(stt): respect explicit provider config instead of env-var fallback
Rework _get_provider() to separate explicit config from auto-detect.
When stt.provider is explicitly set in config.yaml, that choice is
authoritative — no silent cross-provider fallback based on which env
vars happen to be set. When no provider is configured, auto-detect
still tries: local > groq > openai.
This fixes the reported scenario where provider: local + a placeholder
OPENAI_API_KEY caused the system to silently select OpenAI and fail
with a 401.
Closes#1774
1. browser_tool.py: Replace **args spread on browser_click, browser_type,
and browser_scroll handlers with explicit parameter extraction. The
**args pattern passed all dict keys as keyword arguments, causing
TypeError if the LLM sent unexpected parameters. Now extracts only
the expected params (ref, text, direction) with safe defaults.
2. fuzzy_match.py: Update module docstring to match actual strategy
order in code. Block anchor was listed as #3 but is actually #7.
Multi-occurrence is not a separate strategy but a flag. Updated
count from 9 to 8.
Salvage of PR #1707 by @kshitijk4poor (cherry-picked with authorship preserved).
Adds Tavily as a third web backend alongside Firecrawl and Parallel, using the Tavily REST API via httpx.
- Backend selection via hermes tools → saved as web.backend in config.yaml
- All three tools supported: search, extract, crawl
- TAVILY_API_KEY in config registry, doctor, status, setup wizard
- 15 new Tavily tests + 9 backend selection tests + 5 config tests
- Backward compatible
Closes#1707
Two concurrent gateway sessions calling memory add/replace/remove
simultaneously could both read the old state, apply their changes
independently, and write — the last writer silently drops the first
writer's entry.
Fix: wrap each mutation in a file lock (fcntl.flock on a .lock file).
Under the lock, re-read entries from disk to get the latest state,
apply the mutation, then write. This ensures concurrent writers
serialize properly.
The lock uses a separate .lock file since the memory file itself is
atomically replaced via os.replace() (can't flock a replaced file).
Readers remain lock-free since atomic rename ensures they always see
a complete file.
Two concurrent threads (e.g. parallel subagents) could both pass the
'task_id in _active_sessions' check, both create cloud sessions via
network calls, and then one would overwrite the other — leaking the
first cloud session.
Add double-check after the lock is re-acquired: if another thread
already created a session while we were doing the network call, use
the existing one instead of orphaning it.
* feat(web): add Parallel as alternative web search/extract backend
Adds Parallel (parallel.ai) as a drop-in alternative to Firecrawl for
web_search and web_extract tools using the official parallel-web SDK.
- Backend selection via WEB_SEARCH_BACKEND env var (auto/parallel/firecrawl)
- Auto mode prefers Firecrawl when both keys present; Parallel when sole backend
- web_crawl remains Firecrawl-only with clear error when unavailable
- Lazy SDK imports, interrupt support, singleton clients
- 16 new unit tests for backend selection and client config
Co-authored-by: s-jag <s-jag@users.noreply.github.com>
* fix: add PARALLEL_API_KEY to config registry and fix web_crawl policy tests
Follow-up for Parallel backend integration:
- Add PARALLEL_API_KEY to OPTIONAL_ENV_VARS (hermes doctor, env blocklist)
- Add to set_config_value api_keys list (hermes config set)
- Add to doctor keys display
- Fix 2 web_crawl policy tests that didn't set FIRECRAWL_API_KEY
(needed now that web_crawl has a Firecrawl availability guard)
* refactor: explicit backend selection via hermes tools, not auto-detect
Replace the auto-detect backend selection with explicit user choice:
- hermes tools saves WEB_SEARCH_BACKEND to .env when user picks a provider
- _get_backend() reads the explicit choice first
- Fallback only for manual/legacy config (uses whichever key is present)
- _is_provider_active() shows [active] for the selected web backend
- Updated tests, docs, and .env.example to remove 'auto' mode language
* refactor: use config.yaml for web backend, not env var
Match the TTS/browser pattern — web.backend is stored in config.yaml
(set by hermes tools), not as a WEB_SEARCH_BACKEND env var.
- _load_web_config() reads web: section from config.yaml
- _get_backend() reads web.backend from config, falls back to key detection
- _configure_provider() saves to config dict (saved to config.yaml)
- _is_provider_active() reads from config dict
- Removed WEB_SEARCH_BACKEND from .env.example, set_config_value, docs
- Updated all tests to mock _load_web_config instead of env vars
---------
Co-authored-by: s-jag <s-jag@users.noreply.github.com>
When container_persistent=false, the inner mini-swe-agent cleanup only
runs 'docker stop' in the background, leaving containers in Exited state.
Now cleanup() also runs 'docker rm -f' to fully remove the container.
Also fixes pre-existing test failures in model_metadata (gpt-4.1 1M context),
setup tests (TTS provider step), and adds MockInnerDocker.cleanup().
Original fix by crazywriter1. Cherry-picked and adapted for current main.
Fixes#1679
* feat: interactive MCP tool configuration in hermes tools
Add the ability to selectively enable/disable individual MCP server
tools through the interactive 'hermes tools' TUI.
Changes:
- tools/mcp_tool.py: Add probe_mcp_server_tools() — lightweight function
that temporarily connects to configured MCP servers, discovers their
tools (names + descriptions), and disconnects. No registry side effects.
- hermes_cli/tools_config.py: Add 'Configure MCP tools' option to the
interactive menu. When selected:
1. Probes all enabled MCP servers for their available tools
2. Shows a per-server curses checklist with tool descriptions
3. Pre-selects tools based on existing include/exclude config
4. Writes changes back as tools.exclude entries in config.yaml
5. Reports which servers failed to connect
The existing CLI commands (hermes tools enable/disable server:tool)
continue to work unchanged. This adds the interactive TUI counterpart
so users can browse and toggle MCP tools visually.
Tests: 22 new tests covering probe function edge cases and interactive
flow (pre-selection, exclude/include modes, description truncation,
multi-server handling, error paths).
* feat(telegram): auto-detect HTML tags and use parse_mode=HTML in send_message
When _send_telegram detects HTML tags in the message body, it now sends
with parse_mode='HTML' instead of converting to MarkdownV2. This allows
cron jobs and agents to send rich HTML-formatted Telegram messages with
bold, italic, code blocks, etc. that render correctly.
Detection uses the same regex from PR #1568 by @ashaney:
re.search(r'<[a-zA-Z/][^>]*>', message)
Plain-text and markdown messages continue through the existing
MarkdownV2 pipeline. The HTML fallback path also catches HTML parse
errors and falls back to plain text, matching the existing MarkdownV2
error handling.
Inspired by: github.com/ashaney — PR #1568
Salvaged from PR #1573 by @eren-karakus0. Cherry-picked with authorship preserved.
Fixes#1143 — background process notifications resume after gateway restart.
Co-authored-by: Muhammet Eren Karakuş <erenkar950@gmail.com>
Add the ability to selectively enable/disable individual MCP server
tools through the interactive 'hermes tools' TUI.
Changes:
- tools/mcp_tool.py: Add probe_mcp_server_tools() — lightweight function
that temporarily connects to configured MCP servers, discovers their
tools (names + descriptions), and disconnects. No registry side effects.
- hermes_cli/tools_config.py: Add 'Configure MCP tools' option to the
interactive menu. When selected:
1. Probes all enabled MCP servers for their available tools
2. Shows a per-server curses checklist with tool descriptions
3. Pre-selects tools based on existing include/exclude config
4. Writes changes back as tools.exclude entries in config.yaml
5. Reports which servers failed to connect
The existing CLI commands (hermes tools enable/disable server:tool)
continue to work unchanged. This adds the interactive TUI counterpart
so users can browse and toggle MCP tools visually.
Tests: 22 new tests covering probe function edge cases and interactive
flow (pre-selection, exclude/include modes, description truncation,
multi-server handling, error paths).
- Default enabled: false (zero overhead when not configured)
- Fast path: cached disabled state skips all work immediately
- TTL cache (30s) for parsed policy — avoids re-reading config.yaml
on every URL check
- Missing shared files warn + skip instead of crashing all web tools
- Lazy yaml import — missing PyYAML doesn't break browser toolset
- Guarded browser_tool import — fail-open lambda fallback
- check_website_access never raises for default path (fail-open with
warning log); only raises with explicit config_path (test mode)
- Simplified enforcement code in web_tools/browser_tool — no more
try/except wrappers since errors are handled internally
Extract the repeated line-position calculation pattern into a
_calculate_line_positions() helper. The same 4-line pattern was
duplicated across _strategy_trimmed_boundary, _strategy_block_anchor,
_strategy_context_aware, and _find_normalized_matches. Also
standardizes the end_pos clamping (some sites used min(), some used
an if-guard).
Based on PR #1604 by aydnOktay.
Co-authored-by: aydnOktay <aydnOktay@users.noreply.github.com>
Add inference.sh CLI (infsh) as a tool integration, giving agents
access to 150+ AI apps through a single CLI — image gen (FLUX, Reve,
Seedream), video (Veo, Wan, Seedance), LLMs, search (Tavily, Exa),
3D, avatar/lipsync, and more. One API key manages all services.
Tools:
- infsh: run any infsh CLI command (app list, app run, etc.)
- infsh_install: install the CLI if not present
Registered as an 'inference' toolset (opt-in, not in core tools).
Includes comprehensive skill docs with examples for all app categories.
Changes from original PR:
- NOT added to _HERMES_CORE_TOOLS (available via --toolsets inference)
- Added 12 tests covering tool registration, command execution,
error handling, timeout, JSON parsing, and install flow
Inspired by PR #1021 by @okaris.
Co-authored-by: okaris <okaris@users.noreply.github.com>
Cast path to str() before os.path.expanduser() to handle pathlib.Path
inputs safely.
Based on PR #1051 by JackTheGit.
Co-authored-by: JackTheGit <JackTheGit@users.noreply.github.com>
* fix: thread safety for concurrent subagent delegation
Four thread-safety fixes that prevent crashes and data races when
running multiple subagents concurrently via delegate_task:
1. Remove redirect_stdout/stderr from delegate_tool — mutating global
sys.stdout races with the spinner thread when multiple children start
concurrently, causing segfaults. Children already run with
quiet_mode=True so the redirect was redundant.
2. Split _run_single_child into _build_child_agent (main thread) +
_run_single_child (worker thread). AIAgent construction creates
httpx/SSL clients which are not thread-safe to initialize
concurrently.
3. Add threading.Lock to SessionDB — subagents share the parent's
SessionDB and call create_session/append_message from worker threads
with no synchronization.
4. Add _active_children_lock to AIAgent — interrupt() iterates
_active_children while worker threads append/remove children.
5. Add _client_cache_lock to auxiliary_client — multiple subagent
threads may resolve clients concurrently via call_llm().
Based on PR #1471 by peteromallet.
* feat: Honcho base_url override via config.yaml + quick command alias type
Two features salvaged from PR #1576:
1. Honcho base_url override: allows pointing Hermes at a remote
self-hosted Honcho deployment via config.yaml:
honcho:
base_url: "http://192.168.x.x:8000"
When set, this overrides the Honcho SDK's environment mapping
(production/local), enabling LAN/VPN Honcho deployments without
requiring the server to live on localhost. Uses config.yaml instead
of env var (HONCHO_URL) per project convention.
2. Quick command alias type: adds a new 'alias' quick command type
that rewrites to another slash command before normal dispatch:
quick_commands:
sc:
type: alias
target: /context
Supports both CLI and gateway. Arguments are forwarded to the
target command.
Based on PR #1576 by redhelix.
---------
Co-authored-by: peteromallet <peteromallet@users.noreply.github.com>
Co-authored-by: redhelix <redhelix@users.noreply.github.com>
Docker terminal sessions are secret-dark by default. This adds
terminal.docker_forward_env as an explicit allowlist for env vars
that may be forwarded into Docker containers.
Values resolve from the current shell first, then fall back to
~/.hermes/.env. Only variables the user explicitly lists are
forwarded — nothing is auto-exposed.
Cherry-picked from PR #1449 by @teknium1, conflict-resolved onto
current main.
Fixes#1436
Supersedes #1439
Remove the optional skill (redundant now that NeuTTS is a built-in TTS
provider). Replace neutts_cli dependency with a standalone synthesis
helper (tools/neutts_synth.py) that calls the neutts Python API directly
in a subprocess.
Add TTS provider selection to hermes setup:
- 'hermes setup' now prompts for TTS provider after model selection
- 'hermes setup tts' available as standalone section
- Selecting NeuTTS checks for deps and offers to install:
espeak-ng (system) + neutts[all] (pip)
- ElevenLabs/OpenAI selections prompt for API keys
- Tool status display shows NeuTTS install state
Changes:
- Remove optional-skills/mlops/models/neutts/ (skill + CLI scaffold)
- Add tools/neutts_synth.py (standalone synthesis subprocess helper)
- Move jo.wav/jo.txt to tools/neutts_samples/ (bundled default voice)
- Refactor _generate_neutts() — uses neutts API via subprocess, no
neutts_cli dependency, config-driven ref_audio/ref_text/model/device
- Add TTS setup to hermes_cli/setup.py (SETUP_SECTIONS, tool status)
- Update config.py defaults (ref_audio, ref_text, model, device)
search_files(target='files') now uses rg --files -g instead of find.
Ripgrep respects .gitignore, excludes hidden dirs by default, and has
parallel directory traversal — ~200x faster on wide trees (0.14s vs 34s
benchmarked on 164-repo tree).
Falls back to find when rg is unavailable, preserving hidden-dir
exclusion and BSD find compatibility.
Salvaged from PR #1464 by @light-merlin-dark (Merlin) — adapted to
preserve hidden-dir exclusion added since the original PR.
* fix(security): harden terminal safety and sandbox file writes
Two security improvements:
1. Dangerous command detection: expand shell -c pattern to catch
combined flags (bash -lc, bash -ic, ksh -c) that were previously
undetected. Pattern changed from matching only 'bash -c' to
matching any shell invocation with -c anywhere in the flags.
2. File write sandboxing: add HERMES_WRITE_SAFE_ROOT env var that
constrains all write_file/patch operations to a configured directory
tree. Opt-in — when unset, behavior is unchanged. Useful for
gateway/messaging deployments that should only touch a workspace.
Based on PR #1085 by ismoilh.
* fix: correct "POSIDEON" typo to "POSEIDON" in banner ASCII art
The poseidon skin's banner_logo had the E and I letters swapped,
spelling "POSIDEON-AGENT" instead of "POSEIDON-AGENT".
---------
Co-authored-by: ismoilh <ismoilh@users.noreply.github.com>
Co-authored-by: unmodeled-tyler <unmodeled.tyler@proton.me>
* fix: prevent infinite 400 failure loop on context overflow (#1630)
When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message. This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error. Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.
Three-layer fix:
1. run_agent.py — Fallback heuristic: when a 400 error has a very short
generic message AND the session is large (>40% of context or >80
messages), treat it as a probable context overflow and trigger
compression instead of aborting.
2. run_agent.py + gateway/run.py — Don't persist failed messages:
when the agent returns failed=True before generating any response,
skip writing the user's message to the transcript/DB. This prevents
the session from growing on each failure.
3. gateway/run.py — Smarter error messages: detect context-overflow
failures and suggest /compact or /reset specifically, instead of a
generic 'try again' that will fail identically.
* fix(skills): detect prompt injection patterns and block cache file reads
Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):
1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
(index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
was the original injection vector — untrusted skill descriptions
in the catalog contained adversarial text that the model executed.
2. skill_view: warns when skills are loaded from outside the trusted
~/.hermes/skills/ directory, and detects common injection patterns
in skill content ("ignore previous instructions", "<system>", etc.).
Cherry-picked from PR #1562 by ygd58.
* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)
Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.
- Apply truncate_message() chunking in _send_to_platform() before
dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement
Cherry-picked from PR #1557 by llbn.
* fix(approval): show full command in dangerous command approval (#1553)
Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:
- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests
Cherry-picked from PR #1566 by crazywriter1.
* fix(cli): flush stdout during agent loop to prevent macOS display freeze (#1624)
The interrupt polling loop in chat() waited on the queue without
invalidating the prompt_toolkit renderer. On macOS, the StdoutProxy
buffer only flushed on input events, causing the CLI to appear frozen
during tool execution until the user typed a key.
Fix: call _invalidate() on each queue timeout (every ~100ms, throttled
to 150ms) to force the renderer to flush buffered agent output.
* fix(claw): warn when API keys are skipped during OpenClaw migration (#1580)
When --migrate-secrets is not passed (the default), API keys like
OPENROUTER_API_KEY are silently skipped with no warning. Users don't
realize their keys weren't migrated until the agent fails to connect.
Add a post-migration warning with actionable instructions: either
re-run with --migrate-secrets or add the key manually via
hermes config set.
Cherry-picked from PR #1593 by ygd58.
* fix(security): block sandbox backend creds from subprocess env (#1264)
Add Modal and Daytona sandbox credentials to the subprocess env
blocklist so they're not leaked to agent terminal sessions via
printenv/env.
Cherry-picked from PR #1571 by ygd58.
---------
Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
* feat(skills): add bundled neutts optional skill
Add NeuTTS optional skill with CLI scaffold, bootstrap helper, and
sample voice profile. Also fixes skills_hub.py to handle binary
assets (WAV files) during skill installation.
Changes:
- optional-skills/mlops/models/neutts/ — skill + CLI scaffold
- tools/skills_hub.py — binary asset support (read_bytes, write_bytes)
- tests/tools/test_skills_hub.py — regression tests for binary assets
* feat(tts): add NeuTTS as local TTS provider backend
Add NeuTTS as a fourth TTS provider option alongside Edge, ElevenLabs,
and OpenAI. NeuTTS runs fully on-device via neutts_cli — no API key
needed.
Provider behavior:
- Explicit: set tts.provider to 'neutts' in config.yaml
- Fallback: when Edge TTS is unavailable and neutts_cli is installed,
automatically falls back to NeuTTS instead of failing
- check_tts_requirements() now includes NeuTTS in availability checks
NeuTTS outputs WAV natively. For Telegram voice bubbles, ffmpeg
converts to Opus (same pattern as Edge TTS).
Changes:
- tools/tts_tool.py — _generate_neutts(), _check_neutts_available(),
provider dispatch, fallback logic, Opus conversion
- hermes_cli/config.py — tts.neutts config defaults
---------
Co-authored-by: unmodeled-tyler <unmodeled.tyler@proton.me>
The primary injection vector in #1558 was search_files discovering
catalog cache files in .hub/index-cache/ via find or grep, which
don't skip hidden directories like ripgrep does by default.
Three-layer fix:
1. _search_files (find): add -not -path '*/.*' to exclude hidden
directories, matching ripgrep's default behavior.
2. _search_with_grep: add --exclude-dir='.*' to skip hidden
directories in the grep fallback path.
3. _write_index_cache: write a .ignore file to .hub/ so ripgrep
also skips it even when invoked with --hidden (belt-and-suspenders).
This makes all three search backends (rg, grep, find) consistently
exclude hidden directories, preventing the agent from discovering
and reading unvetted community content in hub cache files.
* fix: prevent infinite 400 failure loop on context overflow (#1630)
When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message. This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error. Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.
Three-layer fix:
1. run_agent.py — Fallback heuristic: when a 400 error has a very short
generic message AND the session is large (>40% of context or >80
messages), treat it as a probable context overflow and trigger
compression instead of aborting.
2. run_agent.py + gateway/run.py — Don't persist failed messages:
when the agent returns failed=True before generating any response,
skip writing the user's message to the transcript/DB. This prevents
the session from growing on each failure.
3. gateway/run.py — Smarter error messages: detect context-overflow
failures and suggest /compact or /reset specifically, instead of a
generic 'try again' that will fail identically.
* fix(skills): detect prompt injection patterns and block cache file reads
Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):
1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
(index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
was the original injection vector — untrusted skill descriptions
in the catalog contained adversarial text that the model executed.
2. skill_view: warns when skills are loaded from outside the trusted
~/.hermes/skills/ directory, and detects common injection patterns
in skill content ("ignore previous instructions", "<system>", etc.).
Cherry-picked from PR #1562 by ygd58.
* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)
Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.
- Apply truncate_message() chunking in _send_to_platform() before
dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement
Cherry-picked from PR #1557 by llbn.
* fix(approval): show full command in dangerous command approval (#1553)
Previously the command was truncated to 80 chars in CLI (with a
[v]iew full option), 500 chars in Discord embeds, and missing entirely
in Telegram/Slack approval messages. Now the full command is always
displayed everywhere:
- CLI: removed 80-char truncation and [v]iew full menu option
- Gateway (TG/Slack): approval_required message includes full command
in a code block
- Discord: embed shows full command up to 4096-char limit
- Windows: skip SIGALRM-based test timeout (Unix-only)
- Updated tests: replaced view-flow tests with direct approval tests
Cherry-picked from PR #1566 by crazywriter1.
---------
Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
Co-authored-by: crazywriter1 <53251494+crazywriter1@users.noreply.github.com>
* fix: prevent infinite 400 failure loop on context overflow (#1630)
When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message. This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error. Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.
Three-layer fix:
1. run_agent.py — Fallback heuristic: when a 400 error has a very short
generic message AND the session is large (>40% of context or >80
messages), treat it as a probable context overflow and trigger
compression instead of aborting.
2. run_agent.py + gateway/run.py — Don't persist failed messages:
when the agent returns failed=True before generating any response,
skip writing the user's message to the transcript/DB. This prevents
the session from growing on each failure.
3. gateway/run.py — Smarter error messages: detect context-overflow
failures and suggest /compact or /reset specifically, instead of a
generic 'try again' that will fail identically.
* fix(skills): detect prompt injection patterns and block cache file reads
Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):
1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
(index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
was the original injection vector — untrusted skill descriptions
in the catalog contained adversarial text that the model executed.
2. skill_view: warns when skills are loaded from outside the trusted
~/.hermes/skills/ directory, and detects common injection patterns
in skill content ("ignore previous instructions", "<system>", etc.).
Cherry-picked from PR #1562 by ygd58.
* fix(tools): chunk long messages in send_message_tool before dispatch (#1552)
Long messages sent via send_message tool or cron delivery silently
failed when exceeding platform limits. Gateway adapters handle this
via truncate_message(), but the standalone senders in send_message_tool
bypassed that entirely.
- Apply truncate_message() chunking in _send_to_platform() before
dispatching to individual platform senders
- Remove naive message[i:i+2000] character split in _send_discord()
in favor of centralized smart splitting
- Attach media files to last chunk only for Telegram
- Add regression tests for chunking and media placement
Cherry-picked from PR #1557 by llbn.
---------
Co-authored-by: buray <ygd58@users.noreply.github.com>
Co-authored-by: lbn <llbn@users.noreply.github.com>
* fix: prevent infinite 400 failure loop on context overflow (#1630)
When a gateway session exceeds the model's context window, Anthropic may
return a generic 400 invalid_request_error with just 'Error' as the
message. This bypassed the phrase-based context-length detection,
causing the agent to treat it as a non-retryable client error. Worse,
the failed user message was still persisted to the transcript, making
the session even larger on each attempt — creating an infinite loop.
Three-layer fix:
1. run_agent.py — Fallback heuristic: when a 400 error has a very short
generic message AND the session is large (>40% of context or >80
messages), treat it as a probable context overflow and trigger
compression instead of aborting.
2. run_agent.py + gateway/run.py — Don't persist failed messages:
when the agent returns failed=True before generating any response,
skip writing the user's message to the transcript/DB. This prevents
the session from growing on each failure.
3. gateway/run.py — Smarter error messages: detect context-overflow
failures and suggest /compact or /reset specifically, instead of a
generic 'try again' that will fail identically.
* fix(skills): detect prompt injection patterns and block cache file reads
Adds two security layers to prevent prompt injection via skills hub
cache files (#1558):
1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory
(index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json
was the original injection vector — untrusted skill descriptions
in the catalog contained adversarial text that the model executed.
2. skill_view: warns when skills are loaded from outside the trusted
~/.hermes/skills/ directory, and detects common injection patterns
in skill content ("ignore previous instructions", "<system>", etc.).
Cherry-picked from PR #1562 by ygd58.
---------
Co-authored-by: buray <ygd58@users.noreply.github.com>
The _send_telegram() function was sending raw markdown text without
parse_mode, causing bold, links, and headers to render as plain text.
This fix reuses the gateway adapter's format_message() to convert
markdown to Telegram's MarkdownV2 format, with a fallback to plain
text if parsing fails.
* fix(tools): remove unnecessary crontab requirement from cronjob tool
The hermes cron system is internal — it uses a JSON-based scheduler
ticked by the gateway (cron/scheduler.py), not system crontab.
The check for shutil.which('crontab') was preventing the cronjob tool
from being available in environments without crontab installed (e.g.
minimal Ubuntu containers).
Changes:
- Remove shutil.which('crontab') check from check_cronjob_requirements()
- Remove unused shutil import
- Update docstring to clarify internal scheduler is used
- Update tests to reflect new behavior and add coverage for all
session modes (interactive, gateway, exec_ask)
Fixes#1589
* test: add HERMES_EXEC_ASK coverage for cronjob requirements
Adds missing test for the exec_ask session mode, complementing
the cherry-picked fix from PR #1633.
---------
Co-authored-by: Bartok9 <bartokmagic@proton.me>
Introduce a cloud browser provider abstraction so users can switch
between Local Browser, Browserbase, and Browser Use (or future providers)
via hermes tools / hermes setup.
Cloud browser providers are behind an ABC (tools/browser_providers/base.py)
so adding a new provider is a single-file addition with no changes to
browser_tool.py internals.
Changes:
- tools/browser_providers/ package with ABC, Browserbase extraction,
and Browser Use provider
- browser_tool.py refactored to use _PROVIDER_REGISTRY + _get_cloud_provider()
(cached) instead of hardcoded _is_local_mode() / _create_browserbase_session()
- tools_config.py: generic _is_provider_active() / _detect_active_provider_index()
replace TTS-only logic; Browser Use added as third browser option
- config.py: BROWSER_USE_API_KEY added to OPTIONAL_ENV_VARS + show_config + allowlist
- subprocess pipe hang fix: agent-browser daemon inherits pipe fds,
communicate() blocks. Replaced with Popen + temp files.
Original PR: #1208
Co-authored-by: ShawnPana <shawnpana@users.noreply.github.com>
* fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting
Anthropic routes OAuth/subscription requests based on Claude Code's
identity markers. Without them, requests get intermittent 500 errors
(~25% failure rate observed). This matches what pi-ai (clawdbot) and
OpenCode both implement for OAuth compatibility.
Changes (OAuth tokens only — API key users unaffected):
1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli'
2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI'
3. System prompt sanitization: replace Hermes/Nous references
4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools)
5. Tool name stripping: remove 'mcp_' prefix from response tool calls
Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate)
After: 16/16 OK, 0 failures, 0 retries (0% error rate)
* fix: three gateway issues from user error logs
1. send_animation missing metadata kwarg (base.py)
- Base class send_animation lacked the metadata parameter that the
call site in base.py line 917 passes. Telegram's override accepted
it, but any platform without an override (Discord, Slack, etc.)
hit TypeError. Added metadata to base class signature.
2. MarkdownV2 split-inside-inline-code (base.py truncate_message)
- truncate_message could split at a space inside an inline code span
(e.g. `function(arg1, arg2)`), leaving an unpaired backtick and
unescaped parentheses in the chunk. Telegram rejects with
'character ( is reserved'. Added inline code awareness to the
split-point finder — detects odd backtick counts and moves the
split before the code span.
3. tirith auto-install without cosign (tirith_security.py)
- Previously required cosign on PATH for auto-install, blocking
install entirely with a warning if missing. Now proceeds with
SHA-256 checksum verification only when cosign is unavailable.
Cosign is still used for full supply chain verification when
present. If cosign IS present but verification explicitly fails,
install is still aborted (tampered release).
* fix(tools): improve error logging in code_execution_tool
* fix: harden execute_code cleanup and reduce logging noise
Follow-up to cherry-picked PR #1588 (aydnOktay):
- Initialize server_sock = None before try block to prevent NameError
if exception occurs before socket creation (line 413 is inside the try)
- Guard server_sock.close() with None check
- Narrow cleanup exception handlers to OSError (the actual error type)
- Remove exc_info=True from cleanup debug logs — benign teardown
failures don't need stack traces, the message is sufficient
- Remove redundant try/except around shutil.rmtree(ignore_errors=True)
- Silence sock_path unlink with pass — expected when already cleaned up
---------
Co-authored-by: aydnOktay <xaydinoktay@gmail.com>
When neither apptainer nor singularity is installed, the Singularity
backend silently defaults to "singularity" and fails with a cryptic
FileNotFoundError inside _start_instance(). Add a preflight check
that resolves the executable and verifies it responds, raising a
clear RuntimeError with install instructions on failure.
Closes#1511
* feat: improve memory prioritization — user preferences over procedural knowledge
Inspired by OpenAI Codex's memory prompt improvements (openai/codex#14493)
which focus memory writes on user preferences and recurring patterns
rather than procedural task details.
Key insight: 'Optimize for reducing future user steering — the most
valuable memory prevents the user from having to repeat themselves.'
Changes:
- MEMORY_GUIDANCE (prompt_builder.py): added prioritization hierarchy
and the core principle about reducing user steering
- MEMORY_SCHEMA (memory_tool.py): reordered WHEN TO SAVE list to put
corrections first, added explicit PRIORITY guidance
- Memory nudge (run_agent.py): now asks specifically about preferences,
corrections, and workflow patterns instead of generic 'anything'
- Memory flush (run_agent.py): now instructs to prioritize user
preferences and corrections over task-specific details
* feat: more aggressive skill creation and update prompting
Press harder on skill updates — the agent should proactively patch
skills when it encounters issues during use, not wait to be asked.
Changes:
- SKILLS_GUIDANCE: 'consider saving' → 'save'; added explicit instruction
to patch skills immediately when found outdated/wrong
- Skills header: added instruction to update loaded skills before finishing
if they had missing steps or wrong commands
- Skill nudge: more assertive ('save the approach' not 'consider saving'),
now also prompts for updating existing skills used in the task
- Skill nudge interval: lowered default from 15 to 10 iterations
- skill_manage schema: added 'patch it immediately' to update triggers
Add /browser slash command for connecting browser tools to the user's
live Chrome instance via Chrome DevTools Protocol:
/browser connect — connect to Chrome on localhost:9222
/browser connect ws://host:port — custom CDP endpoint
/browser disconnect — revert to default (headless/Browserbase)
/browser status — show current browser mode + connectivity
When connected:
- All browser tools (navigate, snapshot, click, etc.) control the
user's real Chrome — logged-in sessions, cookies, open tabs
- Platform-specific Chrome launch instructions are shown
- Port connectivity is tested immediately
- A context message is injected so the model knows it's controlling
a live browser and should be mindful of user's open tabs
Implementation:
- BROWSER_CDP_URL env var drives the backend selection in browser_tool.py
- New _create_cdp_session() creates sessions using the CDP override
- _get_cdp_override() checked before local/Browserbase selection
- Existing agent-browser --cdp flag handles the actual CDP connection
Inspired by OpenClaw's browser profile system.
* feat: smart approvals — LLM-based risk assessment for dangerous commands
Adds a 'smart' approval mode that uses the auxiliary LLM to assess
whether a flagged command is genuinely dangerous or a false positive,
auto-approving low-risk commands without prompting the user.
Inspired by OpenAI Codex's Smart Approvals guardian subagent
(openai/codex#13860).
Config (config.yaml):
approvals:
mode: manual # manual (default), smart, off
Modes:
- manual — current behavior, always prompt the user
- smart — aux LLM evaluates risk: APPROVE (auto-allow), DENY (block),
or ESCALATE (fall through to manual prompt)
- off — skip all approval prompts (equivalent to --yolo)
When smart mode auto-approves, the pattern gets session-level approval
so subsequent uses of the same pattern don't trigger another LLM call.
When it denies, the command is blocked without user prompt. When
uncertain, it escalates to the normal manual approval flow.
The LLM prompt is carefully scoped: it sees only the command text and
the flagged reason, assesses actual risk vs false positive, and returns
a single-word verdict.
* feat: make smart approval model configurable via config.yaml
Adds auxiliary.approval section to config.yaml with the same
provider/model/base_url/api_key pattern as other aux tasks (vision,
web_extract, compression, etc.).
Config:
auxiliary:
approval:
provider: auto
model: '' # fast/cheap model recommended
base_url: ''
api_key: ''
Bridged to env vars in both CLI and gateway paths so the aux client
picks them up automatically.
* feat: add /stop command to kill all background processes
Adds a /stop slash command that kills all running background processes
at once. Currently users have to process(list) then process(kill) for
each one individually.
Inspired by OpenAI Codex's separation of interrupt (Ctrl+C stops current
turn) from /stop (cleans up background processes). See openai/codex#14602.
Ctrl+C continues to only interrupt the active agent turn — background
dev servers, watchers, etc. are preserved. /stop is the explicit way
to clean them all up.
Salvaged from PR #1528 by an420eth. Closes#517.
Improves _strategy_block_anchor in fuzzy_match.py:
- Add unicode normalization (smart quotes, em/en-dashes, ellipsis,
non-breaking spaces → ASCII) so LLM-produced unicode artifacts
don't break anchor line matching
- Lower thresholds: 0.10 for unique matches (was 0.70), 0.30 for
multiple candidates — if first/last lines match exactly, the
block is almost certainly correct
- Use original (non-normalized) content for offset calculation to
preserve correct character positions
Tested: 3 new scenarios fixed (em-dash anchors, non-breaking space
anchors, very-low-similarity unique matches), zero regressions on
all 9 existing fuzzy match tests.
Co-authored-by: an420eth <an420eth@users.noreply.github.com>
Keep Docker sandboxes isolated by default. Add an explicit terminal.docker_mount_cwd_to_workspace opt-in, thread it through terminal/file environment creation, and document the security tradeoff and config.yaml workflow clearly.
Fixes#1445 — When using Docker backend, the user's current working
directory is now automatically bind-mounted to /workspace inside the
container. This allows users to run `cd my-project && hermes` and have
their project files accessible to the agent without manual volume config.
Changes:
- Add host_cwd and auto_mount_cwd parameters to DockerEnvironment
- Capture original host CWD in _get_env_config() before container fallback
- Pass host_cwd through _create_environment() to Docker backend
- Add TERMINAL_DOCKER_NO_AUTO_MOUNT env var to disable if needed
- Skip auto-mount when /workspace is already explicitly mounted
- Add tests for auto-mount behavior
- Add documentation for the new feature
The auto-mount is skipped when:
1. TERMINAL_DOCKER_NO_AUTO_MOUNT=true is set
2. User configured docker_volumes with :/workspace
3. persistent_filesystem=true (persistent sandbox mode)
This makes the Docker backend behave more intuitively — the agent
operates on the user's actual project directory by default.
Checkpoint & rollback upgrades:
1. Enabled by default — checkpoints are now on for all new sessions.
Zero cost when no file-mutating tools fire. Disable with
checkpoints.enabled: false in config.yaml.
2. Diff preview — /rollback diff <N> shows a git diff between the
checkpoint and current working tree before committing to a restore.
3. File-level restore — /rollback <N> <file> restores a single file
from a checkpoint instead of the entire directory.
4. Conversation undo on rollback — when restoring files, the last
chat turn is automatically undone so the agent's context matches
the restored filesystem state.
5. Terminal command checkpoints — destructive terminal commands (rm,
mv, sed -i, truncate, git reset/clean, output redirects) now
trigger automatic checkpoints before execution. Previously only
write_file and patch were covered.
6. Change summary in listing — /rollback now shows file count and
+insertions/-deletions for each checkpoint.
7. Fixed dead code — removed duplicate _run_git call in
list_checkpoints with nonsensical --all if False condition.
8. Updated help text — /rollback with no args now shows available
subcommands (diff, file-level restore).
Salvaged from PR #1470 by adavyas.
Core fix: Honcho tool calls in a multi-session gateway could route to
the wrong session because honcho_tools.py relied on process-global
state. Now threads session context through the call chain:
AIAgent._invoke_tool() → handle_function_call() → registry.dispatch()
→ handler **kw → _resolve_session_context()
Changes:
- Add _resolve_session_context() to prefer per-call context over globals
- Plumb honcho_manager + honcho_session_key through handle_function_call
- Add sync_honcho=False to run_conversation() for synthetic flush turns
- Pass honcho_session_key through gateway memory flush lifecycle
- Harden gateway PID detection when /proc cmdline is unreadable
- Make interrupt test scripts import-safe for pytest-xdist
- Wrap BibTeX examples in Jekyll raw blocks for docs build
- Fix thread-order-dependent assertion in client lifecycle test
- Expand Honcho docs: session isolation, lifecycle, routing internals
Dropped from original PR:
- Indentation change in _create_request_openai_client that would move
client creation inside the lock (causes unnecessary contention)
Co-authored-by: adavyas <adavyas@users.noreply.github.com>
Salvaged from PR #1146 by spanishflu-est1918.
Background process progress/completion messages were sent with only
chat_id, landing in the general topic instead of the originating forum
topic. Thread the thread_id from HERMES_SESSION_THREAD_ID through the
watcher payload and pass it as metadata to adapter.send() so Telegram
routes notifications to the correct topic.
The env var export (HERMES_SESSION_THREAD_ID in _set_session_env /
_clear_session_env) already existed on main — this commit adds the
missing watcher plumbing.
Co-authored-by: spanishflu-est1918 <spanishflu-est1918@users.noreply.github.com>
Restore local STT command fallback for voice transcription, detect whisper and ffmpeg in common local install paths, and avoid bogus no-provider messaging when only a backend-specific key is missing.
- Add 'emoji' field to ToolEntry and 'get_emoji()' to ToolRegistry
- Add emoji= to all 50+ registry.register() calls across tool files
- Add get_tool_emoji() helper in agent/display.py with 3-tier resolution:
skin override → registry default → hardcoded fallback
- Replace hardcoded emoji maps in run_agent.py, delegate_tool.py, and
gateway/run.py with centralized get_tool_emoji() calls
- Add 'tool_emojis' field to SkinConfig so skins can override per-tool
emojis (e.g. ares skin could use swords instead of wrenches)
- Add 11 tests (5 registry emoji, 6 display/skin integration)
- Update AGENTS.md skin docs table
Based on the approach from PR #1061 by ForgingAlex (emoji centralization
in registry). This salvage fixes several issues from the original:
- Does NOT split the cronjob tool (which would crash on missing schemas)
- Does NOT change image_generate toolset/requires_env/is_async
- Does NOT delete existing tests
- Completes the centralization (gateway/run.py was missed)
- Hooks into the skin system for full customizability
SSH persistent shell now defaults to true — non-local backends benefit
most from state persistence across execute() calls. Local backend
remains opt-in via TERMINAL_LOCAL_PERSISTENT env var.
New config.yaml option: terminal.persistent_shell (default: true)
Controls the default for non-local backends. Users can disable with:
hermes config set terminal.persistent_shell false
Precedence: per-backend env var > TERMINAL_PERSISTENT_SHELL > default.
Wired through cli.py, gateway/run.py, and hermes_cli/config.py so the
config.yaml value reaches terminal_tool via env var bridge.
When a cronjob is created from within a Telegram or Slack thread,
deliver=origin was posting to the parent channel instead of the thread.
Root cause: the gateway never set HERMES_SESSION_THREAD_ID in the
session environment, so cronjob_tools.py could not capture thread_id
into the job's origin metadata — even though the scheduler already
reads origin.get('thread_id').
Fix:
- gateway/run.py: set HERMES_SESSION_THREAD_ID when thread_id is
present on the session context, and clear it in _clear_session_env
- tools/cronjob_tools.py: read HERMES_SESSION_THREAD_ID into origin
Closes#1219
Extend subprocess env sanitization beyond provider credentials by blocking Hermes-managed tool, messaging, and related gateway runtime vars. Reuse a shared sanitizer in LocalEnvironment and ProcessRegistry so background and PTY processes honor the same blocklist and _HERMES_FORCE_ escape hatch. Add regression coverage for local env execution and process_registry spawning.
Expanded the list of blocked environment variables to include Google, Groq, Mistral, and other major LLM providers. This ensures complete isolation and prevents conflicts with external CLI tools.
Salvaged from PR #1292 onto current main. Preserve per-job model,
provider, and base_url overrides in cron execution, persist them in
job records, expose them through the cronjob tool create/update paths,
and add regression coverage. Deliberately does not persist per-job
api_key values.
- add stt.enabled to the default user config
- make transcription_tools respect the disabled flag globally
- surface disabled state cleanly in voice mode diagnostics
- add regression coverage for disabled STT provider selection
pattern_key was derived by splitting the regex on \b and taking [1],
so patterns starting with the same word (e.g. find -exec rm and
find -delete) produced the same key "find". Approving one silently
approved the other. Using the unique description string as the key
eliminates all collisions.
The fork bomb regex used `()` (empty capture group) and unescaped `{}`
instead of literal `\(\)` and `\{\}`. This meant the classic fork bomb
`:(){ :|:& };:` was never detected. Also added `\s*` between `:` and
`&` and between `;` and trailing `:` to catch whitespace variants.
The execute_code sandbox spawns a child process with cwd set to a
temporary directory, but never adds the hermes-agent project root to
PYTHONPATH. This makes project-root modules like minisweagent_path
unreachable from sandboxed scripts, causing ImportError when the
agent runs self-diagnostic or analysis code via execute_code.
Fix by prepending the hermes-agent root directory to PYTHONPATH in
the child process environment.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.
Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.
Allow cron runs to keep using send_message for additional destinations, but
skip same-target sends when the scheduler will already auto-deliver the final
response there. Add prompt/tool guidance, docs, and regression coverage for
origin/home-channel resolution and thread-aware comparisons.
- Merge _init_persistent_shell + _start_persistent_shell into single method
- Move execute() dispatcher and cleanup() into PersistentShellMixin
so LocalEnvironment and SSHEnvironment inherit them
- Remove broad except Exception wrappers from _execute_oneshot in both backends
- Replace try/except with os.path.exists checks in local _read_temp_files
and _cleanup_temp_files
- Remove redundant bash -c from SSH oneshot (SSH already runs in a shell)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When vision_analyze_tool fails, the except block was returning a
generic 'could not be analyzed' message that gave the agent no
actionable information about the failure cause.
Replace the generic message with the actual exception string so the
agent can distinguish between backend errors, missing dependencies,
network failures, and unsupported image paths.
Also add an 'error' field to the failure response for structured
error handling by callers.
Fixes#1034
Remove diary-style memory framing from the system prompt and memory tool
schema, explicitly steer task/session logs to session_search, and clarify
that session_search is for cross-session recall after checking the current
conversation first. Add regression tests for the updated guidance text.
Resolve the cherry-pick against current browser_tool structure without carrying unrelated formatting churn, while preserving the intended cleanup, PATH, and screenshot recovery changes from PR #1001.
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
Salvaged from PR #1007 by stablegenius49.
- let INSTALL_POLICY decide dangerous verdict handling for builtin skills
- allow --force to override blocked dangerous decisions for trusted and community sources
- accept --yes / -y as aliases for --force in /skills install
- update regression tests to match the intended policy precedence
Round out the skills hub integration with:
- richer skills.sh metadata and security surfacing during inspect/install
- generic check/update flows for hub-installed skills
- support for well-known Agent Skills endpoints via /.well-known/skills/index.json
Also persist upstream bundle metadata in the lock file and add
regression coverage plus live-compatible path handling for both
skills.sh aliases and well-known endpoints.
* improve: add exc_info to MoA error logging
* refactor: tighten MoA traceback logging scope
Follow up on salvaged PR #998 by limiting exc_info logging to terminal
failure paths, avoiding duplicate aggregator errors, and refreshing the
MoA default OpenRouter model lineup to current frontier options.
---------
Co-authored-by: aydnOktay <xaydinoktay@gmail.com>