hermes-agent-features

Author	SHA1	Message	Date
Yukipukii1	44a16c5d9d	guard terminal_tool import-time env parsing	2026-04-22 14:45:50 -07:00
kshitijk4poor	d6ed35d047	feat(security): add global toggle to allow private/internal URL resolution Adds security.allow_private_urls / HERMES_ALLOW_PRIVATE_URLS toggle so users on OpenWrt routers, TUN-mode proxies (Clash/Mihomo/Sing-box), corporate split-tunnel VPNs, and Tailscale networks — where DNS resolves public domains to 198.18.0.0/15 or 100.64.0.0/10 — can use web_extract, browser, vision URL fetching, and gateway media downloads. Single toggle in tools/url_safety.py; all 23 is_safe_url() call sites inherit automatically. Cached for process lifetime. Cloud metadata endpoints stay ALWAYS blocked regardless of the toggle: 169.254.169.254 (AWS/GCP/Azure/DO/Oracle), 169.254.170.2 (AWS ECS task IAM creds), 169.254.169.253 (Azure IMDS wire server), 100.100.100.200 (Alibaba), fd00:ec2::254 (AWS IPv6), the entire 169.254.0.0/16 link-local range, and the metadata.google.internal / metadata.goog hostnames (checked pre-DNS so they can't be bypassed on networks where those names resolve to local IPs). Supersedes #3779 (narrower HERMES_ALLOW_RFC2544 for the same class of users). Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-04-22 14:38:59 -07:00
Yukipukii1	40619b393f	tools: normalize file tool pagination bounds	2026-04-22 06:11:41 -07:00
Teknium	8f167e8791	fix(tts): use per-provider input-character caps instead of global 4000 (#13743 ) A single global MAX_TEXT_LENGTH = 4000 truncated every TTS provider at 4000 chars, causing long inputs to be silently chopped even though the underlying APIs allow much more: - OpenAI: 4096 - xAI: 15000 - MiniMax: 10000 - ElevenLabs: 5000 / 10000 / 30000 / 40000 (model-aware) - Gemini: ~5000 - Edge: ~5000 The schema description also told the model 'Keep under 4000 characters', which encouraged the agent to self-chunk long briefs into multiple TTS calls (producing 3 separate audio files instead of one). New behavior: - PROVIDER_MAX_TEXT_LENGTH table + ELEVENLABS_MODEL_MAX_TEXT_LENGTH encode the documented per-provider limits. - _resolve_max_text_length(provider, cfg) resolves: 1. tts.<provider>.max_text_length user override 2. ElevenLabs model_id lookup 3. provider default 4. 4000 fallback - text_to_speech_tool() and stream_tts_to_speaker() both call the resolver; old MAX_TEXT_LENGTH alias kept for back-compat. - Schema description no longer hardcodes 4000. Tests: 27 new unit + E2E tests; all 53 existing TTS tests and 253 voice-command/voice-cli tests still pass.	2026-04-21 17:49:39 -07:00
Teknium	9c9d9b7ddf	feat(delegate): cross-agent file state coordination for concurrent subagents (#13718 ) * feat(models): hide OpenRouter models that don't advertise tool support Port from Kilo-Org/kilocode#9068. hermes-agent is tool-calling-first — every provider path assumes the model can invoke tools. Models whose OpenRouter supported_parameters doesn't include 'tools' (e.g. image-only or completion-only models) cannot be driven by the agent loop and fail at the first tool call. Filter them out of fetch_openrouter_models() so they never appear in the model picker (`hermes model`, setup wizard, /model slash command). Permissive when the field is missing — OpenRouter-compatible gateways (Nous Portal, private mirrors, older snapshots) don't always populate supported_parameters. Treat missing as 'unknown → allow' rather than silently emptying the picker on those gateways. Only hide models whose supported_parameters is an explicit list that omits tools. Tests cover: tools present → kept, tools absent → dropped, field missing → kept, malformed non-list → kept, non-dict item → kept, empty list → dropped. * feat(delegate): cross-agent file state coordination for concurrent subagents Prevents mangled edits when concurrent subagents touch the same file (same process, same filesystem — the mangle scenario from #11215). Three layers, all opt-out via HERMES_DISABLE_FILE_STATE_GUARD=1: 1. FileStateRegistry (tools/file_state.py) — process-wide singleton tracking per-agent read stamps and the last writer globally. check_stale() names the sibling subagent in the warning when a non-owning agent wrote after this agent's last read. 2. Per-path threading.Lock wrapped around the read-modify-write region in write_file_tool and patch_tool. Concurrent siblings on the same path serialize; different paths stay fully parallel. V4A multi-file patches lock in sorted path order (deadlock-free). 3. Delegate-completion reminder in tools/delegate_tool.py: after a subagent returns, writes_since(parent, child_start, parent_reads) appends '[NOTE: subagent modified files the parent previously read — re-read before editing: ...]' to entry.summary when the child touched anything the parent had already seen. Complements (does not replace) the existing path-overlap check in run_agent._should_parallelize_tool_batch — batch check prevents same-file parallel dispatch within one agent's turn (cheap prevention, zero API cost), registry catches cross-subagent and cross-turn staleness at write time (detection). Behavior is warning-only, not hard-failing — matches existing project style. Errors surface naturally: sibling writes often invalidate the old_string in patch operations, which already errors cleanly. Tests: tests/tools/test_file_state_registry.py — 16 tests covering registry state transitions, per-path locking, per-path-not-global locking, writes_since filtering, kill switch, and end-to-end integration through the real read_file/write_file/patch handlers.	2026-04-21 16:41:26 -07:00
pefontana	48ecb98f8a	feat(delegate): orchestrator role and configurable spawn depth (default flat) Adds role='leaf'\|'orchestrator' to delegate_task. With max_spawn_depth>=2, an orchestrator child retains the 'delegation' toolset and can spawn its own workers; leaf children cannot delegate further (identical to today). Default posture is flat — max_spawn_depth=1 means a depth-0 parent's children land at the depth-1 floor and orchestrator role silently degrades to leaf. Users opt into nested delegation by raising max_spawn_depth to 2 or 3 in config.yaml. Also threads acp_command/acp_args through the main agent loop's delegate dispatch (previously silently dropped in the schema) via a new _dispatch_delegate_task helper, and adds a DelegateEvent enum with legacy-string back-compat for gateway/ACP/CLI progress consumers. Config (hermes_cli/config.py defaults): delegation.max_concurrent_children: 3 # floor-only, no upper cap delegation.max_spawn_depth: 1 # 1=flat (default), 2-3 unlock nested delegation.orchestrator_enabled: true # global kill switch Salvaged from @pefontana's PR #11215. Overrides vs. the original PR: concurrency stays at 3 (PR bumped to 5 + cap 8 — we keep the floor only, no hard ceiling); max_spawn_depth defaults to 1 (PR defaulted to 2 which silently enabled one level of orchestration for every user). Co-authored-by: pefontana <fontana.pedro93@gmail.com>	2026-04-21 14:23:45 -07:00
Teknium	5ffae9228b	feat(image-gen): add GPT Image 2 to FAL catalog (#13677 ) Adds OpenAI's new GPT Image 2 model via FAL.ai, selectable through `hermes tools` → Image Generation. SOTA text rendering (including CJK) and world-aware photorealism. - FAL_MODELS entry with image_size_preset style - 4:3 presets on all aspect ratios — 16:9 (1024x576) falls below GPT-Image-2's 655,360 min-pixel floor and would be rejected - quality pinned to medium (same rule as gpt-image-1.5) for predictable Nous Portal billing - BYOK (openai_api_key) deliberately omitted from supports so all users stay on shared FAL billing - 6 new tests covering preset mapping, quality pinning, and supports-whitelist integrity - Docs table + aspect-ratio map updated Live-tested end-to-end: 39.9s cold request, clean 1024x768 PNG	2026-04-21 13:35:31 -07:00
Teknium	ba4357d13b	fix(env_passthrough): reject Hermes provider credentials from skill passthrough (#13523 ) A skill declaring `required_environment_variables: [ANTHROPIC_TOKEN]` in its SKILL.md frontmatter silently bypassed the `execute_code` sandbox's credential-scrubbing guarantee. `register_env_passthrough` had no blocklist, so any name a skill chose flipped `is_env_passthrough(name) => True`, which shortcircuits the sandbox's secret filter. Fix: reject registration when the name appears in `_HERMES_PROVIDER_ENV_BLOCKLIST` (the canonical list of Hermes-managed credentials — provider keys, gateway tokens, etc.). Log a warning naming GHSA-rhgp-j443-p4rf so operators see the rejection in logs. Non-Hermes third-party API keys (TENOR_API_KEY for gif-search, NOTION_TOKEN for notion skills, etc.) remain legitimately registerable — they were never in the sandbox scrub list in the first place. Tests: 16 -> 17 passing. Two old tests that documented the bypass (`test_passthrough_allows_blocklisted_var`, `test_make_run_env_passthrough`) are rewritten to assert the new fail-closed behavior. New `test_non_hermes_api_key_still_registerable` locks in that legitimate third-party keys are unaffected. Reported in GHSA-rhgp-j443-p4rf by @q1uf3ng. Hardening; not CVE-worthy on its own per the decision matrix (attacker must already have operator consent to install a malicious skill).	2026-04-21 06:14:25 -07:00
Ben	724377c429	test(mcp): add failing tests for circuit-breaker recovery The MCP circuit breaker in tools/mcp_tool.py has no half-open state and no reset-on-reconnect behavior, so once it trips after 3 consecutive failures it stays tripped for the process lifetime. These tests lock in the intended recovery behavior: 1. test_circuit_breaker_half_opens_after_cooldown — after the cooldown elapses, the next call must actually probe the session; success closes the breaker. 2. test_circuit_breaker_reopens_on_probe_failure — a failed probe re-arms the cooldown instead of letting every subsequent call through. 3. test_circuit_breaker_cleared_on_reconnect — a successful OAuth recovery resets the breaker even if the post-reconnect retry fails (a successful reconnect is sufficient evidence the server is viable again). All three currently fail, as expected.	2026-04-21 05:19:03 -07:00
JackTheGit	77061ac995	Normalize FAL_KEY env handling (ignore whitespace-only values) Treat whitespace-only FAL_KEY the same as unset so users who export FAL_KEY=" " (or CI that leaves a blank token) get the expected 'not set' error path instead of a confusing downstream fal_client failure. Applied to the two direct FAL_KEY checks in image_generation_tool.py: image_generate_tool's upfront credential check and check_fal_api_key(). Both keep the existing managed-gateway fallback intact. Adapted the original whitespace/valid tests to pin the managed gateway to None so the whitespace assertion exercises the direct-key path rather than silently relying on gateway absence.	2026-04-21 02:04:21 -07:00
Teknium	5e6427a42c	fix(patch): gate 'did you mean?' to no-match + extend to v4a/skill_manage Follow-ups on top of @teyrebaz33's cherry-picked commit: 1. New shared helper format_no_match_hint() in fuzzy_match.py with a startswith('Could not find') gate so the snippet only appends to genuine no-match errors — not to 'Found N matches' (ambiguous), 'Escape-drift detected', or 'identical strings' errors, which would all mislead the model. 2. file_tools.patch_tool suppresses the legacy generic '[Hint: old_string not found...]' string when the rich 'Did you mean?' snippet is already attached — no more double-hint. 3. Wire the same helper into patch_parser.py (V4A patch mode, both _validate_operations and _apply_update) and skill_manager_tool.py so all three fuzzy callers surface the hint consistently. Tests: 7 new gating tests in TestFormatNoMatchHint cover every error class (ambiguous, drift, identical, non-zero match count, None error, no similar content, happy path). 34/34 test_fuzzy_match, 96/96 test_file_tools + test_patch_parser + test_skill_manager_tool pass. E2E verified across all four scenarios: no-match-with-similar, no-match-no-similar, ambiguous, success. V4A mode confirmed end-to-end with a non-matching hunk.	2026-04-21 02:03:46 -07:00
teyrebaz33	15abf4ed8f	feat(patch): add 'did you mean?' feedback when patch fails to match When patch_replace() cannot find old_string in a file, the error message now includes the closest matching lines from the file with line numbers and context. This helps the LLM self-correct without a separate read_file call. Implements Phase 1 of #536: enhanced patch error feedback with no architectural changes. - tools/fuzzy_match.py: new find_closest_lines() using SequenceMatcher - tools/file_operations.py: attach closest-lines hint to patch errors - tests/tools/test_fuzzy_match.py: 5 new tests for find_closest_lines	2026-04-21 02:03:46 -07:00
Teknium	2d7ff9c5bd	feat(tts): complete KittenTTS integration (tools/setup/docs/tests) Builds on @AxDSan's PR #2109 to finish the KittenTTS wiring so the provider behaves like every other TTS backend end to end. - tools/tts_tool.py: `_check_kittentts_available()` helper and wire into `check_tts_requirements()`; extend Opus-conversion list to include kittentts (WAV → Opus for Telegram voice bubbles); point the missing-package error at `hermes setup tts`. - hermes_cli/tools_config.py: add KittenTTS entry to the "Text-to-Speech" toolset picker, with a `kittentts` post_setup hook that auto-installs the wheel + soundfile via pip. - hermes_cli/setup.py: `_install_kittentts_deps()`, new choice + install flow in `_setup_tts_provider()`, provider_labels entry, and status row in the `hermes setup` summary. - website/docs/user-guide/features/tts.md: add KittenTTS to the provider table, config example, ffmpeg note, and the zero-config voice-bubble tip. - tests/tools/test_tts_kittentts.py: 10 unit tests covering generation, model caching, config passthrough, ffmpeg conversion, availability detection, and the missing-package dispatcher branch. E2E verified against the real `kittentts` wheel: - WAV direct output (pcm_s16le, 24kHz mono) - MP3 conversion via ffmpeg (from WAV) - Telegram flow (provider in Opus-conversion list) produces `codec_name=opus`, 48kHz mono, `voice_compatible=True`, and the `[[audio_as_voice]]` marker - check_tts_requirements() returns True when kittentts is installed	2026-04-21 01:28:32 -07:00
Teknium	328223576b	feat(skills+terminal): make bundled skill scripts runnable out of the box (#13384 ) * feat(skills): inject absolute skill dir and expand ${HERMES_SKILL_DIR} templates When a skill loads, the activation message now exposes the absolute skill directory and substitutes ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in the SKILL.md body, so skills with bundled scripts can instruct the agent to run them by absolute path without an extra skill_view round-trip. Also adds opt-in inline-shell expansion: !`cmd` snippets in SKILL.md are pre-executed (with the skill directory as CWD) and their stdout is inlined into the message before the agent reads it. Off by default — enable via skills.inline_shell in config.yaml — because any snippet runs on the host without approval. Changes: - agent/skill_commands.py: template substitution, inline-shell expansion, absolute skill-dir header, supporting-files list now shows both relative and absolute forms. - hermes_cli/config.py: new skills.template_vars, skills.inline_shell, skills.inline_shell_timeout knobs. - tests/agent/test_skill_commands.py: coverage for header, both template tokens (present and missing session id), template_vars disable, inline-shell default-off, enabled, CWD, and timeout. - website/docs/developer-guide/creating-skills.md: documents the template tokens, the absolute-path header, and the opt-in inline shell with its security caveat. Validation: tests/agent/ 1591 passed (includes 9 new tests). E2E: loaded a real skill in an isolated HERMES_HOME; confirmed ${HERMES_SKILL_DIR} resolves to the absolute path, ${HERMES_SESSION_ID} resolves to the passed task_id, !`date` runs when opt-in is set, and stays literal when it isn't. * feat(terminal): source ~/.bashrc (and user-listed init files) into session snapshot bash login shells don't source ~/.bashrc, so tools that install themselves there — nvm, asdf, pyenv, cargo, custom PATH exports — stay invisible to the environment snapshot Hermes builds once per session. Under systemd or any context with a minimal parent env, that surfaces as 'node: command not found' in the terminal tool even though the binary is reachable from every interactive shell on the machine. Changes: - tools/environments/local.py: before the login-shell snapshot bootstrap runs, prepend guarded 'source <file>' lines for each resolved init file. Missing files are skipped, each source is wrapped with a '[ -r ... ] && . ... \|\| true' guard so a broken rc can't abort the bootstrap. - hermes_cli/config.py: new terminal.shell_init_files (explicit list, supports ~ and ${VAR}) and terminal.auto_source_bashrc (default on) knobs. When shell_init_files is set it takes precedence; when it's empty and auto_source_bashrc is on, ~/.bashrc gets auto-sourced. - tests/tools/test_local_shell_init.py: 10 tests covering the resolver (auto-bashrc, missing file, explicit override, ~/${VAR} expansion, opt-out) and the prelude builder (quoting, guarded sourcing), plus a real-LocalEnvironment snapshot test that confirms exports in the init file land in subsequent commands' environment. - website/docs/reference/faq.md: documents the fix in Troubleshooting, including the zsh-user pattern of sourcing ~/.zshrc or nvm.sh directly via shell_init_files. Validation: 10/10 new tests pass; tests/tools/test_local_*.py 40/40 pass; tests/agent/ 1591/1591 pass; tests/hermes_cli/test_config.py 50/50 pass. E2E in an isolated HERMES_HOME: confirmed that a fake ~/.bashrc setting a marker var and PATH addition shows up in a real LocalEnvironment().execute() call, that auto_source_bashrc=false suppresses it, that an explicit shell_init_files entry wins over the auto default, and that a missing bashrc is silently skipped.	2026-04-21 00:39:19 -07:00
helix4u	b48ea41d27	feat(voice): add cli beep toggle	2026-04-21 00:29:29 -07:00
Teknium	62cbeb6367	test: stop testing mutable data — convert change-detectors to invariants (#13363 ) Catalog snapshots, config version literals, and enumeration counts are data that changes as designed. Tests that assert on those values add no behavioral coverage — they just break CI on every routine update and cost engineering time to 'fix.' Replace with invariants where one exists, delete where none does. Deleted (pure snapshots): - TestMinimaxModelCatalog (3 tests): 'MiniMax-M2.7 in models' et al - TestGeminiModelCatalog: 'gemini-2.5-pro in models', 'gemini-3.x in models' - test_browser_camofox_state::test_config_version_matches_current_schema (docstring literally said it would break on unrelated bumps) Relaxed (keep plumbing check, drop snapshot): - Xiaomi / Arcee / Kimi moonshot / Kimi coding / HuggingFace static lists: now assert 'provider exists and has >= 1 entry' instead of specific names - HuggingFace main/models.py consistency test: drop 'len >= 6' floor Dynamicized (follow source, not a literal): - 3x test_config.py migration tests: raw['_config_version'] == DEFAULT_CONFIG['_config_version'] instead of hardcoded 21 Fixed stale tests against intentional behavior changes: - test_insights::test_gateway_format_hides_cost: name matches new behavior (no dollar figures); remove contradicting '$' in text assertion - test_config::prefers_api_then_url_then_base_url: flipped per PR #9332; rename + update to base_url > url > api - test_anthropic_adapter: relax assert_called_once() (xdist-flaky) to assert called — contract is 'credential flowed through' - test_interrupt_propagation: add provider/model/_base_url to bare-agent fixture so the stale-timeout code path resolves Fixed stale integration tests against opt-in plugin gate: - transform_tool_result + transform_terminal_output: write plugins.enabled allow-list to config.yaml and reset the plugin manager singleton Source fix (real consistency invariant): - agent/model_metadata.py: add moonshotai/Kimi-K2.6 context length (262144, same as K2.5). test_model_metadata_has_context_lengths was correctly catching the gap. Policy: - AGENTS.md Testing section: new subsection 'Don't write change-detector tests' with do/don't examples. Reviewers should reject catalog-snapshot assertions in new tests. Covers every test that failed on the last completed main CI run (24703345583) except test_modal_sandbox_fixes::test_terminal_tool_present + test_terminal_and_file_toolsets_resolve_all_tools, which now pass both alone and with the full tests/tools/ directory (xdist ordering flake that resolved itself).	2026-04-20 23:20:33 -07:00
Junass1	735996d2ad	fix(tools/delegate): propagate resolved ACP runtime settings to child agents	2026-04-20 20:47:01 -07:00
cdanis	4a424f1fbb	feat(send_message): add media delivery support for Signal Cherry-picked from PR #13159 by @cdanis. Adds native media attachment delivery to Signal via signal-cli JSON-RPC attachments param. Signal messages with media now follow the same early-return pattern as Telegram/Discord/Matrix — attachments are sent only with the last chunk to avoid duplicates. Follow-up fixes on top of the original PR: - Moved Signal into its own early-return block above the restriction check (matches Telegram/Discord/Matrix pattern) - Fixed media_files being sent on every chunk in the generic loop - Restored restriction/warning guards to simple form (Signal exits early) - Fixed non-hermetic test writing to /tmp instead of tmp_path	2026-04-20 13:24:15 -07:00
Teknium	5a2118a70b	test: add _resolve_path tests + AUTHOR_MAP entry for aniruddhaadak80	2026-04-20 12:29:31 -07:00
Mibayy	3273f301b7	fix(stt): map cloud-only model names to valid local size for faster-whisper (#2544 ) Cherry-picked from PR #2545 by @Mibayy. The setup wizard could leave stt.model: "whisper-1" in config.yaml. When using the local faster-whisper provider, this crashed with "Invalid model size 'whisper-1'". Voice messages were silently ignored. _normalize_local_model() now detects cloud-only names (whisper-1, gpt-4o-transcribe, etc.) and maps them to the default local model with a warning. Valid local sizes (tiny, base, small, medium, large-v3) pass through unchanged. - Renamed _normalize_local_command_model -> _normalize_local_model (backward-compat wrapper preserved) - 6 new tests including integration test - Added lowercase AUTHOR_MAP alias for @Mibayy Closes #2544	2026-04-20 05:18:48 -07:00
Teknium	04068c5891	feat(plugins): add transform_tool_result hook for generic tool-result rewriting (#12972 ) Closes #8933 more fully, extending the per-tool transform_terminal_output hook from #12929 to a generic seam that fires after every tool dispatch. Plugins can rewrite any tool's result string (normalize formats, redact fields, summarize verbose output) without wrapping individual tools. Changes - hermes_cli/plugins.py: add "transform_tool_result" to VALID_HOOKS - model_tools.py: invoke the hook in handle_function_call after post_tool_call (which remains observational); first valid str return replaces the result; fail-open - tests/test_transform_tool_result_hook.py: 9 new tests covering no-op, None return, non-string return, first-match wins, kwargs, hook exception fallback, post_tool_call observation invariant, ordering vs post_tool_call, and an end-to-end real-plugin integration - tests/hermes_cli/test_plugins.py: assert new hook in VALID_HOOKS - tests/test_model_tools.py: extend the hook-call-sequence assertion to include the new hook Design - transform_tool_result runs AFTER post_tool_call so observers always see the original (untransformed) result. This keeps post_tool_call's observational contract. - transform_terminal_output (from #12929) still runs earlier, inside terminal_tool, so plugins can canonicalize BEFORE the 50k truncation drops middle content. Both hooks coexist; they target different layers.	2026-04-20 03:48:08 -07:00
Alexazhu	64a1368210	fix(tools): keep SSH ControlMaster socket path under macOS 104-byte limit On macOS, Unix domain socket paths are capped at 104 bytes (sun_path). SSH appends a 16-byte random suffix to the ControlPath when operating in ControlMaster mode. With an IPv6 host embedded literally in the filename and a deeply-nested macOS $TMPDIR like /var/folders/XX/YYYYYYYYYYYY/T/, the full path reliably exceeds the limit — every terminal/file-op tool call then fails immediately with ``unix_listener: path "…" too long for Unix domain socket``. Swap the ``user@host:port.sock`` filename for a sha256-derived 16-char hex digest. The digest is deterministic for a given (user, host, port) triple, so ControlMaster reuse across reconnects is preserved, and the full path fits comfortably under the limit even after SSH's random suffix. Collision space is 2^64 — effectively unreachable for the handful of concurrent connections any single Hermes process holds. Regression tests cover: path length under realistic macOS $TMPDIR with the IPv6 host from the issue report, determinism for reconnects, and distinctness across different (user, host, port) triples. Closes #11840	2026-04-20 03:07:32 -07:00
sjz-ks	2081b71c42	feat(tools): add terminal output transform hook	2026-04-20 03:04:06 -07:00
Teknium	be472138f3	fix(send_message): accept E.164 phone numbers for signal/sms/whatsapp (#12936 ) Follow-up to #12704. The SignalAdapter can resolve +E164 numbers to UUIDs via listContacts, but _parse_target_ref() in the send_message tool rejected '+' as non-digit and fell through to channel-name resolution — which fails for contacts without a prior session entry. Adds an E.164 branch in _parse_target_ref for phone-based platforms (signal, sms, whatsapp) that preserves the leading '+' so downstream adapters keep the format they expect. Non-phone platforms are unaffected. Reported by @qdrop17 on Discord after pulling #12704.	2026-04-20 03:02:44 -07:00
teyrebaz33	2d59afd3da	fix(docker): pass docker_mount_cwd_to_workspace and docker_forward_env to container_config in file_tools file_tools._get_file_ops() built a container_config dict for Docker/ Singularity/Modal/Daytona backends but omitted docker_mount_cwd_to_workspace and docker_forward_env. Both are read by _create_environment() from container_config, so file tools (read_file, write_file, patch, search) silently ignored those config values when running in Docker. Add the two missing keys to match the container_config already built by terminal_tool.terminal_tool(). Fixes #2672.	2026-04-20 00:58:16 -07:00
helix4u	6ab78401c9	fix(aux): add session_search extra_body and concurrency controls Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy OpenAI-compatible providers can receive provider-specific request fields (e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds session_search summary fan-out with auxiliary.session_search.max_concurrency (default 3, clamped 1-5) to avoid 429 bursts on small providers. - agent/auxiliary_client.py: extract _get_auxiliary_task_config helper, add _get_task_extra_body, merge config+explicit extra_body with explicit winning - hermes_cli/config.py: extra_body defaults on all aux tasks + session_search.max_concurrency; _config_version 19 -> 20 - tools/session_search_tool.py: semaphore around _summarize_all gather - tests: coverage in test_auxiliary_client, test_session_search, test_aux_config - docs: user-guide/configuration.md + fallback-providers.md Co-authored-by: Teknium <teknium@nousresearch.com>	2026-04-20 00:47:39 -07:00
kshitijk4poor	fd5df5fe8e	fix(camofox): honor auxiliary vision temperature\n\n- forward auxiliary.vision.temperature in camofox screenshot analysis\n- add regression tests for configured and default behavior	2026-04-20 00:32:09 -07:00
kshitijk4poor	9d88bdaf11	fix(browser): honor auxiliary.vision.temperature for screenshot analysis\n\n- mirror the vision tool's config bridge in browser_vision - add regression tests for configured and default temperature forwarding	2026-04-20 00:32:09 -07:00
kshitijk4poor	098d554aac	test: cover vision config temperature wiring\n\n- add regression tests for auxiliary.vision.temperature and timeout\n- add bugkill3r to AUTHOR_MAP for the salvaged commit	2026-04-20 00:32:09 -07:00
Teknium	323e827f4a	test: remove 8 flaky tests that fail under parallel xdist scheduling (#12784 ) These tests all pass in isolation but fail in CI due to test-ordering pollution on shared xdist workers. Each has a different root cause: - tests/tools/test_send_message_tool.py (4 tests): racing session ContextVar pollution — get_session_env returns '' instead of 'cli' default when an earlier test on the same worker leaves HERMES_SESSION_PLATFORM set. - tests/tools/test_skills_tool.py (2 tests): KeyError: 'gateway_setup_hint' from shared skill state mutation. - tests/tools/test_tts_mistral.py::test_telegram_produces_ogg_and_voice_compatible: pre-existing intermittent failure. - tests/hermes_cli/test_update_check.py::test_get_update_result_timeout: racing a background git-fetch thread that writes a real commits-behind value into module-level _update_result before assertion. All 8 have been failing on main for multiple runs with no clear path to a safe fix that doesn't require restructuring the tests' isolation story. Removing is cheaper than chasing — the code paths they cover are exercised elsewhere (send_message has 73+ other tests, skills_tool has extensive coverage, TTS has other backend tests, update check has other tests for check_for_updates proper). Validation: all 4 files now pass cleanly: 169/169 under CI-parity env.	2026-04-19 19:38:02 -07:00
Teknium	c9b833feb3	fix(ci): unblock test suite + cut ~2s of dead Z.AI probes from every AIAgent CI on main had 7 failing tests. Five were stale test fixtures; one (agent cache spillover timeout) was covering up a real perf regression in AIAgent construction. The perf bug: every AIAgent.__init__ calls _check_compression_model_feasibility → resolve_provider_client('auto') → _resolve_api_key_provider which iterates PROVIDER_REGISTRY. When it hits 'zai', it unconditionally calls resolve_api_key_provider_credentials → _resolve_zai_base_url → probes 8 Z.AI endpoints with an empty Bearer token (all 401s), ~2s of pure latency per agent, even when the user has never touched Z.AI. Landed in `9e844160` (PR for credential-pool Z.AI auto-detect) — the short-circuit when api_key is empty was missing. _resolve_kimi_base_url had the same shape; fixed too. Test fixes: - tests/gateway/test_voice_command.py: _make_adapter helpers were missing self._voice_locks (added in PR #12644, 7 call sites — all updated). - tests/test_toolsets.py: test_hermes_platforms_share_core_tools asserted equality, but hermes-discord has discord_server (DISCORD_BOT_TOKEN-gated, discord-only by design). Switched to subset check. - tests/run_agent/test_streaming.py: test_tool_name_not_duplicated_when_resent_per_chunk missing api_key/base_url — classic pitfall (PR #11619 fixed 16 of these; this one slipped through on a later commit). - tests/tools/test_discord_tool.py: TestConfigAllowlist caplog assertions fail in parallel runs because AIAgent(quiet_mode=True) globally sets logging.getLogger('tools').setLevel(ERROR) and xdist workers are persistent. Autouse fixture resets the 'tools' and 'tools.discord_tool' levels per test. Validation: tests/cron + voice + agent_cache + streaming + toolsets + command_guards + discord_tool: 550/550 pass tests/hermes_cli + tests/gateway: 5713/5713 pass AIAgent construction without Z.AI creds: 2.2s → 0.24s (9x)	2026-04-19 19:18:19 -07:00
handsdiff	abfc1847b7	fix(terminal): rewrite `A && B &` to `A && { B & }` to prevent subshell leak bash parses `A && B &` with `&&` tighter than `&`, so it forks a subshell for the compound and backgrounds the subshell. Inside the subshell, B runs foreground, so the subshell waits for B. When B is a process that doesn't naturally exit (`python3 -m http.server`, `yes > /dev/null`, a long-running daemon), the subshell is stuck in `wait4` forever and leaks as an orphan reparented to init. Observed in production: agents running `cd X && python3 -m http.server 8000 &>/dev/null & sleep 1 && curl ...` as a "start a local server, then verify it" one-liner. Outer bash exits cleanly; the subshell never does. Across ~3 days of use, 8 unique stuck-terminal events and 7 leaked bash+server pairs accumulated on the fleet, with some sessions appearing hung from the user's perspective because the subshell's open stdout pipe kept the terminal tool's drain thread blocked. This is distinct from the `set +m` fix in `933fbd8f` (which addressed interactive-shell job-control waiting at exit). `set +m` doesn't help here because `bash -c` is non-interactive and job control is already off; the problem is the subshell's own internal wait for its foreground B, not the outer shell's job-tracking. The fix: walk the command shell-aware (respecting quotes, parens, brace groups, `&>`/`>&` redirects), find `A && B &` / `A \|\| B &` at depth 0 and rewrite the tail to `A && { B & }`. Brace groups don't fork a subshell — they run in the current shell. `B &` inside the group is a simple background (no subshell wait). The outer `&` is absorbed into the group, so the compound no longer needs an explicit subshell. `&&` error-propagation is preserved exactly: if A fails, `&&` short-circuits and B never runs. - Skips quoted strings, comment lines, and `(…)` subshells - Handles `&>/dev/null`, `2>&1`, `>&2` without mistaking them for `&` - Resets chain state at `;`, `\|`, and newlines - Tracks brace depth so already-rewritten output is idempotent - Walks using the existing `_read_shell_token` tokenizer, matching the pattern of `_rewrite_real_sudo_invocations` Called once from `BaseEnvironment.execute` right after `_prepare_command`, so it runs for every backend (local, ssh, docker, modal, etc.) with no per-backend plumbing. 34 new tests covering rewrite cases, preservation cases, redirect edge-cases, quoting/parens/backticks, idempotency, and empty/edge inputs. End-to-end verified on a test VM: the exact vela-incident command now returns in ~1.3s with no leaked bash, only the intentional backgrounded server. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-19 16:53:11 -07:00
etherman-os	d50a9b20d2	terminal: steer long-lived server commands to background mode	2026-04-19 16:47:20 -07:00
Teknium	a3a4932405	fix(mcp-oauth): bidirectional auth_flow bridge + absolute expires_at (salvage #12025 ) (#12717 ) * [verified] fix(mcp-oauth): bridge httpx auth_flow bidirectional generator HermesMCPOAuthProvider.async_auth_flow wrapped the SDK's auth_flow with 'async for item in super().async_auth_flow(request): yield item', which discards httpx's .asend(response) values and resumes the inner generator with None. This broke every OAuth MCP server on the first HTTP response with 'NoneType' object has no attribute 'status_code' crashing at mcp/client/auth/oauth2.py:505. Replace with a manual bridge that forwards .asend() values into the inner generator, preserving httpx's bidirectional auth_flow contract. Add tests/tools/test_mcp_oauth_bidirectional.py with two regression tests that drive the flow through real .asend() round-trips. These catch the bug at the unit level; prior tests only exercised _initialize() and disk-watching, never the full generator protocol. Verified against BetterStack MCP: Before: 'Connection failed (11564ms): NoneType...' after 3 retries After: 'Connected (2416ms); Tools discovered: 83' Regression from #11383. * [verified] fix(mcp-oauth): seed token_expiry_time + pre-flight AS discovery on cold-load PR #11383's consolidation fixed external-refresh reloading and 401 dedup but left two latent bugs that surfaced on BetterStack and any other OAuth MCP with a split-origin authorization server: 1. HermesTokenStorage persisted only a relative 'expires_in', which is meaningless after a process restart. The MCP SDK's OAuthContext does NOT seed token_expiry_time in _initialize, so is_token_valid() returned True for any reloaded token regardless of age. Expired tokens shipped to servers, and app-level auth failures (e.g. BetterStack's 'No teams found. Please check your authentication.') were invisible to the transport-layer 401 handler. 2. Even once preemptive refresh did fire, the SDK's _refresh_token falls back to {server_url}/token when oauth_metadata isn't cached. For providers whose AS is at a different origin (BetterStack: mcp.betterstack.com for MCP, betterstack.com/oauth/token for the token endpoint), that fallback 404s and drops into full browser re-auth on every process restart. Fix set: - HermesTokenStorage.set_tokens persists an absolute wall-clock expires_at alongside the SDK's OAuthToken JSON (time.time() + TTL at write time). - HermesTokenStorage.get_tokens reconstructs expires_in from max(expires_at - now, 0), clamping expired tokens to zero TTL. Legacy files without expires_at fall back to file-mtime as a best-effort wall-clock proxy, self-healing on the next set_tokens. - HermesMCPOAuthProvider._initialize calls super(), then update_token_expiry on the reloaded tokens so token_expiry_time reflects actual remaining TTL. If tokens are loaded but oauth_metadata is missing, pre-flight PRM + ASM discovery runs via httpx.AsyncClient using the MCP SDK's own URL builders and response handlers (build_protected_resource_metadata_discovery_urls, handle_auth_metadata_response, etc.) so the SDK sees the correct token_endpoint before the first refresh attempt. Pre-flight is skipped when there are no stored tokens to keep fresh-install paths zero-cost. Test coverage (tests/tools/test_mcp_oauth_cold_load_expiry.py): - set_tokens persists absolute expires_at - set_tokens skips expires_at when token has no expires_in - get_tokens round-trips expires_at -> remaining expires_in - expired tokens reload with expires_in=0 - legacy files without expires_at fall back to mtime proxy - _initialize seeds token_expiry_time from stored tokens - _initialize flags expired-on-disk tokens as is_token_valid=False - _initialize pre-flights PRM + ASM discovery with mock transport - _initialize skips pre-flight when no tokens are stored Verified against BetterStack MCP: hermes mcp test betterstack -> Connected (2508ms), 83 tools mcp_betterstack_telemetry_list_teams_tool -> real team data, not 'No teams found. Please check your authentication.' Reference: mcp-oauth-token-diagnosis skill, Fix A. * chore: map hermes@noushq.ai to benbarclay in AUTHOR_MAP Needed for CI attribution check on cherry-picked commits from PR #12025. --------- Co-authored-by: Hermes Agent <hermes@noushq.ai>	2026-04-19 16:31:07 -07:00
Teknium	aa5bd09232	fix(tests): unstick CI — sweep stale tests from recent merges (#12670 ) One source fix (web_server category merge) + five test updates that didn't travel with their feature PRs. All 13 failures on the 04-19 CI run on main are now accounted for (5 already self-healed on main; 8 fixed here). Changes - web_server.py: add code_execution → agent to _CATEGORY_MERGE (new singleton section from #11971 broke no-single-field-category invariant). - test_browser_camofox_state: bump hardcoded _config_version 18 → 19 (also from #11971). - test_registry: add browser_cdp_tool (#12369) and discord_tool (#4753) to the expected built-in tool set. - test_run_agent::test_tool_call_accumulation: rewrite fragment chunks — #0f778f77 switched streaming name-accumulation from += to = to fix MiniMax/NIM duplication; the test still encoded the old fragment-per-chunk premise. - test_concurrent_interrupt::_Stub: no-op _apply_pending_steer_to_tool_results — #12116 added this call after concurrent tool batches; the hand-rolled stub was missing it. - test_codex_cli_model_picker: drop the two obsolete tests that asserted auto-import from ~/.codex/auth.json into the Hermes auth store. #12360 explicitly removed that behavior (refresh-token reuse races with Codex CLI / VS Code); adoption is now explicit via `hermes auth openai-codex`. Remaining 3 tests in the file (normal path, Claude Code fallback, negative case) still cover the picker. Validation - scripts/run_tests.sh across all 6 affected files + surrounding tests (54 tests total) all green locally.	2026-04-19 12:39:58 -07:00
Teknium	d2c2e34469	fix(patch): catch silent persistence failures and escape-drift in tool-call transport (#12669 ) Two hardening layers in the patch tool, triggered by a real silent failure in the previous session: (1) Post-write verification in patch_replace — after write_file succeeds, re-read the file and confirm the bytes on disk match the intended write. If not, return an error instead of the current success-with-diff. Catches silent persistence failures from any cause (backend FS oddities, stdin pipe truncation, concurrent task races, mount drift). (2) Escape-drift guard in fuzzy_find_and_replace — when a non-exact strategy matches and both old_string and new_string contain literal \' or \" sequences but the matched file region does not, reject the patch with a clear error pointing at the likely cause (tool-call serialization adding a spurious backslash around apostrophes/quotes). Exact matches bypass the guard, and legitimate edits that add or preserve escape sequences in files that already have them still work. Why: in a prior tool call, old_string was sent with \' where the file has ' (tool-call transport drift). The fuzzy matcher's block_anchor strategy matched anyway and produced a diff the tool reported as successful — but the file was never modified on disk. The agent moved on believing the edit landed when it hadn't. Tests: added TestPatchReplacePostWriteVerification (3 cases) and TestEscapeDriftGuard (6 cases). All pass, existing fuzzy match and file_operations tests unaffected.	2026-04-19 12:27:34 -07:00
Teknium	ef73367fc5	feat: add Discord server introspection and management tool (#4753 ) * feat: add Discord server introspection and management tool Add a discord_server tool that gives the agent the ability to interact with Discord servers when running on the Discord gateway. Uses Discord REST API directly with the bot token — no dependency on the gateway adapter's discord.py client. The tool is only included in the hermes-discord toolset (zero cost for users on other platforms) and gated on DISCORD_BOT_TOKEN via check_fn. Actions (14): - Introspection: list_guilds, server_info, list_channels, channel_info, list_roles, member_info, search_members - Messages: fetch_messages, list_pins, pin_message, unpin_message - Management: create_thread, add_role, remove_role This addresses a gap where users on Discord could not ask Hermes to review server structure, channels, roles, or members — a task competing agents (OpenClaw) handle out of the box. Files changed: - tools/discord_tool.py (new): Tool implementation + registration - model_tools.py: Add to discovery list - toolsets.py: Add to hermes-discord toolset only - tests/tools/test_discord_tool.py (new): 43 tests covering all actions, validation, error handling, registration, and toolset scoping * feat(discord): intent-aware schema filtering + config allowlist + schema cleanup - _detect_capabilities() hits GET /applications/@me once per process to read GUILD_MEMBERS / MESSAGE_CONTENT privileged intent bits. - Schema is rebuilt per-session in model_tools.get_tool_definitions: hides search_members / member_info when GUILD_MEMBERS intent is off, annotates fetch_messages description when MESSAGE_CONTENT is off. - New config key discord.server_actions (comma-separated or YAML list) lets users restrict which actions the agent can call, intersected with intent availability. Unknown names are warned and dropped. - Defense-in-depth: runtime handler re-checks the allowlist so a stale cached schema cannot bypass a tightened config. - Schema description rewritten as an action-first manifest (signature per action) instead of per-parameter 'required for X, Y, Z' cross-refs. ~25% shorter; model can see each action's required params at a glance. - Added bounds: limit gets minimum=1 maximum=100, auto_archive_duration becomes an enum of the 4 valid Discord values. - 403 enrichment: runtime 403 errors are mapped to actionable guidance (which permission is missing and what to do about it) instead of the raw Discord error body. - 36 new tests: capability detection with caching and force refresh, config allowlist parsing (string/list/invalid/unknown), intent+allowlist intersection, dynamic schema build, runtime allowlist enforcement, 403 enrichment, and model_tools integration wiring.	2026-04-19 11:52:19 -07:00
Teknium	f336ae3d7d	fix(environments): use incremental UTF-8 decoder in select-based drain The first draft of the fix called `chunk.decode("utf-8")` directly on each 4096-byte `os.read()` result, which corrupts output whenever a multi-byte UTF-8 character straddles a read boundary: * `UnicodeDecodeError` fires on the valid-but-truncated byte sequence. * The except handler clears ALL previously-decoded output and replaces the whole buffer with `[binary output detected ...]`. Empirically: 10000 '日' chars (30001 bytes) through the wrapper loses all 10000 characters on the first draft; the baseline TextIOWrapper drain (which uses `encoding='utf-8', errors='replace'` on Popen) preserves them all. This regression affects any command emitting non-ASCII output larger than one chunk — CJK/Arabic/emoji in `npm install`, `pip install`, `docker logs`, `kubectl logs`, etc. Fix: swap to `codecs.getincrementaldecoder('utf-8')(errors='replace')`, which buffers partial multi-byte sequences across chunks and substitutes U+FFFD for genuinely invalid bytes. Flush on drain exit via `decoder.decode(b'', final=True)` to emit any trailing replacement character for a dangling partial sequence. Adds two regression tests: * test_utf8_multibyte_across_read_boundary — 10000 U+65E5 chars, verifies count round-trips and no fallback fires. * test_invalid_utf8_uses_replacement_not_fallback — deliberate \xff\xfe between valid ASCII, verifies surrounding text survives.	2026-04-19 11:27:50 -07:00
Teknium	0a02fbd842	fix(environments): prevent terminal hang when commands background children (#8340 ) When a user's command backgrounds a child (`cmd &`, `setsid cmd & disown`, etc.), the backgrounded grandchild inherits the write-end of our stdout pipe via fork(). The old `for line in proc.stdout` drain never EOF'd until the grandchild closed the pipe — so for a uvicorn server, the terminal tool hung indefinitely (users reported the whole session deadlocking when asking the agent to restart a backend). Fix: switch _drain() to select()-based non-blocking reads and stop draining shortly after bash exits even if the pipe hasn't EOF'd. Any output the grandchild writes after that point goes to an orphaned pipe, which is exactly what the user asked for when they said '&'. Adds regression tests covering the issue's exact repro and 5 related patterns (plain bg, setsid+disown, streaming output, high volume, timeout, UTF-8).	2026-04-19 11:27:50 -07:00
Teknium	ea0bd81b84	feat(skills): consolidate find-nearby into maps as a single location skill find-nearby and the (new) maps optional skill both used OpenStreetMap's Overpass + Nominatim to answer the same question — 'what's near this location?' — so shipping both would be duplicate code for overlapping capability. Consolidate into one active-by-default skill at skills/productivity/maps/ that is a strict superset of find-nearby. Moves + deletions: - optional-skills/productivity/maps/ → skills/productivity/maps/ (active, no install step needed) - skills/leisure/find-nearby/ → DELETED (fully superseded) Upgrades to maps_client.py so it covers everything find-nearby did: - Overpass server failover — tries overpass-api.de then overpass.kumi.systems so a single-mirror outage doesn't break the skill (new overpass_query helper, used by both nearby and bbox) - nearby now accepts --near "<address>" as a shortcut that auto-geocodes, so one command replaces the old 'search → copy coords → nearby' chain - nearby now accepts --category (repeatable) for multi-type queries in one call (e.g. --category restaurant --category bar), results merged and deduped by (osm_type, osm_id), sorted by distance, capped at --limit - Each nearby result now includes maps_url (clickable Google Maps search link) and directions_url (Google Maps directions from the search point — only when a ref point is known) - Promoted commonly-useful OSM tags to top-level fields on each result: cuisine, hours (opening_hours), phone, website — instead of forcing callers to dig into the raw tags dict SKILL.md: - Version bumped 1.1.0 → 1.2.0, description rewritten to lead with capability surface - New 'Working With Telegram Location Pins' section replacing find-nearby's equivalent workflow - metadata.hermes.supersedes: [find-nearby] so tooling can flag any lingering references to the old skill External references updated: - optional-skills/productivity/telephony/SKILL.md — related_skills find-nearby → maps - website/docs/reference/skills-catalog.md — removed the (now-empty) 'leisure' section, added 'maps' row under productivity - website/docs/user-guide/features/cron.md — find-nearby example usages swapped to maps - tests/tools/test_cronjob_tools.py, tests/hermes_cli/test_cron.py, tests/cron/test_scheduler.py — fixture string values swapped - cli.py:5290 — /cron help-hint example swapped Not touched: - RELEASE_v0.2.0.md — historical record, left intact E2E-verified live (Nominatim + Overpass, one query each): - nearby --near "Times Square" --category restaurant --category bar → 3 results, sorted by distance, all with maps_url, directions_url, cuisine, phone, website where OSM had the tags All 111 targeted tests pass across tests/cron/, tests/tools/, tests/hermes_cli/.	2026-04-19 05:19:22 -07:00
Teknium	ce410521b3	feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369 ) Agents can now send arbitrary CDP commands to the browser. The tool is gated on a reachable CDP endpoint at session start — it only appears in the toolset when BROWSER_CDP_URL is set (from '/browser connect') or 'browser.cdp_url' is configured in config.yaml. Backends that don't currently expose CDP to the Python side (Camofox, default local agent-browser, cloud providers whose per-session cdp_url is not yet surfaced) do not see the tool at all. Tool schema description links to the CDP method reference at https://chromedevtools.github.io/devtools-protocol/ so the agent can web_extract specific method docs on demand. Stateless per call. Browser-level methods (Target., Browser., Storage.*) omit target_id. Page-level methods attach to the target with flatten=true and dispatch the method on the returned sessionId. Clean errors when the endpoint becomes unreachable mid-session or the URL isn't a WebSocket. Tests: 19 unit (mock CDP server + gate checks) + E2E against real headless Chrome (Target.getTargets, Browser.getVersion, Runtime.evaluate with target_id, Page.navigate + re-eval, bogus method, bogus target_id, missing endpoint) + E2E of the check_fn gate (tool hidden without CDP URL, visible with it, hidden again after unset).	2026-04-19 00:03:10 -07:00
Teknium	762f7e9796	feat: configurable approval mode for cron jobs (approvals.cron_mode) Add approvals.cron_mode config option that controls how cron jobs handle dangerous commands. Previously, cron jobs silently auto-approved all dangerous commands because there was no user present to approve them. Now the behavior is configurable: - deny (default): block dangerous commands and return a message telling the agent to find an alternative approach. The agent loop continues — it just can't use that specific command. - approve: auto-approve all dangerous commands (previous behavior). When a command is blocked, the agent receives the same response format as a user denial in the CLI — exit_code=-1, status=blocked, with a message explaining why and pointing to the config option. This keeps the agent loop running and encourages it to adapt. Implementation: - config.py: add approvals.cron_mode to DEFAULT_CONFIG - scheduler.py: set HERMES_CRON_SESSION=1 env var before agent runs - approval.py: both check_command_approval() and check_all_command_guards() now check for cron sessions and apply the configured mode - 21 new tests covering config parsing, deny/approve behavior, and interaction with other bypass mechanisms (yolo, containers)	2026-04-18 19:24:35 -07:00
Teknium	285bb2b915	feat(execute_code): add project/strict execution modes, default to project (#11971 ) Weaker models (Gemma-class) repeatedly rediscover and forget that execute_code uses a different CWD and Python interpreter than terminal(), causing them to flip-flop on whether user files exist and to hit import errors on project dependencies like pandas. Adds a new 'code_execution.mode' config key (default 'project') that brings execute_code into line with terminal()'s filesystem/interpreter: project (new default): - cwd = session's TERMINAL_CWD (falls back to os.getcwd()) - python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python with a Python 3.8+ version check; falls back cleanly to sys.executable if no venv or the candidate fails - result : 'import pandas' works, '.env' resolves, matches terminal() strict (opt-in): - cwd = staging tmpdir (today's behavior) - python = sys.executable (today's behavior) - result : maximum reproducibility and isolation; project deps won't resolve Security-critical invariants are identical across both modes and covered by explicit regression tests: - env scrubbing (strips _API_KEY, _TOKEN, _SECRET, _PASSWORD, _CREDENTIAL, _PASSWD, *_AUTH substrings) - SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no delegate_task, no MCP from inside scripts) - resource caps (5-min timeout, 50KB stdout, 50 tool calls) Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool descriptions (regression from commit `39b83f34` where agents on local backends falsely believed they were sandboxed and refused networking). Override via env var: HERMES_EXECUTE_CODE_MODE=strict\|project	2026-04-18 01:46:25 -07:00
Teknium	598cba62ad	test: update stale tests to match current code (#11963 ) Seven test files were asserting against older function signatures and behaviors. CI has been red on main because of accumulated test debt from other PRs; this catches the tests up. - tests/agent/test_subagent_progress.py: _build_child_progress_callback now takes (task_index, goal, parent_agent, task_count=1); update all call sites and rewrite tests that assumed the old 'batch-only' relay semantics (now relays per-tool AND flushes a summary at BATCH_SIZE). Renamed test_thinking_not_relayed_to_gateway → test_thinking_relayed_to_gateway since thinking IS now relayed as subagent.thinking. - tests/tools/test_delegate.py: _build_child_agent now requires task_count; add task_count=1 to all 8 call sites. - tests/cli/test_reasoning_command.py: AIAgent gained _stream_callback; stub it on the two test agent helpers that use spec=AIAgent / __new__. - tests/hermes_cli/test_cmd_update.py: cmd_update now runs npm install in repo root + ui-tui/ + web/ and 'npm run build' in web/; assert all four subprocess calls in the expected order. - tests/hermes_cli/test_model_validation.py: dissimilar unknown models now return accepted=False (previously True with warning); update both affected tests. - tests/tools/test_registry.py: include feishu_doc_tool and feishu_drive_tool in the expected builtin tool set. - tests/gateway/test_voice_command.py: missing-voice-deps message now suggests 'pip install PyNaCl' not 'hermes-agent[messaging]'. 411/411 pass locally across these 7 files.	2026-04-17 21:35:30 -07:00
Teknium	20f2258f34	fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907 ) * fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace interrupt() previously only flagged the agent's _execution_thread_id. Tools running inside _execute_tool_calls_concurrent execute on ThreadPoolExecutor worker threads whose tids are distinct from the agent's, so is_interrupted() inside those tools returned False no matter how many times the gateway called .interrupt() — hung ssh / curl / long make-builds ran to their own timeout. Changes: - run_agent.py: track concurrent-tool worker tids in a per-agent set, fan interrupt()/clear_interrupt() out to them, and handle the register-after-interrupt race at _run_tool entry. getattr fallback for the tracker so test stubs built via object.__new__ keep working. - tools/environments/base.py: opt-in _wait_for_process trace (ENTER, per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1. - tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target tid, set snapshot) behind the same env flag. - tests: new regression test runs a polling tool on a concurrent worker and asserts is_interrupted() flips to True within ~1s of interrupt(). Second new test guards clear_interrupt() clearing tracked worker bits. Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env subset 216 pass. * fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when quiet_mode=True (the CLI default). This would silently swallow every INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation added in the parent commit — confirmed by running hermes chat -q with the flag and finding zero trace lines in agent.log even though _wait_for_process was clearly executing (subprocess pid existed). Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets its own logger level to INFO at import time, overriding the 'tools' parent-level filter. Scoped to the opt-in case only, so production (quiet_mode default) logs stay quiet as designed. Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes '_wait_for_process ENTER/EXIT' lines to agent.log as expected. * fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses Tool subprocesses spawned by the local environment backend use os.setsid so they run in their own process group. Before this fix, SIGTERM/SIGHUP to the hermes CLI killed the main thread via KeyboardInterrupt but the worker thread running _wait_for_process never got a chance to call _kill_process — Python exited, the child was reparented to init (PPID=1), and the subprocess ran to its natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM to the agent until manual cleanup). Changes: - cli.py _signal_handler (interactive) + _signal_handler_q (-q mode): route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll loop sees the per-thread interrupt flag and calls _kill_process (os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default 1.5s) gives the worker time to complete its SIGTERM+SIGKILL escalation before KeyboardInterrupt unwinds main. - tools/environments/base.py _wait_for_process: wrap the poll loop in try/except (KeyboardInterrupt, SystemExit) so the cleanup fires even on paths the signal handlers don't cover (direct sys.exit, unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace line when HERMES_DEBUG_INTERRUPT=1. - New regression test: injects KeyboardInterrupt into a running _wait_for_process via PyThreadState_SetAsyncExc, verifies the subprocess process group is dead within 3s of the exception and that KeyboardInterrupt re-raises cleanly afterward. Validation: \| Before \| After \| \|---------------------------------------------------------\|--------------------\| \| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM \| dies within 2 s \| \| No INTERRUPT DETECTED in trace \| INTERRUPT DETECTED fires + killing process group \| \| tests/tools/test_local_interrupt_cleanup \| 1/1 pass \| \| tests/run_agent/test_concurrent_interrupt \| 4/4 pass \|	2026-04-17 20:39:25 -07:00
Teknium	607be54a24	fix(discord): forum channel media + polish Extend forum support from PR #10145: - REST path (_send_discord): forum thread creation now uploads media files as multipart attachments on the starter message in a single call. Previously media files were silently dropped on the forum path. - Websocket media paths (_send_file_attachment, send_voice, send_image, send_animation — covers send_image_file, send_video, send_document transitively): forum channels now go through a new _forum_post_file helper that creates a thread with the file as starter content, instead of failing via channel.send(file=...) which forums reject. - _send_to_forum chunk follow-up failures are collected into raw_response['warnings'] so partial-send outcomes surface. - Process-local probe cache (_DISCORD_CHANNEL_TYPE_PROBE_CACHE) avoids GET /channels/{id} on every uncached send after the first. - Dedup of TestSendDiscordMedia that the PR merge-resolution left behind. - Docs: Forum Channels section under website/docs/user-guide/messaging/discord.md. Tests: 117 passed (22 new for forum+media, probe cache, warnings).	2026-04-17 20:25:48 -07:00
ChimingLiu	e5333e793c	feat(discord): support forum channels	2026-04-17 20:25:48 -07:00
helix4u	148459716c	fix(kimi): cover remaining fixed-temperature bypasses	2026-04-17 20:25:42 -07:00
Teknium	d7ef562a05	fix(file-ops): follow terminal env's live cwd in _exec instead of init-time cached cwd (#11912 ) ShellFileOperations captured the terminal env's cwd at __init__ time and used that stale value for every subsequent _exec() call. When the user ran `cd` via the terminal tool, `env.cwd` updated but `ops.cwd` did not. Relative paths passed to patch_replace / read_file / write_file / search then targeted the ORIGINAL directory instead of the current one. Observed symptom in agent sessions: terminal: cd .worktrees/my-branch patch hermes_cli/main.py <old> <new> → returns {"success": true} with a plausible unified diff → but `git diff` in the worktree shows nothing → the patch landed in the main repo's checkout of main.py instead The diff looked legitimate because patch_replace computes it from the IN-MEMORY content vs new_content, not by re-reading the file. The write itself DID succeed — it just wrote to the wrong directory's copy of the same-named file. Fix: _exec() now resolves cwd from live sources in this order: 1. Explicit `cwd` arg (if provided by the caller) 2. Live `self.env.cwd` (tracks `cd` commands run via terminal) 3. Init-time `self.cwd` (fallback when env has no cwd attribute) Includes a 5-test regression suite covering: - cd followed by relative read follows live cwd - the exact reported bug: patch_replace with relative path after cd - explicit cwd= arg still wins over env.cwd - env without cwd attribute falls back to init-time cwd - patch_replace success reflects real file state (safety rail) Co-authored-by: teknium1 <teknium@nousresearch.com>	2026-04-17 19:26:40 -07:00
liujinkun	85cdb04bd4	feat: add Feishu document comment intelligent reply with 3-tier access control - Full comment handler: parse drive.notice.comment_add_v1 events, build timeline, run agent, deliver reply with chunking support. - 5 tools: feishu_doc_read, feishu_drive_list_comments, feishu_drive_list_comment_replies, feishu_drive_reply_comment, feishu_drive_add_comment. - 3-tier access control rules (exact doc > wildcard "*" > top-level > defaults) with per-field fallback. Config via ~/.hermes/feishu_comment_rules.json, mtime-cached hot-reload. - Self-reply filter using generalized self_open_id (supports future user-identity subscriptions). Receiver check: only process events where the bot is the @mentioned target. - Smart timeline selection, long text chunking, semantic text extraction, session sharing per document, wiki link resolution. Change-Id: I31e82fd6355173dbcc400b8934b6d9799e3137b9	2026-04-17 19:04:11 -07:00
Teknium	304fb921bf	fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843 ) Both fixes close process leaks observed in production (18+ orphaned agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters accumulated over ~3 days, ~2.7 GB RSS). ## agent-browser daemon leak Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran from _start_browser_cleanup_thread, which is only invoked on the first browser tool call in a process. Hermes sessions that never used the browser never swept orphans, and the cross-process orphan detection relied on in-process _active_sessions, which doesn't see other hermes PIDs' sessions (race risk). - Write <session>.owner_pid alongside the socket dir recording the hermes PID that owns the daemon (extracted into _write_owner_pid for direct testability). - Reaper prefers owner_pid liveness over in-process _active_sessions. Cross-process safe: concurrent hermes instances won't reap each other's daemons. Legacy tracked_names fallback kept for daemons that predate owner_pid. - atexit handler (_emergency_cleanup_all_sessions) now always runs the reaper, not just when this process had active sessions — every clean hermes exit sweeps accumulated orphans. ## paste.rs auto-delete leak _schedule_auto_delete spawned a detached Python subprocess per call that slept 6 hours then issued DELETE requests. No dedup, no tracking — every 'hermes debug share' invocation added ~20 MB of resident Python interpreters that stuck around until the sleep finished. - Replaced the spawn with ~/.hermes/pastes/pending.json: records {url, expire_at} entries. - _sweep_expired_pastes() synchronously DELETEs past-due entries on every 'hermes debug' invocation (run_debug() dispatcher). - Network failures stay in pending.json for up to 24h, then give up (paste.rs's own retention handles the 'user never runs hermes again' edge case). - Zero subprocesses; regression test asserts subprocess/Popen/time.sleep never appear in the function source (skipping docstrings via AST). ## Validation \| \| Before \| After \| \|------------------------------\|---------------\|--------------\| \| Orphan agent-browser daemons \| 18 accumulated\| 2 (live) \| \| paste.rs sleep interpreters \| 15 accumulated\| 0 \| \| RSS reclaimed \| - \| ~2.7 GB \| \| Targeted tests \| - \| 2253 pass \| E2E verified: alive-owner daemons NOT reaped; dead-owner daemons SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired entries without spawning subprocesses.	2026-04-17 18:46:30 -07:00
helix4u	64b354719f	Support browser CDP URL from config	2026-04-17 16:05:04 -07:00
brooklyn!	e9b8ece103	Merge pull request #4692 from NousResearch/feat/ink-refactor Feat/ink refactor	2026-04-17 18:02:37 -05:00
Teknium	3f43aec15d	fix(tools): bound _read_tracker sub-containers + prune _completion_consumed (#11839 ) Two accretion-over-time leaks that compound over long CLI / gateway lifetimes. Both were flagged in the memory-leak audit. ## file_tools._read_tracker _read_tracker[task_id] holds three sub-containers that grew unbounded: read_history set of (path, offset, limit) tuples — 1 per unique read dedup dict of (path, offset, limit) → mtime — same growth pattern read_timestamps dict of resolved_path → mtime — 1 per unique path A CLI session uses one stable task_id for its lifetime, so these were uncapped. A 10k-read session accumulated ~1.5MB of tracker state that the tool no longer needed (only the most recent reads are relevant for dedup, consecutive-loop detection, and write/patch external-edit warnings). Fix: _cap_read_tracker_data() enforces hard caps on each container after every add. Defaults: read_history=500, dedup=1000, read_timestamps=1000. Eviction is insertion-order (Python 3.7+ dict guarantee) for the dicts; arbitrary for the set (which only feeds diagnostic summaries). ## process_registry._completion_consumed Module-level set that recorded every session_id ever polled / waited / logged. No pruning. Each entry is ~20 bytes, so the absolute leak is small, but on a gateway processing thousands of background commands per day the set grows until process exit. Fix: _prune_if_needed() now discards _completion_consumed entries alongside the session dict evictions it already performs (both the TTL-based prune and the LRU-over-cap prune). Adds a final belt-and-suspenders pass that drops any dangling entries whose session_id no longer appears in _running or _finished. Tests: tests/tools/test_accretion_caps.py — 9 cases * Each container bound respected, oldest evicted * No-op when under cap (no unnecessary work) * Handles missing sub-containers without crashing * Live read_file_tool path enforces caps end-to-end * _completion_consumed pruned on TTL expiry * _completion_consumed pruned on LRU eviction * Dangling entries (no backing session) cleared Broader suite: 3486 tests/tools + tests/cli pass. The single flake (test_alias_command_passes_args) reproduces on unchanged main — known cross-test pollution under suite-order load.	2026-04-17 15:53:57 -07:00
Brooklyn Nicholson	aa583cb14e	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-17 17:51:40 -05:00
Teknium	f362083c64	fix(providers): complete NVIDIA NIM parity with other providers Follow-up on the native NVIDIA NIM provider salvage. The original PR wired PROVIDER_REGISTRY + HERMES_OVERLAYS correctly but missed several touchpoints required for full parity with other OpenAI-compatible providers (xai, huggingface, deepseek, zai). Gaps closed: - hermes_cli/main.py: - Add 'nvidia' to the _model_flow_api_key_provider dispatch tuple so selecting 'NVIDIA NIM' in `hermes model` actually runs the api-key provider flow (previously fell through silently). - Add 'nvidia' to `hermes chat --provider` argparse choices so the documented test command (`hermes chat --provider nvidia --model ...`) parses successfully. - hermes_cli/config.py: Register NVIDIA_API_KEY and NVIDIA_BASE_URL in OPTIONAL_ENV_VARS so setup wizard can prompt for them and they're auto-added to the subprocess env blocklist. - hermes_cli/doctor.py: Add NVIDIA NIM row to `_apikey_providers` so `hermes doctor` probes https://integrate.api.nvidia.com/v1/models. - hermes_cli/dump.py: Add NVIDIA_API_KEY → 'nvidia' mapping for `hermes dump` credential masking. - tests/tools/test_local_env_blocklist.py: Extend registry_vars fixture with NVIDIA_API_KEY to verify it's blocked from leaking into subprocesses. - agent/model_metadata.py: Add 'nemotron' → 131072 context-length entry so all Nemotron variants get 128K context via substring match (rather than falling back to MINIMUM_CONTEXT_LENGTH). - hermes_cli/models.py: Fix hallucinated model ID 'nvidia/nemotron-3-nano-8b-a4b' → 'nvidia/nemotron-3-nano-30b-a3b' (verified against live integrate.api.nvidia.com/v1/models catalog). Expand curated list from 5 to 9 agentic models mapping to OpenRouter defaults per provider-guide convention: add qwen3.5-397b-a17b, deepseek-v3.2, llama-3.3-nemotron-super-49b-v1.5, gpt-oss-120b. - cli-config.yaml.example: Document 'nvidia' provider option. - scripts/release.py: Map asurla@nvidia.com → anniesurla in AUTHOR_MAP for CI attribution. E2E verified: `hermes chat --provider nvidia ...` now reaches NVIDIA's endpoint (returns 401 with bogus key instead of argparse error); `hermes doctor` detects NVIDIA NIM when NVIDIA_API_KEY is set.	2026-04-17 13:47:46 -07:00
Brooklyn Nicholson	1f37ef2fd1	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-17 08:59:33 -05:00
yeyitech	a97b08e30c	fix: allow trusted QQ CDN benchmark IP resolution	2026-04-17 04:22:40 -07:00
Teknium	2367c6ffd5	test: remove 169 change-detector tests across 21 files (#11472 ) First pass of test-suite reduction to address flaky CI and bloat. Removed tests that fall into these change-detector patterns: 1. Source-grep tests (tests/gateway/test_feishu.py, test_email.py): tests that call inspect.getsource() on production modules and grep for string literals. Break on any refactor/rename even when behavior is correct. 2. Platform enum tautologies (every gateway/test_X.py): assertions like `Platform.X.value == 'x'` duplicated across ~9 adapter test files. 3. Toolset/PLATFORM_HINTS/setup-wizard registry-presence checks: tests that only verify a key exists in a dict. Data-layout tests, not behavior. 4. Argparse wiring tests (test_argparse_flag_propagation, test_subparser_routing _fallback): tests that do parser.parse_args([...]) then assert args.field. Tests Python's argparse, not our code. 5. Pure dispatch tests (test_plugins_cmd.TestPluginsCommandDispatch): patch cmd_X, call plugins_command with matching action, assert mock called. Tests the if/elif chain, not behavior. 6. Kwarg-to-mock verification (test_auxiliary_client ~45 tests, test_web_tools_config, test_gemini_cloudcode, test_retaindb_plugin): tests that mock the external API client, call our function, and assert exact kwargs. Break on refactor even when behavior is preserved. 7. Schedule-internal "function-was-called" tests (acp/test_server scheduling tests): tests that patch own helper method, then assert it was called. Kept behavioral tests throughout: error paths (pytest.raises), security tests (path traversal, SSRF, redaction), message alternation invariants, provider API format conversion, streaming logic, memory contract, real config load/merge tests. Net reduction: 169 tests removed. 38 empty classes cleaned up. Collected before: 12,522 tests Collected after: 12,353 tests	2026-04-17 01:05:09 -07:00
Teknium	e5cde568b7	feat(skills): add 'hermes skills reset' to un-stick bundled skills (#11468 ) When a user edits a bundled skill, sync flags it as user_modified and skips it forever. The problem: if the user later tries to undo the edit by copying the current bundled version back into ~/.hermes/skills/, the manifest still holds the old origin hash from the last successful sync, so the fresh bundled hash still doesn't match and the skill stays stuck as user_modified. Adds an escape hatch for this case. hermes skills reset <name> Drops the skill's entry from ~/.hermes/skills/.bundled_manifest and re-baselines against the user's current copy. Future 'hermes update' runs accept upstream changes again. Non-destructive. hermes skills reset <name> --restore Also deletes the user's copy and re-copies the bundled version. Use when you want the pristine upstream skill back. Also available as /skills reset in chat. - tools/skills_sync.py: new reset_bundled_skill(name, restore=False) - hermes_cli/skills_hub.py: do_reset() + wired into skills_command and handle_skills_slash; added to the slash /skills help panel - hermes_cli/main.py: argparse entry for 'hermes skills reset' - tests/tools/test_skills_sync.py: 5 new tests covering the stuck-flag repro, --restore, unknown-skill error, upstream-removed-skill, and no-op on already-clean state - website/docs/user-guide/features/skills.md: new 'Bundled skill updates' section explaining the origin-hash mechanic + reset usage	2026-04-17 00:41:31 -07:00
Teknium	220fa7db90	feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro (#11406 ) * feat(image_gen): upgrade Recraft V3 → V4 Pro, Nano Banana → Pro Upstream asked for these two upgrades ASAP — the old entries show stale models when newer, higher-quality versions are available on FAL. Recraft V3 → Recraft V4 Pro ID: fal-ai/recraft-v3 → fal-ai/recraft/v4/pro/text-to-image Price: $0.04/image → $0.25/image (6x — V4 Pro is premium tier) Schema: V4 dropped the required `style` enum entirely; defaults handle taste now. Added `colors` and `background_color` to supports for brand-palette control. `seed` is not supported by V4 per the API docs. Nano Banana → Nano Banana Pro ID: fal-ai/nano-banana → fal-ai/nano-banana-pro Price: $0.08/image → $0.15/image (1K); $0.30 at 4K Schema: Aspect ratio family unchanged. Added `resolution` (1K/2K/4K, default 1K for billing predictability), `enable_web_search` (real-time info grounding, +$0.015), and `limit_generations` (force exactly 1 image). Architecture: Gemini 2.5 Flash → Gemini 3 Pro Image. Quality and reasoning depth improved; slower (~6s → ~8s). Migration: users who had the old IDs in `image_gen.model` will fall through the existing 'unknown model → default' warning path in `_resolve_fal_model()` and get the Klein 9B default on the next run. Re-run `hermes tools` → Image Generation to pick the new version. No silent cost-upgrade aliasing — the 2-6x price jump on these tiers warrants explicit user re-selection. Portal note: both new model IDs need to be allowlisted on the Nous fal-queue-gateway alongside the previous 7 additions, or users on Nous Subscription will see the 'managed gateway rejected model' error we added previously (which is clear and self-remediating, just noisy). * docs: wrap '<1s' in backticks to unblock MDX compilation Docusaurus's MDX parser treats unquoted '<' as the start of JSX, and '<1s' fails because '1' isn't a valid tag-name start character. This was broken on main since PR #11265 (never noticed because docs-site-checks was failing on OTHER issues at the time and we admin-merged through it). Wrapping in backticks also gives the cell monospace styling which reads more cleanly alongside the inline-code model ID in the same row. The other '<1s' occurrence (line 52) is inside a fenced code block and is already safe — code fences bypass MDX parsing.	2026-04-16 22:05:41 -07:00
Teknium	70768665a4	fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383 ) * feat(mcp-oauth): scaffold MCPOAuthManager Central manager for per-server MCP OAuth state. Provides get_or_build_provider (cached), remove (evicts cache + deletes disk), invalidate_if_disk_changed (mtime watch, core fix for external-refresh workflow), and handle_401 (dedup'd recovery). No behavior change yet — existing call sites still use build_oauth_auth directly. Task 1 of 8 in the MCP OAuth consolidation (fixes Cthulhu's BetterStack reliability issues). * feat(mcp-oauth): add HermesMCPOAuthProvider with pre-flow disk watch Subclasses the MCP SDK's OAuthClientProvider to inject a disk mtime check before every async_auth_flow, via the central manager. When a subclass instance is used, external token refreshes (cron, another CLI instance) are picked up before the next API call. Still dead code: the manager's _build_provider still delegates to build_oauth_auth and returns the plain OAuthClientProvider. Task 4 wires this subclass in. Task 2 of 8. * refactor(mcp-oauth): extract build_oauth_auth helpers Decomposes build_oauth_auth into _configure_callback_port, _build_client_metadata, _maybe_preregister_client, and _parse_base_url. Public API preserved. These helpers let MCPOAuthManager._build_provider reuse the same logic in Task 4 instead of duplicating the construction dance. Also updates the SDK version hint in the warning from 1.10.0 to 1.26.0 (which is what we actually require for the OAuth types used here). Task 3 of 8. * feat(mcp-oauth): manager now builds HermesMCPOAuthProvider directly _build_provider constructs the disk-watching subclass using the helpers from Task 3, instead of delegating to the plain build_oauth_auth factory. Any consumer using the manager now gets pre-flow disk-freshness checks automatically. build_oauth_auth is preserved as the public API for backwards compatibility. The code path is now: MCPOAuthManager.get_or_build_provider -> _build_provider -> _configure_callback_port _build_client_metadata _maybe_preregister_client _parse_base_url HermesMCPOAuthProvider(...) Task 4 of 8. * feat(mcp): wire OAuth manager + add _reconnect_event MCPServerTask gains _reconnect_event alongside _shutdown_event. When set, _run_http / _run_stdio exit their async-with blocks cleanly (no exception), and the outer run() loop re-enters the transport to rebuild the MCP session with fresh credentials. This is the recovery path for OAuth failures that the SDK's in-place httpx.Auth cannot handle (e.g. cron externally consumed the refresh_token, or server-side session invalidation). _run_http now asks MCPOAuthManager for the OAuth provider instead of calling build_oauth_auth directly. Config-time, runtime, and reconnect paths all share one provider instance with pre-flow disk-watch active. shutdown() defensively sets both events so there is no race between reconnect and shutdown signalling. Task 5 of 8. * feat(mcp): detect auth failures in tool handlers, trigger reconnect All 5 MCP tool handlers (tool call, list_resources, read_resource, list_prompts, get_prompt) now detect auth failures and route through MCPOAuthManager.handle_401: 1. If the manager says recovery is viable (disk has fresh tokens, or SDK can refresh in-place), signal MCPServerTask._reconnect_event to tear down and rebuild the MCP session with fresh credentials, then retry the tool call once. 2. If no recovery path exists, return a structured needs_reauth JSON error so the model stops hallucinating manual refresh attempts (the 'let me curl the token endpoint' loop Cthulhu pasted from Discord). _is_auth_error catches OAuthFlowError, OAuthTokenError, OAuthNonInteractiveError, and httpx.HTTPStatusError(401). Non-auth exceptions still surface via the generic error path unchanged. Task 6 of 8. * feat(mcp-cli): route add/remove through manager, add 'hermes mcp login' cmd_mcp_add and cmd_mcp_remove now go through MCPOAuthManager instead of calling build_oauth_auth / remove_oauth_tokens directly. This means CLI config-time state and runtime MCP session state are backed by the same provider cache — removing a server evicts the live provider, adding a server populates the same cache the MCP session will read from. New 'hermes mcp login <name>' command: - Wipes both the on-disk tokens file and the in-memory MCPOAuthManager cache - Triggers a fresh OAuth browser flow via the existing probe path - Intended target for the needs_reauth error Task 6 returns to the model Task 7 of 8. * test(mcp-oauth): end-to-end integration tests Five new tests exercising the full consolidation with real file I/O and real imports (no transport mocks): 1. external_refresh_picked_up_without_restart — Cthulhu's cron workflow. External process writes fresh tokens to disk; on the next auth flow the manager's mtime-watch flips _initialized and the SDK re-reads from storage. 2. handle_401_deduplicates_concurrent_callers — 10 concurrent handlers for the same failed token fire exactly ONE recovery attempt (thundering-herd protection). 3. handle_401_returns_false_when_no_provider — defensive path for unknown servers. 4. invalidate_if_disk_changed_handles_missing_file — pre-auth state returns False cleanly. 5. provider_is_reused_across_reconnects — cache stickiness so reconnects preserve the disk-watch baseline mtime. Task 8 of 8 — consolidation complete.	2026-04-16 21:57:10 -07:00
Brooklyn Nicholson	41d3d7afb7	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-16 22:35:27 -05:00
Teknium	01906e99dd	feat(image_gen): multi-model FAL support with picker in hermes tools (#11265 ) * feat(image_gen): multi-model FAL support with picker in hermes tools Adds 8 FAL text-to-image models selectable via `hermes tools` → Image Generation → (FAL.ai \| Nous Subscription) → model picker. Models supported: - fal-ai/flux-2/klein/9b (new default, <1s, $0.006/MP) - fal-ai/flux-2-pro (previous default, kept backward-compat upscaling) - fal-ai/z-image/turbo (Tongyi-MAI, bilingual EN/CN) - fal-ai/nano-banana (Gemini 2.5 Flash Image) - fal-ai/gpt-image-1.5 (with quality tier: low/medium/high) - fal-ai/ideogram/v3 (best typography) - fal-ai/recraft-v3 (vector, brand styles) - fal-ai/qwen-image (LLM-based) Architecture: - FAL_MODELS catalog declares per-model size family, defaults, supports whitelist, and upscale flag. Three size families handled uniformly: image_size_preset (flux family), aspect_ratio (nano-banana), and gpt_literal (gpt-image-1.5). - _build_fal_payload() translates unified inputs (prompt + aspect_ratio) into model-specific payloads, merges defaults, applies caller overrides, wires GPT quality_setting, then filters to the supports whitelist — so models never receive rejected keys. - IMAGEGEN_BACKENDS registry in tools_config prepares for future imagegen providers (Replicate, Stability, etc.); each provider entry tags itself with imagegen_backend: 'fal' to select the right catalog. - Upscaler (Clarity) defaults off for new models (preserves <1s value prop), on for flux-2-pro (backward-compat). Per-model via FAL_MODELS. Config: image_gen.model = fal-ai/flux-2/klein/9b (new) image_gen.quality_setting = medium (new, GPT only) image_gen.use_gateway = bool (existing) Agent-facing schema unchanged (prompt + aspect_ratio only) — model choice is a user-level config decision, not an agent-level arg. Picker uses curses_radiolist (arrow keys, auto numbered-fallback on non-TTY). Column-aligned: Model / Speed / Strengths / Price. Docs: image-generation.md rewritten with the model table and picker walkthrough. tools-reference, tool-gateway, overview updated to drop the stale "FLUX 2 Pro" wording. Tests: 42 new in tests/tools/test_image_generation.py covering catalog integrity, all 3 size families, supports filter, default merging, GPT quality wiring, model resolution fallback. 8 new in tests/hermes_cli/test_tools_config.py for picker wiring (registry, config writes, GPT quality follow-up prompt, corrupt-config repair). * feat(image_gen): translate managed-gateway 4xx to actionable error When the Nous Subscription managed FAL proxy rejects a model with 4xx (likely portal-side allowlist miss or billing gate), surface a clear message explaining: 1. The rejected model ID + HTTP status 2. Two remediation paths: set FAL_KEY for direct access, or pick a different model via `hermes tools` 5xx, connection errors, and direct-FAL errors pass through unchanged (those have different root causes and reasonable native messages). Motivation: new FAL models added to this release (flux-2-klein-9b, z-image-turbo, nano-banana, gpt-image-1.5, ideogram-v3, recraft-v3, qwen-image) are untested against the Nous Portal proxy. If the portal allowlists model IDs, users on Nous Subscription will hit cryptic 4xx errors without guidance on how to work around it. Tests: 8 new cases covering status extraction across httpx/fal error shapes and 4xx-vs-5xx-vs-ConnectionError translation policy. Docs: brief note in image-generation.md for Nous subscribers. Operator action (Nous Portal side): verify that fal-queue-gateway passes through these 7 new FAL model IDs. If the proxy has an allowlist, add them; otherwise Nous Subscription users will see the new translated error and fall back to direct FAL. * feat(image_gen): pin GPT-Image quality to medium (no user choice) Previously the tools picker asked a follow-up question for GPT-Image quality tier (low / medium / high) and persisted the answer to `image_gen.quality_setting`. This created two problems: 1. Nous Portal billing complexity — the 22x cost spread between tiers ($0.009 low / $0.20 high) forces the gateway to meter per-tier per user, which the portal team can't easily support at launch. 2. User footgun — anyone picking `high` by mistake burns through credit ~6x faster than `medium`. This commit pins quality at medium by baking it into FAL_MODELS defaults for gpt-image-1.5 and removes all user-facing override paths: - Removed `_resolve_gpt_quality()` runtime lookup - Removed `honors_quality_setting` flag on the model entry - Removed `_configure_gpt_quality_setting()` picker helper - Removed `_GPT_QUALITY_CHOICES` constant - Removed the follow-up prompt call in `_configure_imagegen_model()` - Even if a user manually edits `image_gen.quality_setting` in config.yaml, no code path reads it — always sends medium. Tests: - Replaced TestGptQualitySetting (6 tests) with TestGptQualityPinnedToMedium (5 tests) — proves medium is baked in, config is ignored, flag is removed, helper is removed, non-gpt models never get quality. - Replaced test_picker_with_gpt_image_also_prompts_quality with test_picker_with_gpt_image_does_not_prompt_quality — proves only 1 picker call fires when gpt-image is selected (no quality follow-up). Docs updated: image-generation.md replaces the quality-tier table with a short note explaining the pinning decision. * docs(image_gen): drop stale 'wires GPT quality tier' line from internals section Caught in a cleanup sweep after pinning quality to medium. The "How It Works Internally" walkthrough still described the removed quality-wiring step.	2026-04-16 20:19:53 -07:00
Teknium	7fd508979e	fix: harden sync_back — PID-suffix temp path, size cap, lifecycle guards Follow-ups on top of kshitijk4poor's cherry-picked salvage of PR #8018: tools/environments/daytona.py - PID-suffix /tmp/.hermes_sync.<pid>.tar so concurrent sync_back calls against the same sandbox don't collide on the remote temp path - Move sync_back() inside the cleanup lock and after the _sandbox-None guard, with its own try/except. Previously a no-op cleanup (sandbox already cleared) still fired sync_back → 3-attempt retry storm against a nil sandbox (~6s of sleep). Now short-circuits cleanly. tools/environments/file_sync.py - Add _SYNC_BACK_MAX_BYTES (2 GiB) defensive cap: refuse to extract a tar larger than the limit. Protects against runaway sandboxes producing arbitrary-size archives. - Add 'nothing previously pushed' guard at the top of sync_back(). If _pushed_hashes and _synced_files are both empty, the FileSyncManager was never initialized from the host side — there is nothing coherent to sync back. Skips the retry/backoff machinery on uninitialized managers and eliminates test-suite slowdown from pre-existing cleanup tests that don't mock the sync layer. tests/tools/test_file_sync_back.py - Update _make_manager helper to seed a _pushed_hashes entry by default so sync_back() exercises its real path. A seed_pushed_state=False opt-out is available for noop-path tests. - Add TestSyncBackSizeCap with positive and negative coverage of the new cap. tests/tools/test_sync_back_backends.py - Update Daytona bulk download test to assert the PID-suffixed path pattern instead of the fixed /tmp/.hermes_sync.tar.	2026-04-16 19:39:21 -07:00
kshitijk4poor	d64446e315	feat(file-sync): sync remote changes back to host on teardown Salvage of PR #8018 by @alt-glitch onto current main. On sandbox teardown, FileSyncManager now downloads the remote .hermes/ directory, diffs against SHA-256 hashes of what was originally pushed, and applies only changed files back to the host. Core (tools/environments/file_sync.py): - sync_back(): orchestrates download -> unpack -> diff -> apply with: - Retry with exponential backoff (3 attempts, 2s/4s/8s) - SIGINT trap + defer (prevents partial writes on Ctrl-C) - fcntl.flock serialization (concurrent gateway sandboxes) - Last-write-wins conflict resolution with warning - New remote files pulled back via _infer_host_path prefix matching Backends: - SSH: _ssh_bulk_download — tar cf - piped over SSH - Modal: _modal_bulk_download — exec tar cf - -> proc.stdout.read - Daytona: _daytona_bulk_download — exec tar cf -> SDK download_file - All three call sync_back() at the top of cleanup() Fixes applied during salvage (vs original PR #8018): \| # \| Issue \| Fix \| \|---\|-------\|-----\| \| C1 \| import fcntl unconditional — crashes Windows \| try/except with fallback; _sync_back_locked skips locking when fcntl=None \| \| W1 \| assert for runtime guard (stripped by -O) \| Replaced with proper if/raise RuntimeError \| \| W2 \| O(n*m) from _get_files_fn() called per file \| Cache mapping once at start of _sync_back_impl, pass to resolve/infer \| \| W3 \| Dead BulkDownloadFn imports in 3 backends \| Removed unused imports \| \| W4 \| Modal hardcodes root/.hermes, no explanation \| Added docstring comment explaining Modal always runs as root \| \| S1 \| SHA-256 computed for new files where pushed_hash=None \| Skip hashing when pushed_hash is None (comparison always False) \| \| S2 \| Daytona /tmp/.hermes_sync.tar never cleaned up \| Added rm -f after download (best-effort) \| Tests: 49 passing (17 new: _infer_host_path edge cases, SIGINT main/worker thread, Windows fcntl=None fallback, Daytona tar cleanup). Based on #8018 by @alt-glitch.	2026-04-16 19:39:21 -07:00
Brooklyn Nicholson	3746c60439	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-16 18:25:49 -05:00
Teknium	edefec4e68	fix(checkpoints): isolate shadow git repo from user's global config (#11261 ) Users with 'commit.gpgsign = true' in their global git config got a pinentry popup (or a failed commit) every time the agent took a background filesystem snapshot — every write_file, patch, or diff mid-session. With GPG_TTY unset, pinentry-qt/gtk would spawn a GUI window, constantly interrupting the session. The shadow repo is internal Hermes infrastructure. It must not inherit user-level git settings (signing, hooks, aliases, credential helpers, etc.) under any circumstance. Fix is layered: 1. _git_env() sets GIT_CONFIG_GLOBAL=os.devnull, GIT_CONFIG_SYSTEM=os.devnull, and GIT_CONFIG_NOSYSTEM=1. Shadow git commands no longer see ~/.gitconfig or /etc/gitconfig at all (uses os.devnull for Windows compat). 2. _init_shadow_repo() explicitly writes commit.gpgsign=false and tag.gpgSign=false into the shadow's own config, so the repo is correct even if inspected or run against directly without the env vars, and for older git versions (<2.32) that predate GIT_CONFIG_GLOBAL. 3. _take() passes --no-gpg-sign inline on the commit call. This covers existing shadow repos created before this fix — they will never re-run _init_shadow_repo (it is gated on HEAD not existing), so they would miss layer 2. Layer 1 still protects them, but the inline flag guarantees correctness at the commit call itself. Existing checkpoints, rollback, list, diff, and restore all continue to work — history is untouched. Users who had the bug stop getting pinentry popups; users who didn't see no observable change. Tests: 5 new regression tests in TestGpgAndGlobalConfigIsolation, including a full E2E repro with fake HOME, global gpgsign=true, and a deliberately broken GPG binary — checkpoint succeeds regardless.	2026-04-16 16:06:49 -07:00
Teknium	387aa9afc9	fix(approval): heartbeat activity during gateway approval wait (#11245 ) The blocking gateway approval wait at tools/approval.py called `entry.event.wait(timeout=...)` which never touched the agent's activity tracker. When a user was slow to respond to a /approve prompt (or the gateway_timeout config was set higher than the default 300s), the agent thread sat silent long enough for the gateway's inactivity watchdog (agent.gateway_timeout, default 1800s) to kill it — even though the agent was doing exactly the right thing and the user was the one causing the delay. The fix polls the event in 1s slices and calls touch_activity_if_due between slices, mirroring the _wait_for_process() pattern in tools/environments/base.py that covers the subprocess-waiting side of the same problem. At the default 10s heartbeat cadence, a 300s approval wait now pings activity ~30 times, well under the 1800s idle threshold. Observed in community user logs: 12 repeated 'Agent idle 1800s, last_activity=executing tool: terminal' events across April 12-14. Companion to PR #10501 which covered streaming / concurrent-tool / Modal-backend gaps but did not touch approval.py. Test: tests/tools/test_approval_heartbeat.py — verifies (1) heartbeats fire during the wait, (2) user responses are still near-instant, and (3) the approval path stays functional when the heartbeat helper can't be imported.	2026-04-16 14:48:50 -07:00
Teknium	fce6c3cdf6	feat(tts): add Google Gemini TTS provider (#11229 ) Adds Google Gemini TTS as the seventh voice provider, with 30 prebuilt voices (Zephyr, Puck, Kore, Enceladus, Gacrux, etc.) and natural-language prompt control. Integrates through the existing provider chain: - tools/tts_tool.py: new _generate_gemini_tts() calls the generativelanguage REST endpoint with responseModalities=[AUDIO], wraps the returned 24kHz mono 16-bit PCM (L16) in a WAV RIFF header, then ffmpeg-converts to MP3 or Opus depending on output extension. For .ogg output, libopus is forced explicitly so Telegram voice bubbles get Opus (ffmpeg defaults to Vorbis for .ogg). - hermes_cli/tools_config.py: exposes 'Google Gemini TTS' as a provider option in the curses-based 'hermes tools' UI. - hermes_cli/setup.py: adds gemini to the setup wizard picker, tool status display, and API key prompt branch (accepts existing GEMINI_API_KEY or GOOGLE_API_KEY, falls back to Edge if neither set). - tests/tools/test_tts_gemini.py: 15 unit tests covering WAV header wrap correctness, env var fallback (GEMINI/GOOGLE), voice/model overrides, snake_case vs camelCase inlineData handling, HTTP error surfacing, and empty-audio edge cases. - docs: TTS features page updated to list seven providers with the new gemini config block and ffmpeg notes. Live-tested against api key against gemini-2.5-flash-preview-tts: .wav, .mp3, and Telegram-compatible .ogg (Opus codec) all produce valid playable audio.	2026-04-16 14:23:16 -07:00
Brooklyn Nicholson	cb2a737bc8	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-16 14:48:33 -05:00
emozilla	f188ac74f0	feat: ungate Tool Gateway — subscription-based access with per-tool opt-in Replace the HERMES_ENABLE_NOUS_MANAGED_TOOLS env-var feature flag with subscription-based detection. The Tool Gateway is now available to any paid Nous subscriber without needing a hidden env var. Core changes: - managed_nous_tools_enabled() checks get_nous_auth_status() + check_nous_free_tier() instead of an env var - New use_gateway config flag per tool section (web, tts, browser, image_gen) records explicit user opt-in and overrides direct API keys at runtime - New prefers_gateway(section) shared helper in tool_backend_helpers.py used by all 4 tool runtimes (web, tts, image gen, browser) UX flow: - hermes model: after Nous login/model selection, shows a curses prompt listing all gateway-eligible tools with current status. User chooses to enable all, enable only unconfigured tools, or skip. Defaults to Enable for new users, Skip when direct keys exist. - hermes tools: provider selection now manages use_gateway flag — selecting Nous Subscription sets it, selecting any other provider clears it - hermes status: renamed section to Nous Tool Gateway, added free-tier upgrade nudge for logged-in free users - curses_radiolist: new description parameter for multi-line context that survives the screen clear Runtime behavior: - Each tool runtime (web_tools, tts_tool, image_generation_tool, browser_use) checks prefers_gateway() before falling back to direct env-var credentials - get_nous_subscription_features() respects use_gateway flags, suppressing direct credential detection when the user opted in Removed: - HERMES_ENABLE_NOUS_MANAGED_TOOLS env var and all references - apply_nous_provider_defaults() silent TTS auto-set - get_nous_subscription_explainer_lines() static text - Override env var warnings (use_gateway handles this properly now)	2026-04-16 12:36:49 -07:00
Brooklyn Nicholson	9c71f3a6ea	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-16 10:47:41 -05:00
kshitijk4poor	a6142a8e08	fix: follow-up for salvaged PR #10854 - Extract duplicated activity-callback polling into shared touch_activity_if_due() helper in tools/environments/base.py - Use helper from both base.py _wait_for_process and code_execution_tool.py local polling loop (DRY) - Add test assertion that timeout output field contains the timeout message and emoji (#10807) - Add stream_consumer test for tool-boundary fallback scenario where continuation is empty but final_text differs from visible prefix (#10807)	2026-04-16 06:42:45 -07:00
Brooklyn Nicholson	f81dba0da2	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-16 08:23:20 -05:00
Teknium	f726b9b843	fix(browser): runtime fallback to local Chromium when cloud provider fails Wraps provider.create_session() in _get_session_info() with try/except to catch cloud provider runtime failures (timeouts, auth errors, rate limits, invalid responses). Falls back to _create_local_session() so browser automation continues working when cloud APIs are down. Marks fallback sessions with fallback_from_cloud, fallback_reason, and fallback_provider metadata for observability. If both cloud and local fail, raises RuntimeError with chained context from both errors. Closes #10883 Co-authored-by: konsisumer <konsisumer@users.noreply.github.com>	2026-04-16 04:19:34 -07:00
Teknium	23a42635f0	docs: remove nonexistent CAMOFOX_PROFILE_DIR env var references (#10976 ) Camofox automatically maps each userId to a persistent Firefox profile on the server side — no CAMOFOX_PROFILE_DIR env var exists. Our docs incorrectly told users to configure this on the server. Removed the fabricated env var from: - browser docs (:::note block) - config.py DEFAULT_CONFIG comment - test docstring	2026-04-16 04:07:11 -07:00
Markus Corazzione	c928ebb1b1	retry transient telegram send failures	2026-04-16 03:47:00 -07:00
Kovyrin Family Claw	00ff9a26cd	Fix Telegram link preview suppression for bot sends	2026-04-15 17:54:43 -07:00
Teknium	c850a40e4e	fix: gate Matrix adapter path on media_files presence Text-only Matrix sends should continue using the lightweight _send_matrix() HTTP helper (~100ms). Only route through the heavy MatrixAdapter (full sync + E2EE setup) when media files are present. Adds test verifying text-only messages don't take the adapter path.	2026-04-15 17:37:43 -07:00
Teknium	276ed5c399	fix(send_message): deliver Matrix media via adapter Matrix media delivery was silently dropped by send_message because Matrix wasn't wired into the native adapter-backed media path. Only Telegram, Discord, and Weixin had native media support. Adds _send_matrix_via_adapter() which creates a MatrixAdapter instance, connects, sends text + media via the adapter's native upload methods (send_document, send_image_file, send_video, send_voice), then disconnects. Also fixes a stale URL-encoding assertion in test_send_message_missing_platforms that broke after PR #10151 added quote() to room IDs. Cherry-picked from PR #10486 by helix4u.	2026-04-15 17:37:43 -07:00
Brooklyn Nicholson	097702c8a7	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-15 19:11:07 -05:00
Brenner Spear	2fbdc2c8fa	feat(discord): add channel_prompts config Add native Discord channel_prompts support with parent forum fallback, ephemeral runtime injection, config migration updates, docs, and tests.	2026-04-15 16:31:28 -07:00
Brooklyn Nicholson	72aebfbb24	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-15 17:43:41 -05:00
Ruzzgar	de3f8bc6ce	fix terminal workdir validation for Windows paths	2026-04-15 15:06:51 -07:00
Brooklyn Nicholson	baa0de7649	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-15 16:35:01 -05:00
Teknium	305a702e09	fix: /browser connect CDP override now takes priority over Camofox (#10523 ) When a user runs /browser connect to attach browser tools to their real Chrome instance via CDP, the BROWSER_CDP_URL env var is set. However, every browser tool function checks _is_camofox_mode() first, which short-circuits to the Camofox backend before _get_session_info() ever checks for the CDP override. Fix: is_camofox_mode() now returns False when BROWSER_CDP_URL is set, so the explicit CDP connection takes priority. This is the correct behavior — /browser connect is an intentional user override. Reported by SkyLinx on Discord.	2026-04-15 14:11:18 -07:00
Teknium	824c33729d	fix(session_search): coerce limit to int to prevent TypeError with non-int values (#10522 ) Models (especially open-source like qwen3.5-plus) may send non-int values for the limit parameter — None (JSON null), string, or even a type object. This caused TypeError: '<=' not supported between instances of 'int' and 'type' when the value reached min()/comparison operations. Changes: - Add defensive int coercion at session_search() entry with fallback to 3 - Clamp limit to [1, 5] range (was only capped at 5, not floored) - Add tests for None, type object, string, negative, and zero limit values Reported by community user ludoSifu via Discord.	2026-04-15 14:11:05 -07:00
Teknium	a418ddbd8b	fix: add activity heartbeats to prevent false gateway inactivity timeouts (#10501 ) Multiple gaps in activity tracking could cause the gateway's inactivity timeout to fire while the agent is actively working: 1. Streaming wait loop had no periodic heartbeat — the outer thread only touched activity when the stale-stream detector fired (180-300s), and for local providers (Ollama) the stale timeout was infinity, meaning zero heartbeats. Now touches activity every 30s. 2. Concurrent tool execution never set the activity callback on worker threads (threading.local invisible across threads) and never set _current_tool. Workers now set the callback, and the concurrent wait uses a polling loop with 30s heartbeats. 3. Modal backend's execute() override had its own polling loop without any activity callback. Now matches _wait_for_process cadence (10s).	2026-04-15 13:29:05 -07:00
Brooklyn Nicholson	53a024a941	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-15 14:37:54 -05:00
etcircle	dee592a0b1	fix(gateway): route synthetic background events by session	2026-04-15 11:16:01 -07:00
Brooklyn Nicholson	371166fe26	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-15 10:21:00 -05:00
Teknium	47e6ea84bb	fix: file handle bug, warning text, and tests for Discord media send - Fix file handle closed before POST: nest session.post() inside the 'with open()' block so aiohttp can read the file during upload - Update warning text to include weixin (also supports media delivery) - Add 8 unit tests covering: text+media, media-only, missing files, upload failures, multiple files, and _send_to_platform routing	2026-04-15 04:16:06 -07:00
Teknium	e69526be79	fix(send_message): URL-encode Matrix room IDs and add Matrix to schema examples (#10151 ) Matrix room IDs contain ! and : which must be percent-encoded in URI path segments per the Matrix C-S spec. Without encoding, some homeservers reject the PUT request. Also adds 'matrix:!roomid:server.org' and 'matrix:@user:server.org' to the tool schema examples so models know the correct target format.	2026-04-15 00:10:59 -07:00
Teknium	180b14442f	test: add _parse_target_ref Matrix coverage for salvaged PR #6144	2026-04-15 00:08:14 -07:00
Brooklyn Nicholson	561cea0d4a	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-15 00:02:31 -05:00
Teknium	8548893d14	feat: entry-level Podman support — find_docker() + rootless entrypoint (#10066 ) - find_docker() now checks HERMES_DOCKER_BINARY env var first, then docker on PATH, then podman on PATH, then macOS known locations - Entrypoint respects HERMES_HOME env var (was hardcoded to /opt/data) - Entrypoint uses groupmod -o to tolerate non-unique GIDs (fixes macOS GID 20 conflict with Debian's dialout group) - Entrypoint makes chown best-effort so rootless Podman continues instead of failing with 'Operation not permitted' - 5 new tests covering env var override, podman fallback, precedence Based on work by alanjds (PR #3996) and malaiwah (PR #8115). Closes #4084.	2026-04-14 21:20:37 -07:00
Greer Guthrie	4b2a1a4337	fix(tools): auto-discover built-in tool modules	2026-04-14 21:12:29 -07:00
Brooklyn Nicholson	77cd5bf565	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-14 19:33:03 -05:00
Greer Guthrie	c10fea8d26	fix(mcp): make server aliases explicit	2026-04-14 17:19:20 -07:00
Greer Guthrie	cda64a5961	fix(mcp): resolve toolsets from live registry	2026-04-14 17:19:20 -07:00
adybag14-cyber	56c34ac4f7	fix(browser): add termux PATH fallbacks Refactor browser tool PATH construction to include Termux directories (/data/data/com.termux/files/usr/bin, /data/data/com.termux/files/usr/sbin) so agent-browser and npx are discoverable on Android/Termux. Extracts _browser_candidate_path_dirs() and _merge_browser_path() helpers to centralize PATH construction shared between _find_agent_browser() and _run_browser_command(), replacing duplicated inline logic. Also fixes os.pathsep usage (was hardcoded ':') for cross-platform correctness. Cherry-picked from PR #9846.	2026-04-14 16:55:55 -07:00
Brooklyn Nicholson	bf54f1fb2f	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-14 18:26:05 -05:00
Teknium	1525624904	fix: block agent from self-destructing gateway via terminal (#6666 ) Add dangerous command patterns that require approval when the agent tries to run gateway lifecycle commands via the terminal tool: - hermes gateway stop/restart — kills all running agents mid-work - hermes update — pulls code and restarts the gateway - systemctl restart/stop (with optional flags like --user) These patterns fire the approval prompt so the user must explicitly approve before the agent can kill its own gateway process. In YOLO mode, the commands run without approval (by design — YOLO means the user accepts all risks). Also fixes the existing systemctl pattern to handle flags between the command and action (e.g. 'systemctl --user restart' was previously undetected because the regex expected the action immediately after 'systemctl'). Root cause: issue #6666 reported agents running 'hermes gateway restart' via terminal, killing the gateway process mid-agent-loop. The user sees the agent suddenly stop responding with no explanation. Combined with the SIGTERM auto-recovery from PR #9875, the gateway now both prevents accidental self-destruction AND recovers if it happens anyway. Test plan: - Updated test_systemctl_restart_not_flagged → test_systemctl_restart_flagged - All 119 approval tests pass - E2E verified: hermes gateway restart, hermes update, systemctl --user restart all detected; hermes gateway status, systemctl status remain safe	2026-04-14 15:43:31 -07:00
Teknium	eed891f1bb	security: supply chain hardening — CI pinning, dep pinning, and code fixes (#9801 ) CI/CD Hardening: - Pin all 12 GitHub Actions to full commit SHAs (was mutable @vN tags) - Add explicit permissions: {contents: read} to 4 workflows - Pin CI pip installs to exact versions (pyyaml==6.0.2, httpx==0.28.1) - Extend supply-chain-audit.yml to scan workflow, Dockerfile, dependency manifest, and Actions version changes Dependency Pinning: - Pin git-based Python deps to commit SHAs (atroposlib, tinker, yc-bench) - Pin WhatsApp Baileys from mutable branch to commit SHA Tool Registry: - Reject tool name shadowing from different tool families (plugins/MCP cannot overwrite built-in tools). MCP-to-MCP overwrites still allowed. MCP Security: - Add tool description content scanning for prompt injection patterns - Log detailed change diff on dynamic tool refresh at WARNING level Skill Manager: - Fix dangerous verdict bug: agent-created skills with dangerous findings were silently allowed (ask->None->allow). Now blocked.	2026-04-14 14:23:37 -07:00
Dusk1e	420d27098f	fix(tools): keep memory tool available when fcntl is unavailable	2026-04-14 10:18:05 -07:00
Brooklyn Nicholson	9a3a2925ed	feat: scroll aware sticky prompt	2026-04-14 11:49:32 -05:00
Teknium	7ad47ace51	fix: resolve remaining 4 CI test failures (#9543 ) - test_auth_commands: suppress _seed_from_singletons auto-seeding that adds extra credentials from CI env (same pattern as nearby tests) - test_interrupt: clear stale _interrupted_threads set to prevent thread ident reuse from prior tests in same xdist worker - test_code_execution: add watch_patterns to _BLOCKED_TERMINAL_PARAMS to match production _TERMINAL_BLOCKED_PARAMS	2026-04-14 02:18:38 -07:00
Greer Guthrie	c7e2fe655a	fix: make tool registry reads thread-safe	2026-04-13 23:52:32 -07:00
helix4u	e08590888a	fix: honor interrupts during MCP tool waits	2026-04-13 22:14:55 -07:00
Brooklyn Nicholson	1b573b7b21	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-13 21:17:41 -05:00
Teknium	f324222b79	fix: add vLLM/local server error patterns + MCP initial connection retry (#9281 ) Port two improvements inspired by Kilo-Org/kilocode analysis: 1. Error classifier: add context overflow patterns for vLLM, Ollama, and llama.cpp/llama-server. These local inference servers return different error formats than cloud providers (e.g., 'exceeds the max_model_len', 'context length exceeded', 'slot context'). Without these patterns, context overflow errors from local servers are misclassified as format errors, causing infinite retries instead of triggering compression. 2. MCP initial connection retry: previously, if the very first connection attempt to an MCP server failed (e.g., transient DNS blip at startup), the server was permanently marked as failed with no retry. Post-connect reconnection had 5 retries with exponential backoff, but initial connection had zero. Now initial connections retry up to 3 times with backoff before giving up, matching the resilience of post-connect reconnection. (Inspired by Kilo Code's MCP server disappearing fix in v1.3.3) Tests: 6 new error classifier tests, 4 new MCP retry tests, 1 updated existing test. All 276 affected tests pass.	2026-04-13 18:46:14 -07:00
Brooklyn Nicholson	7e4dd6ea02	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-13 18:32:13 -05:00
Teknium	8d023e43ed	refactor: remove dead code — 1,784 lines across 77 files (#9180 ) Deep scan with vulture, pyflakes, and manual cross-referencing identified: - 41 dead functions/methods (zero callers in production) - 7 production-dead functions (only test callers, tests deleted) - 5 dead constants/variables - ~35 unused imports across agent/, hermes_cli/, tools/, gateway/ Categories of dead code removed: - Refactoring leftovers: _set_default_model, _setup_copilot_reasoning_selection, rebuild_lookups, clear_session_context, get_logs_dir, clear_session - Unused API surface: search_models_dev, get_pricing, skills_categories, get_read_files_summary, clear_read_tracker, menu_labels, get_spinner_list - Dead compatibility wrappers: schedule_cronjob, list_cronjobs, remove_cronjob - Stale debug helpers: get_debug_session_info copies in 4 tool files (centralized version in debug_helpers.py already exists) - Dead gateway methods: send_emote, send_notice (matrix), send_reaction (bluebubbles), _normalize_inbound_text (feishu), fetch_room_history (matrix), _start_typing_indicator (signal), parse_feishu_post_content - Dead constants: NOUS_API_BASE_URL, SKILLS_TOOL_DESCRIPTION, FILE_TOOLS, VALID_ASPECT_RATIOS, MEMORY_DIR - Unused UI code: _interactive_provider_selection, _interactive_model_selection (superseded by prompt_toolkit picker) Test suite verified: 609 tests covering affected files all pass. Tests for removed functions deleted. Tests using removed utilities (clear_read_tracker, MEMORY_DIR) updated to use internal APIs directly.	2026-04-13 16:32:04 -07:00
Teknium	0dd26c9495	fix(tests): fix 78 CI test failures and remove dead test (#9036 ) Production fixes: - voice_mode.py: add is_recording property to AudioRecorder (parity with TermuxAudioRecorder) - cronjob_tools.py: add sms example to deliver description Test fixes: - test_real_interrupt_subagent: add missing _execution_thread_id (fixes 19 cascading failures from leaked _build_system_prompt patch) - test_anthropic_error_handling: add _FakeMessages, override _interruptible_streaming_api_call (6 fixes) - test_ctx_halving_fix: add missing request_overrides attribute (4 fixes) - test_context_token_tracking: set _disable_streaming=True for non-streaming test path (4 fixes) - test_dict_tool_call_args: set _disable_streaming=True (1 fix) - test_provider_parity: add model='gpt-4o' for AIGateway tests to meet 64K minimum context (4 fixes) - test_session_race_guard: add user_id to SessionSource (5 fixes) - test_restart_drain/helpers: add user_id to SessionSource (2 fixes) - test_telegram_photo_interrupts: add user_id to SessionSource - test_interrupt: target thread_id for per-thread interrupt system (2 fixes) - test_zombie_process_cleanup: rewrite with object.__new__ for refactored GatewayRunner.stop() (1 fix) - test_browser_camofox_state: update config version 15->17 (1 fix) - test_trajectory_compressor_async: widen lookback window 10->20 for line-shifted AsyncOpenAI (1 fix) - test_voice_mode: fixed by production is_recording addition (5 fixes) - test_voice_cli_integration: add _attached_images to CLI stub (2 fixes) - test_hermes_logging: explicit propagation/level reset for cross-test pollution defense (1 fix) - test_run_agent: add base_url for OpenRouter detection tests (2 fixes) Deleted: - test_inline_think_blocks_reasoning_only_accepted: tested unimplemented inline <think> handling	2026-04-13 10:50:24 -07:00
konsisumer	311dac1971	fix(file_tools): block /private/etc writes on macOS symlink bypass On macOS, /etc is a symlink to /private/etc, so os.path.realpath() resolves /etc/hosts to /private/etc/hosts. The sensitive path check only matched /etc/ prefixes against the resolved path, allowing writes to system files on macOS. - Add /private/etc/ and /private/var/ to _SENSITIVE_PATH_PREFIXES - Check both realpath-resolved and normpath-normalized paths - Add regression tests for macOS symlink bypass Closes #8734 Co-authored-by: ElhamDevelopmentStudio (PR #8829)	2026-04-13 05:15:05 -07:00
Teknium	e3ffe5b75f	fix: remove legacy compression.summary_* config and env var fallbacks (#8992 ) Remove the backward-compat code paths that read compression provider/model settings from legacy config keys and env vars, which caused silent failures when auto-detection resolved to incompatible backends. What changed: - Remove compression.summary_model, summary_provider, summary_base_url from DEFAULT_CONFIG and cli.py defaults - Remove backward-compat block in _resolve_task_provider_model() that read from the legacy compression section - Remove _get_auxiliary_provider() and _get_auxiliary_env_override() helper functions (AUXILIARY_/CONTEXT_ env var readers) - Remove env var fallback chain for per-task overrides - Update hermes config show to read from auxiliary.compression - Add config migration (v16→17) that moves non-empty legacy values to auxiliary.compression and strips the old keys - Update example config and openclaw migration script - Remove/update tests for deleted code paths Compression model/provider is now configured exclusively via: auxiliary.compression.provider / auxiliary.compression.model Closes #8923	2026-04-13 04:59:26 -07:00
Teknium	acdff020b7	test: add multi-word query tests for truncation match strategy Tests phrase matching, proximity co-occurrence, and sliding window coverage maximisation — the three new tiers from the truncation fix.	2026-04-13 04:54:42 -07:00
Teknium	8dfee98d06	fix: clean up description escaping, add string-data tests Follow-up for cherry-picked PR #8918.	2026-04-13 04:45:07 -07:00
Dusk1e	c052cf0eea	fix(security): validate domain/service params in ha_call_service to prevent path traversal	2026-04-12 22:26:15 -07:00
Teknium	0d0d27d45e	test(tts): add speed config tests for Edge, OpenAI, and MiniMax 12 tests covering: - Provider-specific speed overrides global speed - Global speed used as fallback - Default (no speed) preserves existing behavior - Edge SSML rate string conversion (positive/negative) - OpenAI speed clamping to 0.25-4.0 range	2026-04-12 16:46:18 -07:00
Brooklyn Nicholson	2aea75e91e	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-12 13:18:55 -05:00
Teknium	f53a5a7fe1	fix: suppress duplicate completion notifications when agent already consumed output via wait/poll/log (#8228 ) When the agent calls process(action='wait') or process(action='poll') and gets the exited status, the completion_queue notification is redundant — the agent already has the output from the tool return. Previously, the drain loops in CLI and gateway would still inject the [SYSTEM: Background process completed] message, causing the agent to receive the same information twice. Fix: track session IDs in _completion_consumed set when wait/poll/log returns an exited process. Drain loops in cli.py and gateway watcher skip completion events for consumed sessions. Watch pattern events are never suppressed (they have independent semantics). Adds 4 tests covering wait/poll/log marking and running-process negative case.	2026-04-12 00:36:22 -07:00
Siddharth Balyan	27eeea0555	perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive (#8014 ) * perf(ssh,modal): bulk file sync via tar pipe and tar/base64 archive SSH: symlink-staging + tar -ch piped over SSH in a single TCP stream. Eliminates per-file scp round-trips. Handles timeout (kills both processes), SSH Popen failure (kills tar), and tar create failure. Modal: in-memory gzipped tar archive, base64-encoded, decoded+extracted in one exec call. Checks exit code and raises on failure. Both backends use shared helpers extracted into file_sync.py: - quoted_mkdir_command() — mirrors existing quoted_rm_command() - unique_parent_dirs() — deduplicates parent dirs from file pairs Migrates _ensure_remote_dirs to use the new helpers. 28 new tests (21 SSH + 7 Modal), all passing. Closes #7465 Closes #7467 * fix(modal): pipe stdin to avoid ARG_MAX, clean up review findings - Modal bulk upload: stream base64 payload through proc.stdin in 1MB chunks instead of embedding in command string (Modal SDK enforces 64KB ARG_MAX_BYTES — typical payloads are ~4.3MB) - Modal single-file upload: same stdin fix, add exit code checking - Remove what-narrating comments in ssh.py and modal.py (keep WHY comments: symlink staging rationale, SIGPIPE, deadlock avoidance) - Remove unnecessary `sandbox = self._sandbox` alias in modal bulk - Daytona: use shared helpers (unique_parent_dirs, quoted_mkdir_command) instead of inlined duplicates --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-04-12 06:18:05 +05:30
Teknium	14ccd32cee	refactor(terminal): remove check_interval parameter (#8001 ) The check_interval parameter on terminal_tool sent periodic output updates to the gateway chat, but these were display-only — the agent couldn't see or act on them. This added schema bloat and introduced a bug where notify_on_complete=True was silently dropped when check_interval was also set (the not-check_interval guard skipped fast-watcher registration, and the check_interval watcher dict was missing the notify_on_complete key). Removing check_interval entirely: - Eliminates the notify_on_complete interaction bug - Reduces tool schema size (one fewer parameter for the model) - Simplifies the watcher registration path - notify_on_complete (agent wake-on-completion) still works - watch_patterns (output alerting) still works - process(action='poll') covers manual status checking Closes #7947 (root cause eliminated rather than patched).	2026-04-11 17:16:11 -07:00
WAXLYY	6d272ba477	fix(tools): enforce ID uniqueness in TODO store during replace operations Deduplicate todo items by ID before writing to the store, keeping the last occurrence. Prevents ghost entries when the model sends duplicate IDs in a single write() call, which corrupts subsequent merge operations. Co-authored-by: WAXLYY <WAXLYY@users.noreply.github.com>	2026-04-11 16:22:50 -07:00
asheriif	97b0cd51ee	feat(gateway): surface natural mid-turn assistant messages in chat platforms Add display.interim_assistant_messages config (enabled by default) that forwards completed assistant commentary between tool calls to the user as separate chat messages. Models already emit useful status text like 'I'll inspect the repo first.' — this surfaces it on Telegram, Discord, and other messaging platforms instead of swallowing it. Independent from tool_progress and gateway streaming. Disabled for webhooks. Uses GatewayStreamConsumer when available, falls back to direct adapter send. Tracks response_previewed to prevent double-delivery when interim message matches the final response. Also fixes: cursor not stripped from fallback prefix in stream consumer (affected continuation calculation on no-edit platforms like Signal). Cherry-picked from PR #7885 by asheriif, default changed to enabled. Fixes #5016	2026-04-11 16:21:39 -07:00
Brooklyn Nicholson	ec553fdb49	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-11 17:15:41 -05:00
faishal	90352b2adf	fix: normalize checkpoint manager home-relative paths Adds _normalize_path() helper that calls expanduser().resolve() to properly handle tilde paths (e.g. ~/.hermes, ~/.config). Previously Path.resolve() alone treated ~ as a literal directory name, producing invalid paths like /root/~/.hermes. Also improves _run_git() error handling to distinguish missing working directories from missing git executable, and adds pre-flight directory validation. Cherry-picked from PR #7898 by faishal882. Fixes #7807	2026-04-11 14:50:44 -07:00
Teknium	f2893fe51a	fix(tools): neutralize shell injection in _write_to_sandbox via path quoting (#7940 ) _write_to_sandbox interpolated storage_dir and remote_path directly into a shell command passed to env.execute(). Paths containing shell metacharacters (spaces, semicolons, $(), backticks) could trigger arbitrary command execution inside the sandbox. Fix: wrap both paths with shlex.quote(). Clean paths (alphanumeric + slashes/hyphens/dots) are left unmodified by shlex.quote, so existing behavior is unchanged. Paths with unsafe characters get single-quoted. Tests added for spaces, $(command) substitution, and semicolon injection.	2026-04-11 14:26:11 -07:00
Dusk1e	255f59de18	fix(tools): prevent command argument injection and path traversal in checkpoint manager This commit addresses a security vulnerability where unsanitized user inputs for commit_hash and file_path were passed directly to git commands in CheckpointManager.restore() and diff(). It validates commit hashes to be strictly hexadecimal characters without leading dashes (preventing flag injection like '--patch') and enforces file paths to stay within the working directory via root resolution. Regression tests test_restore_rejects_argument_injection, test_restore_rejects_invalid_hex_chars, and test_restore_rejects_path_traversal were added.	2026-04-11 14:25:57 -07:00
Teknium	dfc820345d	fix: scope tool interrupt signal per-thread to prevent cross-session leaks (#7930 ) The interrupt mechanism in tools/interrupt.py used a process-global threading.Event. In the gateway, multiple agents run concurrently in the same process via run_in_executor. When any agent was interrupted (user sends a follow-up message), the global flag killed ALL agents' running tools — terminal commands, browser ops, web requests — across all sessions. Changes: - tools/interrupt.py: Replace single threading.Event with a set of interrupted thread IDs. set_interrupt() targets a specific thread; is_interrupted() checks the current thread. Includes a backward- compatible _ThreadAwareEventProxy for legacy _interrupt_event usage. - run_agent.py: Store execution thread ID at start of run_conversation(). interrupt() and clear_interrupt() pass it to set_interrupt() so only this agent's thread is affected. - tools/code_execution_tool.py: Use is_interrupted() instead of directly checking _interrupt_event.is_set(). - tools/process_registry.py: Same — use is_interrupted(). - tests: Update interrupt tests for per-thread semantics. Add new TestPerThreadInterruptIsolation with two tests verifying cross-thread isolation.	2026-04-11 14:02:58 -07:00
Teknium	75380de430	fix: reap orphaned browser sessions on startup (#7931 ) When a Python process exits uncleanly (SIGKILL, crash, gateway restart via hermes update), in-memory _active_sessions tracking is lost but the agent-browser node daemons and their Chromium child processes keep running indefinitely. On a long-running system this causes unbounded memory growth — 24 orphaned sessions consumed 7.6 GB on a production machine over 9 days. Add _reap_orphaned_browser_sessions() which scans the tmp directory for agent-browser-{h_,cdp_} socket dirs on cleanup thread startup. For each dir not tracked by the current process, reads the daemon PID file and sends SIGTERM if the daemon is still alive. Handles edge cases: dead PIDs, corrupt PID files, permission errors, foreign processes. The reaper runs once on thread startup (not every 30s) to avoid races with sessions being actively created by concurrent agents.	2026-04-11 14:02:46 -07:00
Teknium	04c1c5d53f	refactor: extract shared helpers to deduplicate repeated code patterns (#7917 ) * refactor: add shared helper modules for code deduplication New modules: - gateway/platforms/helpers.py: MessageDeduplicator, TextBatchAggregator, strip_markdown, ThreadParticipationTracker, redact_phone - hermes_cli/cli_output.py: print_info/success/warning/error, prompt helpers - tools/path_security.py: validate_within_dir, has_traversal_component - utils.py additions: safe_json_loads, read_json_file, read_jsonl, append_jsonl, env_str/lower/int/bool helpers - hermes_constants.py additions: get_config_path, get_skills_dir, get_logs_dir, get_env_path * refactor: migrate gateway adapters to shared helpers - MessageDeduplicator: discord, slack, dingtalk, wecom, weixin, mattermost - strip_markdown: bluebubbles, feishu, sms - redact_phone: sms, signal - ThreadParticipationTracker: discord, matrix - _acquire/_release_platform_lock: telegram, discord, slack, whatsapp, signal, weixin Net -316 lines across 19 files. * refactor: migrate CLI modules to shared helpers - tools_config.py: use cli_output print/prompt + curses_radiolist (-117 lines) - setup.py: use cli_output print helpers + curses_radiolist (-101 lines) - mcp_config.py: use cli_output prompt (-15 lines) - memory_setup.py: use curses_radiolist (-86 lines) Net -263 lines across 5 files. * refactor: migrate to shared utility helpers - safe_json_loads: agent/display.py (4 sites) - get_config_path: skill_utils.py, hermes_logging.py, hermes_time.py - get_skills_dir: skill_utils.py, prompt_builder.py - Token estimation dedup: skills_tool.py imports from model_metadata - Path security: skills_tool, cronjob_tools, skill_manager_tool, credential_files - Non-atomic YAML writes: doctor.py, config.py now use atomic_yaml_write - Platform dict: new platforms.py, skills_config + tools_config derive from it - Anthropic key: new get_anthropic_key() in auth.py, used by doctor/status/config/main * test: update tests for shared helper migrations - test_dingtalk: use _dedup.is_duplicate() instead of _is_duplicate() - test_mattermost: use _dedup instead of _seen_posts/_prune_seen - test_signal: import redact_phone from helpers instead of signal - test_discord_connect: _platform_lock_identity instead of _token_lock_identity - test_telegram_conflict: updated lock error message format - test_skill_manager_tool: 'escapes' instead of 'boundary' in error msgs	2026-04-11 13:59:52 -07:00
Teknium	cac6178104	fix(gateway): propagate user identity through process watcher pipeline Background process watchers (notify_on_complete, check_interval) created synthetic SessionSource objects without user_id/user_name. While the internal=True bypass (`1d8d4f28`) prevented false pairing for agent- generated notifications, the missing identity caused: - Garbage entries in pairing rate limiters (discord:None, telegram:None) - 'User None' in approval messages and logs - No user identity available for future code paths that need it Additionally, platform messages arriving without from_user (Telegram service messages, channel forwards, anonymous admin actions) could still trigger false pairing because they are not internal events. Fix: 1. Propagate user_id/user_name through the full watcher chain: session_context.py → gateway/run.py → terminal_tool.py → process_registry.py (including checkpoint persistence/recovery) 2. Add None user_id guard in _handle_message() — silently drop non-internal messages with no user identity instead of triggering the pairing flow. Salvaged from PRs #7664 (kagura-agent, ContextVar approach), #6540 (MestreY0d4-Uninter, tests), and #7709 (guang384, None guard). Closes #6341, #6485, #7643 Relates to #6516, #7392	2026-04-11 13:46:16 -07:00
Brooklyn Nicholson	9ccb490cf3	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-11 15:30:23 -05:00
0xbyt4	3ec8809b78	fix(vision): preserve aspect ratio during auto-resize Independent halving of width and height caused aspect ratio distortion for extreme dimensions (e.g. 8000x200 panoramas). When one axis hit the 64px floor, the other kept shrinking — collapsing the ratio toward 1:1. Use proportional scaling instead: when either dimension hits the floor, derive the effective scale factor and apply it to both axes. Add tests for extreme panorama (8000x200) and tall narrow (200x6000) images to verify aspect ratio preservation.	2026-04-11 11:53:04 -07:00
Brooklyn Nicholson	bf6af95ff5	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor	2026-04-11 13:14:36 -05:00
Brooklyn Nicholson	3fd5cf6e3c	feat: fix img pasting in new ink plus newline after tools	2026-04-11 13:14:32 -05:00
kshitijk4poor	50bb4fe010	fix(vision): auto-resize oversized images, increase default timeout, fix vision capability detection Cherry-picked from PR #7749 by kshitijk4poor with modifications: - Raise hard image limit from 5 MB to 20 MB (matches most restrictive provider) - Send images at full resolution first; only auto-resize to 5 MB on API failure - Add _is_image_size_error() helper to detect size-related API rejections - Auto-resize uses Pillow (soft dep) with progressive downscale + JPEG quality reduction - Fix get_model_capabilities() to check modalities.input for vision support - Increase default vision timeout from 30s to 120s (matches hardcoded fallback intent) - Applied retry-with-resize to both vision_analyze_tool and browser_vision Closes #7740	2026-04-11 11:12:50 -07:00
Brooklyn Nicholson	b04248f4d5	Merge branch 'main' of github.com:NousResearch/hermes-agent into feat/ink-refactor # Conflicts: # gateway/platforms/base.py # gateway/run.py # tests/gateway/test_command_bypass_active_session.py	2026-04-11 11:39:47 -05:00
Teknium	f459214010	feat: background process monitoring — watch_patterns for real-time output alerts * feat: add watch_patterns to background processes for output monitoring Adds a new 'watch_patterns' parameter to terminal(background=true) that lets the agent specify strings to watch for in process output. When a matching line appears, a notification is queued and injected as a synthetic message — triggering a new agent turn, similar to notify_on_complete but mid-process. Implementation: - ProcessSession gets watch_patterns field + rate-limit state - _check_watch_patterns() in ProcessRegistry scans new output chunks from all three reader threads (local, PTY, env-poller) - Rate limited: max 8 notifications per 10s window - Sustained overload (45s) permanently disables watching for that process - watch_queue alongside completion_queue, same consumption pattern - CLI drains watch_queue in both idle loop and post-turn drain - Gateway drains after agent runs via _inject_watch_notification() - Checkpoint persistence + crash recovery includes watch_patterns - Blocked in execute_code sandbox (like other bg params) - 20 new tests covering matching, rate limiting, overload kill, checkpoint persistence, schema, and handler passthrough Usage: terminal( command='npm run dev', background=true, watch_patterns=['ERROR', 'WARN', 'listening on port'] ) * refactor: merge watch_queue into completion_queue Unified queue with 'type' field distinguishing 'completion', 'watch_match', and 'watch_disabled' events. Extracted _format_process_notification() in CLI and gateway to handle all event types in a single drain loop. Removes duplication across both CLI drain sites and the gateway.	2026-04-11 03:13:23 -07:00
luyao618	50ad66aee6	test(tools): add unit tests for budget_config module Cover default constants, BudgetConfig defaults, frozen immutability, custom construction, and the resolve_threshold() priority chain (pinned > tool_overrides > registry > default). 20 tests total. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 02:58:48 -07:00
luyao618	80d82c2f5c	test(tools): add unit tests for tool_backend_helpers module Cover all public functions with 50 test cases: - managed_nous_tools_enabled() feature flag toggling - normalize_browser_cloud_provider() coercion and defaults - coerce_modal_mode() / normalize_modal_mode() validation - has_direct_modal_credentials() env vars and config file detection - resolve_modal_backend_state() full backend selection matrix - resolve_openai_audio_api_key() priority chain and edge cases Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 02:58:48 -07:00
Teknium	7241e6134b	fix: remove stale test (missing pop_pending), add headers to FakeResponse Follow-up fixes for cherry-pick conflicts: - Removed test_context_keeps_pending_approval test that referenced pop_pending() which doesn't exist on current main - Added headers attribute to FakeResponse in vision test (needed after #6949 added Content-Length check)	2026-04-11 02:03:20 -07:00
Kenny Xie	ae9a713a0a	test(approval): clear leaked bypass state	2026-04-11 02:03:20 -07:00
Kenny Xie	086d92a0e0	test(tools): isolate approval and audio gateway env	2026-04-11 02:03:20 -07:00
Tranquil-Flow	4e56eacdce	fix(vision): reject oversized images before API call, handle file:// URIs, improve 400 errors Three fixes for vision_analyze returning cryptic 400 "Invalid request data": 1. Pre-flight base64 size check — base64 inflates data ~33%, so a 3.8 MB file exceeds the 5 MB API limit. Reject early with a clear message instead of letting the provider return a generic 400. 2. Handle file:// URIs — strip the scheme and resolve as a local path. Previously file:///path/to/image.png fell through to the "invalid image source" error since it matched neither is_file() nor http(s). 3. Separate invalid_request errors from "does not support vision" errors so the user gets actionable guidance (resize/compress/retry) instead of a misleading "model does not support vision" message. Closes #6677	2026-04-11 02:03:20 -07:00
jjovalle99	640441b865	feat(tools): add Voxtral TTS provider (Mistral AI)	2026-04-11 01:56:55 -07:00
konsisumer	b87e0f59cc	fix(skills): read name from SKILL.md frontmatter in skills_sync _discover_bundled_skills() used the directory name to identify skills, but skills_tool.py and skills_hub.py use the `name:` field from SKILL.md frontmatter. This mismatch caused 9 builtin skills whose directory name differs from their SKILL.md name to be written to .bundled_manifest under the wrong key, so `hermes skills list` showed them as "local" instead of "builtin". Read the frontmatter name field (with directory-name fallback) so the manifest keys match what the rest of the codebase expects. Closes #6835	2026-04-11 01:21:20 -07:00
luyao618	fc06a0147e	fix(tools): remove dead code in _is_likely_binary and harden _check_lint against brace paths - Remove unreachable `if not content_sample` branch inside the truthy `if content_sample` block in `_is_likely_binary()` (dead code that could never execute). - Replace `linter_cmd.format(file=...)` with `linter_cmd.replace("{file}", ...)` in `_check_lint()` so file paths containing curly braces (e.g. `src/{test}.py`) no longer raise KeyError/ValueError. - Add 16 unit tests covering both fixes and edge cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-10 21:16:53 -07:00
hermes-agent-dhabibi	718e8ad6fa	feat(delegation): add configurable reasoning_effort for subagents Add delegation.reasoning_effort config key so subagents can run at a different thinking level than the parent agent. When set, overrides the parent's reasoning_config; when empty, inherits as before. Valid values: xhigh, high, medium, low, minimal, none (disables thinking). Config path: delegation.reasoning_effort in config.yaml Files changed: - tools/delegate_tool.py: resolve override in _build_child_agent - hermes_cli/config.py: add reasoning_effort to DEFAULT_CONFIG - tests/tools/test_delegate.py: 4 new tests covering all cases	2026-04-10 21:16:53 -07:00
Teknium	a8fd7257b1	feat(gateway): WSL-aware gateway with smart systemd detection (#7510 ) - Add shared is_wsl() to hermes_constants (like is_termux) - Update supports_systemd_services() to verify systemd is actually running on WSL before returning True - Add WSL-specific guidance in gateway install/start/setup/status for both cases: WSL+systemd and WSL without systemd - Improve help strings: 'run' now says recommended for WSL/Docker, 'start'/'install' now mention systemd/launchd explicitly - Add WSL gateway FAQ section with tmux/nohup/Task Scheduler tips - Update CLI commands docs with WSL tip - Deduplicate _is_wsl() from clipboard.py to shared hermes_constants - Fix clipboard tests to reset hermes_constants cache - 20 new WSL-specific tests covering detection, systemd check, supports_systemd_services integration, and command output Motivated by user feedback: took 1 hour to figure out run vs start on WSL, Telegram bot kept disconnecting due to flaky WSL systemd.	2026-04-10 21:15:47 -07:00
Hermes Agent	97bb64dbbf	test(file_sync): add tests for bulk_upload_fn callback Cover the three key behaviors: - bulk_upload_fn is called instead of per-file upload_fn - Fallback to upload_fn when bulk_upload_fn is None - Rollback on bulk upload failure retries all files	2026-04-10 21:14:32 -07:00
pefontana	8414f41856	test: add zombie process cleanup tests Add 9 tests covering the full zombie process prevention chain: - TestZombieReproduction: demonstrates that processes survive when references are dropped without explicit cleanup (the original bug) - TestAgentCloseMethod: verifies close() calls all cleanup functions, is idempotent, propagates to children, and continues cleanup even when individual steps fail - TestGatewayCleanupWiring: verifies stop() calls close() and that _evict_cached_agent() does NOT call close() (since it's also used for non-destructive cache refreshes) - TestDelegationCleanup: calls the real _run_single_child function and verifies close() is called on the child agent Ref: #7131	2026-04-10 16:51:44 -07:00
KUSH42	0e939af7c2	fix(patch): harden V4A patch parser and fuzzy match — 9 correctness bugs - Bug 1: replace read_file(limit=10000) with read_file_raw in _apply_update, preventing silent truncation of files >2000 lines and corruption of lines >2000 chars; add read_file_raw to FileOperations abstract interface and ShellFileOperations - Bug 2: split apply_v4a_operations into validate-then-apply phases; if any hunk fails validation, zero writes occur (was: continue after failure, leaving filesystem partially modified) - Bug 3: parse_v4a_patch now returns an error for begin-marker-with-no-ops, empty file paths, and moves missing a destination (was: always returned error=None) - Bug 4: raise strategy 7 (block anchor) single-candidate similarity threshold from 0.10 to 0.50, eliminating false-positive matches in repetitive code - Bug 5: add _strategy_unicode_normalized (new strategy 7) with position mapping via _build_orig_to_norm_map; smart quotes and em-dashes in LLM-generated patches now match via strategies 1-6 before falling through to fuzzy strategies - Bug 6: extend fuzzy_find_and_replace to return 4-tuple (content, count, error, strategy); update all 5 call sites across patch_parser.py, file_operations.py, and skill_manager_tool.py - Bug 7: guard in _apply_update returns error when addition-only context hint is ambiguous (>1 occurrences); validation phase errors on both 0 and >1 - Bug 8: _apply_delete returns error (not silent success) on missing file - Bug 9: _validate_operations checks source existence and destination absence for MOVE operations before any write occurs	2026-04-10 16:47:44 -07:00
Awsh1	6f63ba9c8f	fix(mcp): fall back when SIGKILL is unavailable	2026-04-10 16:47:44 -07:00
Teknium	363d5d57be	test: update schema assertion after maxItems removal	2026-04-10 13:38:14 -07:00
angelos	7ccdb74364	fix(delegate): make max_concurrent_children configurable + error on excess `delegate_task` silently truncated batch tasks to 3 — the model sends 5 tasks, gets results for 3, never told 2 were dropped. Now returns a clear tool_error explaining the limit and how to fix it. The limit is configurable via: - delegation.max_concurrent_children in config.yaml (priority 1) - DELEGATION_MAX_CONCURRENT_CHILDREN env var (priority 2) - default: 3 Uses the same _load_config() path as the rest of delegate_task for consistent config priority. Clamps to min 1, warns on non-integer config values. Also removes the hardcoded maxItems: 3 from the JSON schema — the schema was blocking the model from even attempting >3 tasks before the runtime check could fire. The runtime check gives a much more actionable error message. Backwards compatible: default remains 3, existing configs unchanged.	2026-04-10 13:38:14 -07:00
kshitijk4poor	37a1c75716	fix(browser): hardening — dead code, caching, scroll perf, security, thread safety Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools and unrelated scope additions from the contributor's commit). - Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema - Fix _camofox_eval wrong call signatures (_ensure_tab, _post args) - Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs - Replace 5x subprocess scroll loop with single pixel-arg call - URL-decode before secret exfiltration check (bypass prevention) - Protect _recording_sessions with _cleanup_lock (thread safety) - Return failure on empty stdout instead of silent success - Structure-aware _truncate_snapshot (cut at line boundaries) Follow-up improvements over contributor's original: - Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation) - Fix list+tuple concat in _run_browser_command PATH construction - Update test_browser_homebrew_paths.py for tuple returns and cache fixtures Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Closes #7168, closes #7171, closes #7172, closes #7173	2026-04-10 13:05:44 -07:00
Teknium	a093eb47f7	fix: propagate child activity to parent during delegate_task (#7295 ) When delegate_task runs, the parent agent's activity tracker freezes because child.run_conversation() blocks and the child's own _touch_activity() never propagates back to the parent. The gateway inactivity timeout then fires a spurious 'No activity' warning and eventually kills the agent, even though the subagent is actively working. Fix: add a heartbeat thread in _run_single_child that calls parent._touch_activity() every 30 seconds with detail from the child's activity summary (current tool, iteration count). The thread is a daemon that starts before child.run_conversation() and is cleaned up in the finally block. This also improves the gateway 'Still working...' status messages — instead of just 'running: delegate_task', users now see what the subagent is actually doing (e.g., 'delegate_task: subagent running terminal (iteration 5/50)').	2026-04-10 12:51:30 -07:00
win4r	aedf6c7964	security(approval): close 4 pattern gaps found by source-grounded audit Four gaps in DANGEROUS_PATTERNS found by running 10 targeted tests that each mapped to a specific pattern in approval.py and checked whether the documented defense actually held. 1. Heredoc script injection — `python3 << 'EOF'` bypasses the existing `-e`/`-c` flag pattern. Adds pattern for interpreter + `<<` covering python{2,3}, perl, ruby, node. 2. PID expansion self-termination — `kill -9 $(pgrep hermes)` is opaque to the existing `pkill\|killall` + name pattern because command substitution is not expanded at detection time. Adds structural patterns matching `kill` + `$(pgrep` and backtick variants. 3. Git destructive operations — `git reset --hard`, `push --force`, `push -f`, `clean -f`, and `branch -D` were entirely absent. Note: `branch -d` also triggers because IGNORECASE is global — acceptable since -d is still a delete, just a safe one, and the prompt is only a confirmation, not a hard block. 4. chmod +x then execute* — two-step social engineering where a script containing dangerous commands is first written to disk (not checked by write_file), then made executable and run as `./script`. Pattern catches `chmod +x ... [;&\|]+ ./` combos. Does not solve the deeper architectural issue (write_file not checking content) — that is called out in the PR description as a known limitation. Tests: 23 new cases across 4 test classes, all in test_approval.py: - TestHeredocScriptExecution (7 cases, incl. regressions for -c) - TestPgrepKillExpansion (5 cases, incl. safe kill PID negative) - TestGitDestructiveOps (8 cases, incl. safe git status/push negatives) - TestChmodExecuteCombo (3 cases, incl. safe chmod-only negative) Full suite: 146 passed, 0 failed.	2026-04-10 05:19:21 -07:00
Dusk1e	e683c9db90	fix(security): enforce path boundary checks in skill manager operations	2026-04-10 05:19:21 -07:00
Teknium	c8e4dcf412	fix: prevent duplicate completion notifications on process kill (#7124 ) When kill_process() sends SIGTERM, both it and the reader thread race to call _move_to_finished() — kill_process sets exit_code=-15 and enqueues a notification, then the reader thread's process.wait() returns with exit_code=143 (128+SIGTERM) and enqueues a second one. Fix: make _move_to_finished() idempotent by tracking whether the session was actually removed from _running. The second call sees it was already moved and skips the completion_queue.put(). Adds regression test: test_move_to_finished_idempotent_no_duplicate	2026-04-10 03:52:16 -07:00
Teknium	957485876b	fix: update 6 test files broken by dead code removal - test_percentage_clamp.py: remove TestContextCompressorUsagePercent class and test_context_compressor_clamped (tested removed get_status() method) - test_credential_pool.py: remove test_mark_used_increments_request_count (tested removed mark_used()), replace active_lease_count() calls with direct _active_leases dict access, remove mark_used from thread test - test_session.py: replace SessionSource.local_cli() factory calls with direct SessionSource construction (local_cli classmethod removed) - test_error_classifier.py: remove test_is_transient_property (tested removed is_transient property on ClassifiedError) - test_delivery.py: remove TestDeliveryRouter class (tested removed resolve_targets method), clean up unused imports - test_skills_hub.py: remove test_is_hub_installed (tested removed is_hub_installed method on HubLockFile)	2026-04-10 03:44:43 -07:00
alt-glitch	cff9b7ffab	fix: restore 6 tests that tested live code but used deleted helpers	2026-04-10 03:44:43 -07:00
alt-glitch	96c060018a	fix: remove 115 verified dead code symbols across 46 production files Automated dead code audit using vulture + coverage.py + ast-grep intersection, confirmed by Opus deep verification pass. Every symbol verified to have zero production callers (test imports excluded from reachability analysis). Removes ~1,534 lines of dead production code across 46 files and ~1,382 lines of stale test code. 3 entire files deleted (agent/builtin_memory_provider.py, hermes_cli/checklist.py, tests/hermes_cli/test_setup_model_selection.py). Co-authored-by: alt-glitch <balyan.sid@gmail.com>	2026-04-10 03:44:43 -07:00
Teknium	04baab5422	fix(mcp): combine content and structuredContent when both present (#7118 ) When an MCP server returns both content (model-oriented text) and structuredContent (machine-oriented JSON), the client now combines them instead of discarding content. The text content becomes the primary result (what the agent reads), and structuredContent is included as supplementary metadata. Previously, structuredContent took full precedence — causing data loss for servers like Desktop Commander that put the actual file text in content and metadata in structuredContent. MCP spec guidance: for conversational/agent UX, prefer content.	2026-04-10 03:44:35 -07:00
tars	9a0dfb5a6d	fix(gateway): scope /yolo to the active session	2026-04-10 03:38:44 -07:00
Teknium	0f597dd127	fix: STT provider-model mismatch — whisper-1 fed to faster-whisper (#7113 ) Legacy flat stt.model config key (from cli-config.yaml.example and older versions) was passed as a model override to transcribe_audio() by the gateway, bypassing provider-specific model resolution. When the provider was 'local' (faster-whisper), this caused: ValueError: Invalid model size 'whisper-1' Changes: - gateway/run.py, discord.py: stop passing model override — let transcribe_audio() handle provider-specific model resolution internally - get_stt_model_from_config(): now provider-aware, reads from the correct nested section (stt.local.model, stt.openai.model, etc.); ignores legacy flat key for local provider to prevent model name mismatch - cli-config.yaml.example: updated STT section to show nested provider config structure instead of legacy flat key - config migration v13→v14: moves legacy stt.model to the correct provider section and removes the flat key Reported by community user on Discord.	2026-04-10 03:27:30 -07:00
maxyangcn	19292eb8bf	feat(cron): support Discord thread_id in deliver targets Add Discord thread support to cron delivery and send_message_tool. - _parse_target_ref: handle discord platform with chat_id:thread_id format - _send_discord: add thread_id param, route to /channels/{thread_id}/messages - _send_to_platform: pass thread_id through for Discord - Discord adapter send(): read thread_id from metadata for gateway path - Update tool schema description to document Discord thread targets Cherry-picked from PR #7046 by pandacooming (maxyangcn). Follow-up fixes: - Restore proxy support (resolve_proxy_url/proxy_kwargs_for_aiohttp) that was accidentally deleted — would have caused NameError at runtime - Remove duplicate _DISCORD_TARGET_RE regex; reuse existing _TELEGRAM_TOPIC_TARGET_RE via _NUMERIC_TOPIC_RE alias (identical pattern) - Fix misleading test comments about Discord negative snowflake IDs (Discord uses positive snowflakes; negative IDs are a Telegram convention) - Rewrite misleading scheduler test that claimed to exercise home channel fallback but actually tested the explicit platform:chat_id parsing path	2026-04-10 03:20:05 -07:00
alt-glitch	aad40f6d0c	fix(tests): update mocks for file sync changes - Modal snapshot tests: accept **kw in iter_skills_files/iter_cache_files mock lambdas to match new container_base kwarg - SSH preflight test: mock _detect_remote_home, _ensure_remote_dirs, init_session, and FileSyncManager added in file sync PR	2026-04-10 03:01:46 -07:00
alt-glitch	41c233cb99	test: add reproducible perf benchmark for file sync overhead Direct env.execute() timing — no LLM in the loop. Measures per-command wall-clock including sync check. Results on SSH: - echo median: 617ms (pure SSH round-trip + spawn overhead) - sync-triggered after 6s wait: 621ms (mtime skip adds ~0ms) - within-interval (no sync): 618ms Confirms mtime skip makes sync overhead unmeasurable.	2026-04-10 03:01:46 -07:00
alt-glitch	1f1f297528	feat(environments): unified file sync with change tracking and deletion Replace per-backend ad-hoc file sync with a shared FileSyncManager that handles mtime-based change detection, remote deletion of locally-removed files, and transactional state updates. - New FileSyncManager class (tools/environments/file_sync.py) with callbacks for upload/delete, rate limiting, and rollback - Shared iter_sync_files() eliminates 3 duplicate implementations - SSH: replace unconditional rsync with scp + mtime skip - Modal/Daytona: replace inline _synced_files dict with manager - All 3 backends now sync credentials + skills + cache uniformly - Remote deletion: files removed locally are cleaned from remote - HERMES_FORCE_FILE_SYNC=1 env var for debugging - Base class _before_execute() simplified to empty hook - 12 unit tests covering mtime skip, deletion, rollback, rate limiting	2026-04-10 03:01:46 -07:00
Teknium	a420235b66	fix: reject foreground timeout above cap instead of clamping Change behavior from silent clamping to returning an error when the model requests a foreground timeout exceeding FOREGROUND_MAX_TIMEOUT. This forces the model to use background=true for long-running commands rather than silently changing its intent. - Config default timeouts above the cap are NOT rejected (user's choice) - Only explicit model-requested timeouts trigger rejection - Added boundary test for timeout exactly at the limit	2026-04-10 02:58:54 -07:00
kshitijk4poor	6c3565df57	fix(terminal): cap foreground timeout to prevent session deadlocks When the model calls terminal() in foreground mode without background=true (e.g. to start a server), the tool call blocks until the command exits or the timeout expires. Without an upper bound the model can request arbitrarily high timeouts (the schema had minimum=1 but no maximum), blocking the entire agent session for hours until the gateway idle watchdog kills it. Changes: - Add FOREGROUND_MAX_TIMEOUT (600s, configurable via TERMINAL_MAX_FOREGROUND_TIMEOUT env var) that caps foreground timeout - Clamp effective_timeout to the cap when background=false and timeout exceeds the limit - Include a timeout_note in the tool result when clamped, nudging the model to use background=true for long-running processes - Update schema description to show the max timeout value - Remove dead clamping code in the background branch that could never fire (max_timeout was set to effective_timeout, so timeout > max_timeout was always false) - Add 7 tests covering clamping, no-clamping, config-default-exceeds-cap edge case, background bypass, default timeout, constant value, and schema content Self-review fixes: - Fixed bug where timeout_note said 'Requested timeout Nones' when clamping fired from config default exceeding cap (timeout param is None). Now uses unclamped_timeout instead of the raw timeout param. - Removed unused pytest import from test file - Extracted test config dict into _make_env_config() helper - Fixed tautological test_default_value assertion - Added missing test for config default > cap with no model timeout	2026-04-10 02:58:54 -07:00
Young	940237c6fd	fix(cli): prevent stale image attachment on text paste and voice input Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-10 02:58:18 -07:00
Austin Pickett	f805323517	chore: merge main	2026-04-09 20:00:34 -04:00
adybag14-cyber	c3141429b7	fix(termux): tighten voice setup and mobile chat UX	2026-04-09 16:24:53 -07:00
adybag14-cyber	769ec1ee1a	fix(termux): deepen browser, voice, and tui support	2026-04-09 16:24:53 -07:00
adybag14-cyber	3237733ca5	fix(termux): harden execute_code and mobile browser/audio UX	2026-04-09 16:24:53 -07:00
adybag14-cyber	54d5138a54	fix(termux): harden env-backed background jobs	2026-04-09 16:24:53 -07:00
adybag14-cyber	122925a6f2	fix(termux): honor temp dirs for local temp artifacts	2026-04-09 16:24:53 -07:00
adybag14-cyber	e79cc88985	feat: add tested Termux install path and EOF-aware gh auth	2026-04-09 16:24:53 -07:00
dashed	7f7b02b764	fix(slack): comprehensive mrkdwn formatting — 6 bug fixes + 52 tests Fixes blockquote > escaping, edit_message raw markdown, *bold italic* handling, HTML entity double-escaping (&amp;), Wikipedia URL parens truncation, and step numbering format. Also adds format_message to the tool-layer _send_to_platform for consistent formatting across all delivery paths. Changes: - Protect Slack entities (<@user>, <https://...\|label>, <!here>) from escaping passes - Protect blockquote > markers before HTML entity escaping - Unescape-before-escape for idempotent HTML entity handling - *bold italic* → _text_ conversion (before bold pass) - URL regex upgraded to handle balanced parentheses - mrkdwn:True flag on chat_postMessage payloads - format_message applied in edit_message and send_message_tool - 52 new tests (format, edit, streaming, splitting, tool chunking) - Use reversed(dict) idiom for placeholder restoration Based on PR #3715 by dashed, cherry-picked onto current main.	2026-04-09 14:07:32 -07:00
Dylan Socolobsky	c6dba918b3	fix(tests): fix several failing/flaky tests on main (#6777 ) * fix(tests): mock is_safe_url in tests that use example.com Tests using example.com URLs were failing because is_safe_url does a real DNS lookup which fails in environments where example.com doesn't resolve, causing the request to be blocked before reaching the already-mocked HTTP client. This should fix around 17 failing tests. These tests test logic, caching, etc. so mocking this method should not modify them in any way. TestMattermostSendUrlAsFile was already doing this so we follow the same pattern. * fix(test): use case-insensitive lookup for model context length check DEFAULT_CONTEXT_LENGTHS uses inconsistent casing (MiniMax keys are lowercase, Qwen keys are mixed-case) so the test was broken in some cases since it couldn't find the model. * fix(test): patch is_linux in systemd gateway restart test The test only patched is_macos to False but didn't patch is_linux to True. On macOS hosts, is_linux() returns False and the systemd restart code path is skipped entirely, making the assertion fail. * fix(test): use non-blocklisted env var in docker forward_env tests GITHUB_TOKEN is in api_key_env_vars and thus in _HERMES_PROVIDER_ENV_BLOCKLIST so the env var is silently dropped, we replace it with a non-blocked one like DATABASE_URL so the tests actually work. * fix(test): fully isolate _has_any_provider_configured from host env _has_any_provider_configured() checks all env vars from PROVIDER_REGISTRY (not just the 5 the tests were clearing) and also calls get_auth_status() which detects gh auth token for Copilot. On machines with any of these set, the function returns True before reaching the code path under test. Clear all registry vars and mock get_auth_status so host credentials don't interfere. * fix(test): correct path to hermes_base_env.py in tool parser tests Path(__file__).parent.parent resolved to tests/, not the project root. The file lives at environments/hermes_base_env.py so we need one more parent level. * fix(test): accept optional HTML fields in Matrix send payload _send_matrix sometimes adds format and formatted_body when the markdown library is installed. The test was doing an exact dict equality check which broke. Check required fields instead. * fix(test): add config.yaml to codex vision requirements test The test only wrote auth.json but not config.yaml, so _read_main_provider() returned empty and vision auto-detect never tried the codex provider. Add a config.yaml pointing at openai-codex so the fallback path actually resolves the client. * fix(test): clear OPENROUTER_API_KEY in _isolate_hermes_home run_agent.py calls load_hermes_dotenv() at import time, which injects API keys from ~/.hermes/.env into os.environ before any test fixture runs. This caused test_agent_loop_tool_calling to make real API calls instead of skipping, which ends up making some tests fail. * fix(test): add get_rate_limit_state to agent mock in usage report tests _show_usage now calls agent.get_rate_limit_state() for rate limit display. The SimpleNamespace mock was missing this method. * fix(test): update expected Camofox config version from 12 to 13 * fix(test): mock _get_enabled_platforms in nous managed defaults test Importing gateway.run leaks DISCORD_BOT_TOKEN into os.environ, which makes _get_enabled_platforms() return ["cli", "discord"] instead of just ["cli"]. tools_command loops per platform, so apply_nous_managed_defaults runs twice: the first call sets config values, the second sees them as already configured and returns an empty set, causing the assertion to fail.	2026-04-09 13:17:06 -07:00
Brooklyn Nicholson	00e1d42b9e	feat: image pasting	2026-04-09 13:45:23 -05:00
Lumen Radley	e22416dd9b	fix: handle empty sudo password and false prompts	2026-04-09 02:50:07 -07:00
helix4u	e94008c404	fix(terminal): guard invalid command values	2026-04-08 21:37:51 -07:00
Teknium	e19252afc4	fix: update tests for unified spawn-per-call execution model - Docker env tests: verify _build_init_env_args() instead of per-execute Popen flags (env forwarding is now init-time only) - Docker: preserve explicit forward_env bypass of blocklist from main - Daytona tests: adapt to SDK-native timeout, _ThreadedProcessHandle, base.py interrupt handling, HERMES_STDIN_ heredoc prefix - Modal tests: fix _load_module to include _ThreadedProcessHandle stub, check ensurepip in _resolve_modal_image instead of __init__ - SSH tests: mock time.sleep on base module instead of removed ssh import - Add missing BaseEnvironment attributes to __new__()-based test fixtures	2026-04-08 17:23:15 -07:00
alt-glitch	d684d7ee7e	feat(environments): unified spawn-per-call execution layer Replace dual execution model (PersistentShellMixin + per-backend oneshot) with spawn-per-call + session snapshot for all backends except ManagedModal. Core changes: - Every command spawns a fresh bash process; session snapshot (env vars, functions, aliases) captured at init and re-sourced before each command - CWD persists via file-based read (local) or in-band stdout markers (remote) - ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends - cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop) - Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store, _save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS - Rate-limited file sync unified in base _before_execute() with _sync_files() hook - execute_oneshot() removed; all 11 call sites in code_execution_tool.py migrated to execute() - Daytona timeout wrapper replaced with SDK-native timeout parameter - persistent_shell.py deleted (291 lines) Backend-specific: - Local: process-group kill via os.killpg, file-based CWD read - Docker: -e env flags only on init_session, not per-command - SSH: shlex.quote transport, ControlMaster connection reuse - Singularity: apptainer exec with instance://, no forced --pwd - Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate - Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop - ManagedModal: unchanged (gateway owns execution); docstring added explaining why	2026-04-08 17:23:15 -07:00
jjovalle99	d46db0a1b4	fix(tools): use correct import path for mistralai SDK mistralai v2.x is a namespace package — `Mistral` class lives at `mistralai.client`, not at the top-level `mistralai` module. The previous `from mistralai import Mistral` raises ImportError at runtime. Update both production code and test fixture to use the correct path.	2026-04-08 13:47:08 -07:00
jjovalle99	5f4b93c20f	feat(tools): add Voxtral Transcribe STT provider (Mistral AI)	2026-04-08 13:47:08 -07:00
Teknium	4f467700d4	fix(doctor): only check the active memory provider, not all providers unconditionally (#6285 ) * fix(tools): skip camofox auto-cleanup when managed persistence is enabled When managed_persistence is enabled, cleanup_browser() was calling camofox_close() which destroys the server-side browser context via DELETE /sessions/{userId}, killing login sessions across cron runs. Add camofox_soft_cleanup() — a public wrapper that drops only the in-memory session entry when managed persistence is on, returning True. When persistence is off it returns False so the caller falls back to the full camofox_close(). The inactivity reaper still handles idle resource cleanup. Also surface a logger.warning() when _managed_persistence_enabled() fails to load config, replacing a silent except-and-return-False. Salvaged from #6182 by el-analista (Eduardo Perea Fernandez). Added public API wrapper to avoid cross-module private imports, and test coverage for both persistence paths. Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com> * fix(doctor): only check the active memory provider, not all providers unconditionally hermes doctor had hardcoded Honcho Memory and Mem0 Memory sections that always ran regardless of the user's memory.provider config setting. After the swappable memory provider update (#4623), users with leftover Honcho config but no active provider saw false 'broken' errors. Replaced both sections with a single Memory Provider section that reads memory.provider from config.yaml and only checks the configured provider. Users with no external provider see a green 'Built-in memory active' check. Reported by community user michaelruiz001, confirmed by Eri (Honcho). --------- Co-authored-by: Eduardo Perea Fernandez <el-analista@users.noreply.github.com>	2026-04-08 13:44:58 -07:00
mrshu	19b0ddce40	fix(process): correct detached crash recovery state Previously crash recovery recreated detached sessions as if they were fully managed, so polls and kills could lie about liveness and the checkpoint could forget recovered jobs after the next restart. This commit refreshes recovered host-backed sessions from real PID state, keeps checkpoint data durable, and preserves notify watcher metadata while treating sandbox-only PIDs as non-recoverable. - Persist `pid_scope` in `tools/process_registry.py` and skip recovering sandbox-backed entries without a host-visible PID handle - Refresh detached sessions on access so `get`/`poll`/`wait` and active session queries observe exited processes instead of hanging forever - Allow recovered host PIDs to be terminated honestly and requeue `notify_on_complete` watchers during checkpoint recovery - Add regression tests for durable checkpoints, detached exit/kill behavior, sandbox skip logic, and recovered notify watchers	2026-04-08 03:35:43 -07:00
Vasanthdev2004	085c1c6875	fix(browser): preserve agent-browser paths with spaces	2026-04-08 02:35:48 -07:00
Teknium	3696c74bfb	fix: preserve existing thresholds, remove pre-read byte guard - DEFAULT_RESULT_SIZE_CHARS: 50K -> 100K (match current _LARGE_RESULT_CHARS) - DEFAULT_PREVIEW_SIZE_CHARS: 2K -> 1.5K (match current _LARGE_RESULT_PREVIEW_CHARS) - Per-tool overrides all set to 100K (terminal, execute_code, search_files) - Remove pre-read byte guard (no behavioral regression vs current main) - Revert limit signature change to int=500 (match current default) - Restore original read_file schema description - Update test assertions to match 100K thresholds	2026-04-08 02:24:32 -07:00
alt-glitch	bbcff8dcd0	fix(tools): address PR review — remove _extract_raw_output, BudgetConfig everywhere, read_file hardening - Remove _extract_raw_output: persist content verbatim (fixes size mismatch bug) - Drop import aliases: import from budget_config directly, one canonical name - BudgetConfig param on maybe_persist_tool_result and enforce_turn_budget - read_file: limit=None signature, pre-read guard fires only when limit omitted (256KB) - Unify binary extensions: file_operations.py imports from binary_extensions.py - Exclude .pdf and .svg from binary set (text-based, agents may inspect) - Remove redundant outer try/except in eval path (internal fallback handles it) - Fix broken tests: update assertion strings for new persistence format - Module-level constants: _PRE_READ_MAX_BYTES, _DEFAULT_READ_LIMIT - Remove redundant pathlib import (Path already at module level) - Update spec.md with IMPLEMENTED annotations and design decisions	2026-04-08 02:24:32 -07:00
alt-glitch	65e24c942e	wip: tool result fixes -- persistence	2026-04-08 02:24:32 -07:00
Teknium	b9a5e6e247	fix: use camelCase structuredContent attr, prefer structured over text - The MCP SDK Pydantic model uses camelCase (structuredContent), not snake_case (structured_content). The original getattr was a silent no-op. - When structuredContent is present, return it AS the result instead of alongside text — the structured payload is the machine-readable data. - Move test file to tests/tools/ and fix fake class to use camelCase. - Patch _run_on_mcp_loop in tests so the handler actually executes.	2026-04-07 18:00:01 -07:00
Siddharth Balyan	f3006ebef9	refactor(tests): re-architect tests + fix CI failures (#5946 ) * refactor: re-architect tests to mirror the codebase * Update tests.yml * fix: add missing tool_error imports after registry refactor * fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers, causing test_code_execution tests to hit the Modal remote path. * fix(tests): fix update_check and telegram xdist failures - test_update_check: replace patch("hermes_cli.banner.os.getenv") with monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os directly, it uses get_hermes_home() from hermes_constants. - test_telegram_conflict/approval_buttons: provide real exception classes for telegram.error mock (NetworkError, TimedOut, BadRequest) so the except clause in connect() doesn't fail with "catching classes that do not inherit from BaseException" when xdist pollutes sys.modules. * fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock	2026-04-07 17:19:07 -07:00
kshitijk4poor	f4528c885b	feat(clipboard): add native Windows image paste support Add win32 platform branch to clipboard.py so Ctrl+V image paste works on native Windows (PowerShell / Windows Terminal), not just WSL2. Uses the same .NET System.Windows.Forms.Clipboard approach as the WSL path but calls PowerShell directly instead of powershell.exe (the WSL cross-call path). Tries 'powershell' first (Windows PowerShell 5.1, always available), then 'pwsh' (PowerShell 7+). PowerShell executable is discovered once and cached for the process lifetime. Includes 14 new tests covering: - Platform dispatch (save_clipboard_image + has_clipboard_image) - Image detection via PowerShell .NET check - Base64 PNG extraction and decode - Edge cases: no PowerShell, empty output, invalid base64, timeout	2026-04-07 12:49:39 -07:00
Teknium	caded0a5e7	fix: repair 57 failing CI tests across 14 files (#5823 ) * fix: repair 57 failing CI tests across 14 files Categories of fixes: Test isolation under xdist (-n auto): - test_hermes_logging: Strip ALL RotatingFileHandlers before each test to prevent handlers leaked from other xdist workers from polluting counts - test_code_execution: Force TERMINAL_ENV=local in setUp — prevents Modal AuthError when another test leaks TERMINAL_ENV=modal - test_timezone: Same TERMINAL_ENV fix for execute_code timezone tests - test_codex_execution_paths: Mock _resolve_turn_agent_config to ensure model resolution works regardless of xdist worker state Matrix adapter tests (nio not installed in CI): - Add _make_fake_nio() helper with real response classes for isinstance() checks in production code - Replace MagicMock(spec=nio.XxxResponse) with fake_nio instances - Wrap production method calls with patch.dict('sys.modules', {'nio': ...}) so import nio succeeds in method bodies - Use try/except instead of pytest.importorskip for nio.crypto imports (importorskip can be fooled by MagicMock in sys.modules) - test_matrix_voice: Skip entire file if nio is a mock, not just missing Stale test expectations: - test_cli_provider_resolution: _prompt_provider_choice now takes kwargs (default param added); mock getpass.getpass alongside input - test_anthropic_oauth_flow: Mock getpass.getpass (code switched from input) - test_gemini_provider: Mock models.dev + OpenRouter API lookups to test hardcoded defaults without external API variance - test_code_execution: Add notify_on_complete to blocked terminal params - test_setup_openclaw_migration: Mock prompt_choice to select 'Full setup' (new quick-setup path leads to _require_tty → sys.exit in CI) - test_skill_manager_tool: Patch get_all_skills_dirs alongside SKILLS_DIR so _find_skill searches tmp_path, not real ~/.hermes/skills/ Missing attributes in object.__new__ test runners: - test_platform_reconnect: Add session_store to _make_runner() - test_session_race_guard: Add hooks, _running_agents_ts, session_store, delivery_router to _make_runner() Production bug fix (gateway/run.py):** - Fix sentinel eviction race: _AGENT_PENDING_SENTINEL was immediately evicted by the stale-detection logic because sentinels have no get_activity_summary() method, causing _stale_idle=inf >= timeout. Guard _should_evict with 'is not _AGENT_PENDING_SENTINEL'. * fix: address remaining CI failures - test_setup_openclaw_migration: Also mock _offer_launch_chat (called at end of both quick and full setup paths) - test_code_execution: Move TERMINAL_ENV=local to module level to protect ALL test classes (TestEnvVarFiltering, TestExecuteCodeEdgeCases, TestInterruptHandling, TestHeadTailTruncation) from xdist env leaks - test_matrix: Use try/except for nio.crypto imports (importorskip can be fooled by MagicMock in sys.modules under xdist)	2026-04-07 09:58:45 -07:00
Ben Barclay	b2f477a30b	feat: switch managed browser provider from Browserbase to Browser Use (#5750 ) * feat: switch managed browser provider from Browserbase to Browser Use The Nous subscription tool gateway now routes browser automation through Browser Use instead of Browserbase. This commit: - Adds managed Nous gateway support to BrowserUseProvider (idempotency keys, X-BB-API-Key auth header, external_call_id persistence) - Removes managed gateway support from BrowserbaseProvider (now direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID) - Updates browser_tool.py fallback: prefers Browser Use over Browserbase - Updates nous_subscription.py: gateway vendor 'browser-use', auto-config sets cloud_provider='browser-use' for new subscribers - Updates tools_config.py: Nous Subscription entry now uses Browser Use - Updates setup.py, cli.py, status.py, prompt_builder.py display strings - Updates all affected tests to match new behavior Browserbase remains fully functional for users with direct API credentials. The change only affects the managed/subscription path. * chore: remove redundant Browser Use hint from system prompt * fix: upgrade Browser Use provider to v3 API - Base URL: api/v2 -> api/v3 (v2 is legacy) - Unified all endpoints to use native Browser Use paths: - POST /browsers (create session, returns cdpUrl) - PATCH /browsers/{id} with {action: stop} (close session) - Removed managed-mode branching that used Browserbase-style /v1/sessions paths — v3 gateway now supports /browsers directly - Removed unused managed_mode variable in close_session * fix(browser-use): use X-Browser-Use-API-Key header for managed mode The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key (which is a Browserbase-specific header). Using the wrong header caused a 401 AUTH_ERROR on every managed-mode browser session create. Simplified _headers() to always use X-Browser-Use-API-Key regardless of direct vs managed mode. * fix(nous_subscription): browserbase explicit provider is direct-only Since managed Nous gateway now routes through Browser Use, the browserbase explicit provider path should not check managed_browser_available (which resolves against the browser-use gateway). Simplified to direct-only with managed=False. * fix(browser-use): port missing improvements from PR #5605 - CDP URL normalization: resolve HTTP discovery URLs to websocket after cloud provider create_session() (prevents agent-browser failures) - Managed session payload: send timeout=5 and proxyCountryCode=us for gateway-backed sessions (prevents billing overruns) - Update prompt builder, browser_close schema, and module docstring to replace remaining Browserbase references with Browser Use - Dynamic /browser status detection via _get_cloud_provider() instead of hardcoded env var checks (future-proof for new providers) - Rename post_setup key from 'browserbase' to 'agent_browser' - Update setup hint to mention Browser Use alongside Browserbase - Add tests: CDP normalization, browserbase direct-only guard, managed browser-use gateway, direct browserbase fallback --------- Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>	2026-04-07 08:40:22 -04:00
Teknium	8b861b77c1	refactor: remove browser_close tool — auto-cleanup handles it (#5792 ) * refactor: remove browser_close tool — auto-cleanup handles it The browser_close tool was called in only 9% of browser sessions (13/144 navigations across 66 sessions), always redundantly — cleanup_browser() already runs via _cleanup_task_resources() at conversation end, and the background inactivity reaper catches anything else. Removing it saves one tool schema slot in every browser-enabled API call. Also fixes a latent bug: cleanup_browser() now handles Camofox sessions too (previously only Browserbase). Camofox sessions were never auto-cleaned per-task because they live in a separate dict from _active_sessions. Files changed (13): - tools/browser_tool.py: remove function, schema, registry entry; add camofox cleanup to cleanup_browser() - toolsets.py, model_tools.py, prompt_builder.py, display.py, acp_adapter/tools.py: remove browser_close from all tool lists - tests/: remove browser_close test, update toolset assertion - docs/skills: remove all browser_close references * fix: repeat browser_scroll 5x per call for meaningful page movement Most backends scroll ~100px per call — barely visible on a typical viewport. Repeating 5x gives ~500px (~half a viewport), making each scroll tool call actually useful. Backend-agnostic approach: works across all 7+ browser backends without needing to configure each one's scroll amount individually. Breaks early on error for the agent-browser path. * feat: auto-return compact snapshot from browser_navigate Every browser session starts with navigate → snapshot. Now navigate returns the compact accessibility tree snapshot inline, saving one tool call per browser task. The snapshot captures the full page DOM (not viewport-limited), so scroll position doesn't affect it. browser_snapshot remains available for refreshing after interactions or getting full=true content. Both Browserbase and Camofox paths auto-snapshot. If the snapshot fails for any reason, navigation still succeeds — the snapshot is a bonus, not a requirement. Schema descriptions updated to guide models: navigate mentions it returns a snapshot, snapshot mentions it's for refresh/full content. * refactor: slim cronjob tool schema — consolidate model/provider, drop unused params Session data (151 calls across 67 sessions) showed several schema properties were never used by models. Consolidated and cleaned up: Removed from schema (still work via backend/CLI): - skill (singular): use skills array instead - reason: pause-only, unnecessary - include_disabled: now defaults to true - base_url: extreme edge case, zero usage - provider (standalone): merged into model object Consolidated: - model + provider → single 'model' object with {model, provider} fields. If provider is omitted, the current main provider is pinned at creation time so the job stays stable even if the user changes their default. Kept: - script: useful data collection feature - skills array: standard interface for skill loading Schema shrinks from 14 to 10 properties. All backend functionality preserved — the Python function signature and handler lambda still accept every parameter. * fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli, hermes-messaging, safe), which meant it appeared in every session for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS gate only works after running 'hermes tools' explicitly. Now MoA only appears when a user explicitly enables it via 'hermes tools'. The moa toolset definition and check_fn remain unchanged — it just needs to be opted into.	2026-04-07 03:28:44 -07:00
Teknium	e120d2afac	feat: notify_on_complete for background processes (#5779 ) * feat: notify_on_complete for background processes When terminal(background=true, notify_on_complete=true), the system auto-triggers a new agent turn when the process exits — no polling needed. Changes: - ProcessSession: add notify_on_complete field - ProcessRegistry: add completion_queue, populate on _move_to_finished() - Terminal tool: add notify_on_complete parameter to schema + handler - CLI: drain completion_queue after agent turn AND during idle loop - Gateway: enhanced _run_process_watcher injects synthetic MessageEvent on completion, triggering a full agent turn - Checkpoint persistence includes notify_on_complete for crash recovery - code_execution_tool: block notify_on_complete in sandbox scripts - 15 new tests covering queue mechanics, checkpoint round-trip, schema * docs: update terminal tool descriptions for notify_on_complete - background: remove 'ONLY for servers' language, describe both patterns (long-lived processes AND long-running tasks with notify_on_complete) - notify_on_complete: more prescriptive about when to use it - TERMINAL_TOOL_DESCRIPTION: remove 'Do NOT use background for builds' guidance that contradicted the new feature	2026-04-07 02:40:16 -07:00
Mateus Scheuer Macedo	f2c11ff30c	fix(delegate): share credential pools with subagents + per-task leasing Cherry-picked from PR #5580 by MestreY0d4-Uninter. - Share parent's credential pool with child agents for key rotation - Leasing layer spreads parallel children across keys (least-loaded) - Thread-safe acquire_lease/release_lease in CredentialPool - Reverted sneaked-in tool-name restoration change (kept original getattr + isinstance guard pattern)	2026-04-06 23:01:11 -07:00
WAXLYY	c1818b7e9e	fix(tools): redact query secrets in send_message errors	2026-04-06 16:49:52 -07:00
Siddharth Balyan	7b129636f0	feat(tools): add Firecrawl cloud browser provider (#5628 ) * feat(tools): add Firecrawl cloud browser provider Adds Firecrawl (https://firecrawl.dev) as a cloud browser provider alongside Browserbase and Browser Use. All browser tools route through Firecrawl's cloud browser via CDP when selected. - tools/browser_providers/firecrawl.py — FirecrawlProvider - tools/browser_tool.py — register in _PROVIDER_REGISTRY - hermes_cli/tools_config.py — add to onboarding provider picker - hermes_cli/setup.py — add to setup summary - hermes_cli/config.py — add FIRECRAWL_BROWSER_TTL config - website/docs/ — browser docs and env var reference Based on #4490 by @developersdigest. Co-Authored-By: Developers Digest <124798203+developersdigest@users.noreply.github.com> * refactor: simplify FirecrawlProvider.emergency_cleanup Use self._headers() and self._api_url() instead of duplicating env-var reads and header construction. * fix: recognize Firecrawl in subscription browser detection _resolve_browser_feature_state() now handles "firecrawl" as a direct browser provider (same pattern as "browser-use"), so hermes setup summary correctly shows "Browser Automation (Firecrawl)" instead of misreporting as "Local browser". Also fixes test_config_version_unchanged assertion (11 → 12). --------- Co-authored-by: Developers Digest <124798203+developersdigest@users.noreply.github.com>	2026-04-07 02:35:26 +05:30
Teknium	38d8446011	feat: implement MCP OAuth 2.1 PKCE client support (#5420 ) Implement tools/mcp_oauth.py — the OAuth adapter that mcp_tool.py's existing auth: oauth hook has been waiting for. Components: - HermesTokenStorage: persists tokens + client registration to HERMES_HOME/mcp-tokens/<server>.json with 0o600 permissions - Callback handler factory: per-flow isolated HTTP handlers (safe for concurrent OAuth flows across multiple MCP servers) - OAuthClientProvider integration: wraps the MCP SDK's httpx.Auth subclass which handles discovery, DCR, PKCE, token exchange, refresh, and step-up auth (403 insufficient_scope) automatically - Non-interactive detection: warns when gateway/cron environments try to OAuth without cached tokens - Pre-registered client support: injects client_id/secret from config for servers that don't support Dynamic Client Registration (e.g. Slack) - Path traversal protection on server names - remove_oauth_tokens() for cleanup Config format: mcp_servers: sentry: url: 'https://mcp.sentry.dev/mcp' auth: oauth oauth: # all optional client_id: '...' # skip DCR client_secret: '...' # confidential client scope: 'read write' # server-provided by default Also passes oauth config dict through from mcp_tool.py (was passing only server_name and url before). E2E verified: full OAuth flow (401 → discovery → DCR → authorize → token exchange → authenticated request → tokens persisted) against local test servers. 23 unit tests + 186 MCP suite tests pass.	2026-04-05 22:08:00 -07:00
Teknium	4494fba140	feat: OSV malware check for MCP extension packages (#5305 ) Before launching an MCP server via npx/uvx, queries the OSV (Open Source Vulnerabilities) API to check if the package has known malware advisories (MAL-* IDs). Regular CVEs are ignored — only confirmed malware is blocked. - Free, public API (Google-maintained), ~300ms per query - Runs once per MCP server launch, inside _run_stdio() before subprocess spawn - Parallel with other MCP servers (asyncio.gather already in place) - Fail-open: network errors, timeouts, unrecognized commands → allow - Parses npm (scoped @scope/pkg@version) and PyPI (name[extras]==version) Inspired by Block/goose extension malware check.	2026-04-05 12:46:07 -07:00
Git-on-my-level	fcdd5447e2	fix: keep ACP stdout protocol-clean Route AIAgent print output to stderr via _print_fn for ACP stdio sessions. Gate quiet-mode spinner startup on _should_start_quiet_spinner() so JSON-RPC on stdout isn't corrupted. Child agents inherit the redirect. Co-authored-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>	2026-04-05 12:05:13 -07:00
Damian P	afccbf253c	fix: resolve listed messaging targets consistently	2026-04-05 11:59:28 -07:00
Teknium	aa475aef31	feat: add exit code context for common CLI tools in terminal results (#5144 ) When commands like grep, diff, test, or find return non-zero exit codes that aren't actual errors (grep 1 = no matches, diff 1 = files differ), the model wastes turns investigating non-problems. This adds an exit_code_meaning field to the terminal JSON result that explains informational exit codes, so the agent can move on instead of debugging. Covers grep/rg/ag/ack (no matches), diff (files differ), find (partial access), test/[ (condition false), curl (timeouts, DNS, HTTP errors), and git (context-dependent). Correctly extracts the last command from pipelines and chains, strips full paths and env var assignments. The exit_code field itself is unchanged — this is purely additive context.	2026-04-04 16:57:24 -07:00
LucidPaths	6367e1c4c0	fix: remove stale test skips, fix regex backtracking, file search bug, and test flakiness Bug fixes: - agent/redact.py: catastrophic regex backtracking in _ENV_ASSIGN_RE — removed re.IGNORECASE and changed [A-Z_]* to [A-Z0-9_]* to restrict matching to actual env var name chars. Without this, the pattern backtracks exponentially on large strings (e.g. 100K tool output), causing test_file_read_guards to time out. - tools/file_operations.py: over-escaped newline in find -printf format string produced literal backslash-n instead of a real newline, breaking file search result parsing (total_count always 1, paths concatenated). Test fixes: - Remove stale pytestmark.skip from 4 test modules that were blanket-skipped as 'Hangs in non-interactive environments' but actually run fine: - test_413_compression.py (12 tests, 25s) - test_file_tools_live.py (71 tests, 24s) - test_code_execution.py (61 tests, 99s) - test_agent_loop_tool_calling.py (has proper OPENROUTER_API_KEY skip already) - test_413_compression.py: fix threshold values in 2 preflight compression tests where context_length was too small for the compressed output to fit in one pass. - test_mcp_probe.py: add missing _MCP_AVAILABLE mock so tests work without MCP SDK. - test_mcp_tool_issue_948.py: inject MCP symbols (StdioServerParameters etc.) when SDK is not installed so patch() targets exist. - test_approve_deny_commands.py: replace time.sleep(0.3) with deterministic polling of _gateway_queues — fixes race condition where resolve fires before threads register their approval entries, causing the test to hang indefinitely. Net effect: +256 tests recovered from skip, 8 real failures fixed.	2026-04-04 10:18:57 -07:00
Teknium	43d3efd5c8	feat: add docker_env config for explicit container environment variables (#4738 ) Add docker_env option to terminal config — a dict of key-value pairs that get set inside Docker containers via -e flags at both container creation (docker run) and per-command execution (docker exec) time. This complements docker_forward_env (which reads values dynamically from the host process environment). docker_env is useful when Hermes runs as a systemd service without access to the user's shell environment — e.g. setting SSH_AUTH_SOCK or GNUPGHOME to known stable paths for SSH/GPG agent socket forwarding. Precedence: docker_env provides baseline values; docker_forward_env overrides for the same key. Config example: terminal: docker_env: SSH_AUTH_SOCK: /run/user/1000/ssh-agent.sock GNUPGHOME: /root/.gnupg docker_volumes: - /run/user/1000/ssh-agent.sock:/run/user/1000/ssh-agent.sock - /run/user/1000/gnupg/S.gpg-agent:/root/.gnupg/S.gpg-agent	2026-04-03 23:30:12 -07:00
Teknium	ad4feeaf0d	feat: wire skills.external_dirs into all remaining discovery paths The config key skills.external_dirs and core resolution (get_all_skills_dirs, get_external_skills_dirs in agent/skill_utils.py) already existed but several code paths still only scanned SKILLS_DIR. Now external dirs are respected everywhere: - skills_categories(): scan all dirs for category discovery - _get_category_from_path(): resolve categories against any skills root - skill_manager_tool._find_skill(): search all dirs for edit/patch/delete - credential_files.get_skills_directory_mount(): mount all dirs into Docker/Singularity containers (external dirs at external_skills/<idx>) - credential_files.iter_skills_files(): list files from all dirs for Modal/Daytona upload - tools/environments/ssh.py: rsync all skill dirs to remote hosts - gateway _check_unavailable_skill(): check disabled skills across all dirs Usage in config.yaml: skills: external_dirs: - ~/repos/agent-skills/hermes - /shared/team-skills	2026-04-03 21:14:42 -07:00
Tranquil-Flow	3bfb39a25f	fix(gateway): isolate approval session key per turn	2026-04-03 17:50:01 -07:00
Teknium	b1756084a3	feat: add .zip document support and auto-mount cache dirs into remote backends (#4846 ) - Add .zip to SUPPORTED_DOCUMENT_TYPES so gateway platforms (Telegram, Slack, Discord) cache uploaded zip files instead of rejecting them. - Add get_cache_directory_mounts() and iter_cache_files() to credential_files.py for host-side cache directory passthrough (documents, images, audio, screenshots). - Docker: bind-mount cache dirs read-only alongside credentials/skills. Changes are live (bind mount semantics). - Modal: mount cache files at sandbox creation + resync before each command via _sync_files() with mtime+size change detection. - Handles backward-compat with legacy dir names (document_cache, image_cache, audio_cache, browser_screenshots) via get_hermes_dir(). - Container paths always use the new cache/<subdir> layout regardless of host layout. This replaces the need for a dedicated extract_archive tool (PR #4819) — the agent can now use standard terminal commands (unzip, tar) on uploaded files inside remote containers. Closes: related to PR #4819 by kshitijk4poor	2026-04-03 13:16:26 -07:00
Teknium	8a384628a5	fix(memory): profile-scoped memory isolation and clone support (#4845 ) Three fixes for memory+profile isolation bugs: 1. memory_tool.py: Replace module-level MEMORY_DIR constant with get_memory_dir() function that calls get_hermes_home() dynamically. The old constant was cached at import time and could go stale if HERMES_HOME changed after import. Internal MemoryStore methods now call get_memory_dir() directly. MEMORY_DIR kept as backward-compat alias. 2. profiles.py: profile create --clone now copies MEMORY.md and USER.md from the source profile. These curated memory files are part of the agent's identity (same as SOUL.md) and should carry over on clone. 3. holographic plugin: initialize() now expands $HERMES_HOME and ${HERMES_HOME} in the db_path config value, so users can write 'db_path: $HERMES_HOME/memory_store.db' and it resolves to the active profile directory, not the default home. Tests updated to mock get_memory_dir() alongside the legacy MEMORY_DIR.	2026-04-03 13:10:11 -07:00
Teknium	cc54818d26	fix(mcp): stability fix pack — reload timeout, shutdown cleanup, event loop handler, OAuth non-blocking (#4757 ) Four fixes for MCP server stability issues reported by community member (terminal lockup, zombie processes, escape sequence pollution, startup hang): 1. MCP reload timeout guard (cli.py): _check_config_mcp_changes now runs _reload_mcp in a separate daemon thread with a 30s hard timeout. Previously, a hung MCP server could block the process_loop thread indefinitely, freezing the entire TUI (user can type but nothing happens, only Ctrl+D/Ctrl+\ work). 2. MCP stdio subprocess PID tracking (mcp_tool.py): Tracks child PIDs spawned by stdio_client via before/after snapshots of /proc children. On shutdown, _stop_mcp_loop force-kills any tracked PIDs that survived the SDK's graceful SIGTERM→SIGKILL cleanup. Prevents zombie MCP server processes from accumulating across sessions. 3. MCP event loop exception handler (mcp_tool.py): Installs _mcp_loop_exception_handler on the MCP background event loop — same pattern as the existing _suppress_closed_loop_errors on prompt_toolkit's loop. Suppresses benign 'Event loop is closed' RuntimeError from httpx transport __del__ during MCP shutdown. Salvaged from PR #2538 (acsezen). 4. MCP OAuth non-blocking (mcp_oauth.py): Replaces blocking input() call in _wait_for_callback with OAuthNonInteractiveError raise. Adds _is_interactive() TTY detection. In non-interactive environments, build_oauth_auth() still returns a provider (cached tokens + refresh work), but the callback handler raises immediately instead of blocking the MCP event loop for 120s. Re-raises OAuth setup failures in _run_http so failed servers are reported cleanly without blocking others. Salvaged from PRs #4521 (voidborne-d) and #4465 (heathley). Closes #2537, closes #4462 Related: #4128, #3436	2026-04-03 02:29:20 -07:00
Teknium	21c2d32471	fix(gateway): normalize step_callback prev_tools for backward compat The PR changed prev_tools from list[str] to list[dict] with name/result keys. The gateway's _step_callback_sync passed this directly to hooks as 'tool_names', breaking user-authored hooks that call ', '.join(tool_names). Now: - 'tool_names' always contains strings (backward-compatible) - 'tools' carries the enriched dicts for hooks that want results Also adds summary logging to register_mcp_servers() and comprehensive tests for all three PR changes: - sanitize_mcp_name_component edge cases - register_mcp_servers public API - _register_session_mcp_servers ACP integration - step_callback result forwarding - gateway normalization backward compat	2026-04-02 20:54:27 -07:00
Teknium	924bc67eee	feat(memory): pluggable memory provider interface with profile isolation, review fixes, and honcho CLI restoration (#4623 ) * feat(memory): add pluggable memory provider interface with profile isolation Introduces a pluggable MemoryProvider ABC so external memory backends can integrate with Hermes without modifying core files. Each backend becomes a plugin implementing a standard interface, orchestrated by MemoryManager. Key architecture: - agent/memory_provider.py — ABC with core + optional lifecycle hooks - agent/memory_manager.py — single integration point in the agent loop - agent/builtin_memory_provider.py — wraps existing MEMORY.md/USER.md Profile isolation fixes applied to all 6 shipped plugins: - Cognitive Memory: use get_hermes_home() instead of raw env var - Hindsight Memory: check $HERMES_HOME/hindsight/config.json first, fall back to legacy ~/.hindsight/ for backward compat - Hermes Memory Store: replace hardcoded ~/.hermes paths with get_hermes_home() for config loading and DB path defaults - Mem0 Memory: use get_hermes_home() instead of raw env var - RetainDB Memory: auto-derive profile-scoped project name from hermes_home path (hermes-<profile>), explicit env var overrides - OpenViking Memory: read-only, no local state, isolation via .env MemoryManager.initialize_all() now injects hermes_home into kwargs so every provider can resolve profile-scoped storage without importing get_hermes_home() themselves. Plugin system: adds register_memory_provider() to PluginContext and get_plugin_memory_providers() accessor. Based on PR #3825. 46 tests (37 unit + 5 E2E + 4 plugin registration). * refactor(memory): drop cognitive plugin, rewrite OpenViking as full provider Remove cognitive-memory plugin (#727) — core mechanics are broken: decay runs 24x too fast (hourly not daily), prefetch uses row ID as timestamp, search limited by importance not similarity. Rewrite openviking-memory plugin from a read-only search wrapper into a full bidirectional memory provider using the complete OpenViking session lifecycle API: - sync_turn: records user/assistant messages to OpenViking session (threaded, non-blocking) - on_session_end: commits session to trigger automatic memory extraction into 6 categories (profile, preferences, entities, events, cases, patterns) - prefetch: background semantic search via find() endpoint - on_memory_write: mirrors built-in memory writes to the session - is_available: checks env var only, no network calls (ABC compliance) Tools expanded from 3 to 5: - viking_search: semantic search with mode/scope/limit - viking_read: tiered content (abstract ~100tok / overview ~2k / full) - viking_browse: filesystem-style navigation (list/tree/stat) - viking_remember: explicit memory storage via session - viking_add_resource: ingest URLs/docs into knowledge base Uses direct HTTP via httpx (no openviking SDK dependency needed). Response truncation on viking_read to prevent context flooding. * fix(memory): harden Mem0 plugin — thread safety, non-blocking sync, circuit breaker - Remove redundant mem0_context tool (identical to mem0_search with rerank=true, top_k=5 — wastes a tool slot and confuses the model) - Thread sync_turn so it's non-blocking — Mem0's server-side LLM extraction can take 5-10s, was stalling the agent after every turn - Add threading.Lock around _get_client() for thread-safe lazy init (prefetch and sync threads could race on first client creation) - Add circuit breaker: after 5 consecutive API failures, pause calls for 120s instead of hammering a down server every turn. Auto-resets after cooldown. Logs a warning when tripped. - Track success/failure in prefetch, sync_turn, and all tool calls - Wait for previous sync to finish before starting a new one (prevents unbounded thread accumulation on rapid turns) - Clean up shutdown to join both prefetch and sync threads * fix(memory): enforce single external memory provider limit MemoryManager now rejects a second non-builtin provider with a warning. Built-in memory (MEMORY.md/USER.md) is always accepted. Only ONE external plugin provider is allowed at a time. This prevents tool schema bloat (some providers add 3-5 tools each) and conflicting memory backends. The warning message directs users to configure memory.provider in config.yaml to select which provider to activate. Updated all 47 tests to use builtin + one external pattern instead of multiple externals. Added test_second_external_rejected to verify the enforcement. * feat(memory): add ByteRover memory provider plugin Implements the ByteRover integration (from PR #3499 by hieuntg81) as a MemoryProvider plugin instead of direct run_agent.py modifications. ByteRover provides persistent memory via the brv CLI — a hierarchical knowledge tree with tiered retrieval (fuzzy text then LLM-driven search). Local-first with optional cloud sync. Plugin capabilities: - prefetch: background brv query for relevant context - sync_turn: curate conversation turns (threaded, non-blocking) - on_memory_write: mirror built-in memory writes to brv - on_pre_compress: extract insights before context compression Tools (3): - brv_query: search the knowledge tree - brv_curate: store facts/decisions/patterns - brv_status: check CLI version and context tree state Profile isolation: working directory at $HERMES_HOME/byterover/ (scoped per profile). Binary resolution cached with thread-safe double-checked locking. All write operations threaded to avoid blocking the agent (curate can take 120s with LLM processing). * fix(memory): thread remaining sync_turns, fix holographic, add config key Plugin fixes: - Hindsight: thread sync_turn (was blocking up to 30s via _run_in_thread) - RetainDB: thread sync_turn (was blocking on HTTP POST) - Both: shutdown now joins sync threads alongside prefetch threads Holographic retrieval fixes: - reason(): removed dead intersection_key computation (bundled but never used in scoring). Now reuses pre-computed entity_residuals directly, moved role_content encoding outside the inner loop. - contradict(): added _MAX_CONTRADICT_FACTS=500 scaling guard. Above 500 facts, only checks the most recently updated ones to avoid O(n^2) explosion (~125K comparisons at 500 is acceptable). Config: - Added memory.provider key to DEFAULT_CONFIG ("" = builtin only). No version bump needed (deep_merge handles new keys automatically). * feat(memory): extract Honcho as a MemoryProvider plugin Creates plugins/honcho-memory/ as a thin adapter over the existing honcho_integration/ package. All 4 Honcho tools (profile, search, context, conclude) move from the normal tool registry to the MemoryProvider interface. The plugin delegates all work to HonchoSessionManager — no Honcho logic is reimplemented. It uses the existing config chain: $HERMES_HOME/honcho.json -> ~/.honcho/config.json -> env vars. Lifecycle hooks: - initialize: creates HonchoSessionManager via existing client factory - prefetch: background dialectic query - sync_turn: records messages + flushes to API (threaded) - on_memory_write: mirrors user profile writes as conclusions - on_session_end: flushes all pending messages This is a prerequisite for the MemoryManager wiring in run_agent.py. Once wired, Honcho goes through the same provider interface as all other memory plugins, and the scattered Honcho code in run_agent.py can be consolidated into the single MemoryManager integration point. * feat(memory): wire MemoryManager into run_agent.py Adds 8 integration points for the external memory provider plugin, all purely additive (zero existing code modified): 1. Init (~L1130): Create MemoryManager, find matching plugin provider from memory.provider config, initialize with session context 2. Tool injection (~L1160): Append provider tool schemas to self.tools and self.valid_tool_names after memory_manager init 3. System prompt (~L2705): Add external provider's system_prompt_block alongside existing MEMORY.md/USER.md blocks 4. Tool routing (~L5362): Route provider tool calls through memory_manager.handle_tool_call() before the catchall handler 5. Memory write bridge (~L5353): Notify external provider via on_memory_write() when the built-in memory tool writes 6. Pre-compress (~L5233): Call on_pre_compress() before context compression discards messages 7. Prefetch (~L6421): Inject provider prefetch results into the current-turn user message (same pattern as Honcho turn context) 8. Turn sync + session end (~L8161, ~L8172): sync_all() after each completed turn, queue_prefetch_all() for next turn, on_session_end() + shutdown_all() at conversation end All hooks are wrapped in try/except — a failing provider never breaks the agent. The existing memory system, Honcho integration, and all other code paths are completely untouched. Full suite: 7222 passed, 4 pre-existing failures. * refactor(memory): remove legacy Honcho integration from core Extracts all Honcho-specific code from run_agent.py, model_tools.py, toolsets.py, and gateway/run.py. Honcho is now exclusively available as a memory provider plugin (plugins/honcho-memory/). Removed from run_agent.py (-457 lines): - Honcho init block (session manager creation, activation, config) - 8 Honcho methods: _honcho_should_activate, _strip_honcho_tools, _activate_honcho, _register_honcho_exit_hook, _queue_honcho_prefetch, _honcho_prefetch, _honcho_save_user_observation, _honcho_sync - _inject_honcho_turn_context module-level function - Honcho system prompt block (tool descriptions, CLI commands) - Honcho context injection in api_messages building - Honcho params from __init__ (honcho_session_key, honcho_manager, honcho_config) - HONCHO_TOOL_NAMES constant - All honcho-specific tool dispatch forwarding Removed from other files: - model_tools.py: honcho_tools import, honcho params from handle_function_call - toolsets.py: honcho toolset definition, honcho tools from core tools list - gateway/run.py: honcho params from AIAgent constructor calls Removed tests (-339 lines): - 9 Honcho-specific test methods from test_run_agent.py - TestHonchoAtexitFlush class from test_exit_cleanup_interrupt.py Restored two regex constants (_SURROGATE_RE, _BUDGET_WARNING_RE) that were accidentally removed during the honcho function extraction. The honcho_integration/ package is kept intact — the plugin delegates to it. tools/honcho_tools.py registry entries are now dead code (import commented out in model_tools.py) but the file is preserved for reference. Full suite: 7207 passed, 4 pre-existing failures. Zero regressions. * refactor(memory): restructure plugins, add CLI, clean gateway, migration notice Plugin restructure: - Move all memory plugins from plugins/<name>-memory/ to plugins/memory/<name>/ (byterover, hindsight, holographic, honcho, mem0, openviking, retaindb) - New plugins/memory/__init__.py discovery module that scans the directory directly, loading providers by name without the general plugin system - run_agent.py uses load_memory_provider() instead of get_plugin_memory_providers() CLI wiring: - hermes memory setup — interactive curses picker + config wizard - hermes memory status — show active provider, config, availability - hermes memory off — disable external provider (built-in only) - hermes honcho — now shows migration notice pointing to hermes memory setup Gateway cleanup: - Remove _get_or_create_gateway_honcho (already removed in prev commit) - Remove _shutdown_gateway_honcho and _shutdown_all_gateway_honcho methods - Remove all calls to shutdown methods (4 call sites) - Remove _honcho_managers/_honcho_configs dict references Dead code removal: - Delete tools/honcho_tools.py (279 lines, import was already commented out) - Delete tests/gateway/test_honcho_lifecycle.py (131 lines, tested removed methods) - Remove if False placeholder from run_agent.py Migration: - Honcho migration notice on startup: detects existing honcho.json or ~/.honcho/config.json, prints guidance to run hermes memory setup. Only fires when memory.provider is not set and not in quiet mode. Full suite: 7203 passed, 4 pre-existing failures. Zero regressions. * feat(memory): standardize plugin config + add per-plugin documentation Config architecture: - Add save_config(values, hermes_home) to MemoryProvider ABC - Honcho: writes to $HERMES_HOME/honcho.json (SDK native) - Mem0: writes to $HERMES_HOME/mem0.json - Hindsight: writes to $HERMES_HOME/hindsight/config.json - Holographic: writes to config.yaml under plugins.hermes-memory-store - OpenViking/RetainDB/ByteRover: env-var only (default no-op) Setup wizard (hermes memory setup): - Now calls provider.save_config() for non-secret config - Secrets still go to .env via env vars - Only memory.provider activation key goes to config.yaml Documentation: - README.md for each of the 7 providers in plugins/memory/<name>/ - Requirements, setup (wizard + manual), config reference, tools table - Consistent format across all providers The contract for new memory plugins: - get_config_schema() declares all fields (REQUIRED) - save_config() writes native config (REQUIRED if not env-var-only) - Secrets use env_var field in schema, written to .env by wizard - README.md in the plugin directory * docs: add memory providers user guide + developer guide New pages: - user-guide/features/memory-providers.md — comprehensive guide covering all 7 shipped providers (Honcho, OpenViking, Mem0, Hindsight, Holographic, RetainDB, ByteRover). Each with setup, config, tools, cost, and unique features. Includes comparison table and profile isolation notes. - developer-guide/memory-provider-plugin.md — how to build a new memory provider plugin. Covers ABC, required methods, config schema, save_config, threading contract, profile isolation, testing. Updated pages: - user-guide/features/memory.md — replaced Honcho section with link to new Memory Providers page - user-guide/features/honcho.md — replaced with migration redirect to the new Memory Providers page - sidebars.ts — added both new pages to navigation * fix(memory): auto-migrate Honcho users to memory provider plugin When honcho.json or ~/.honcho/config.json exists but memory.provider is not set, automatically set memory.provider: honcho in config.yaml and activate the plugin. The plugin reads the same config files, so all data and credentials are preserved. Zero user action needed. Persists the migration to config.yaml so it only fires once. Prints a one-line confirmation in non-quiet mode. * fix(memory): only auto-migrate Honcho when enabled + credentialed Check HonchoClientConfig.enabled AND (api_key OR base_url) before auto-migrating — not just file existence. Prevents false activation for users who disabled Honcho, stopped using it (config lingers), or have ~/.honcho/ from a different tool. * feat(memory): auto-install pip dependencies during hermes memory setup Reads pip_dependencies from plugin.yaml, checks which are missing, installs them via pip before config walkthrough. Also shows install guidance for external_dependencies (e.g. brv CLI for ByteRover). Updated all 7 plugin.yaml files with pip_dependencies: - honcho: honcho-ai - mem0: mem0ai - openviking: httpx - hindsight: hindsight-client - holographic: (none) - retaindb: requests - byterover: (external_dependencies for brv CLI) * fix: remove remaining Honcho crash risks from cli.py and gateway cli.py: removed Honcho session re-mapping block (would crash importing deleted tools/honcho_tools.py), Honcho flush on compress, Honcho session display on startup, Honcho shutdown on exit, honcho_session_key AIAgent param. gateway/run.py: removed honcho_session_key params from helper methods, sync_honcho param, _honcho.shutdown() block. tests: fixed test_cron_session_with_honcho_key_skipped (was passing removed honcho_key param to _flush_memories_for_session). * fix: include plugins/ in pyproject.toml package list Without this, plugins/memory/ wouldn't be included in non-editable installs. Hermes always runs from the repo checkout so this is belt- and-suspenders, but prevents breakage if the install method changes. * fix(memory): correct pip-to-import name mapping for dep checks The heuristic dep.replace('-', '_') fails for packages where the pip name differs from the import name: honcho-ai→honcho, mem0ai→mem0, hindsight-client→hindsight_client. Added explicit mapping table so hermes memory setup doesn't try to reinstall already-installed packages. * chore: remove dead code from old plugin memory registration path - hermes_cli/plugins.py: removed register_memory_provider(), _memory_providers list, get_plugin_memory_providers() — memory providers now use plugins/memory/ discovery, not the general plugin system - hermes_cli/main.py: stripped 74 lines of dead honcho argparse subparsers (setup, status, sessions, map, peer, mode, tokens, identity, migrate) — kept only the migration redirect - agent/memory_provider.py: updated docstring to reflect new registration path - tests: replaced TestPluginMemoryProviderRegistration with TestPluginMemoryDiscovery that tests the actual plugins/memory/ discovery system. Added 3 new tests (discover, load, nonexistent). * chore: delete dead honcho_integration/cli.py and its tests cli.py (794 lines) was the old 'hermes honcho' command handler — nobody calls it since cmd_honcho was replaced with a migration redirect. Deleted tests that imported from removed code: - tests/honcho_integration/test_cli.py (tested _resolve_api_key) - tests/honcho_integration/test_config_isolation.py (tested CLI config paths) - tests/tools/test_honcho_tools.py (tested the deleted tools/honcho_tools.py) Remaining honcho_integration/ files (actively used by the plugin): - client.py (445 lines) — config loading, SDK client creation - session.py (991 lines) — session management, queries, flush * refactor: move honcho_integration/ into the honcho plugin Moves client.py (445 lines) and session.py (991 lines) from the top-level honcho_integration/ package into plugins/memory/honcho/. No Honcho code remains in the main codebase. - plugins/memory/honcho/client.py — config loading, SDK client creation - plugins/memory/honcho/session.py — session management, queries, flush - Updated all imports: run_agent.py (auto-migration), hermes_cli/doctor.py, plugin __init__.py, session.py cross-import, all tests - Removed honcho_integration/ package and pyproject.toml entry - Renamed tests/honcho_integration/ → tests/honcho_plugin/ * docs: update architecture + gateway-internals for memory provider system - architecture.md: replaced honcho_integration/ with plugins/memory/ - gateway-internals.md: replaced Honcho-specific session routing and flush lifecycle docs with generic memory provider interface docs * fix: update stale mock path for resolve_active_host after honcho plugin migration * fix(memory): address review feedback — P0 lifecycle, ABC contract, honcho CLI restore Review feedback from Honcho devs (erosika): P0 — Provider lifecycle: - Remove on_session_end() + shutdown_all() from run_conversation() tail (was killing providers after every turn in multi-turn sessions) - Add shutdown_memory_provider() method on AIAgent for callers - Wire shutdown into CLI atexit, reset_conversation, gateway stop/expiry Bug fixes: - Remove sync_honcho=False kwarg from /btw callsites (TypeError crash) - Fix doctor.py references to dead 'hermes honcho setup' command - Cache prefetch_all() before tool loop (was re-calling every iteration) ABC contract hardening (all backwards-compatible): - Add session_id kwarg to prefetch/sync_turn/queue_prefetch - Make on_pre_compress() return str (provider insights in compression) - Add *kwargs to on_turn_start() for runtime context - Add on_delegation() hook for parent-side subagent observation - Document agent_context/agent_identity/agent_workspace kwargs on initialize() (prevents cron corruption, enables profile scoping) - Fix docstring: single external provider, not multiple Honcho CLI restoration: - Add plugins/memory/honcho/cli.py (from main's honcho_integration/cli.py with imports adapted to plugin path) - Restore full hermes honcho command with all subcommands (status, peer, mode, tokens, identity, enable/disable, sync, peers, --target-profile) - Restore auto-clone on profile creation + sync on hermes update - hermes honcho setup now redirects to hermes memory setup fix(memory): wire on_delegation, skip_memory for cron/flush, fix ByteRover return type - Wire on_delegation() in delegate_tool.py — parent's memory provider is notified with task+result after each subagent completes - Add skip_memory=True to cron scheduler (prevents cron system prompts from corrupting user representations — closes #4052) - Add skip_memory=True to gateway flush agent (throwaway agent shouldn't activate memory provider) - Fix ByteRover on_pre_compress() return type: None -> str * fix(honcho): port profile isolation fixes from PR #4632 Ports 5 bug fixes found during profile testing (erosika's PR #4632): 1. 3-tier config resolution — resolve_config_path() now checks $HERMES_HOME/honcho.json → ~/.hermes/honcho.json → ~/.honcho/config.json (non-default profiles couldn't find shared host blocks) 2. Thread host=_host_key() through from_global_config() in cmd_setup, cmd_status, cmd_identity (--target-profile was being ignored) 3. Use bare profile name as aiPeer (not host key with dots) — Honcho's peer ID pattern is ^[a-zA-Z0-9_-]+$, dots are invalid 4. Wrap add_peers() in try/except — was fatal on new AI peers, killed all message uploads for the session 5. Gate Honcho clone behind --clone/--clone-all on profile create (bare create should be blank-slate) Also: sanitize assistant_peer_id via _sanitize_id() * fix(tests): add module cleanup fixture to test_cli_provider_resolution test_cli_provider_resolution._import_cli() wipes tools.*, cli, and run_agent from sys.modules to force fresh imports, but had no cleanup. This poisoned all subsequent tests on the same xdist worker — mocks targeting tools.file_tools, tools.send_message_tool, etc. patched the NEW module object while already-imported functions still referenced the OLD one. Caused ~25 cascade failures: send_message KeyError, process_registry FileNotFoundError, file_read_guards timeouts, read_loop_detection file-not-found, mcp_oauth None port, and provider_parity/codex_execution stale tool lists. Fix: autouse fixture saves all affected modules before each test and restores them after, matching the pattern in test_managed_browserbase_and_modal.py.	2026-04-02 15:33:51 -07:00
Teknium	acea9ee20b	fix(tests): fix 11 real test failures + major cascade poisoner (#4570 ) Three root causes addressed: 1. AIAgent no longer defaults base_url to OpenRouter (9 tests) Tests that assert OpenRouter-specific behavior (prompt caching, reasoning extra_body, provider preferences) need explicit base_url and model set on the agent. Updated test_run_agent.py and test_provider_parity.py. 2. Credential pool auto-seeding from host env (2 tests) test_auxiliary_client.py tests for Anthropic OAuth and custom endpoint fallback were not mocking _select_pool_entry, so the host's credential pool interfered. Added pool + codex mocks. 3. sys.modules corruption cascade (major - ~250 tests) test_managed_modal_environment.py replaced sys.modules entries (tools, hermes_cli, agent packages) with SimpleNamespace stubs but had NO cleanup fixture. Every subsequent test in the process saw corrupted imports: 'cannot import get_config_path from <unknown module name>' and 'module tools has no attribute environments'. Added _restore_tool_and_agent_modules autouse fixture matching the pattern in test_managed_browserbase_and_modal.py. This was also the root cause of CI failures (104 failed on main).	2026-04-02 08:43:06 -07:00
Ben Barclay	a2e56d044b	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-04-02 11:00:35 +11:00
Teknium	1515e8c8f2	fix: rewrite test mock secrets and add redaction fixture The original test file had mock secrets corrupted by secret-redaction tooling before commit — the test values (sk-ant...l012) didn't actually trigger the PREFIX_RE regex, so 4 of 10 tests were asserting against values that never appeared in the input. - Replace truncated mock values with proper fake keys built via string concatenation (avoids tool redaction during file writes) - Add _ensure_redaction_enabled autouse fixture to patch the module-level _REDACT_ENABLED constant, matching the pattern from test_redact.py	2026-04-01 12:03:56 -07:00
0xbyt4	712aa44325	security: block secret exfiltration via browser URLs and auxiliary LLM calls Three exfiltration vectors closed: 1. Browser URL exfil — agent could embed secrets in URL params and navigate to attacker-controlled server. Now scans URLs for known API key patterns before navigating (browser_navigate, web_extract). 2. Browser snapshot leak — page displaying env vars or API keys would send secrets to auxiliary LLM via _extract_relevant_content before run_agent.py's redaction layer sees the result. Now redacts snapshot text before the auxiliary call. 3. Camofox annotation leak — accessibility tree text sent to vision LLM could contain secrets visible on screen. Now redacts annotation context before the vision call. 10 new tests covering URL blocking, snapshot redaction, and annotation redaction for both browser and camofox backends.	2026-04-01 12:03:56 -07:00
Teknium	e0abf2416d	fix: restore _config_version to 11 (reverted by stale-branch merge in #4419 ) (#4440 ) PR #4419 was based on pre-credential-pools main where _config_version was 10. The squash merge downgraded it from 11 (set by #2647) back to 10. Also fixes the test assertion.	2026-04-01 04:34:04 -07:00
Teknium	f6ada27d1c	feat(skills): size limits for agent writes + fuzzy matching for patch (#4414 ) * feat(skills): add content size limits for agent-created skills Agent writes via skill_manage (create/edit/patch/write_file) are now constrained to prevent unbounded growth: - SKILL.md and supporting files: 100,000 character limit - Supporting files: additional 1 MiB byte limit - Patches on oversized hand-placed skills that reduce the size are allowed (shrink path), but patches that grow beyond the limit are rejected Hand-placed skills and hub-installed skills have NO hard limit — they load and function normally regardless of size. Hub installs get a warning in the log if SKILL.md exceeds 100k chars. This mirrors the memory system's char_limit pattern. Without this, the agent auto-grows skills indefinitely through iterative patches (hermes-agent-dev reached 197k chars / 72k tokens — 40x larger than the largest skill in the entire skills.sh ecosystem). Constants: MAX_SKILL_CONTENT_CHARS (100k), MAX_SKILL_FILE_BYTES (1MiB) Tests: 14 new tests covering all write paths and edge cases * feat(skills): add fuzzy matching to skill patch _patch_skill now uses the same 8-strategy fuzzy matching engine (tools/fuzzy_match.py) as the file patch tool. Handles whitespace normalization, indentation differences, escape sequences, and block-anchor matching. Eliminates exact-match failures when agents patch skills with minor formatting mismatches.	2026-04-01 04:19:19 -07:00
Teknium	70744add15	feat(browser): add persistent Camofox sessions and VNC URL discovery (salvage #4400 ) (#4419 ) Adds two Camofox features: 1. Persistent browser sessions: new `browser.camofox.managed_persistence` config option. When enabled, Hermes sends a deterministic profile-scoped userId to Camofox so the server maps it to a persistent browser profile directory. Cookies, logins, and browser state survive across restarts. Default remains ephemeral (random userId per session). 2. VNC URL discovery: Camofox /health endpoint returns vncPort when running in headed mode. Hermes constructs the VNC URL and includes it in navigate responses so the agent can share it with users. Also fixes camofox_vision bug where call_llm response object was passed directly to json.dumps instead of extracting .choices[0].message.content. Changes from original PR: - Removed browser_evaluate tool (separate feature, needs own PR) - Removed snapshot truncation limit change (unrelated) - Config.yaml only for managed_persistence (no env var, no version bump) - Rewrote tests to use config mock instead of env var - Reverted package-lock.json churn Co-authored-by: analista <psikonetik@gmail.com.com>	2026-04-01 04:18:50 -07:00
Teknium	ef2ae3e48f	fix(file_tools): refresh staleness timestamp after writes (#4390 ) After a successful write_file or patch, update the stored read timestamp to match the file's new modification time. Without this, consecutive edits by the same task (read → write → write) would false-warn on the second write because the stored timestamp still reflected the original read, not the first write. Also renames the internal tracker key from 'file_mtimes' to 'read_timestamps' for clarity.	2026-04-01 00:50:08 -07:00
Teknium	f04986029c	feat(file_tools): detect stale files on write and patch (#4345 ) Track file mtime when read_file is called. When write_file or patch subsequently targets the same file, compare the current mtime against the recorded one. If they differ (external edit, concurrent agent, user change), include a _warning in the result advising the agent to re-read. The write still proceeds — this is a soft signal, not a hard block. Key design points: - Per-task isolation: task A's reads don't affect task B's writes. - Files never read produce no warning (not enforcing read-before-write). - mtime naturally updates after the agent's own writes, so the warning only fires on external changes, not the agent's own edits. - V4A multi-file patches check all target paths. Tests: 10 new tests covering write staleness, patch staleness, never-read files, cross-task isolation, and the helper function.	2026-03-31 14:49:00 -07:00
Teknium	e3f8347be3	feat(file_tools): harden read_file with size guard, dedup, and device blocking (#4315 ) * feat(file_tools): harden read_file with size guard, dedup, and device blocking Three improvements to read_file_tool to reduce wasted context tokens and prevent process hangs: 1. Character-count guard: reads that produce more than 100K characters (≈25-35K tokens across tokenisers) are rejected with an error that tells the model to use offset+limit for a smaller range. The effective cap is min(file_size, 100K) so small files that happen to have long lines aren't over-penalised. Large truncated files also get a hint nudging toward targeted reads. 2. File-read deduplication: when the same (path, offset, limit) is read a second time and the file hasn't been modified (mtime unchanged), return a lightweight stub instead of re-sending the full content. Writes and patches naturally change mtime, so post-edit reads always return fresh content. The dedup cache is cleared on context compression — after compression the original read content is summarised away, so the model needs the full content again. 3. Device path blocking: paths like /dev/zero, /dev/random, /dev/stdin etc. are rejected before any I/O to prevent process hangs from infinite-output or blocking-input devices. Tests: 17 new tests covering all three features plus the dedup-reset- on-compression integration. All 52 file-read tests pass (35 existing + 17 new). Full tool suite (2124 tests) passes with 0 failures. * feat: make file_read_max_chars configurable, add docs Add file_read_max_chars to DEFAULT_CONFIG (default 100K). read_file_tool reads this on first call and caches for the process lifetime. Users on large-context models can raise it; users on small local models can lower it. Also adds a 'File Read Safety' section to the configuration docs explaining the char limit, dedup behavior, and example values.	2026-03-31 12:53:19 -07:00
Teknium	7f78deebe7	fix: apply same path traversal checks to config-based credential files _load_config_files() had the same hermes_home / item pattern without containment checks. While config.yaml is user-controlled (lower threat than skill frontmatter), defense in depth prevents exploitation via config injection or copy-paste mistakes.	2026-03-31 12:16:37 -07:00
maymuneth	a97641b9f2	fix(security): reject path traversal in credential file registration	2026-03-31 12:16:37 -07:00
0xbyt4	08171c1c31	fix: allow voice mode in WSL when PulseAudio bridge is configured WSL detection was treated as a hard fail, blocking voice mode even when audio worked via PulseAudio bridge. Now PULSE_SERVER env var presence makes WSL a soft notice instead of a blocking warning. Device query failures in WSL with PULSE_SERVER are also treated as non-blocking.	2026-03-31 12:13:33 -07:00
Teknium	cca0996a28	fix(browser): skip SSRF check for local backends (Camofox, headless Chromium) (#4292 ) The SSRF protection added in #3041 blocks all private/internal addresses unconditionally in browser_navigate(). This prevents legitimate local use cases (localhost apps, LAN devices) when using Camofox or the built-in headless Chromium without a cloud provider. The check is only meaningful for cloud backends (Browserbase, BrowserUse) where the agent could reach internal resources on a remote machine. Local backends give the user full terminal and network access already — the SSRF check adds zero security value. Add _is_local_backend() helper that returns True when Camofox is active or no cloud provider is configured. Both the pre-navigation and post-redirect SSRF checks now skip when running locally. The browser.allow_private_urls config option remains available as an explicit opt-out for cloud mode.	2026-03-31 10:40:13 -07:00
Teknium	8d59881a62	feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647 ) * feat(auth): add same-provider credential pools and rotation UX Add same-provider credential pooling so Hermes can rotate across multiple credentials for a single provider, recover from exhausted credentials without jumping providers immediately, and configure that behavior directly in hermes setup. - agent/credential_pool.py: persisted per-provider credential pools - hermes auth add/list/remove/reset CLI commands - 429/402/401 recovery with pool rotation in run_agent.py - Setup wizard integration for pool strategy configuration - Auto-seeding from env vars and existing OAuth state Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com> Salvaged from PR #2647 * fix(tests): prevent pool auto-seeding from host env in credential pool tests Tests for non-pool Anthropic paths and auth remove were failing when host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials were present. The pool auto-seeding picked these up, causing unexpected pool entries in tests. - Mock _select_pool_entry in auxiliary_client OAuth flag tests - Clear Anthropic env vars and mock _seed_from_singletons in auth remove test * feat(auth): add thread safety, least_used strategy, and request counting - Add threading.Lock to CredentialPool for gateway thread safety (concurrent requests from multiple gateway sessions could race on pool state mutations without this) - Add 'least_used' rotation strategy that selects the credential with the lowest request_count, distributing load more evenly - Add request_count field to PooledCredential for usage tracking - Add mark_used() method to increment per-credential request counts - Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current() with lock acquisition - Add tests: least_used selection, mark_used counting, concurrent thread safety (4 threads × 20 selects with no corruption) * feat(auth): add interactive mode for bare 'hermes auth' command When 'hermes auth' is called without a subcommand, it now launches an interactive wizard that: 1. Shows full credential pool status across all providers 2. Offers a menu: add, remove, reset cooldowns, set strategy 3. For OAuth-capable providers (anthropic, nous, openai-codex), the add flow explicitly asks 'API key or OAuth login?' — making it clear that both auth types are supported for the same provider 4. Strategy picker shows all 4 options (fill_first, round_robin, least_used, random) with the current selection marked 5. Remove flow shows entries with indices for easy selection The subcommand paths (hermes auth add/list/remove/reset) still work exactly as before for scripted/non-interactive use. * fix(tests): update runtime_provider tests for config.yaml source of truth (#4165) Tests were using OPENAI_BASE_URL env var which is no longer consulted after #4165. Updated to use model config (provider, base_url, api_key) which is the new single source of truth for custom endpoint URLs. * feat(auth): support custom endpoint credential pools keyed by provider name Custom OpenAI-compatible endpoints all share provider='custom', making the provider-keyed pool useless. Now pools for custom endpoints are keyed by 'custom:<normalized_name>' where the name comes from the custom_providers config list (auto-generated from URL hostname). - Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)' - load_pool('custom:name') seeds from custom_providers api_key AND model.api_key when base_url matches - hermes auth add/list now shows custom endpoints alongside registry providers - _resolve_openrouter_runtime and _resolve_named_custom_runtime check pool before falling back to single config key - 6 new tests covering custom pool keying, seeding, and listing * docs: add Excalidraw diagram of full credential pool flow Comprehensive architecture diagram showing: - Credential sources (env vars, auth.json OAuth, config.yaml, CLI) - Pool storage and auto-seeding - Runtime resolution paths (registry, custom, OpenRouter) - Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh) - CLI management commands and strategy configuration Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g * fix(tests): update setup wizard pool tests for unified select_provider_and_model flow The setup wizard now delegates to select_provider_and_model() instead of using its own prompt_choice-based provider picker. Tests needed: - Mock select_provider_and_model as no-op (provider pre-written to config) - Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it) - Pre-write model.provider to config so the pool step is reached * docs: add comprehensive credential pool documentation - New page: website/docs/user-guide/features/credential-pools.md Full guide covering quick start, CLI commands, rotation strategies, error recovery, custom endpoint pools, auto-discovery, thread safety, architecture, and storage format. - Updated fallback-providers.md to reference credential pools as the first layer of resilience (same-provider rotation before cross-provider) - Added hermes auth to CLI commands reference with usage examples - Added credential_pool_strategies to configuration guide * chore: remove excalidraw diagram from repo (external link only) * refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns - _load_config_safe(): replace 4 identical try/except/import blocks - _iter_custom_providers(): shared generator for custom provider iteration - PooledCredential.extra dict: collapse 11 round-trip-only fields (token_type, scope, client_id, portal_base_url, obtained_at, expires_in, agent_key_id, agent_key_expires_in, agent_key_reused, agent_key_obtained_at, tls) into a single extra dict with __getattr__ for backward-compatible access - _available_entries(): shared exhaustion-check between select and peek - Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical) - SimpleNamespace replaces class _Args boilerplate in auth_commands - _try_resolve_from_custom_pool(): shared pool-check in runtime_provider Net -17 lines. All 383 targeted tests pass. --------- Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-31 03:10:01 -07:00
Nils	50302ed70a	fix(tools): make browser SSRF check configurable via browser.allow_private_urls (#4198 ) * fix(tools): skip SSRF check in local browser mode The SSRF protection added in #3041 blocks all private/internal addresses unconditionally in browser_navigate(). This prevents legitimate local development use cases (localhost testing, LAN device access) when using the local Chromium backend. The SSRF check is only meaningful for cloud browsers (Browserbase, BrowserUse) where the agent could reach internal resources on a remote machine. In local mode, the user already has full terminal and network access, so the check adds no security value. This change makes the SSRF check conditional on _get_cloud_provider(), keeping full protection in cloud mode while allowing private addresses in local mode. * fix(tools): make SSRF check configurable via browser.allow_private_urls Replace unconditional SSRF check with a configurable setting. Default (False) keeps existing security behavior. Setting to True allows navigating to private/internal IPs for local dev and LAN use cases. --------- Co-authored-by: Nils (Norya) <nils@begou.dev>	2026-03-31 02:11:55 -07:00
Teknium	d30ea65c9b	fix: URL-based auth for third-party Anthropic endpoints + CI test fixes (#4148 ) * fix(tests): mock sys.stdin.isatty for cmd_model TTY guard * fix(tests): update camofox snapshot format + trajectory compressor mock path - test_browser_camofox: mock response now uses snapshot format (accessibility tree) - test_trajectory_compressor: mock _get_async_client instead of setting async_client directly * fix: URL-based auth detection for third-party Anthropic endpoints + test fixes Reverts the key-prefix approach from #4093 which broke JWT and managed key OAuth detection. Instead, detects third-party endpoints by URL: if base_url is set and isn't anthropic.com, it's a proxy (Azure AI Foundry, AWS Bedrock, etc.) that uses x-api-key regardless of key format. Auth decision chain is now: 1. _requires_bearer_auth(url) → MiniMax → Bearer 2. _is_third_party_anthropic_endpoint(url) → Azure/Bedrock → x-api-key 3. _is_oauth_token(key) → OAuth on direct Anthropic → Bearer 4. else → x-api-key Also includes test fixes from PR #4051 by @erosika: - Mock sys.stdin.isatty for cmd_model TTY guard - Update camofox snapshot format mock - Fix trajectory compressor async client mock path --------- Co-authored-by: Erosika <eri@plasticlabs.ai>	2026-03-30 20:36:56 -07:00
Robin Fernandes	1b7473e702	Fixes and refactors enabled by recent updates to main.	2026-03-31 09:29:59 +09:00
Robin Fernandes	1126284c97	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-03-31 09:29:43 +09:00
Robin Fernandes	6e4598ce1e	Merge branch 'main' into rewbs/tool-use-charge-to-subscription	2026-03-31 08:48:54 +09:00
Teknium	950f69475f	feat(browser): add Camofox local anti-detection browser backend (#4008 ) Camofox-browser is a self-hosted Node.js server wrapping Camoufox (Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set, all 11 browser tools route through the Camofox REST API instead of the agent-browser CLI. Maps 1:1 to the existing browser tool interface: - Navigate, snapshot, click, type, scroll, back, press, close - Get images, vision (screenshot + LLM analysis) - Console (returns empty with note — camofox limitation) Setup: npm start in camofox-browser dir, or docker run -p 9377:9377 Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env Advantages over Browserbase (cloud): - Free (no per-session API costs) - Local (zero network latency for browser ops) - Anti-detection at C++ level (bypasses Cloudflare/Google bot detection) - Works offline, Docker-ready Files: - tools/browser_camofox.py: Full REST backend (~400 lines) - tools/browser_tool.py: Routing at each tool function - hermes_cli/config.py: CAMOFOX_URL env var entry - tests/tools/test_browser_camofox.py: 20 tests	2026-03-30 13:18:42 -07:00
Teknium	37825189dd	fix(skills): validate hub bundle paths before install (#3986 ) Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>	2026-03-30 08:37:19 -07:00
Teknium	1e896b0251	fix: resolve 7 failing CI tests (#3936 ) 1. matrix voice: _on_room_message_media unconditionally overwrote media_urls with the image cache path (always None for non-images), wiping the locally-cached voice path. Now only overrides when cached_path is truthy. 2. cli_tools_command: /tools disable no longer prompts for confirmation (input() removed in earlier commit to fix TUI hang), but tests still expected the old Y/N prompt flow. Updated tests to match current behavior (direct apply + session reset). 3. slack app_mention: connect() was refactored for multi-workspace (creates AsyncWebClient per token), but test only mocked the old self._app.client path. Added AsyncWebClient and acquire_scoped_lock mocks. 4. website_policy: module-level _cached_policy from earlier tests caused fast-path return of None. Added invalidate_cache() before assertion. 5. codex 401 refresh: already passing on current main (fixed by intervening commit).	2026-03-30 08:10:14 -07:00
Teknium	5148682b43	feat: mount skills directory into all remote backends with live sync (#3890 ) Skills with scripts/, templates/, and references/ subdirectories need those files available inside sandboxed execution environments. Previously the skills directory was missing entirely from remote backends. Live sync — files stay current as credentials refresh and skills update: - Docker/Singularity: bind mounts are inherently live (host changes visible immediately) - Modal: _sync_files() runs before each command with mtime+size caching, pushing only changed credential and skill files (~13μs no-op overhead) - SSH: rsync --safe-links before each command (naturally incremental) - Daytona: _upload_if_changed() with mtime+size caching before each command Security — symlink filtering: - Docker/Singularity: sanitized temp copy when symlinks detected - Modal/Daytona: iter_skills_files() skips symlinks - SSH: rsync --safe-links skips symlinks pointing outside source tree - Temp dir cleanup via atexit + reuse across calls Non-root user support: - SSH: detects remote home via echo $HOME, syncs to $HOME/.hermes/ - Daytona: detects sandbox home before sync, uploads to $HOME/.hermes/ - Docker/Modal/Singularity: run as root, /root/.hermes/ is correct Also: - credential_files.py: fix name/path key fallback in required_credential_files - Singularity, SSH, Daytona: gained credential file support - 14 tests covering symlink filtering, name/path fallback, iter_skills_files	2026-03-30 02:45:41 -07:00
Teknium	b4ceb541a7	fix(terminal): preserve partial output when command times out (#3868 ) When a command timed out, all captured output was discarded — the agent only saw 'Command timed out after Xs' with zero context. Now returns the buffered output followed by a timeout marker, matching the existing interrupt path behavior. Salvaged from PR #3286 by @binhnt92. Co-authored-by: nguyen binh <binhnt92@users.noreply.github.com>	2026-03-29 21:51:44 -07:00
Robin Fernandes	1cbb1b99cc	Gate tool-gateway behind an env var, so it's not in users' faces until we're ready. Even if users enable it, it'll be blocked server-side for now, until we unlock for non-admin users on tool-gateway.	2026-03-30 13:28:10 +09:00
Teknium	2d607d36f6	fix(security): catch sensitive path writes in approval checks (#3859 ) Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>	2026-03-29 20:57:57 -07:00

... 3 4 5 6 7 ...

766 Commits