hermes-agent-features

Author	SHA1	Message	Date
Grey0202	a219a0a4df	fix(anthropic): strip top-level oneOf/allOf/anyOf from tool input_schema Extends the existing _normalize_tool_input_schema to also drop top-level union keywords that Anthropic's tool schema validator rejects with HTTP 400. Several upstream and plugin tools ship schemas with a top-level oneOf/ allOf/anyOf (common for Pydantic discriminated unions). The existing strip_nullable_unions pass only handles anyOf-with-null patterns; a non-null top-level union keyword sails through and hits the API. Salvage of #16471 — approach folded into the existing normalize helper rather than introducing a parallel _sanitize_input_schema function, to avoid two schema-munging code paths running against the same input. Co-authored-by: Grey0202 <grey0202@users.noreply.github.com>	2026-05-04 03:17:35 -07:00
charliekerfoot	412f2389f1	fix(google_oauth): close TOCTOU window when saving credentials	2026-05-04 03:16:19 -07:00
pander	6b88f46c54	fix(compressor): trigger fallback on timeout errors alongside model-not-found Previously only HTTP 404/503 and specific error strings triggered a fallback to the main model when the summary model was unavailable. Timeout errors (HTTP 408/429/502/504, or error strings containing 'timeout') entered a short cooldown instead, leaving context to grow unbounded for the rest of the session. Add _is_timeout detection alongside _is_model_not_found so that transient timeout errors on the summary model also trigger immediate fallback to the main model, preventing compression failure from cascading. Closes #15935	2026-05-04 03:10:53 -07:00
flobo3	ba8337464d	fix(gemini): extract usageMetadata from streaming chunks for token tracking	2026-05-04 02:33:30 -07:00
B1GGersnow	dc63ad0ad2	fix(anthropic): cap max_tokens at 65536 for Qwen models via DashScope DashScope's Anthropic-compatible endpoint enforces max_tokens ∈ [1, 65536]. Adding "qwen3" to _ANTHROPIC_OUTPUT_LIMITS prevents 400 errors that were misclassified as context overflow, triggering premature compression. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-04 02:31:05 -07:00
nftpoetrist	e2211b2683	fix(compressor): reset _summary_failure_cooldown_until in on_session_reset() on_session_reset() cleared _previous_summary, _last_summary_error, and _ineffective_compression_count but left _summary_failure_cooldown_until intact. When a transient summary error sets a 60 s cooldown (or 600 s for a missing-provider RuntimeError) and the user immediately runs /reset or /new, the cooldown carries into the new session. If the new session reaches the compression threshold before the cooldown expires, _generate_summary() returns None early, middle turns are silently dropped without a summary, and the agent continues with no indication that compaction was skipped. Fix: set _summary_failure_cooldown_until = 0.0 in on_session_reset(), matching the value assigned in __init__ and symmetric with the other per-session fields already cleared there. Fixes #15547	2026-05-04 02:30:31 -07:00
daixin1204	744079ffe6	fix(curator): prevent false-positive consolidation from substring matching _classify_removed_skills used naive 'in' substring matching to detect whether a removed skill's name appeared in skill_manage arguments. Short/common skill names (api, git, test, foo, etc.) matched incorrectly when they appeared as substrings of longer words in file paths (references/api-design.md) or content (latest, testing). Replace with field-aware matching: - file_path: needle must match a complete filename stem or directory name, with -/_ normalised for variant tolerance - content fields: word-boundary regex (\b) prevents embedding in longer words Also add 3 regression tests covering the false-positive scenarios.	2026-05-04 01:21:23 -07:00
nftpoetrist	808fee151d	fix(auxiliary): propagate explicit_api_key to _try_anthropic() _try_anthropic() lacked the explicit_api_key parameter added to _try_openrouter() in #18768. When resolve_provider_client() is called with provider="anthropic" and an explicit key (e.g. from a fallback_model entry with api_key set), the key was silently ignored — _try_anthropic() always fell back to resolve_anthropic_token(), so the fallback returned None,None for users without a default Anthropic credential configured. Fix: add explicit_api_key: str = None to _try_anthropic() and use explicit_api_key or <pool/env fallback> in both the pool-present and no-pool paths. Pass explicit_api_key=explicit_api_key at the call site in resolve_provider_client(). Symmetric with the _try_openrouter() fix. No behavior change when explicit_api_key is None.	2026-05-03 17:00:55 -07:00
Teknium	b58db237e4	fix(kanban): drop worker identity claim from KANBAN_GUIDANCE (#19427 ) KANBAN_GUIDANCE layer 3 of the system prompt started with 'You are a Kanban worker', overriding the profile's SOUL.md identity at layer 1. Profiles with strict role boundaries (e.g. a reviewer profile that never writes code) still executed implementation tasks because the kanban identity claim diluted SOUL's. Drop the identity line. Layer 3 now describes the task-execution protocol only; SOUL.md remains the sole identity slot. Fixes #19351	2026-05-03 16:59:00 -07:00
0xKingBack	3c42024539	fix(curator): pass auxiliary curator api_key/base_url into runtime resolution Curator review fork now forwards per-slot credentials from auxiliary.curator and legacy curator.auxiliary to resolve_runtime_provider, matching the canonical aux task schema. Add regression tests for binding and main fallback.	2026-05-03 16:55:16 -07:00
sprmn24	408dd8aa28	fix(compressor): skip non-string tool content in dedup pass to prevent AttributeError	2026-05-03 15:28:30 -07:00
Zyproth	dfdd7b6e6f	fix(codex-transport): preserve request override headers for xai responses	2026-05-03 15:25:45 -07:00
kshitij	457c7b76cd	feat(openrouter): add response caching support (#19132 ) Enable OpenRouter's response caching feature (beta) via X-OpenRouter-Cache headers. When enabled, identical API requests return cached responses for free (zero billing), reducing both latency and cost. Configuration via config.yaml: openrouter: response_cache: true # default: on response_cache_ttl: 300 # 1-86400 seconds Changes: - Add openrouter config section to DEFAULT_CONFIG (response_cache + TTL) - Add build_or_headers() in auxiliary_client.py that builds attribution headers plus optional cache headers based on config - Replace inline _OR_HEADERS dicts with build_or_headers() at all 5 sites: run_agent.py __init__, _apply_client_headers_for_base_url(), and auxiliary_client.py _try_openrouter() + _to_async_client() - Add _check_openrouter_cache_status() method to AIAgent that reads X-OpenRouter-Cache-Status from streaming response headers and logs HIT/MISS status - Document in cli-config.yaml.example - Add 28 tests (22 unit + 6 integration) Ref: https://openrouter.ai/docs/guides/features/response-caching	2026-05-03 01:54:24 -07:00
liuhao1024	af98122793	fix(auxiliary): propagate explicit_api_key to _try_openrouter() When resolve_provider_client() passes explicit_api_key for OpenRouter auxiliary tasks, _try_openrouter() now accepts and honors this parameter instead of silently ignoring it and falling back to OPENROUTER_API_KEY env var. Root cause: _try_openrouter() had no explicit_api_key parameter, so even when callers wanted to pass a runtime credential pool key, it could not be used. Fix: - Add explicit_api_key: str = None parameter to _try_openrouter() - Prioritize explicit_api_key over pool key and env var - Update resolve_provider_client() call site to pass explicit_api_key Regression coverage: - Test that explicit_api_key is passed to OpenAI client when provided - Test that fallback to OPENROUTER_API_KEY still works when explicit_api_key is None Closes #18338	2026-05-02 02:27:49 -07:00
Frank Song	2ef1ad280b	fix: prefer ~/.hermes/.env over os.environ when seeding credential pool When _seed_from_env() reads API keys to populate the credential pool, it should treat ~/.hermes/.env as the authoritative source — not os.environ. Stale env vars inherited from parent shell processes (Codex CLI, test scripts, etc.) can shadow deliberate changes to the .env file, causing auth.json to cache an outdated key that leads to silent 401 errors. This is especially visible with OpenRouter: if a parent process exported OPENROUTER_API_KEY=test-key-fresh and the user later updates .env with a valid key, restarting Hermes still picks up the stale os.environ value, writes it back to auth.json, and all API calls fail with 401. Fixes #18254	2026-05-02 02:00:32 -07:00
liuhao1024	9bf260472b	fix(tools): deduplicate tool names at API boundary for Vertex/Azure/Bedrock Providers like Google Vertex, Azure, and Amazon Bedrock reject API requests with duplicate tool names (HTTP 400: 'Tool names must be unique'). The upstream injection paths in run_agent.py already dedup after PR #17335, but two API-boundary functions pass tools through without checking: - agent/auxiliary_client.py: _build_call_kwargs() (all non-Anthropic providers in chat_completions mode) - agent/anthropic_adapter.py: convert_tools_to_anthropic() (Anthropic Messages API path) Add defensive dedup guards at both sites. Duplicates are dropped with a warning log, converting a hard 400 failure into a recoverable condition. This is intentionally conservative — the root-cause dedup in run_agent.py is the primary defense; these guards add resilience against future injection-path regressions. Includes 8 new tests covering unique passthrough, duplicate removal, empty/None edge cases. Closes #18478	2026-05-02 01:51:51 -07:00
Teknium	c73594fe41	fix(skills): rescan skill_commands cache when platform scope changes (#18739 ) The process-global `_skill_commands` dict in agent/skill_commands.py was seeded by whichever platform scanned first, and `get_skill_commands()` only rescanned when the cache was empty. In a long-lived gateway process serving multiple platforms (Telegram + Discord + Slack), the first platform's `skills.platform_disabled` view was silently inherited by the others — so a skill disabled for Telegram would also disappear from Discord's slash menu, and vice versa. Track the platform scope the cache was populated for (`_skill_commands_platform`) and rescan in `get_skill_commands()` when the currently-active platform no longer matches. Platform resolution uses the same precedence as `_is_skill_disabled`: `HERMES_PLATFORM` env var then `HERMES_SESSION_PLATFORM` from the gateway session context. Fixes #14536 Salvages #14570 by LeonSGP43. Co-authored-by: LeonSGP <leon@sgp43.com>	2026-05-02 01:36:53 -07:00
Teknium	97acd66b4c	fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671 ) (#18731 ) * fix(curator): authoritative absorbed_into declarations on skill delete Closes #18671. The classification pipeline that feeds cron-ref rewriting used to infer consolidation vs pruning from two brittle signals: the curator model's post-hoc YAML summary block, and a substring heuristic scanning other tool calls for the removed skill's name. Both miss in real consolidations — the model forgets the YAML under reasoning pressure, and the heuristic misses when the umbrella's patch content describes the absorbed behavior abstractly instead of naming the old slug. When both miss, the skill falls through to 'no-evidence fallback' pruned, and #18253's cron rewriter drops the cron ref entirely instead of mapping it to the umbrella. Same observable symptom as pre-#18253: 'Skill(s) not found and skipped' at the next cron run. The fix makes the model declare intent at the moment of deletion. skill_manage(action='delete') now accepts absorbed_into: - absorbed_into='<umbrella>' -> consolidated, target must exist on disk - absorbed_into='' -> explicit prune, no forwarding target - missing -> legacy path, falls through to heuristic/YAML The curator reconciler reads these declarations off llm_meta.tool_calls BEFORE either the YAML block or the substring heuristic. Declaration wins. Fallback logic stays intact for backward compat with any caller (human or older curator conversation) that doesn't populate the arg. Changes - tools/skill_manager_tool.py: add absorbed_into param to skill_manage + _delete_skill. Validate target exists when non-empty. Reject absorbed_into=<self>. Wire through dispatcher + registry + schema. - agent/curator.py: new _extract_absorbed_into_declarations() walks tool calls for skill_manage(delete) with the arg. _reconcile_classification accepts absorbed_declarations= and treats them as authoritative. Curator prompt updated to require the arg on every delete. - Tests: 7 new skill_manager tests covering the tool contract (valid target, empty string, nonexistent target, self-reference, whitespace, backward compat, dispatcher plumbing). 11 new curator tests covering the extractor + authoritative reconciler path + mixed-legacy-and- declared runs. Validation - 307/307 targeted tests pass (curator + cron + skill_manager suites). - E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing all 3. Model emits NO YAML block. Heuristic misses (patch prose doesn't name old slugs). Delete calls carry absorbed_into. Result: both PR skills correctly classified 'consolidated' + cron rewritten ['pr-review-format', 'pr-review-checklist', 'stale-junk'] -> ['hermes-agent-dev']; stale-junk pruned via absorbed_into=''. - E2E backward-compat: delete without absorbed_into, model emits YAML -> routed via existing 'model' source, cron still rewritten correctly. * feat(curator): capture + restore cron skill links across snapshot/rollback Before this, rolling back a curator run restored the skills tree but cron jobs still pointed at the umbrella skills the curator had rewritten them to. The user would see their old narrow skills back on disk but their cron jobs still configured with the merged umbrella — not actually 'back to how it was'. Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json alongside the skills tarball, as cron-jobs.json. The manifest gets a new 'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI confirm dialog) can surface what's in the snapshot. If jobs.json is missing/unreadable/malformed, snapshot proceeds without cron data — the skills backup is the core guarantee; cron is additive. Rollback side: after the skills extract succeeds, the new _restore_cron_skill_links() reconciles the backed-up jobs into the live jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and only on jobs matched by id. Everything else about a cron job — schedule, last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live state the user or scheduler has modified since the snapshot; overwriting it would regress unrelated activity. Reconciliation rules: - Job in backup AND live, skills differ → skills restored. - Job in backup AND live, skills match → no-op. - Job in backup, NOT in live → skipped (user deleted it after snapshot; their choice is later than the snapshot). - Job in live, NOT in backup → untouched (user created it after snapshot). - Snapshot missing cron-jobs.json at all → rollback still succeeds, reports 'not captured' (older pre-feature snapshots keep working). Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the scheduler uses, so rollback doesn't race tick(). Also: - hermes_cli/curator.py: rollback confirm dialog now shows 'cron jobs: N (will be restored for skill-link fields only)' when the snapshot has cron data, or 'not in snapshot (<reason>)' otherwise. - rollback()'s message string includes a 'cron links: ...' clause summarizing the reconciliation outcome. Tests - 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json captured-as-raw, full rollback-restores-skills-and-cron, rollback touches only skill fields, rollback skips user-deleted jobs, rollback leaves user-created jobs untouched, rollback still works with pre-feature snapshot that has no cron-jobs.json, standalone unit test on _restore_cron_skill_links exercising the full report shape. Validation - 484/484 targeted tests pass (curator + cron + skill_manager suites). - E2E: real snapshot_skills, real cron rewrite, real rollback. Before: ['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage']. After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id, name, prompt) preserved across the round trip.	2026-05-02 01:29:57 -07:00
Teknium	77c0bc6b13	fix(curator): defer first run and add --dry-run preview (#18373 ) (#18389 ) * fix(curator): defer first run and add --dry-run preview (#18373) Curator was meant to run 7 days after install, not on the very first gateway tick. On a fresh install (no .curator_state), should_run_now() returned True immediately because last_run_at was None — so the gateway cron ticker fired Curator against a fresh skill library moments after 'hermes update'. Combined with the binary 'agent-created' provenance model (anything not bundled and not hub-installed), this consolidated hand-authored user workflow skills without consent. Changes: - should_run_now(): first observation seeds last_run_at='now' and returns False. The next real pass fires one full interval_hours later (7 days by default), matching the original design intent. - hermes curator run --dry-run: produces the same review report without applying automatic transitions OR permitting the LLM to call skill_manage / terminal mv. A DRY-RUN banner is prepended to the prompt and the caller skips apply_automatic_transitions. State is NOT advanced so a preview doesn't defer the next scheduled real pass. - hermes update: prints a one-liner on fresh installs pointing at --dry-run, pause, and the docs. Silent on steady state. - Docs: curator.md and cli-commands.md explain the deferred first-run behavior and warn that hand-written SKILL.md files share the 'agent-created' bucket, with guidance to pin or preview before the first pass. Tests: - test_first_run_defers replaces the old 'first run always eligible' assertion — same fixture, inverted expectation. - test_maybe_run_curator_defers_on_fresh_install covers the gateway tick path end-to-end. - Three new dry-run tests cover state-advance suppression, prompt banner injection, and apply_automatic_transitions skipping. Fixes #18373. * feat(curator): pre-run backup + rollback (#18373) Every real curator pass now snapshots ~/.hermes/skills/ into ~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling apply_automatic_transitions or the LLM review. If a run consolidates or archives something the user didn't want touched, 'hermes curator rollback' restores the tree in one command. Dry-run is skipped — no mutation means no snapshot needed. Changes: - agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed by the skills hub). Extract refuses absolute paths and .. components, and uses tarfile's filter='data' on Python 3.12+. Rollback takes a pre-rollback safety snapshot FIRST, stages the current tree into .rollback-staging-<ts>/ so the extract lands in an empty dir, and cleans the staging dir on success. A failed extract restores the staged contents. - agent/curator.py: run_curator_review() calls curator_backup. snapshot_skills(reason='pre-curator-run') before apply_automatic_ transitions. Best-effort — a failed snapshot logs at debug and the run continues (a transient disk issue shouldn't silently disable curator forever). - hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator rollback' subcommands. rollback supports --list, --id <ts>, -y. - hermes_cli/config.py: curator.backup.{enabled, keep} config block with sane defaults (enabled=true, keep=5). - Docs: curator.md gets a 'Backups and rollback' section; cli-commands .md table gets the new rows. Tests (new file tests/agent/test_curator_backup.py, 16 cases): - snapshot creates tarball + manifest with correct counts - snapshot excludes .curator_backups/ (recursion guard) and .hub/ - snapshot disabled via config returns None without creating anything - snapshot uniquifies ids within the same second (-01 suffix) - prune honors keep count, newest-first - list_backups + _resolve_backup cover newest-default and unknown-id - rollback restores a deleted skill with content intact - rollback is itself undoable — safety snapshot shows up in list_backups - rollback with no snapshots returns an error - rollback refuses tarballs with absolute paths or .. components - real curator runs take a 'pre-curator-run' snapshot; dry-runs do not All curator tests: 210 passing locally.	2026-05-01 09:49:59 -07:00
teknium1	2af8b8ff37	fix(moonshot): also strip nullable/enum after anyOf collapse The anyOf collapse in _repair_schema returned early, skipping the nullable-strip and enum-cleanup steps. When a schema had anyOf [{enum: [..., null, '']}, {type: null}] alongside a parent-level 'nullable: true', collapsing to the single non-null branch produced a merged node that still had both 'nullable' and the bad enum values — Moonshot would still 400 on it. Fix: fall through to Rules 1/3 when the collapse produces a single merged node; only return early for the multi-branch case (pure anyOf preservation) or when there was no null branch to remove. Adds a test that locks in the combined-case expectation.	2026-04-30 23:14:31 -07:00
Hendrix	9ca72a69a7	fix(moonshot): fill missing type before enum cleanup to handle anyOf branches without explicit type When a schema node inside anyOf has enum values but no explicit 'type', Rule 3 (enum cleanup) ran before _fill_missing_type, so node_type was None and the enum was never cleaned. Moonshot then rejected the schema with 'enum value (<nil>) does not match any type in [string]'. Fix: reorder operations — fill missing type first, strip nullable, then clean enum. This ensures enum cleanup always has a type to check. Also fixes test expectation: empty string in enum is now correctly stripped (Moonshot rejects it too). Closes #16875	2026-04-30 23:14:31 -07:00
Teknium	e2eb561e8e	fix(curator): rewrite cron job skill refs after consolidation (#18253 ) When the curator consolidates skill X into umbrella Y, any cron job that listed X in its skills field would fail to load X at run time — the scheduler logs a warning and skips it, so the scheduled job runs without the instructions it was scheduled to follow. cron.jobs.rewrite_skill_refs(consolidated, pruned) now updates jobs in-place: consolidated names route to the umbrella target (dedup when umbrella is already present), pruned names are dropped. agent.curator._write_run_report calls it after classification, best-effort so a cron-side failure never breaks the curator itself. Results are recorded in run.json (counts.cron_jobs_rewritten + full cron_rewrites payload), a separate cron_rewrites.json for convenience when jobs were touched, and a section in REPORT.md. Reported by @tombielecki.	2026-04-30 23:04:50 -07:00
Teknium	f0dc919f92	fix(compression): include system prompt + tool schemas in token estimates (#18265 ) The user-visible /compress banner and the post-compression last_prompt_tokens writeback both counted only the raw message transcript (chars/4). With a 15KB system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of request pressure — a 234x gap. Two user-facing consequences: - Banner shows 'Compressing … (~45 tokens)…' while compression is actually firing on 10K+ tokens of real pressure, confusing users about why compression triggered (reported by @codecovenant on X; #6217). - Post-compression last_prompt_tokens writeback omits tool schemas, so the next should_compress() check compares real usage against a stale underestimate — compression triggers late, potentially past the model's context limit on small-context models (#14695). Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough() at every user-visible banner and at the post-compression writeback. estimate_request_tokens_rough() already existed for exactly this purpose and includes system prompt + tool schemas. Touched call sites: - run_agent.py: post-compression last_prompt_tokens writeback, post-tool call should_compress() fallback when provider usage is missing - cli.py: /compress banner + summary - gateway/run.py: gateway /compress banner + summary - tui_gateway/server.py: TUI /compress status + summary - acp_adapter/server.py: ACP /compact before/after Left intentionally alone: - Session-hygiene fallback and the 'no agent' /status path in gateway/run.py — no agent instance is in scope to query for system prompt/tools, and the existing 30-50% overestimate wobble on hygiene is safety-accepted. - Verbose-mode 'Request size' logging — informational only, already counts system prompt via api_messages[0]. Also relabels the feedback line from 'Rough transcript estimate' to 'Approx request size' so the metric label matches what it actually measures. Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217); user report @codecovenant on X (2026-04-30). Closes #14695 Closes #6217	2026-04-30 23:03:54 -07:00
Teknium	8fa44b1724	fix(guardrails): preserve display _detect_tool_failure semantics The initial guardrail PR consolidated failure classification by pointing display._detect_tool_failure at the new classify_tool_failure helper, which was strictly broader: it flagged any JSON result with "success": false / "failed": true / non-empty "error", plus plain-text "traceback" and "error:" prefixes. That would uptick the user-visible [error] tag on tools that return {"success": false} as a benign signal (memory fullness, todo state, etc.) and feed the failure-streak counter at the same time. Restore display._detect_tool_failure to its pre-PR semantics verbatim. Tighten classify_tool_failure (the guardrail's internal safety-fallback used only when callers don't pass failed=) to match _detect_tool_failure exactly, so the two never disagree. Production callers in run_agent.py already pass an explicit failed= derived from _detect_tool_failure, so the guardrail counter is driven by the same signal the CLI shows.	2026-04-30 20:43:15 -07:00
Mind-Dragon	0704589ceb	fix(agent): make tool loop guardrails warning-first	2026-04-30 20:43:15 -07:00
Mind-Dragon	58b89965c8	fix(agent): add tool-call loop guardrails	2026-04-30 20:43:15 -07:00
Teknium	0ddc8aba68	fix(fallback): let custom_providers shadow built-in aliases When a user defines `custom_providers: [{name: kimi, ...}]` and references `provider: kimi` from fallback_model or the main config, the built-in alias rewriting (`kimi` → `kimi-coding`) was hijacking the request before the named-custom lookup ran. `_get_named_custom_provider` also refused to return a match when the raw name resolved to any built-in (including aliases), so the custom endpoint was unreachable. Fix at both layers of the resolution chain so every caller benefits, not just `_try_activate_fallback`: - hermes_cli/runtime_provider.py: narrow `_get_named_custom_provider`'s built-in-wins guard to canonical provider names only. An alias like `kimi` that resolves to a different canonical (`kimi-coding`) no longer blocks the custom lookup; a canonical name like `nous` still does. - agent/auxiliary_client.py: in `resolve_provider_client`, try the named- custom lookup with the original (pre-alias-normalization) name before the alias-normalized one, so aliased requests reach the user's custom entry. Also honour `explicit_base_url` and `explicit_api_key` in the API-key provider branch so callers that pass explicit hints (e.g. fallback activation) can override the registered defaults. Tests added for: - custom `kimi` shadowing built-in alias (regression for #15743) - custom `nous` NOT shadowing canonical built-in (behaviour preserved) - bare `kimi` without any custom entry still routing to built-in - explicit base_url/api_key override on the API-key provider branch Original PR #17827 by @Feranmi10 identified the same bug class and implemented a narrower fix in `_try_activate_fallback`; this reshapes the fix to live in the shared resolution layer so all callers benefit. Fixes #15743 Co-authored-by: Feranmi10 <89228157+Feranmi10@users.noreply.github.com>	2026-04-30 20:18:44 -07:00
0z!	b194617d00	fix(context_compressor): off-by-one in tail protection for short conversations	2026-04-30 20:00:01 -07:00
Stephen Schoettler	b29b709a71	fix(agent): sanitize Codex tool-call history summaries	2026-04-30 19:58:46 -07:00
Yukipukii1	75483b6db1	fix(curator): preserve last_report_path in state	2026-04-30 19:45:59 -07:00
Teknium	c868425467	feat(kanban): durable multi-profile collaboration board (#17805 ) Salvage of PR #16100 onto current main (after emozilla's #17514 fix that unblocks plugin Pydantic body validation). History preserved on the standing `feat/kanban-standing` branch; this squashes the 22 iterative commits into one clean landing. What this lands: - SQLite kernel (hermes_cli/kanban_db.py) — durable task board with tasks, task_links, task_runs, task_comments, task_events, kanban_notify_subs tables. WAL mode, atomic claim via CAS, tenant-namespaced, skills JSON array per task, max-runtime timeouts, worker heartbeats, idempotency keys, circuit breaker on repeated spawn failures, crash detection via /proc/<pid>/status, run history preserved across attempts. - Dispatcher — runs inside the gateway by default (`kanban.dispatch_in_gateway: true`). Ticks every 60s, reclaims stale claims, promotes ready tasks, spawns `hermes -p <assignee> chat -q "work kanban task <id>"` with HERMES_KANBAN_TASK + HERMES_KANBAN_WORKSPACE env. Auto-loads `--skills kanban-worker` plus any per-task skills. Health telemetry warns on stuck ready queue. - Structured tool surface (tools/kanban_tools.py) — 7 tools (kanban_show, kanban_complete, kanban_block, kanban_heartbeat, kanban_comment, kanban_create, kanban_link). Gated on HERMES_KANBAN_TASK via check_fn so zero schema footprint in normal sessions. - System-prompt guidance (agent/prompt_builder.py KANBAN_GUIDANCE) injected only when kanban tools are active. - Dashboard plugin (plugins/kanban/dashboard/) — Linear-style board UI: triage/todo/ready/running/blocked/done columns, drag-drop, inline create, task drawer with markdown, comments, run history, dependency editor, bulk ops, lanes-by-profile grouping, WS-driven live refresh. Matches active dashboard theme via CSS variables. - CLI — `hermes kanban init\|create\|list\|show\|assign\|link\|unlink\| claim\|comment\|complete\|block\|unblock\|archive\|tail\|dispatch\|context\| init\|gc\|watch\|stats\|notify\|log\|heartbeat\|runs\|assignees` + `/kanban` slash in-session. - Worker + orchestrator skills (skills/devops/kanban-worker + kanban-orchestrator) — pattern library for good summary/metadata shapes, retry diagnostics, block-reason examples, fan-out patterns. - Per-task force-loaded skills — `--skill <name>` (repeatable), stored as JSON, threaded through to dispatcher argv as one `--skills X` pair per skill alongside the built-in kanban-worker. Dashboard + CLI + tool parity. - Deprecation of standalone `hermes kanban daemon` — stub exits 2 with migration guidance; `--force` escape hatch for headless hosts. - Docs (website/docs/user-guide/features/kanban.md + kanban-tutorial.md) with 11 dashboard screenshots walking through four user stories (Solo Dev, Fleet Farming, Role Pipeline, Circuit Breaker). - Tests (251 passing): kernel schema + migration + CAS atomicity, dispatcher logic, circuit breaker, crash detection, max-runtime timeouts, claim lifecycle, tenant isolation, idempotency keys, per- task skills round-trip + validation + dispatcher argv, tool surface (7 tools × round-trip + error paths), dashboard REST (CRUD + bulk + links + warnings), gateway-embedded dispatcher (config gate, env override, graceful shutdown), CLI deprecation stub, migration from legacy schemas. Gateway integration: - GatewayRunner._kanban_dispatcher_watcher — new asyncio background task, symmetric with _kanban_notifier_watcher. Runs dispatch_once via asyncio.to_thread so SQLite WAL never blocks the loop. Sleeps in 1s slices for snappy shutdown. Respects HERMES_KANBAN_DISPATCH_IN_GATEWAY=0 env override for debugging. - Config: new `kanban` section in DEFAULT_CONFIG with `dispatch_in_gateway: true` (default) + `dispatch_interval_seconds: 60`. Additive — no \_config_version bump needed. Forward-compat: - workflow_template_id / current_step_key columns on tasks (v1 writes NULL; v2 will use them for routing). - task_runs holds claim machinery (claim_lock, claim_expires, worker_pid, last_heartbeat_at) so multi-attempt history is first- class from day one. Closes #16102. Co-authored-by: emozilla <emozilla@nousresearch.com>	2026-04-30 13:36:47 -07:00
lsdsjy	b9b9ee3e6c	fix(deepseek): preserve v4 reasoning_content on replay	2026-04-30 11:18:39 -07:00
y0shualee	f4b76fa272	fix: use skill activity in curator status Treat skill views and edits as activity when curator reports and applies lifecycle transitions, so recently loaded or patched skills are not displayed or transitioned as never used.\n\nAdds regression tests for activity derivation, automatic transitions, and CLI status output.	2026-04-30 10:31:47 -07:00
Teknium	8b290a5908	feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941 ) * fix(curator): split 'archived' into consolidated vs pruned in run reports Users who watched a curator run saw skills like 'anthropic-api' listed under 'Skills archived' and interpreted that as pruning — but the curator had actually absorbed those skills into a new umbrella (e.g. 'llm-providers') during the same run. The directory gets archived for safety (all removals are recoverable), but the content still lives under a different name. Users then 'restored' what they thought were deleted skills and ended up with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella). Classify removed skills using this run's skill_manage tool calls: - consolidated: content absorbed into a surviving/newly-created skill (evidenced by a skill_manage write_file/patch/create/edit whose target is a different skill AND whose file_path/content references the removed skill's name) - pruned: archived without consolidation evidence (truly stale) REPORT.md now shows two distinct sections: - 'Consolidated into umbrella skills' — with `removed → merged into umbrella` - 'Pruned — archived for staleness' — pure staleness archives run.json schema additions (backward compatible): - counts.consolidated_this_run, counts.pruned_this_run - consolidated: [{name, into, evidence}, ...] - pruned: [names] - archived: retained as the union for backward compat Also: relabel the auto-transitions 'archived' counter to 'archived (no LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass archives. Tests: 9 new tests in test_curator_classification.py covering consolidation evidence parsing (write_file/patch/create), hyphen/underscore name variants, self-reference rejection, destination-must-exist, mixed runs, and malformed-JSON fallback safety. Existing test_report_md_is_human_readable updated to cover the new section names. E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified end-to-end. * feat(curator): hybrid model-declared + heuristic classification Extend the consolidated-vs-pruned split with LLM-authored intent: 1. Curator prompt now requires a structured YAML block at the end of the final response (consolidations / prunings with short rationale). 2. _parse_structured_summary() extracts it tolerantly — missing block, malformed YAML, partial lists all fall back to heuristic cleanly. 3. _reconcile_classification() merges model intent with the tool-call heuristic: - Model wins on rationale when its umbrella exists post-run - Model hallucination (umbrella doesn't exist) is downgraded to the heuristic's finding, or pruned if there's no evidence either - Heuristic catches model omission — consolidations the model enumerated tools for but forgot to list get surfaced with a '(detected via tool-call audit)' tag 4. REPORT.md now shows per-row rationale alongside 'removed → umbrella' and flags audit-only rows so the user knows why no reason is shown. Backward compat: run.json's 'archived' field (union) is preserved. 'pruned' is now a list of dicts with {name, source, reason}; 'pruned_names' is the flat-name list for legacy consumers. Tests: 15 new covering YAML parse edge cases (malformed, empty lists, bare-string entries, missing fields), reconciler rules (model wins, hallucination fallback, heuristic catches omission, prune with reason), and an end-to-end report-render test with all four paths exercised.	2026-04-30 10:31:23 -07:00
oak	4e296dcdda	fix(auxiliary): pass raw base_url to _maybe_wrap_anthropic for correct transport detection (#17467 ) Fixes HTTP 404 errors when using Anthropic-compatible providers (Kimi Coding, MiniMax, MiniMax-CN) for auxiliary tasks. Root cause: `_to_openai_base_url()` rewrites `/anthropic` → `/v1` so the OpenAI SDK hits the right endpoint. But the rewritten URL was then passed to `_maybe_wrap_anthropic`, whose `_endpoint_speaks_anthropic_messages` detector only fires on `/anthropic` or `api.kimi.com/coding`. Detector saw `/v1` → returned False → no Anthropic wrap → 404 on every aux call. Fix: preserve the raw base_url before rewriting and pass it to `_maybe_wrap_anthropic` for transport detection, while still giving the rewritten URL to the OpenAI client constructor. Closes #17705, #17413, #17086, #10469. Co-authored-by: oak <chengoak@users.noreply.github.com>	2026-04-30 10:18:42 -07:00
Bartok9	4178ab3c07	fix(skills): wire bump_use() into skill invocation and preload paths (#17782 ) bump_use() existed and was tested but had zero production call sites — use_count stayed 0 for all skills, breaking Curator's stale-detection logic which relies on last_used_at. Wire bump_use() into: 1. build_skill_invocation_message() — when a user invokes /skill-name 2. build_preloaded_skills_prompt() — when a skill is preloaded at session start Both are the canonical 'a skill is actively being used' moments, distinct from 'browsing' (bump_view in skill_view tool call). Closes #17782	2026-04-30 05:07:34 -07:00
Leone Parise	eda1d516dc	fix(skills): exclude .archive from skill index walk Archived skills (moved to ~/.hermes/skills/.archive/ by the curator) were still surfaced in the <available_skills> system prompt under a fake '.archive' category, causing the agent to load and try to use deprecated skills. The os.walk in iter_skill_index_files() only excluded .git/.github/.hub. Add '.archive' to EXCLUDED_SKILL_DIRS, and to the two other places that hardcode the same exclusion tuple (gateway/run.py and agent/skill_commands.py).	2026-04-30 04:59:22 -07:00
Teknium	e8e5985ce6	fix(curator): seed defaults on update, create logs/curator dir, defer fire import (#17927 ) Three fixes bundled for curator reliability on existing installs and broken/partial installs: 1. run_agent.py: defer `import fire` into the __main__ block. `fire` is only used by `fire.Fire(main)` when running run_agent.py directly as a CLI — it is NOT needed for library usage. Importing it at module top made `from run_agent import AIAgent` from a daemon thread (e.g. the curator's forked review agent) crash with ModuleNotFoundError on broken/partial installs where `fire` isn't present. 2. hermes_cli/config.py: add version 22 → 23 migration that writes the `curator` + `auxiliary.curator` sections to config.yaml with their defaults, only filling keys the user hasn't overridden. Existing configs from before PR #16049 / the April 2026 `auxiliary.curator` unification had neither section on disk, so users couldn't see or edit the settings in their config.yaml (runtime deep-merge papered over it at read time, but the file never reflected reality). 3. hermes_cli/config.py: `ensure_hermes_home()` now pre-creates `~/.hermes/logs/curator/` alongside cron/sessions/logs/memories on every CLI launch. Managed-mode (NixOS) variant mkdir's it defensively after the activation-script existence checks, since the activation script may not know about this subpath. 4. agent/curator.py: `_reports_root()` mkdir's the dir at call time as belt-and-suspenders for entry paths that bypass both ensure_hermes_home() and the v23 migration (gateway-only installs, bare library use). E2E validated in isolated HERMES_HOME: fresh install gets full defaults seeded; partial-override config keeps user's `enabled: false` and custom `interval_hours` while filling the missing keys; re-running the migration is a no-op.	2026-04-30 04:52:28 -07:00
Rob Moen	0dd373ec43	fix(context): honor model.context_length for Ollama num_ctx and all display paths When a user sets model.context_length in config.yaml, the value was only used for Hermes' internal compression decisions (context_compressor) but NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF metadata (often 256K+) and allocates that much VRAM regardless of the user's config — causing OOM on smaller GPUs like the P100 (16GB). Root cause: two separate context values existed independently: - context_compressor.context_length = config value (e.g. 65536) ✓ - _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config Changes: 1. Cap Ollama num_ctx to config context_length (run_agent.py) When model.context_length is explicitly set and no explicit ollama_num_ctx override exists, cap the auto-detected GGUF value to the user's context_length. This is the core fix — it prevents Ollama from allocating more VRAM than the user budgeted. 2. Pass config_context_length through all secondary call sites Several paths called get_model_context_length() without the config override, falling through to the 256K default fallback: - cli.py: @-reference expansion and /model switch display - gateway/run.py: @-reference expansion and /model switch display - tui_gateway/server.py: @-reference expansion - hermes_cli/model_switch.py: resolve_display_context_length() 3. Normalize root-level context_length in config (hermes_cli/config.py) _normalize_root_model_keys() now migrates root-level context_length into the model section, matching existing behavior for provider and base_url. Users who wrote `context_length: 65536` at the YAML root instead of under `model:` had it silently ignored. 4. Fix misleading comments (agent/model_metadata.py) DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K as two comments stated. Tests: 3 new tests for root-level context_length normalization. All existing context_length tests pass (96 tests).	2026-04-30 04:31:23 -07:00
briandevans	cc5b9fb581	fix(transport): omit thinking_config for Gemma on the gemini provider (#17426 ) The `gemini` provider also serves Gemma (e.g. `gemma-4-31b-it`) and historically other Google models like PaLM. Those reject `extra_body.thinking_config` with HTTP 400: Unknown name "thinking_config": Cannot find field `_build_gemini_thinking_config()` was unconditionally producing a config dict for any model on the `gemini` / `google-gemini-cli` provider, which `ChatCompletionsTransport.build_kwargs` then dropped into `extra_body["thinking_config"]`. The result: every chat turn for Gemma users on the gemini provider blew up at the API edge. The fix is the same shape Hermes already uses for the Gemini-2.5 vs Gemini-3 family clamping: normalise the model id, strip an `OpenRouter`-style `google/` prefix, and short-circuit early when the result doesn't start with `gemini`. We return `None` rather than `{"includeThoughts": False}`, because the API rejects the field name itself — even the polite "off" form trips the same 400. Three regression tests cover Gemma with reasoning enabled, Gemma with reasoning disabled, and the `google/gemma-…` OpenRouter-style id; the existing Gemini-2.5 / Gemini-3 / `google/gemini-…` cases keep passing because the Gemini guard fires after the prefix strip. Fixes #17426 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 04:29:04 -07:00
Teknium	0da968e521	fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868 ) Voscko reported curator.auxiliary.provider/model was advertised in the docs but ignored — the review fork read only model.provider/default. The narrow fix would wire the one-off key through, but that leaves curator as a parallel system: not in `hermes model` → auxiliary picker, not in the dashboard Models tab, missing per-task base_url/api_key/timeout/ extra_body. Unify curator with the rest of the aux task system so `hermes model` and the dashboard configure it like every other aux task. Four sources of truth updated: - hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary (timeout=600 since reviews run long), drop the one-off curator.auxiliary block from DEFAULT_CONFIG.curator. - hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass') to _AUX_TASKS so the CLI picker offers it. - hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the dashboard REST endpoint accepts it. - web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard Models tab renders the task. agent/curator.py _resolve_review_model() now reads auxiliary.curator first (canonical), falls back to legacy curator.auxiliary (with an info log asking users to migrate), then falls back to the main chat model. Pre-unification users keep working. Docs updated: docs/user-guide/features/curator.md now points at `hermes model` → auxiliary → Curator and the dashboard Models tab. Tests: 6 unit tests on _resolve_review_model (auto default, canonical slot honored, partial override fallback, legacy fallback with deprecation log assertion, new-wins-over-legacy, empty-config safety) plus a cross-registry test that curator is wired into all four sources of truth. test_aux_tasks_keys_all_exist_in_default_config already covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant. Reported by Voscko on Discord.	2026-04-30 02:46:01 -07:00
Teknium	ce0c3ae493	fix(aux): remove hardcoded Codex fallback model, drop Codex from auto chain (#17765 ) The _CODEX_AUX_MODEL constant had already rotated twice in 6 weeks (gpt-5.3-codex -> gpt-5.2-codex -> now broken again at gpt-5.2-codex) because ChatGPT-account Codex gates which models it accepts via an undocumented, shifting allow-list that OpenAI publishes no changelog for. Any pinned default will keep going stale. Issue #17533 reports the current breakage: every ChatGPT-account auxiliary fallback fails with HTTP 400 "model is not supported" and the 60s pause loop degrades long sessions. Rather than reset the clock with another stale pin (PR #17544 proposes gpt-5.2-codex -> gpt-5.4), remove the hardcoded second-order Codex fallback entirely: - Delete `_CODEX_AUX_MODEL`. - Drop `_try_codex` from `_get_provider_chain()` (the auto chain now ends at api-key providers; 4 rungs instead of 5). - Rename `_try_codex() -> _build_codex_client(model)` and require an explicit model from the caller. No more guessing. - `resolve_provider_client("openai-codex", model=None)` now warns and returns (None, None) instead of silently guessing a stale model ID. - Remove `_try_codex` from the `provider="custom"` fallback ladder (same stale-constant trap). - `_resolve_strict_vision_backend("openai-codex")` routes through `resolve_provider_client` so the caller's explicit model is honored. Codex-main users are unaffected: Step 1 of `_resolve_auto` already uses `main_provider` + `main_model` directly and passes the user's configured Codex model through `resolve_provider_client`, which never touched `_CODEX_AUX_MODEL`. Per-task overrides (`auxiliary.<task>.provider/model`) continue to work and are the supported way to route specific aux tasks through Codex. Users whose main provider fails with a payment/connection error and who have ONLY ChatGPT-account Codex auth will now see the 60s pause without a stale-model-rejection noise line in between -- same outcome, cleaner failure. Closes #17533. Supersedes #17544 (which resets the clock on the same stale-constant problem).	2026-04-29 23:23:50 -07:00
Stephen Schoettler	f73364b1c4	fix(ci): stabilize main test suite regressions (#17660 ) * fix: stabilize main test suite regressions * test(agent): update MiniMax normalization expectation * test: stabilize remaining CI assertions * test: harden config helper monkeypatching * test: harden CI-only assertions * fix(agent): propagate fast streaming interrupts	2026-04-29 23:18:55 -07:00
Teknium	828d3a320b	fix(anthropic): reactive recovery for OAuth 1M-context beta rejection (#17752 ) Keep context-1m-2025-08-07 in OAuth requests by default so 1M-capable subscriptions retain full context. When Anthropic rejects a request with 400 'long context beta is not yet available for this subscription', disable the beta for the rest of the session, rebuild the client, and retry once. Addresses #17680 (thanks @JayGwod for the clean reproduction) without forcing every OAuth user off the 1M context window. Changes: - agent/error_classifier.py: new FailoverReason.oauth_long_context_beta_forbidden; pattern matches 400 + 'long context beta' + 'not yet available'. Narrow enough that the existing 429 tier-gate pattern keeps its own reason. - agent/anthropic_adapter.py: _common_betas_for_base_url, build_anthropic_client, build_anthropic_kwargs gain drop_context_1m_beta kwarg. Default=False (1M stays). OAuth OAUTH_ONLY_BETAS unchanged. - agent/transports/anthropic.py: build_kwargs forwards the flag. - run_agent.py: self._oauth_1m_beta_disabled flag, retry-once guard, recovery branch next to the image-shrink path. _rebuild_anthropic_client honors the flag. The main build_kwargs call site threads it through for fast-mode extra_headers. - hermes_cli/doctor.py, hermes_cli/models.py: sibling OAuth /v1/models probes get the same reactive retry — previously they'd falsely report the Anthropic API as unreachable for affected subscriptions. Tests: 2190 tests/agent/ + 94 adjacent integration tests pass. New unit tests cover the classifier pattern (including the collision guard against the 429 tier-gate) and the drop_context_1m_beta adapter behavior (default keeps 1M, flag strips only 1M while preserving every other beta).	2026-04-29 21:56:54 -07:00
teknium1	dd2d1ba5e6	refactor(reload-skills): queue note for next turn, drop cache invalidation + agent tool Salvage-follow-up to @shannonsands's /reload-skills PR. Trims the feature to match the design: user-initiated rescan, no prompt-cache reset, no new schema surface, no phantom user turn, and the next-turn note carries each added/removed skill's 60-char description (not just its name). Changes vs the original PR: * Drop the in-process skills prompt-cache clear in reload_skills(). Skills are invoked at runtime via /skill-name, skills_list, or skill_view — they don't need to live in the system prompt for the model to use them. Keeping the cache intact preserves prefix caching across the reload so /reload-skills pays no cache-reset cost. (MCP has to break the cache because tool schemas must be known at conversation start; skills do not.) * Drop the skills_reload agent tool and SKILLS_RELOAD_SCHEMA from tools/skills_tool.py, plus the four skills_reload enumerations in toolsets.py. No new schema surface — agents can already see a freshly- installed skill via skill_view / skills_list the moment it's on disk. * Replace the phantom 'role: user' turn injection with a one-shot queued note. CLI uses self._pending_skills_reload_note (same pattern as _pending_model_switch_note, prepended to the next API call and cleared). Gateway uses self._pending_skills_reload_notes[session_key]. The note is prepended to the NEXT real user message in this session, so message alternation stays intact and nothing out-of-band is persisted to the transcript. * reload_skills() now returns added/removed as [{'name': str, 'description': str}, ...] (description truncated to 60 chars — matches the curator / gateway adapter budget). The injected next-turn note formats each entry as 'name — description' so the model can actually reason about which new skills to call without running skills_list first. * Only emit the note when the diff is non-empty. On empty diff, print 'No new skills detected' and do nothing else. * Tests rewritten to cover the queue semantics, the description payload, and a regression guard that the prompt-cache snapshot is preserved.	2026-04-29 21:07:47 -07:00
Shannon Sands	7966560fb5	feat(skills): /reload-skills slash command + skills_reload agent tool Adds a public reload path for the in-process skill caches so newly installed (or removed) skills become visible mid-session without a gateway restart. Mirrors the shape of /reload-mcp. Three surfaces: * /reload-skills slash command — CLI (cli.py) and gateway (gateway/run.py), with /reload_skills alias for Telegram autocomplete and an explicit Discord registration. * skills_reload agent tool (tools/skills_tool.py) — lets agents/subagents pick up freshly-installed skills via tool call. * agent.skill_commands.reload_skills() — shared helper that clears _skill_commands, _SKILLS_PROMPT_CACHE (in-process LRU), and the on-disk .skills_prompt_snapshot.json, then returns an added/removed diff plus the new total count. Tested: * tests/agent/test_skill_commands_reload.py (9 cases) * tests/cli/test_cli_reload_skills.py (3 cases) * tests/gateway/test_reload_skills_command.py (4 cases) Use case: NemoClaw / OpenShell-style sandboxed orchestrators that drop skills into ~/.hermes/skills mid-session, plus agentic flows where the agent itself installs a skill via the shell tool and needs it bound without a gateway restart. The Python helper clear_skills_system_prompt_cache(clear_snapshot=True) already exists internally — this PR just exposes it via slash command and tool.	2026-04-29 21:07:47 -07:00
Nanako0129	c5a5e586d7	fix(gemini): nest OpenAI-compat thinking config under google	2026-04-29 12:10:40 -07:00
teknium1	40a98fb0fa	feat(minimax-oauth): full integration with peer OAuth providers Close integration gaps discovered by auditing qwen-oauth's file coverage. These are surfaces the original salvage missed — they all existed on main and were added in the 747 commits since PR #15203 was opened. Coverage added: - agent/credential_pool.py: seed pool from auth.json providers.minimax-oauth so `hermes auth list` reflects logged-in state and `hermes auth remove minimax-oauth <N>` works through the standard flow. - agent/credential_sources.py: register RemovalStep for minimax-oauth with suppression-aware `_clear_auth_store_provider`. - agent/models_dev.py: PROVIDER_TO_MODELS_DEV mapping (-> 'minimax' family). - hermes_cli/providers.py: HermesOverlay entry (anthropic_messages transport, oauth_external auth_type, api.minimax.io/anthropic base). - hermes_cli/model_normalize.py: add to _MATCHING_PREFIX_STRIP_PROVIDERS so `minimax-oauth/MiniMax-M2.7` in config.yaml gets correctly repaired. - hermes_cli/status.py: render MiniMax OAuth block in `hermes doctor` (logged-in / region / expires_at / error). - hermes_cli/web_server.py: register in OAUTH_PROVIDER_REGISTRY + dispatch branch in _resolve_provider_status so the dashboard auth page shows it. - website/docs/integrations/providers.md: full 'MiniMax (OAuth)' section. - website/docs/reference/cli-commands.md: --provider enum. - website/docs/user-guide/features/fallback-providers.md: fallback table row. - scripts/release.py AUTHOR_MAP: amanning3390 mapping (CI gate).	2026-04-29 09:53:42 -07:00
Adam Manning	0b2f1bb27b	feat(agent): wire MiniMax-M2.7 for minimax-oauth provider Wire MiniMax-M2.7 and MiniMax-M2.7-highspeed into the model catalog, CLI model picker, and agent auxiliary/metadata subsystems. Changes: - hermes_cli/models.py: - Add 'minimax-oauth' to _PROVIDER_MODELS with MiniMax-M2.7 and MiniMax-M2.7-highspeed - Add ProviderEntry('minimax-oauth', 'MiniMax (OAuth)', ...) to CANONICAL_PROVIDERS near existing minimax entries - Add aliases: minimax-portal, minimax-global, minimax_oauth in _PROVIDER_ALIASES - hermes_cli/main.py: - Add 'minimax-oauth' to provider_labels dict - Insert 'minimax-oauth' into providers list in select_provider_and_model() near the other minimax entries - Add 'minimax-oauth' to --provider argparse choices - Add _model_flow_minimax_oauth() function: ensures login via _login_minimax_oauth(), resolves runtime credentials, prompts for model selection, saves model choice and config - Add dispatch elif branch for selected_provider == 'minimax-oauth' - agent/auxiliary_client.py: - Add 'minimax-oauth': 'MiniMax-M2.7-highspeed' to _API_KEY_PROVIDER_AUX_MODELS - Add 'minimax-oauth' to _ANTHROPIC_COMPAT_PROVIDERS set - agent/model_metadata.py: - Add 'minimax-oauth' to _PROVIDER_PREFIXES frozenset - MiniMax-M2.7 context length (200_000) already covered by the existing 'minimax' substring match in DEFAULT_CONTEXT_LENGTHS	2026-04-29 09:53:42 -07:00
vominh1919	fd5479a4fc	fix: preserve DeepSeek thinking blocks on Anthropic replay (#16748 ) DeepSeek's /anthropic endpoint requires thinking blocks to be replayed in multi-turn conversations for reasoning continuity. The existing code classified api.deepseek.com as a generic third-party endpoint and stripped ALL thinking blocks, causing HTTP 400 from DeepSeek. Fix: add _is_deepseek_anthropic_endpoint() detector (following the Kimi precedent) and a dedicated branch that strips only signed Anthropic blocks while preserving unsigned ones synthesised from reasoning_content. This follows the exact same pattern as the Kimi exemption (issue #13848) and does not change behavior for any other third-party endpoint (Azure, Bedrock, MiniMax, etc.). Fixes NousResearch/hermes-agent#16748	2026-04-29 08:10:29 -07:00
Teknium	1bedc836b5	docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507 ) The ~/.openclaw/ detection banner (#16327) had two problems flagged in #16629: 1. It only pitched 'hermes claw cleanup' (destructive archive) and never mentioned 'hermes claw migrate' — the actual non-destructive path that ports config/memory/skills into Hermes. 2. The copy anthropomorphized the bug ('the agent can still get confused', 'dutifully reads') and framed OpenClaw as a competitor to eliminate ('instead of Hermes's'). Rewrite so migrate leads, cleanup is a clearly-labelled follow-up with a warning that archiving breaks OpenClaw for users still running it. Closes #16629	2026-04-29 08:08:36 -07:00
Teknium	83c288da01	fix(anthropic): broaden Kimi thinking-suppression to custom endpoints (#17455 ) The guard that drops Anthropic's `thinking` kwarg for Kimi endpoints was matched on `https://api.kimi.com/coding` only. Users configuring a custom Kimi-compatible gateway (or an official Moonshot host) with `api_mode: anthropic_messages` fall through to the generic third-party path, which strips thinking blocks AND still sends `thinking={enabled,...}` → upstream rejects with HTTP 400 "reasoning_content is missing in assistant tool call message at index N" on the next request after a tool call. Replace `_is_kimi_coding_endpoint` callers (history replay + thinking kwarg gate) with `_is_kimi_family_endpoint(base_url, model)` that also matches the `api.kimi.com` / `moonshot.ai` / `moonshot.cn` hosts and Kimi/Moonshot family model names (`kimi-`, `moonshot-`, `k1.`, `k2.`, …) for custom / proxied endpoints. Keeps the UA-header check in `build_anthropic_client` URL-only — the `claude-code/0.1.0` header is an official-Kimi contract. Plumbs optional `model` through `convert_messages_to_anthropic` so the unsigned reasoning_content→thinking block synthesised for Kimi's history validation survives the third-party signature-stripping pass on custom hosts too. Closes #17057.	2026-04-29 06:35:42 -07:00
vominh1919	7141cda967	fix: narrow Anthropic adapter dot-mangling to Claude models only The normalize_model_name() function unconditionally converted dots to hyphens in all model names. This caused non-Anthropic models (e.g. gpt-5.4) to be mangled to gpt-5-4 when routed through the Anthropic adapter path, resulting in HTTP 404 from the backend. Now only applies dot-to-hyphen conversion for models starting with "claude-" or "anthropic/", which are the actual Anthropic model IDs. Fixes NousResearch/hermes-agent#17171 Related: #7421, #13061, #16417	2026-04-29 06:34:57 -07:00
Teknium	ff687c019e	fix(aux): skip kimi-coding in vision auto-detect (closes #17076 ) (#17451 ) * docs(anthropic): correct OAuth scope to Max plan + extra usage credits only The previous docs pass (#17399) overstated what Anthropic OAuth works with. In practice Hermes can only route against a Claude Max plan that has purchased extra usage credits — the base Max allowance is not consumed, and Claude Pro is not supported at all. Without Max + extra credits, users must fall back to an ANTHROPIC_API_KEY (pay-per-token). Updates the four pages touched in #17399: - integrations/providers.md - user-guide/features/credential-pools.md - reference/environment-variables.md - getting-started/quickstart.md * fix(aux): skip kimi-coding in vision auto-detect (closes #17076) Kimi Coding Plan's /coding endpoint (Anthropic Messages wire) has no image_in capability — Kimi's own docs confirm and suggest switching to a vision-capable model. Vision lives on the separate Kimi Platform (api.moonshot.ai, OpenAI-wire, pay-as-you-go). When the user has kimi-coding as main provider and auxiliary.vision.provider=auto, resolve_vision_provider_client was handing back an AnthropicAuxiliaryClient wrapped around /coding which 404'd on every vision request. Add a _PROVIDERS_WITHOUT_VISION frozenset ({kimi-coding, kimi-coding-cn}) and gate the main-provider vision branch on membership. On a skip the auto-detect falls through to OpenRouter → Nous like any other main-provider-unavailable case. Explicit per-task overrides (auxiliary.vision.provider=kimi-coding) are unaffected — the skip only applies when the caller is in auto mode. Tests: 4 new targeted tests in TestVisionAutoSkipsKimiCoding covering the skip path, CN variant, explicit-override passthrough, and a guard against accidental skip-list widening.	2026-04-29 06:10:23 -07:00
Teknium	13683c0842	feat(memory): notify providers on mid-process session_id rotation (#17409 ) Fixes #6672 Memory providers now receive on_session_switch() whenever AIAgent.session_id rotates mid-process — /resume, /branch, /reset, /new, and context compression. Before this, providers that cached per-session state in initialize() (Hindsight's _session_id, _document_id, accumulated _session_turns, _turn_counter) kept writing into the old session's record after the agent had moved on. MemoryProvider ABC ------------------ - New optional hook on_session_switch(new_session_id, , parent_session_id='', reset=False, *kwargs) with no-op default for backward compat. reset=True signals /reset or /new — providers should flush accumulated per-session buffers. reset=False for /resume, /branch, compression where the logical conversation continues. MemoryManager ------------- - on_session_switch() fans the hook out to every registered provider. Isolated try/except per provider — one bad provider can't block others. - Empty/None new_session_id is a no-op to avoid corrupting provider state during shutdown paths. run_agent.py ------------ - _sync_external_memory_for_turn now passes session_id=self.session_id into sync_all() and queue_prefetch_all(). Providers with defensive session_id updates in sync_turn (Hindsight already had this at plugins/memory/hindsight/__init__.py:1199) now actually receive the current id. - Compression block at ~L8884 already notified the context engine of the rollover; now also calls _memory_manager.on_session_switch(reason='compression'). cli.py ------ - new_session() fires reset=True, reason='new_session' so providers flush buffers. - _handle_resume_command fires reset=False, reason='resume' with the previous session as parent_session_id. - _handle_branch_command fires reset=False, reason='branch' with the parent session_id already captured for the DB parent link. gateway/run.py -------------- - _handle_resume_command now evicts the cached AIAgent, mirroring /branch and /reset. The next message rebuilds a fresh agent whose memory provider initialize() runs with the correct session_id — matches the pattern the gateway already uses for provider state cross-session transitions. Hindsight reference implementation ---------------------------------- - plugins/memory/hindsight/__init__.py adds on_session_switch that: updates _session_id, mints a fresh _document_id (prevents vectorize-io/hindsight#1303 overwrite), and clears _session_turns / _turn_counter / _turn_index so in-flight batches don't flush under the new document id. parent_session_id only overwritten when provided (avoids clobbering on a bare switch). Tests ----- - tests/agent/test_memory_session_switch.py: new dedicated file. ABC default no-op, manager fan-out, failure isolation, empty-id no-op, session_id propagation through sync_all/queue_prefetch_all, Hindsight state transitions for every reset/non-reset case, parent preservation. - tests/cli/test_branch_command.py: new test verifying /branch fires the hook with correct parent_session_id + reset=False + reason. - tests/gateway/test_resume_command.py: new test verifying /resume evicts the cached agent. - tests/run_agent/test_memory_sync_interrupted.py: updated existing assertions to account for the session_id kwarg on sync_all and queue_prefetch_all. E2E verified (real imports, tmp HERMES_HOME): - /resume: session_id updates, doc_id fresh, buffers cleared, parent set - /branch: session_id forks, parent links to original - /new: reset=True clears accumulated state - compression: reason='compression' propagated, lineage preserved - Empty id: no-op, state preserved - Legacy provider without on_session_switch: no crash Reported by @nicoloboschi (Hindsight maintainer); related scope-widening comment by @kidonng extending coverage to compression.	2026-04-29 04:57:22 -07:00
Oluwadare Feranmi	860ff445f6	fix(usage_pricing): add MiniMax-M2.7 pricing for minimax and minimax-cn providers Fixes #16825. Sessions using MiniMax-M2.7 via minimax-cn showed estimated_cost_usd=0.0 and cost_status='unknown' because neither provider had a billing route or pricing entry. Adds official_docs_snapshot entries ($0.30/M input, $1.20/M output) for both minimax and minimax-cn, and adds explicit routing in resolve_billing_route so both resolve to billing_mode='official_docs_snapshot' instead of falling through to 'unknown'.	2026-04-29 04:56:50 -07:00
Teknium	21676e80cc	Revert "fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957 )" (#17397 ) This reverts commit `023f5c74b1`.	2026-04-29 03:55:03 -07:00
Teknium	bc0d8a941e	feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307 ) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.	2026-04-28 23:23:11 -07:00
teknium1	fa9383d27b	feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations Based on three live test runs against 346 agent-created skills on the author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator prompt needed three sharpenings before it consistently produced real umbrella consolidation instead of passive audit output: Umbrella-first framing. The original 'decide keep/patch/archive/ consolidate' framing lets opus default to 'keep' whenever two skills aren't byte-identical. The new prompt explicitly tells the reviewer that pairwise distinctness is the wrong bar — the right question is 'would a human maintainer write this as N separate skills, or one skill with N labeled subsections?' Expect 10-25 prefix clusters; merge each into an umbrella via one of three methods. Three concrete consolidation methods. (a) Merge into an existing umbrella (patch the broadest skill, archive siblings); (b) Create a new umbrella SKILL.md (skill_manage action=create); (c) Demote session-specific detail into references/, templates/, or scripts/ under the umbrella via skill_manage action=write_file, then archive the narrow sibling. This matches the support-file vocabulary the review-prompt side already uses (PR #17213). Two observed bailouts pre-empted: 'usage counters are zero so I can't judge' (rule 4: judge on content, not use_count) and 'each has a distinct trigger' (rule 5: pairwise distinctness is the wrong bar). Config-aware parent inheritance. _run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. Fork now inherits the user's main provider and model (via load_config + resolve_runtime_provider) before spawning — runs on whatever the user is currently on, OAuth-backed or pool-backed included. Unbounded iteration ceiling. max_iterations=8 was way too low for an umbrella-build pass over hundreds of skills. A live pass takes 50-100 API calls (scanning, clustering, skill_view'ing candidates, patching umbrellas, mv'ing siblings). Raised to 9999 — the natural stopping criterion is 'no more clusters worth processing', not an arbitrary tool-call budget. Tests updated: test_curator_review_prompt_has_invariants accepts DO NOT / MUST NOT and drops 'keep' from the required-verb set (the umbrella-first prompt correctly deemphasizes 'keep' as a first-class decision label since passive keep-everything is the failure mode being prevented). Added test_curator_review_prompt_is_umbrella_first asserting the umbrella framing, class-level thinking, references/ + templates/ + scripts/ support-file mentions, and the 'use_count is not evidence of value' pre-emption. Added test_curator_review_prompt_offers_support_file_actions asserting skill_manage action=create and action=write_file are both named. Live validation on author's setup: - Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome - Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella - Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.	2026-04-28 22:33:33 -07:00
Teknium	a12f7aa8bb	fix(curator): default cycle is every 7 days, not 24 hours Weekly is closer to how skill churn actually works — most agent-created skills don't change multiple times per day, so a daily review is pure cost without benefit. Bumping the default to 7 days reduces aux-model spend while still catching drift and staleness on the timescales that matter (30d stale, 90d archive). Changes: - DEFAULT_INTERVAL_HOURS: 24 -> 168 (7 days) - config.yaml default: interval_hours: 24 -> 24 * 7 - CLI status line renders as '7d' when interval is a whole-day multiple - Test `test_old_run_eligible` decoupled from the exact default: it now uses 2 * get_interval_hours() so future tweaks don't break it	2026-04-28 22:33:33 -07:00
Teknium	c8b7e7268a	refactor(curator): point review prompt at existing tools The LLM review prompt mentioned bespoke `archive_skill` and `pin_skill` tools that are not registered as model tools. Swap the prompt to rely on the real surface: - skill_manage action=patch — for patching and consolidation - terminal — to `mv` skill dirs into .archive/ Also drop `pin` from the model's decision list — pinning is a user opt-out for `hermes curator pin <skill>`, not something the model should do autonomously. Decision list is now: keep / patch / consolidate / archive. Tests updated: prompt-invariant test now asserts the existing tools are referenced and that bespoke tool names do NOT appear. New test prevents `pin` from being re-added as a model decision.	2026-04-28 22:33:33 -07:00
Teknium	bc79e227e6	feat(curator): background skill maintenance (issue #7816 ) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.	2026-04-28 22:33:33 -07:00
Mil Wang (from Dev Box)	88602376d4	fix: resolve external_dirs relative to HERMES_HOME instead of cwd (#9949 ) Relative entries in skills.external_dirs were resolved against the process cwd via Path.resolve(), making them silently fail when Hermes was launched from a different directory. Resolve relative paths against get_hermes_home() for consistent behavior across CLI, gateway, and cron contexts. Absolute paths and env-var/tilde expansion are unchanged.	2026-04-28 22:29:09 -07:00
Teknium	8c892c1453	refactor(redact): canonical mask_secret helper; fix status.py DIM drift (#17207 ) Three modules independently implemented the same "preserve head+tail of a secret, mask the middle" logic with slightly different behaviors that had started to drift: hermes_cli/config.py redact_key — 12-char floor, 4+4, DIM '(not set)' hermes_cli/status.py redact_key — 12-char floor, 4+4, plain '(not set)' ← drift hermes_cli/dump.py _redact — 12-char floor, 4+4, empty string The visible bug: 'hermes status' displayed the '(not set)' placeholder in plain text while 'hermes config' showed it in dim text. Same concept, inconsistent UI. Introduces mask_secret() in agent/redact.py as the canonical helper, with head/tail/floor/placeholder/empty kwargs. The three call sites become one-line wrappers that differ only in the 'empty' handling: config.redact_key → mask_secret(k, empty=color('(not set)', Colors.DIM)) status.redact_key → mask_secret(k, empty=color('(not set)', Colors.DIM)) dump._redact → mask_secret(v) # empty → '' agent.redact._mask_token (log redactor, different policy: 18-char floor, 6+4 visible, '*' on empty) also ports to mask_secret but retains its own empty-case handling to preserve the historical '' return. Net: the three display-time redactors now agree on formatting, the canonical helper lives in one place, and future tweaks (e.g. adding bullet-point masking, changing the head/tail widths) happen once. Verified: - 3/3 tests/hermes_cli/test_web_server.py::TestRedactKey pass - 89/89 agent/tests/test_redact.py + tests/tools/test_browser_secret_exfil.py + tests/hermes_cli/test_redact_config_bridge.py pass - Live 'hermes status', 'hermes config', 'hermes dump' all render the same way they did before (verified against actual env with real keys: OpenRouter, Firecrawl, Browserbase, FAL, Tinker all show 'prefix...suffix'; Kimi shows '**' at <12 chars; unset shows '(not set)' uniformly). Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 21:04:35 -07:00
Rugved Somwanshi	a0105a7f81	chore(agent): drop drift from rebasing	2026-04-28 12:27:36 -07:00
Rugved Somwanshi	01ad0aacaf	fix(tui): show correct context length	2026-04-28 12:27:36 -07:00
Rugved Somwanshi	214ca943ac	feat(agent): add lmstudio integration	2026-04-28 12:27:36 -07:00
Teknium	b5128a751b	perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage (#17046 ) * perf(startup): lazy-import OpenAI, Anthropic, Firecrawl, account_usage Four heavy SDK/module imports are now deferred off the hot startup path. Net savings on cold module imports: cli 1200 → 958 ms (-242) run_agent 1220 → 901 ms (-319) tools.web_tools 711 → 423 ms (-288) agent.anthropic_adapter 230 → 15 ms (-215) agent.auxiliary_client 253 → 68 ms (-185) Four independent changes in one PR since they all use the same pattern and share the same risk profile (heavy SDK import → lazy proxy or function-local import): 1. tools/web_tools.py: 'from firecrawl import Firecrawl' moved into _get_firecrawl_client(), which is only called when backend='firecrawl'. Users on Exa/Tavily/ Parallel pay zero firecrawl cost. 2. cli.py + gateway/run.py: 'from agent.account_usage import ...' moved into the /limits handlers. account_usage transitively pulls the OpenAI SDK chain; only needed when the user runs /limits. 3. agent/anthropic_adapter.py: 'try: import anthropic as _anthropic_sdk' replaced with a cached '_get_anthropic_sdk()' accessor. The three usage sites (build_anthropic_client, build_anthropic_bedrock_client, read_claude_code_credentials_from_keychain) now resolve via the accessor. All pre-existing test patches of 'agent.anthropic_adapter._anthropic_sdk' keep working because the accessor respects any value already in module globals. 4. agent/auxiliary_client.py AND run_agent.py: 'from openai import OpenAI' replaced with an '_OpenAIProxy()' module- level object that looks like the OpenAI class but imports the SDK on first call/isinstance check. This preserves: - 15+ in-module OpenAI(...) construction sites in auxiliary_client and the single site in run_agent's _create_openai_client (Python's function-scope name lookup finds the proxy, forwards the call); - 'patch("agent.auxiliary_client.OpenAI", ...)' and 'patch("run_agent.OpenAI", ...)' test patterns used by 28+ test files (patch replaces the module attribute as usual). Tried two alternatives first: - 'from openai._client import OpenAI' — doesn't skip openai/__init__.py (the audit's hypothesis here was wrong). - Module-level __getattr__ — works for external access but Python function-scope name resolution skips __getattr__, so in-module OpenAI(...) calls NameError. Note: 'openai' still loads on 'import cli' because cli.py -> neuter_async_httpx_del() -> openai._base_client, and run_agent.py -> code_execution_tool.py (module-level build_execute_code_schema) -> _load_config() -> 'from cli import CLI_CONFIG'. Deferring those is a separate, larger change — out of scope for this PR. The savings above all come from avoiding the openai/, anthropic/, and firecrawl/* top-level type-tree imports on paths that don't need them. Verified: - 302/302 tests in tests/agent/{test_anthropic_adapter, test_bedrock_1m_context, test_minimax_provider, test_anthropic_keychain} pass. Two pre-existing failures on main unchanged. - 106/106 tests/agent/test_auxiliary_client.py pass (1 pre-existing fail). - 97/97 tests/run_agent/test_create_openai_client_kwargs_isolation.py, test_plugin_context_engine_init.py, test_invalid_context_length_warning.py, test_api_max_retries_config.py, tests/hermes_cli/test_gemini_provider.py, test_ollama_cloud_provider.py pass (1 pre-existing fail). - Live hermes chat smoke: 2 turns + /model switch + tool calls, zero errors in the 57-line agent.log window. - Module-level import of run_agent + auxiliary_client + anthropic_adapter no longer pulls 'anthropic' or 'firecrawl' at all. * fix(gateway): restore top-level account_usage import for test-patch surface CI caught two failures in tests/gateway/test_usage_command.py that I missed locally: AttributeError: 'module' object at gateway.run has no attribute 'fetch_account_usage' The test uses monkeypatch.setattr('gateway.run.fetch_account_usage', ...) to inject a fake account-fetch call. Moving the import inside the handler deleted that module-level attribute, breaking the patch surface. Restoring the top-level import in gateway/run.py gives up the ~230 ms gateway-boot savings from that one lazy, but: 1. the gateway is a long-running daemon — boot cost is paid once per install, not per turn; 2. the other four lazy-imports (firecrawl, openai, anthropic, cli's account_usage) remain in place and still account for the bulk of the savings reported in the PR body; 3. preserving the patch surface keeps the established 'gateway.run.fetch_account_usage' monkeypatch pattern working without touching tests. Verified: tests/gateway/test_usage_command.py — 8 passed, 0 failed. Full targeted sweep (2336 tests across agent/gateway/hermes_cli/run_agent): 2332 passed, 4 failed — all 4 pre-existing on main. --------- Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 09:38:42 -07:00
Teknium	1d8b9e6458	fix(auxiliary): auto-detect Anthropic Messages transport for all aux clients (#17027 ) Auxiliary tasks (title_generation, vision, compression, web_extract, session_search) now pick the correct wire protocol based on the endpoint, not just on which resolve_provider_client branch built the client. Fixes 404s on Kimi Coding Plan and any other named provider whose endpoint speaks Anthropic Messages. Root cause: the 'api_key' branch of resolve_provider_client (and the Step 2 fallback chain inside _resolve_auto) always built a plain OpenAI client regardless of what the endpoint actually spoke. For provider=kimi-coding + model=kimi-for-coding, that meant: POST https://api.kimi.com/coding/v1/chat/completions { "model": "kimi-for-coding", ... } → 404 resource_not_found_error The /coding route only accepts the Anthropic Messages shape (the main agent already uses api_mode=anthropic_messages for it). Earlier fixes (#16819, #22ddac4b1) patched the anonymous-custom, named-custom, and external-process branches — but the named api_key branch (kimi-coding, minimax, zai, future /anthropic providers) was the fourth sibling and never got the same treatment. Fix: one module-level helper _maybe_wrap_anthropic() that rewraps a plain OpenAI client in AnthropicAuxiliaryClient when: - api_mode is explicitly 'anthropic_messages', OR - the URL ends in '/anthropic', OR - the host is api.kimi.com + path contains '/coding', OR - the host is api.anthropic.com. Wired into _wrap_if_needed (covers all resolve_provider_client branches that already go through it) and into the Step 2 api_key fallback chain inside _resolve_auto. Explicit api_mode still wins: passing api_mode='chat_completions' forces OpenAI wire, and already- wrapped specialized adapters (Codex, Gemini native, CopilotACP) pass through unchanged. E2E verified: - resolve_provider_client('kimi-coding', 'kimi-for-coding') → AnthropicAuxiliaryClient (was plain OpenAI, which 404'd) - _resolve_auto Step 1 for kimi-coding runtime → AnthropicAuxiliaryClient - resolve_provider_client('openrouter', ...) → plain OpenAI (no regression) - api_mode='chat_completions' override → plain OpenAI (explicit wins) Tests: - tests/agent/test_auxiliary_transport_autodetect.py (new): 21 tests covering URL detection, wrap decisions, and integration. - 204/205 existing auxiliary tests pass (1 pre-existing failure on main, unrelated to this change). Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 06:50:14 -07:00
Teknium	6085d7a93e	chore: remove unused imports and dead locals (ruff F401, F841) (#17010 ) Mechanical cleanup across 43 files — removes 46 unused imports (F401) and 14 unused local variables (F841) detected by `ruff check --select F401,F841`. Net: -49 lines. Also fixes a latent NameError in rl_cli.py where `get_hermes_home()` was called at module line 32 before its import at line 65 — the module never imported successfully on main. The ruff audit surfaced this because it correctly saw the symbol as imported-but-unused (the call happened before the import ran); the fix moves the import to the top of the file alongside other stdlib imports. One `# noqa: F401` kept in hermes_cli/status.py for `subprocess`: tests monkeypatch `hermes_cli.status.subprocess` as a regression guard that systemctl isn't called on Termux, so the name must exist at module scope even though the module body doesn't reference it. Docstring explains the reason. Also fixes an invalid `# noqa:` directive in gateway/platforms/discord.py:308 that lacked a rule code. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 06:46:45 -07:00
Teknium	391f1ca1f4	feat(aux): translate extra_body.reasoning into Codex Responses API (#17004 ) Auxiliary callers that configure reasoning via auxiliary.<task>.extra_body.reasoning were having that config silently dropped by the Codex Responses adapter — it only forwarded messages/model/tools through to responses.stream(), never translating chat.completions-shaped reasoning hints into the Responses API's top-level reasoning + include fields. Mirror the main-agent translation from agent/transports/codex.py: - extra_body.reasoning.effort → resp_kwargs.reasoning.{effort, summary:"auto"} - 'minimal' → 'low' clamp (Codex backend rejects 'minimal') - Always include ['reasoning.encrypted_content'] when reasoning is enabled - {'enabled': False} → omit reasoning and include entirely - Non-dict reasoning values are ignored defensively Reported by @OP (Apr 26 feedback bundle). ## Changes - agent/auxiliary_client.py: _CodexCompletionsAdapter.create() now reads and translates extra_body.reasoning before calling responses.stream() - tests/agent/test_auxiliary_client.py: 9 new tests covering all effort levels, the minimal→low clamp, the disabled path, the no-op paths, and defensive handling of wrong-shape inputs Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 05:47:42 -07:00
Teknium	06164a7b28	fix(codex): resync pool entry from auth.json after reauth (#17001 ) When openai-codex tokens expire or the ChatGPT account hits a 429 window, the pool entry gets marked STATUS_EXHAUSTED with last_error_reset_at many hours in the future. If the user then runs `hermes model` / `hermes auth openai-codex` to reauth, fresh tokens land in ~/.hermes/auth.json but the pool entry stayed frozen behind its reset_at — every request kept failing with 'credential pool: no available entries (all exhausted or empty)' until the original window elapsed. _available_entries() already had auth.json/credentials-file resync branches for anthropic/claude_code and nous/device_code; openai-codex was missing. Added _sync_codex_entry_from_auth_store() mirroring the nous version (reads state["tokens"][{access,refresh}_token] + state["last_refresh"]) and wired it into the exhausted-entry resync loop. Also softens the 'codex CLI not found' doctor warning — native device-code OAuth does not require the Codex binary, only importing existing Codex CLI tokens does. Downgraded to an info line. Reported on Discord by p1aceho1der: Codex stalled indefinitely after a rate-limit reset, reauth didn't help, and doctor falsely warned that the codex CLI was required. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 05:43:09 -07:00
teknium1	529eb29b6a	fix(gemini): clamp Flash thinkingLevel to documented low/medium/high set Gemini 3 Flash documents low/medium/high as the accepted thinkingLevel values. The salvaged bridge was forwarding Hermes' "minimal" effort to Flash verbatim, which is not a documented Gemini level and risks a 400 from the native adapter. Clamp minimal->low on Flash (matching how Pro already clamps minimal+low down), and funnel anything outside {low, medium, high} into medium to keep the request valid by construction. No behaviour change for the documented effort levels.	2026-04-28 05:38:23 -07:00
Nanako0129	dbbe2d1973	fix(gemini): bridge reasoning_config into thinking_config for chat-completions routes	2026-04-28 05:38:23 -07:00
teknium1	315a11a76f	chore(prompt): tell telegram models to prefer bullets over tables Telegram has no native table syntax. The gateway auto-rewrites pipe tables into row-group bullets (see previous commit), but letting models know up front means they emit the clean form directly instead of relying on post-processing to synthesize headings. Also helps users whose MEMORY.md formatting policies were being overridden — the platform hint now carries the guidance.	2026-04-28 05:37:50 -07:00
Teknium	b61d9b297a	refactor: consolidate symlink-safe atomic replace into shared helper Extract the islink/realpath guard from the 16743 fix into a single atomic_replace() helper in utils.py, then migrate every os.replace() call site in the codebase to use it. The original PR #16777 correctly identified and fixed the bug, but only patched 9 of ~24 call sites. The same bug class (managed deployments that symlink state files silently losing the link on every write) still existed at auth.json, sessions file, gateway config, env_loader, webhook subscriptions, debug store, model catalog, pairing, google OAuth, nous rate guard, and more. Rather than add another 10+ copies of the same three-line guard, consolidate into atomic_replace(tmp, target) which: - resolves symlinks via os.path.realpath before os.replace - returns the resolved real path so callers can re-apply permissions - is a drop-in replacement for os.replace at the use sites Changes: - utils.py: new atomic_replace() helper + atomic_json_write / atomic_yaml_write now call it instead of inlining the guard - 16 files: all os.replace() call sites migrated to atomic_replace() - agent/{google_oauth, nous_rate_guard, shell_hooks}.py - cron/jobs.py - gateway/{pairing, session, platforms/telegram}.py - hermes_cli/{auth, config, debug, env_loader, model_catalog, webhook}.py - tools/{memory_tool, skill_manager_tool, skills_sync}.py Tests: tests/test_atomic_replace_symlinks.py pins the invariant for atomic_replace + atomic_json_write + atomic_yaml_write, covers plain files, first-time creates, broken symlinks, and permission preservation. Refs #16743 Builds on #16777 by @vominh1919.	2026-04-28 04:58:22 -07:00
vominh1919	3ab97a32d1	fix: preserve symlinks during atomic file writes (#16743 ) os.replace(tmp, path) replaces the symlink itself with a regular file, breaking users who symlink config.yaml, SOUL.md, or .env from ~/.hermes/ to a dotfiles repo or managed profile package. Fix: resolve symlinks via os.path.realpath() before os.replace(), so the real file is overwritten in-place while the symlink survives. Fixed in 7 files covering all os.replace call sites: - utils.py (atomic_json_write, atomic_yaml_write — fixes save_config) - hermes_cli/config.py (env sanitizer, save_env_value, remove_env_value) - tools/skill_manager_tool.py (_atomic_write_text — SOUL.md writes) - tools/memory_tool.py (memory file writes) - tools/skills_sync.py (manifest writes) - cron/jobs.py (job state + output file writes) - agent/shell_hooks.py (hook file writes) Fixes NousResearch/hermes-agent#16743	2026-04-28 04:58:22 -07:00
阿泥豆	4aa0a7c195	fix(error-classifier): add insufficient balance to billing patterns DeepSeek API returns HTTP 400 with 'Insufficient Balance' message when account funds are depleted. This pattern was not in _BILLING_PATTERNS, causing the error to be misclassified instead of triggering billing exhaustion handling (e.g., fallback to alternate provider). Suggested by teknium1 in PR review of #15586.	2026-04-28 04:58:09 -07:00
Teknium	0f473d643d	refactor(schema): consolidate nullable-union stripping in schema_sanitizer Adds tools.schema_sanitizer.strip_nullable_unions as the single implementation for collapsing anyOf/oneOf nullable unions. Both the MCP input-schema normalizer and the Anthropic tool-schema guard now delegate to it instead of re-implementing the same walk three times. The global sanitizer also gains a final pass so any tool that slips past the two earlier hooks (plugin tools, non-MCP custom tools with Pydantic-shaped schemas) still gets safe input_schemas on Anthropic. - tools/schema_sanitizer.py: * New public strip_nullable_unions(schema, keep_nullable_hint=True). * _sanitize_single_tool() calls it as a final pass (hint preserved so coerce_tool_args can still map string "null" to None). - tools/mcp_tool.py: _normalize_mcp_input_schema delegates. - agent/anthropic_adapter.py: _normalize_tool_input_schema delegates with keep_nullable_hint=False (Anthropic does not recognize nullable). No behavioral change for the fix itself; tests (73/73 targeted + E2E across MCP→sanitizer→Anthropic paths) pass.	2026-04-28 04:58:03 -07:00
Pony.Ma	02ae152222	fix(mcp): normalize nullable tool schemas	2026-04-28 04:58:03 -07:00
Ruda Porto Filgueiras	a23f18cc3e	fix(bedrock): add live model discovery and region resolution for non-US regions provider_model_ids("bedrock") fell through to a static _PROVIDER_MODELS table containing only hardcoded us.* model IDs. Users configured for non-US AWS regions (eu-central-1, ap-northeast-1, etc.) saw wrong or no models in /model and autocomplete. Root causes fixed: 1. models.py: provider_model_ids() now calls discover_bedrock_models() keyed by the resolved region before falling back to the static table. A new bedrock_model_ids_or_none() helper in bedrock_adapter.py consolidates the discover -> extract IDs -> fallback pattern used by all three call sites. 2. providers.py: registers bedrock in HERMES_OVERLAYS with transport=bedrock_converse and auth_type=aws_sdk so get_provider("bedrock") and resolve_provider_full("bedrock") work. 3. model_switch.py: list_authenticated_providers() sections 2 and 3 detect AWS credentials via has_aws_credentials() for aws_sdk overlays and use live discovery for the model list. 4. bedrock_adapter.py: resolve_bedrock_region() reads the configured region from botocore.session before falling back to us-east-1, covering users who set their region in ~/.aws/config via a named profile rather than env vars. 5. tui_gateway/server.py: passes provider= to get_model_context_length() so context window lookups work correctly for the Bedrock provider.	2026-04-28 03:53:11 -07:00
Teknium	023f5c74b1	fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957 ) * fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path OAuth requests now identify as Hermes on the wire. Removed: - "You are Claude Code, Anthropic's official CLI for Claude." system prompt prepend - Hermes Agent → Claude Code / Nous Research → Anthropic system-prompt substitutions - mcp_ tool-name prefix on outgoing tool schemas + message history - Matching mcp_ strip on inbound tool_use blocks (strip_tool_prefix path removed from AnthropicTransport.normalize_response, + all 5 call sites in run_agent.py and auxiliary_client.py) - user-agent: claude-cli/<v> (external, cli) and x-app: cli headers on the Messages API client Added: - OAuth path strips context-1m-2025-08-07 — Anthropic rejects OAuth requests carrying it with HTTP 400 'This authentication style is incompatible with the long context beta header.' Kept (auth plumbing, not identity spoofing): - _is_oauth_token classifier and is_oauth flag threading - Bearer vs x-api-key auth routing - _OAUTH_ONLY_BETAS (claude-code-20250219, oauth-2025-04-20) — backend requires these on the OAuth-gated Messages endpoint - _OAUTH_CLIENT_ID (Claude Code's) — Anthropic doesn't issue OAuth creds to third parties; this is the only way the login flow works - claude-cli/<v> User-Agent on the OAuth token exchange + refresh endpoints at platform.claude.com/v1/oauth/token — bare requests get Cloudflare 1010 blocked Verified live against api.anthropic.com with a fresh sk-ant-oat01-* token: - claude-haiku-4-5 simple message: HTTP 200, 'OK' response - claude-haiku-4-5 tool call: HTTP 200, stop_reason=tool_use, tool named 'terminal' (no mcp_ prefix) round-tripped correctly - Outgoing wire: no user-agent, no x-app, real Hermes identity in system prompt, real tool name in schema Closes/supersedes #16820 (mcp_ PascalCase normalization patch — no longer needed since the mcp_ round-trip is gone). * fix(anthropic): resolve_anthropic_token() reads credential pool first Close the gap where ~/.hermes/auth.json → credential_pool.anthropic (where hermes login + dashboard PKCE flow write OAuth tokens) was not in resolve_anthropic_token()'s source list. Before: users who authed via hermes login got the token written into the pool, but legacy fallback code paths (auxiliary_client, models catalog fetch, explicit-runtime path) that call resolve_anthropic_token() saw None and raised 'No Anthropic credentials found' — even though the token was sitting in auth.json. New priority 1: pool.select() with env-sourced entries skipped. Skipping env:* entries preserves the existing env-var priority logic further down the chain (static env OAuth → refreshable Claude Code upgrade via _prefer_refreshable_claude_code_token). Surfaced while writing the hermes-agent-dev skill playbook for 'finding a live OAuth token for an E2E test'. --------- Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 03:51:17 -07:00
simonweng	a6a6cf047d	feat(providers): add tencent-tokenhub provider support Registers tencent-tokenhub (https://tokenhub.tencentmaas.com/v1) as a new API-key provider with model tencent/hy3-preview (256K context). - PROVIDER_REGISTRY entry + TOKENHUB_API_KEY / TOKENHUB_BASE_URL env vars - Aliases: tencent, tokenhub, tencent-cloud, tencentmaas - openai_chat transport with is_tokenhub branch for top-level reasoning_effort (Hy3 is a reasoning model) - tencent/hy3-preview:free added to OpenRouter curated list - 60+ tests (provider registry, aliases, runtime resolution, credentials, model catalog, URL mapping, context length) - Docs: integrations/providers.md, environment-variables.md, model-catalog.json Author: simonweng <simonweng@tencent.com> Salvaged from PR #16860 onto current main (resolved conflicts with #16935 Azure Anthropic env-var hint tests and the --provider choices= list removal in chat_parser).	2026-04-28 03:45:52 -07:00
Teknium	e63364b8df	revert: computer-use cua-driver (PR #16919 ) (#16927 ) Reverts PR #16919 (commits `dad10a78d`, `413ee1a28`, `b4a8031b2`, `afb958829`) which was merged prematurely. Restoring the pre-merge state so #14817 and #15328 can be revisited as standing PRs. Reverted commits: - `afb958829` fix(computer-use): harden image-rejection fallback + AUTHOR_MAP - `b4a8031b2` fix(computer-use): unwrap _multimodal tool results - `413ee1a28` feat(computer-use): background focus-safe backend - `dad10a78d` feat(computer-use): cua-driver backend, universal any-model schema Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 01:57:21 -07:00
teknium1	22ddac4b14	fix(auxiliary): widen URL rewrite + main_runtime to sibling custom branches Follow-up to PR #16819 applying the same treatment to the two sibling fallback sites in resolve_provider_client() that carry the identical bug class as the anonymous-custom branch: - Named custom provider (providers: / custom_providers: config entries): apply _to_openai_base_url() on the OpenAI-wire path (chat_completions / codex_responses), leave custom_base untouched on the anthropic_messages path where the /anthropic surface is intentional. Prefer main_runtime.get('model') over _read_main_model() so the entry model still wins first. The ImportError fallback for anthropic_messages now redoes query-param extraction against the rewritten URL so the final OpenAI client hits /v1. - external_process branch (copilot-acp): same main_runtime.get('model') fallback before _read_main_model() so auxiliary tasks on this provider track live /model switches instead of stale config.yaml. Keeps the fix consistent across all three custom-endpoint fallback sites in resolve_provider_client().	2026-04-28 01:47:25 -07:00
crayfish-ai	f3371c39a4	fix(auxiliary): custom provider URL rewrite + main_runtime model for title gen - auxiliary_client: apply _to_openai_base_url() to custom base_url (fixes /anthropic → /v1 rewrite missing for provider="custom") - auxiliary_client: use main_runtime.get("model") instead of _read_main_model() so auxiliary tasks follow system default model changes - title_generator: thread main_runtime through generate_title → auto_title_session → maybe_auto_title - cli.py / gateway/run.py: pass main_runtime to maybe_auto_title - tests: update mock assertions for new main_runtime parameter	2026-04-28 01:47:25 -07:00
ddupont	413ee1a286	feat(computer-use): background focus-safe backend — set_value, structured windows, MIME detection Extends the cua-driver computer-use backend to drive backgrounded macOS windows without stealing keyboard or mouse focus from the foreground app. All changes target the cua-driver MCP backend and the shared dispatcher. ## cua_backend.py Window-aware capture: capture() now calls list_windows + get_window_state instead of the removed capture tool. Prefers structuredContent.windows (MCP 2024-11-05+ cua-driver) for zero-parse window enumeration; falls back to regex-parsed text for older builds. Stores the selected (pid, window_id) as sticky context so subsequent action calls do not need a redundant round-trip. Action routing: click/scroll/type_text/key all carry the sticky pid (and window_id for element-indexed clicks). type_text routes through type_text_chars (individual key events) rather than AX attribute write -- WebKit AXTextFields reject attribute writes from backgrounded processes. Key parsing: _parse_key_combo splits cmd+s-style strings into (key, [modifiers]) and routes to hotkey (modifier present) or press_key (bare key) -- cua-driver actual tool names. set_value method: new set_value(value, element) calls the cua-driver set_value MCP tool. For AXPopUpButton / HTML select in a backgrounded Safari, AXPress opens the native macOS popup which closes immediately when the app is non-frontmost; set_value AX-presses the matching child option directly (no menu required, no focus steal). focus_app: reimplemented as a pure window-selector (enumerates list_windows, sets sticky pid/window_id) without ever raising the window or stealing focus. list_apps: fixed tool name from listApps to list_apps; handles plain-text response via regex when structured data is absent. Structured-content extraction: _extract_tool_result now surfaces structuredContent from MCP results, enabling the list_windows window array without text parsing. Helpers: _parse_windows_from_text, _parse_elements_from_tree, _split_tree_text, _parse_key_combo extracted as module-level functions. ## schema.py Added set_value to the action enum with a description explaining when to prefer it over click (select/popup elements, sliders, no focus steal). Added value field for set_value payloads. ## tool.py Routed set_value action through _dispatch to backend.set_value. Added set_value to _DESTRUCTIVE_ACTIONS (approval-gated). Fixed MIME-type detection in _capture_response: cua-driver may return JPEG; detect from base64 magic bytes (/9j/ -> image/jpeg, else image/png) rather than hardcoding image/png. ## agent/display.py + run_agent.py Guard _detect_tool_failure and result-preview logic against non-string function_result values: multimodal tool results (dicts with _multimodal=True) are not string-sliceable; treat them as successes and fall back to str() for length/preview.	2026-04-28 01:46:36 -07:00
Teknium	dad10a78d0	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-04-28 01:46:36 -07:00
Teknium	8081425a1c	feat(security): make secret redaction off by default (#16794 ) Flips security.redact_secrets from true to false in DEFAULT_CONFIG, and the HERMES_REDACT_SECRETS env-var fallback in agent/redact.py now requires explicit opt-in ("1"/"true"/"yes"/"on") to enable. New installs and users without a security.redact_secrets key get pass- through tool output. Existing users whose config.yaml explicitly sets redact_secrets: true keep redaction on — the config-yaml -> env-var bridges in hermes_cli/main.py and gateway/run.py still honor their setting. Also updates the inline config comments, website docs, and the hermes-agent skill so /hermes config set security.redact_secrets true is now the documented way to turn it on.	2026-04-27 21:24:08 -07:00
Teknium	a7cdd4133c	fix(bedrock): send context-1m-2025-08-07 beta so Opus 4.6/4.7 get 1M context (#16793 ) On AWS Bedrock (and Azure AI Foundry), Claude Opus 4.6/4.7 and Sonnet 4.6 are capped at 200K context unless the request carries the `context-1m-2025-08-07` beta header. On native Anthropic (api.anthropic.com) 1M went GA so the header is a harmless no-op, but Bedrock/Azure still gate it as beta as of 2026-04. Hermes was advertising 1M in model_metadata.py (`claude-opus-4-7: 1000000`) while silently sending a request without the beta — so Bedrock users saw a 200K ceiling with no error message, and no config knob unblocked it. Claude Code sends this header by default, which is why the same Bedrock credentials worked there. - Add `context-1m-2025-08-07` to `_COMMON_BETAS` (alongside interleaved thinking and fine-grained tool streaming). - Strip it in `_common_betas_for_base_url` for MiniMax bearer-auth endpoints — they host their own models, not Claude, so Anthropic beta headers are irrelevant and could risk rejection. - Attach `_COMMON_BETAS` as `default_headers` on the AnthropicBedrock client. Previously that constructor passed no betas at all, so native Anthropic had the 1M unlock via default_headers but Bedrock didn't. - Fast-mode per-request `extra_headers` already rebuilds from `_common_betas_for_base_url`, so it picks up the 1M beta automatically. Reported by user 'Rodmar' on Discord: Bedrock Opus 4.7 stuck at 200K while same credentials worked in Claude Code.	2026-04-27 20:41:36 -07:00
Teknium	6ea5699e3f	fix(compression): notify users when configured aux model fails even if main-model fallback recovers (#16775 ) A misconfigured auxiliary.compression.model is a user-fixable problem that silent recovery would hide. The previous retry-on-main logic transparently swallowed aux-model failures whenever the fallback succeeded, leaving the user's broken config in place and racking up future failures. Track the aux-model failure on the compressor alongside the existing fallback-placeholder fields: - _last_aux_model_failure_model: str \| None - _last_aux_model_failure_error: str \| None Both are set at the moment the aux model errors (captured before summary_model is cleared for retry), regardless of whether the retry succeeds. Cleared at compress() start and on on_session_reset() so a clean run doesn't leak stale warnings. Surface at three places: - gateway hygiene auto-compress: ℹ note to the platform adapter (thread_id preserved) - gateway /compress command: ℹ line appended to the reply - CLI via _emit_warning: deduped on (model, error) so repeat compactions don't spam Distinct from the existing ⚠️ dropped-turns warning — different severity, different emoji, explicit 'context is intact' reassurance.	2026-04-27 20:08:23 -07:00
Teknium	94b26f3ec9	fix(compression): retry summary on main model for unknown errors before giving up (#16774 ) The existing retry-on-main path in _generate_summary only fires for errors that match the _is_model_not_found heuristic (404/503, 'model_not_found', 'does not exist', 'no available channel'). Other misconfiguration errors — 400s from aggregators, provider-specific 'no route' strings, opaque rejections — fall straight through to the transient-cooldown branch, which drops N turns of context and inserts a static placeholder. Losing context is almost always worse than one extra summary attempt. Add a best-effort retry-on-main for the unknown-error branch, guarded by the same invariants as the existing fast-path retry: only when summary_model differs from main, and only once per compressor (_summary_model_fallen_back). Tests cover: 404 fast-path fallback still works, unknown 400 now falls back, same-model aux skips retry (no infinite loop), and a double-failure (aux + main) stops at 2 calls.	2026-04-27 19:25:57 -07:00
iamagenius00	e7f2204a07	fix(compression): reset _last_summary_error at start of compress() The per-call reset block at the top of compress() cleared _last_summary_dropped_count and _last_summary_fallback_used but not _last_summary_error. Functionally this didn't break the gateway warning path (callers gate on _last_summary_fallback_used first, and _last_summary_error is overwritten on the next failure), but it left the three tracking fields inconsistent — anyone reading _last_summary_error standalone after a successful compress would see a stale value from a previous failed compress. Reset all three together so the per-call contract is uniform.	2026-04-27 19:18:13 -07:00
iamagenius00	5c56805a74	fix(compression): align fallback placeholder wording with gateway warning The fallback placeholder said "N conversation turns were removed" while the gateway warning said "N historical message(s) were removed". Use "messages" in both so users don't wonder if the two counters refer to different things.	2026-04-27 19:18:13 -07:00
iamagenius00	dfdc4276e8	fix(compression): notify gateway users when summary generation fails When auxiliary compression's summary LLM call fails (e.g. model 404, auxiliary model misconfigured), the compressor still drops the selected turns and inserts a static fallback placeholder — the dropped context is unrecoverable. Previously the only signal of this was a WARNING in agent.log. Gateway users (Telegram/Discord/etc.) had no way to know context was lost because the existing _emit_warning path requires a status_callback, and the gateway hygiene path uses a temporary _hyg_agent with quiet_mode=True and no callback wired up. Changes: - ContextCompressor: track _last_summary_fallback_used and _last_summary_dropped_count on each compress() call. Cleared at the start of compress() and on session reset. - gateway/run.py hygiene: after auto-compress, inspect the temp agent's compressor; if fallback was used, send a visible ⚠️ warning to the user via the platform adapter (TG/Discord/etc.) including dropped count and the underlying error. - gateway/run.py /compress: append the same warning to the manual compress reply so users running /compress see the failure too. Acceptance: - Summary success: no user-visible warning (unchanged). - Summary failure on gateway hygiene: user receives a TG/Discord message with dropped count + error + remediation hint. - Summary failure on /compress: warning appended to the command reply. - CLI status_callback / _emit_warning path is untouched. - Test coverage: two new tests verify the tracking fields are set on failure and cleared on subsequent success.	2026-04-27 19:18:13 -07:00
Erosika	49e3a1d8ee	style: trim verbose comment blocks added by previous commit	2026-04-27 12:37:33 -07:00
Erosika	e553f6f3e4	fix(memory): narrow scrub surface to known wrapper boundaries Reviewer pushback on the original boundary-hardening commits — three overreach points pulled plugin-specific policy into shared core paths: 1. gateway/run.py hardcoded a '## Honcho Context' literal split for vision-LLM output. Plugin-format heading in framework code; could truncate legitimate output naturally containing that header. Drop the literal split; keep generic sanitize_context (the wrapper strip is plugin-agnostic). Plugin-specific cleanup belongs at the provider boundary, not the shared gateway path. 2. run_agent.run_conversation scrubbed user_message and persist_user_message before the conversation loop. User text is sacred — if a user types a literal <memory-context> tag we must not silently delete it. The producer (build_memory_context_block) is the only legitimate emitter; user input should never need the reverse op. 3. _build_assistant_message scrubbed model output before persistence. Same hazard: would silently mutate legitimate documentation/code the model emits containing the literal markers. The streaming scrubber catches real leaks delta-by-delta before content is concatenated; persist-time scrub was redundant belt-and-suspenders. 4. _fire_stream_delta stripped leading newlines from every delta unless a paragraph break flag was set. Mid-stream '\n' is legitimate markdown — lists, code fences, paragraph breaks — and chunk boundaries are arbitrary. Narrow lstrip to the very first delta of the stream only (so stale provider preamble still gets cleaned on turn start, but mid-stream formatting survives). Plus: build_memory_context_block now logs a warning when its defensive sanitize_context strips something — surfaces buggy providers returning pre-wrapped text instead of silently double-fencing. Net architectural change: scrub surface collapses from 8 sites to 3 (StreamingContextScrubber on output deltas, plugin→backend send, build_memory_context_block input-validation). Plugin-specific strings stay out of shared runtime paths. User input and persisted assistant output are no longer mutated. Tests: rescoped TestMemoryContextSanitization (helper-correctness only, no source-inspection of removed call sites), updated vision tests to drop '## Honcho Context' literal-split assertions, updated _build_assistant_message persistence test to assert preservation. Added: cross-turn scrubber reset, build_memory_context_block warn-on- violation, mid-stream newline preservation (plain + code fence).	2026-04-27 12:37:33 -07:00
Erosika	5ce5b17a42	fix(honcho): buffer partial memory-context spans across stream deltas sanitize_context() uses a non-greedy block regex that needs both <memory-context> open and close tags present in a single string. When a provider streams the fenced memory block across multiple deltas (typical for recalled-context leaks — the payload often arrives in 10+ 1-80 char chunks), the per-delta sanitize stripped the lone open/close tags via _FENCE_TAG_RE but let the payload in between flow straight to the UI. Adds StreamingContextScrubber: a small stateful scrubber that tracks open/close tag pairs across deltas, holds back partial-tag tails at chunk boundaries, and discards span contents wholesale (including the system-note line that fragments across deltas). Wired into _fire_stream_delta; reset per user turn; benign trailing partial-tag tails are flushed at the end of each model call. Mid-span interruption (provider drops closing tag) drops the orphaned content rather than leaking it — truncated answer > leaked memory. Follow-up to #13672 (@dontcallmejames).	2026-04-27 12:37:33 -07:00
kshitijk4poor	56724147ef	fix(providers/gmi): post-salvage review fixes - config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version is 22, so all users are past version 17 and would never be prompted for GMI_API_KEY on upgrade — consistent with how arcee was added) - auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview, stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview) - test_gmi_provider.py: fix malformed write_text() call in doctor test (was: write_text("GMI_API_KEY=* encoding="utf-8") → missing closing quote, wrote literal string 'GMI_API_KEY=* encoding=' to .env file) - test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions to match new cheaper default - docs/integrations/providers.md: add 'gmi' to inline 'Supported providers' fallback list (was only in the table, not the inline list at line ~1181) - docs/reference/cli-commands.md: add 'gmi' to --provider choices list	2026-04-27 11:17:59 -07:00
Isaac Huang	c53fcb0173	feat(providers): add GMI Cloud as a first-class API-key provider (#11955 ) Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider with built-in auth, aliases, model catalog, CLI entry points, auxiliary client routing, context length resolution, doctor checks, env var tracking, and docs. - auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL) - providers.py: HermesOverlay with extra_env_vars for models.dev detection - models.py: curated slash-form model catalog; live /v1/models fetch - main.py: 'gmi' in _named_custom_provider_map and --provider choices - model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated context-length probe block (GMI's /models has authoritative data) - auxiliary_client.py: alias entries; _compat_model fix for slash-form models on cached aggregator-style clients; gmi aux default model - doctor.py: GMI in provider connectivity checks - config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS - conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix) - docs: providers.md, environment-variables.md, fallback-providers.md, configuration.md, quickstart.md (expands provider table) Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local>	2026-04-27 11:17:59 -07:00

1 2 3 4 5 ...

772 Commits