hermes-agent-features

Author	SHA1	Message	Date
Teknium	c73594fe41	fix(skills): rescan skill_commands cache when platform scope changes (#18739 ) The process-global `_skill_commands` dict in agent/skill_commands.py was seeded by whichever platform scanned first, and `get_skill_commands()` only rescanned when the cache was empty. In a long-lived gateway process serving multiple platforms (Telegram + Discord + Slack), the first platform's `skills.platform_disabled` view was silently inherited by the others — so a skill disabled for Telegram would also disappear from Discord's slash menu, and vice versa. Track the platform scope the cache was populated for (`_skill_commands_platform`) and rescan in `get_skill_commands()` when the currently-active platform no longer matches. Platform resolution uses the same precedence as `_is_skill_disabled`: `HERMES_PLATFORM` env var then `HERMES_SESSION_PLATFORM` from the gateway session context. Fixes #14536 Salvages #14570 by LeonSGP43. Co-authored-by: LeonSGP <leon@sgp43.com>	2026-05-02 01:36:53 -07:00
Teknium	97acd66b4c	fix(curator): authoritative absorbed_into on delete + restore cron skill links on rollback (#18671 ) (#18731 ) * fix(curator): authoritative absorbed_into declarations on skill delete Closes #18671. The classification pipeline that feeds cron-ref rewriting used to infer consolidation vs pruning from two brittle signals: the curator model's post-hoc YAML summary block, and a substring heuristic scanning other tool calls for the removed skill's name. Both miss in real consolidations — the model forgets the YAML under reasoning pressure, and the heuristic misses when the umbrella's patch content describes the absorbed behavior abstractly instead of naming the old slug. When both miss, the skill falls through to 'no-evidence fallback' pruned, and #18253's cron rewriter drops the cron ref entirely instead of mapping it to the umbrella. Same observable symptom as pre-#18253: 'Skill(s) not found and skipped' at the next cron run. The fix makes the model declare intent at the moment of deletion. skill_manage(action='delete') now accepts absorbed_into: - absorbed_into='<umbrella>' -> consolidated, target must exist on disk - absorbed_into='' -> explicit prune, no forwarding target - missing -> legacy path, falls through to heuristic/YAML The curator reconciler reads these declarations off llm_meta.tool_calls BEFORE either the YAML block or the substring heuristic. Declaration wins. Fallback logic stays intact for backward compat with any caller (human or older curator conversation) that doesn't populate the arg. Changes - tools/skill_manager_tool.py: add absorbed_into param to skill_manage + _delete_skill. Validate target exists when non-empty. Reject absorbed_into=<self>. Wire through dispatcher + registry + schema. - agent/curator.py: new _extract_absorbed_into_declarations() walks tool calls for skill_manage(delete) with the arg. _reconcile_classification accepts absorbed_declarations= and treats them as authoritative. Curator prompt updated to require the arg on every delete. - Tests: 7 new skill_manager tests covering the tool contract (valid target, empty string, nonexistent target, self-reference, whitespace, backward compat, dispatcher plumbing). 11 new curator tests covering the extractor + authoritative reconciler path + mixed-legacy-and- declared runs. Validation - 307/307 targeted tests pass (curator + cron + skill_manager suites). - E2E #18671 repro: 3 narrow skills, 1 umbrella, cron job referencing all 3. Model emits NO YAML block. Heuristic misses (patch prose doesn't name old slugs). Delete calls carry absorbed_into. Result: both PR skills correctly classified 'consolidated' + cron rewritten ['pr-review-format', 'pr-review-checklist', 'stale-junk'] -> ['hermes-agent-dev']; stale-junk pruned via absorbed_into=''. - E2E backward-compat: delete without absorbed_into, model emits YAML -> routed via existing 'model' source, cron still rewritten correctly. * feat(curator): capture + restore cron skill links across snapshot/rollback Before this, rolling back a curator run restored the skills tree but cron jobs still pointed at the umbrella skills the curator had rewritten them to. The user would see their old narrow skills back on disk but their cron jobs still configured with the merged umbrella — not actually 'back to how it was'. Snapshot side: snapshot_skills() now captures ~/.hermes/cron/jobs.json alongside the skills tarball, as cron-jobs.json. The manifest gets a new 'cron_jobs' block with {backed_up, jobs_count} so rollback (and the CLI confirm dialog) can surface what's in the snapshot. If jobs.json is missing/unreadable/malformed, snapshot proceeds without cron data — the skills backup is the core guarantee; cron is additive. Rollback side: after the skills extract succeeds, the new _restore_cron_skill_links() reconciles the backed-up jobs into the live jobs.json SURGICALLY. Only 'skills' and 'skill' fields are restored, and only on jobs matched by id. Everything else about a cron job — schedule, last_run_at, next_run_at, enabled, prompt, workdir, hooks — is live state the user or scheduler has modified since the snapshot; overwriting it would regress unrelated activity. Reconciliation rules: - Job in backup AND live, skills differ → skills restored. - Job in backup AND live, skills match → no-op. - Job in backup, NOT in live → skipped (user deleted it after snapshot; their choice is later than the snapshot). - Job in live, NOT in backup → untouched (user created it after snapshot). - Snapshot missing cron-jobs.json at all → rollback still succeeds, reports 'not captured' (older pre-feature snapshots keep working). Writes go through cron.jobs.save_jobs under the same _jobs_file_lock the scheduler uses, so rollback doesn't race tick(). Also: - hermes_cli/curator.py: rollback confirm dialog now shows 'cron jobs: N (will be restored for skill-link fields only)' when the snapshot has cron data, or 'not in snapshot (<reason>)' otherwise. - rollback()'s message string includes a 'cron links: ...' clause summarizing the reconciliation outcome. Tests - 9 new cases: snapshot-with-cron, snapshot-without-cron, malformed-json captured-as-raw, full rollback-restores-skills-and-cron, rollback touches only skill fields, rollback skips user-deleted jobs, rollback leaves user-created jobs untouched, rollback still works with pre-feature snapshot that has no cron-jobs.json, standalone unit test on _restore_cron_skill_links exercising the full report shape. Validation - 484/484 targeted tests pass (curator + cron + skill_manager suites). - E2E: real snapshot_skills, real cron rewrite, real rollback. Before: ['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage']. After curator: ['hermes-agent-dev']. After rollback: ['pr-review-format', 'pr-review-checklist', 'pr-triage-salvage']. Non-skill fields (id, name, prompt) preserved across the round trip.	2026-05-02 01:29:57 -07:00
Teknium	77c0bc6b13	fix(curator): defer first run and add --dry-run preview (#18373 ) (#18389 ) * fix(curator): defer first run and add --dry-run preview (#18373) Curator was meant to run 7 days after install, not on the very first gateway tick. On a fresh install (no .curator_state), should_run_now() returned True immediately because last_run_at was None — so the gateway cron ticker fired Curator against a fresh skill library moments after 'hermes update'. Combined with the binary 'agent-created' provenance model (anything not bundled and not hub-installed), this consolidated hand-authored user workflow skills without consent. Changes: - should_run_now(): first observation seeds last_run_at='now' and returns False. The next real pass fires one full interval_hours later (7 days by default), matching the original design intent. - hermes curator run --dry-run: produces the same review report without applying automatic transitions OR permitting the LLM to call skill_manage / terminal mv. A DRY-RUN banner is prepended to the prompt and the caller skips apply_automatic_transitions. State is NOT advanced so a preview doesn't defer the next scheduled real pass. - hermes update: prints a one-liner on fresh installs pointing at --dry-run, pause, and the docs. Silent on steady state. - Docs: curator.md and cli-commands.md explain the deferred first-run behavior and warn that hand-written SKILL.md files share the 'agent-created' bucket, with guidance to pin or preview before the first pass. Tests: - test_first_run_defers replaces the old 'first run always eligible' assertion — same fixture, inverted expectation. - test_maybe_run_curator_defers_on_fresh_install covers the gateway tick path end-to-end. - Three new dry-run tests cover state-advance suppression, prompt banner injection, and apply_automatic_transitions skipping. Fixes #18373. * feat(curator): pre-run backup + rollback (#18373) Every real curator pass now snapshots ~/.hermes/skills/ into ~/.hermes/skills/.curator_backups/<utc-iso>/skills.tar.gz before calling apply_automatic_transitions or the LLM review. If a run consolidates or archives something the user didn't want touched, 'hermes curator rollback' restores the tree in one command. Dry-run is skipped — no mutation means no snapshot needed. Changes: - agent/curator_backup.py (new): tar.gz snapshot + safe rollback. The snapshot excludes .curator_backups/ (would recurse) and .hub/ (managed by the skills hub). Extract refuses absolute paths and .. components, and uses tarfile's filter='data' on Python 3.12+. Rollback takes a pre-rollback safety snapshot FIRST, stages the current tree into .rollback-staging-<ts>/ so the extract lands in an empty dir, and cleans the staging dir on success. A failed extract restores the staged contents. - agent/curator.py: run_curator_review() calls curator_backup. snapshot_skills(reason='pre-curator-run') before apply_automatic_ transitions. Best-effort — a failed snapshot logs at debug and the run continues (a transient disk issue shouldn't silently disable curator forever). - hermes_cli/curator.py: new 'hermes curator backup' and 'hermes curator rollback' subcommands. rollback supports --list, --id <ts>, -y. - hermes_cli/config.py: curator.backup.{enabled, keep} config block with sane defaults (enabled=true, keep=5). - Docs: curator.md gets a 'Backups and rollback' section; cli-commands .md table gets the new rows. Tests (new file tests/agent/test_curator_backup.py, 16 cases): - snapshot creates tarball + manifest with correct counts - snapshot excludes .curator_backups/ (recursion guard) and .hub/ - snapshot disabled via config returns None without creating anything - snapshot uniquifies ids within the same second (-01 suffix) - prune honors keep count, newest-first - list_backups + _resolve_backup cover newest-default and unknown-id - rollback restores a deleted skill with content intact - rollback is itself undoable — safety snapshot shows up in list_backups - rollback with no snapshots returns an error - rollback refuses tarballs with absolute paths or .. components - real curator runs take a 'pre-curator-run' snapshot; dry-runs do not All curator tests: 210 passing locally.	2026-05-01 09:49:59 -07:00
teknium1	2af8b8ff37	fix(moonshot): also strip nullable/enum after anyOf collapse The anyOf collapse in _repair_schema returned early, skipping the nullable-strip and enum-cleanup steps. When a schema had anyOf [{enum: [..., null, '']}, {type: null}] alongside a parent-level 'nullable: true', collapsing to the single non-null branch produced a merged node that still had both 'nullable' and the bad enum values — Moonshot would still 400 on it. Fix: fall through to Rules 1/3 when the collapse produces a single merged node; only return early for the multi-branch case (pure anyOf preservation) or when there was no null branch to remove. Adds a test that locks in the combined-case expectation.	2026-04-30 23:14:31 -07:00
Hendrix	9ca72a69a7	fix(moonshot): fill missing type before enum cleanup to handle anyOf branches without explicit type When a schema node inside anyOf has enum values but no explicit 'type', Rule 3 (enum cleanup) ran before _fill_missing_type, so node_type was None and the enum was never cleaned. Moonshot then rejected the schema with 'enum value (<nil>) does not match any type in [string]'. Fix: reorder operations — fill missing type first, strip nullable, then clean enum. This ensures enum cleanup always has a type to check. Also fixes test expectation: empty string in enum is now correctly stripped (Moonshot rejects it too). Closes #16875	2026-04-30 23:14:31 -07:00
Teknium	e2eb561e8e	fix(curator): rewrite cron job skill refs after consolidation (#18253 ) When the curator consolidates skill X into umbrella Y, any cron job that listed X in its skills field would fail to load X at run time — the scheduler logs a warning and skips it, so the scheduled job runs without the instructions it was scheduled to follow. cron.jobs.rewrite_skill_refs(consolidated, pruned) now updates jobs in-place: consolidated names route to the umbrella target (dedup when umbrella is already present), pruned names are dropped. agent.curator._write_run_report calls it after classification, best-effort so a cron-side failure never breaks the curator itself. Results are recorded in run.json (counts.cron_jobs_rewritten + full cron_rewrites payload), a separate cron_rewrites.json for convenience when jobs were touched, and a section in REPORT.md. Reported by @tombielecki.	2026-04-30 23:04:50 -07:00
Mind-Dragon	0704589ceb	fix(agent): make tool loop guardrails warning-first	2026-04-30 20:43:15 -07:00
Mind-Dragon	58b89965c8	fix(agent): add tool-call loop guardrails	2026-04-30 20:43:15 -07:00
sprmn24	adaee2c72c	test(skill_utils): add regression tests for non-dict metadata in extract_skill_conditions The fix for this bug (isinstance guard) was merged via commit `3ff9e010`, but test coverage was not included. Adding 4 tests: - dict metadata with hermes keys (normal case) - string metadata (bug case — previously caused AttributeError) - None metadata - missing metadata key	2026-04-30 20:37:15 -07:00
Teknium	0ddc8aba68	fix(fallback): let custom_providers shadow built-in aliases When a user defines `custom_providers: [{name: kimi, ...}]` and references `provider: kimi` from fallback_model or the main config, the built-in alias rewriting (`kimi` → `kimi-coding`) was hijacking the request before the named-custom lookup ran. `_get_named_custom_provider` also refused to return a match when the raw name resolved to any built-in (including aliases), so the custom endpoint was unreachable. Fix at both layers of the resolution chain so every caller benefits, not just `_try_activate_fallback`: - hermes_cli/runtime_provider.py: narrow `_get_named_custom_provider`'s built-in-wins guard to canonical provider names only. An alias like `kimi` that resolves to a different canonical (`kimi-coding`) no longer blocks the custom lookup; a canonical name like `nous` still does. - agent/auxiliary_client.py: in `resolve_provider_client`, try the named- custom lookup with the original (pre-alias-normalization) name before the alias-normalized one, so aliased requests reach the user's custom entry. Also honour `explicit_base_url` and `explicit_api_key` in the API-key provider branch so callers that pass explicit hints (e.g. fallback activation) can override the registered defaults. Tests added for: - custom `kimi` shadowing built-in alias (regression for #15743) - custom `nous` NOT shadowing canonical built-in (behaviour preserved) - bare `kimi` without any custom entry still routing to built-in - explicit base_url/api_key override on the API-key provider branch Original PR #17827 by @Feranmi10 identified the same bug class and implemented a narrower fix in `_try_activate_fallback`; this reshapes the fix to live in the shared resolution layer so all callers benefit. Fixes #15743 Co-authored-by: Feranmi10 <89228157+Feranmi10@users.noreply.github.com>	2026-04-30 20:18:44 -07:00
Teknium	8b7b074df9	test(context_compressor): regression test for PR #17025 tail-protection off-by-one When len(messages) <= protect_tail_count and a token budget is set, the previous formula min(protect_tail_count, len(result) - 1) under-protected the tail by one, allowing the oldest message to be summarized. The test fails on the buggy formula (pruned == 1) and passes on the fix (pruned == 0, tool content preserved verbatim).	2026-04-30 20:00:01 -07:00
Stephen Schoettler	b29b709a71	fix(agent): sanitize Codex tool-call history summaries	2026-04-30 19:58:46 -07:00
Yukipukii1	75483b6db1	fix(curator): preserve last_report_path in state	2026-04-30 19:45:59 -07:00
lsdsjy	b9b9ee3e6c	fix(deepseek): preserve v4 reasoning_content on replay	2026-04-30 11:18:39 -07:00
y0shualee	f4b76fa272	fix: use skill activity in curator status Treat skill views and edits as activity when curator reports and applies lifecycle transitions, so recently loaded or patched skills are not displayed or transitioned as never used.\n\nAdds regression tests for activity derivation, automatic transitions, and CLI status output.	2026-04-30 10:31:47 -07:00
Teknium	8b290a5908	feat(curator): split archived into consolidated vs pruned with model + heuristic classification (#17941 ) * fix(curator): split 'archived' into consolidated vs pruned in run reports Users who watched a curator run saw skills like 'anthropic-api' listed under 'Skills archived' and interpreted that as pruning — but the curator had actually absorbed those skills into a new umbrella (e.g. 'llm-providers') during the same run. The directory gets archived for safety (all removals are recoverable), but the content still lives under a different name. Users then 'restored' what they thought were deleted skills and ended up with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella). Classify removed skills using this run's skill_manage tool calls: - consolidated: content absorbed into a surviving/newly-created skill (evidenced by a skill_manage write_file/patch/create/edit whose target is a different skill AND whose file_path/content references the removed skill's name) - pruned: archived without consolidation evidence (truly stale) REPORT.md now shows two distinct sections: - 'Consolidated into umbrella skills' — with `removed → merged into umbrella` - 'Pruned — archived for staleness' — pure staleness archives run.json schema additions (backward compatible): - counts.consolidated_this_run, counts.pruned_this_run - consolidated: [{name, into, evidence}, ...] - pruned: [names] - archived: retained as the union for backward compat Also: relabel the auto-transitions 'archived' counter to 'archived (no LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass archives. Tests: 9 new tests in test_curator_classification.py covering consolidation evidence parsing (write_file/patch/create), hyphen/underscore name variants, self-reference rejection, destination-must-exist, mixed runs, and malformed-JSON fallback safety. Existing test_report_md_is_human_readable updated to cover the new section names. E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified end-to-end. * feat(curator): hybrid model-declared + heuristic classification Extend the consolidated-vs-pruned split with LLM-authored intent: 1. Curator prompt now requires a structured YAML block at the end of the final response (consolidations / prunings with short rationale). 2. _parse_structured_summary() extracts it tolerantly — missing block, malformed YAML, partial lists all fall back to heuristic cleanly. 3. _reconcile_classification() merges model intent with the tool-call heuristic: - Model wins on rationale when its umbrella exists post-run - Model hallucination (umbrella doesn't exist) is downgraded to the heuristic's finding, or pruned if there's no evidence either - Heuristic catches model omission — consolidations the model enumerated tools for but forgot to list get surfaced with a '(detected via tool-call audit)' tag 4. REPORT.md now shows per-row rationale alongside 'removed → umbrella' and flags audit-only rows so the user knows why no reason is shown. Backward compat: run.json's 'archived' field (union) is preserved. 'pruned' is now a list of dicts with {name, source, reason}; 'pruned_names' is the flat-name list for legacy consumers. Tests: 15 new covering YAML parse edge cases (malformed, empty lists, bare-string entries, missing fields), reconciler rules (model wins, hallucination fallback, heuristic catches omission, prune with reason), and an end-to-end report-render test with all four paths exercised.	2026-04-30 10:31:23 -07:00
briandevans	cc5b9fb581	fix(transport): omit thinking_config for Gemma on the gemini provider (#17426 ) The `gemini` provider also serves Gemma (e.g. `gemma-4-31b-it`) and historically other Google models like PaLM. Those reject `extra_body.thinking_config` with HTTP 400: Unknown name "thinking_config": Cannot find field `_build_gemini_thinking_config()` was unconditionally producing a config dict for any model on the `gemini` / `google-gemini-cli` provider, which `ChatCompletionsTransport.build_kwargs` then dropped into `extra_body["thinking_config"]`. The result: every chat turn for Gemma users on the gemini provider blew up at the API edge. The fix is the same shape Hermes already uses for the Gemini-2.5 vs Gemini-3 family clamping: normalise the model id, strip an `OpenRouter`-style `google/` prefix, and short-circuit early when the result doesn't start with `gemini`. We return `None` rather than `{"includeThoughts": False}`, because the API rejects the field name itself — even the polite "off" form trips the same 400. Three regression tests cover Gemma with reasoning enabled, Gemma with reasoning disabled, and the `google/gemma-…` OpenRouter-style id; the existing Gemini-2.5 / Gemini-3 / `google/gemini-…` cases keep passing because the Gemini guard fires after the prefix strip. Fixes #17426 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 04:29:04 -07:00
Teknium	0da968e521	fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868 ) Voscko reported curator.auxiliary.provider/model was advertised in the docs but ignored — the review fork read only model.provider/default. The narrow fix would wire the one-off key through, but that leaves curator as a parallel system: not in `hermes model` → auxiliary picker, not in the dashboard Models tab, missing per-task base_url/api_key/timeout/ extra_body. Unify curator with the rest of the aux task system so `hermes model` and the dashboard configure it like every other aux task. Four sources of truth updated: - hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary (timeout=600 since reviews run long), drop the one-off curator.auxiliary block from DEFAULT_CONFIG.curator. - hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass') to _AUX_TASKS so the CLI picker offers it. - hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the dashboard REST endpoint accepts it. - web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard Models tab renders the task. agent/curator.py _resolve_review_model() now reads auxiliary.curator first (canonical), falls back to legacy curator.auxiliary (with an info log asking users to migrate), then falls back to the main chat model. Pre-unification users keep working. Docs updated: docs/user-guide/features/curator.md now points at `hermes model` → auxiliary → Curator and the dashboard Models tab. Tests: 6 unit tests on _resolve_review_model (auto default, canonical slot honored, partial override fallback, legacy fallback with deprecation log assertion, new-wins-over-legacy, empty-config safety) plus a cross-registry test that curator is wired into all four sources of truth. test_aux_tasks_keys_all_exist_in_default_config already covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant. Reported by Voscko on Discord.	2026-04-30 02:46:01 -07:00
Teknium	ce0c3ae493	fix(aux): remove hardcoded Codex fallback model, drop Codex from auto chain (#17765 ) The _CODEX_AUX_MODEL constant had already rotated twice in 6 weeks (gpt-5.3-codex -> gpt-5.2-codex -> now broken again at gpt-5.2-codex) because ChatGPT-account Codex gates which models it accepts via an undocumented, shifting allow-list that OpenAI publishes no changelog for. Any pinned default will keep going stale. Issue #17533 reports the current breakage: every ChatGPT-account auxiliary fallback fails with HTTP 400 "model is not supported" and the 60s pause loop degrades long sessions. Rather than reset the clock with another stale pin (PR #17544 proposes gpt-5.2-codex -> gpt-5.4), remove the hardcoded second-order Codex fallback entirely: - Delete `_CODEX_AUX_MODEL`. - Drop `_try_codex` from `_get_provider_chain()` (the auto chain now ends at api-key providers; 4 rungs instead of 5). - Rename `_try_codex() -> _build_codex_client(model)` and require an explicit model from the caller. No more guessing. - `resolve_provider_client("openai-codex", model=None)` now warns and returns (None, None) instead of silently guessing a stale model ID. - Remove `_try_codex` from the `provider="custom"` fallback ladder (same stale-constant trap). - `_resolve_strict_vision_backend("openai-codex")` routes through `resolve_provider_client` so the caller's explicit model is honored. Codex-main users are unaffected: Step 1 of `_resolve_auto` already uses `main_provider` + `main_model` directly and passes the user's configured Codex model through `resolve_provider_client`, which never touched `_CODEX_AUX_MODEL`. Per-task overrides (`auxiliary.<task>.provider/model`) continue to work and are the supported way to route specific aux tasks through Codex. Users whose main provider fails with a payment/connection error and who have ONLY ChatGPT-account Codex auth will now see the 60s pause without a stale-model-rejection noise line in between -- same outcome, cleaner failure. Closes #17533. Supersedes #17544 (which resets the clock on the same stale-constant problem).	2026-04-29 23:23:50 -07:00
Stephen Schoettler	f73364b1c4	fix(ci): stabilize main test suite regressions (#17660 ) * fix: stabilize main test suite regressions * test(agent): update MiniMax normalization expectation * test: stabilize remaining CI assertions * test: harden config helper monkeypatching * test: harden CI-only assertions * fix(agent): propagate fast streaming interrupts	2026-04-29 23:18:55 -07:00
Teknium	828d3a320b	fix(anthropic): reactive recovery for OAuth 1M-context beta rejection (#17752 ) Keep context-1m-2025-08-07 in OAuth requests by default so 1M-capable subscriptions retain full context. When Anthropic rejects a request with 400 'long context beta is not yet available for this subscription', disable the beta for the rest of the session, rebuild the client, and retry once. Addresses #17680 (thanks @JayGwod for the clean reproduction) without forcing every OAuth user off the 1M context window. Changes: - agent/error_classifier.py: new FailoverReason.oauth_long_context_beta_forbidden; pattern matches 400 + 'long context beta' + 'not yet available'. Narrow enough that the existing 429 tier-gate pattern keeps its own reason. - agent/anthropic_adapter.py: _common_betas_for_base_url, build_anthropic_client, build_anthropic_kwargs gain drop_context_1m_beta kwarg. Default=False (1M stays). OAuth OAUTH_ONLY_BETAS unchanged. - agent/transports/anthropic.py: build_kwargs forwards the flag. - run_agent.py: self._oauth_1m_beta_disabled flag, retry-once guard, recovery branch next to the image-shrink path. _rebuild_anthropic_client honors the flag. The main build_kwargs call site threads it through for fast-mode extra_headers. - hermes_cli/doctor.py, hermes_cli/models.py: sibling OAuth /v1/models probes get the same reactive retry — previously they'd falsely report the Anthropic API as unreachable for affected subscriptions. Tests: 2190 tests/agent/ + 94 adjacent integration tests pass. New unit tests cover the classifier pattern (including the collision guard against the 429 tier-gate) and the drop_context_1m_beta adapter behavior (default keeps 1M, flag strips only 1M while preserving every other beta).	2026-04-29 21:56:54 -07:00
teknium1	dd2d1ba5e6	refactor(reload-skills): queue note for next turn, drop cache invalidation + agent tool Salvage-follow-up to @shannonsands's /reload-skills PR. Trims the feature to match the design: user-initiated rescan, no prompt-cache reset, no new schema surface, no phantom user turn, and the next-turn note carries each added/removed skill's 60-char description (not just its name). Changes vs the original PR: * Drop the in-process skills prompt-cache clear in reload_skills(). Skills are invoked at runtime via /skill-name, skills_list, or skill_view — they don't need to live in the system prompt for the model to use them. Keeping the cache intact preserves prefix caching across the reload so /reload-skills pays no cache-reset cost. (MCP has to break the cache because tool schemas must be known at conversation start; skills do not.) * Drop the skills_reload agent tool and SKILLS_RELOAD_SCHEMA from tools/skills_tool.py, plus the four skills_reload enumerations in toolsets.py. No new schema surface — agents can already see a freshly- installed skill via skill_view / skills_list the moment it's on disk. * Replace the phantom 'role: user' turn injection with a one-shot queued note. CLI uses self._pending_skills_reload_note (same pattern as _pending_model_switch_note, prepended to the next API call and cleared). Gateway uses self._pending_skills_reload_notes[session_key]. The note is prepended to the NEXT real user message in this session, so message alternation stays intact and nothing out-of-band is persisted to the transcript. * reload_skills() now returns added/removed as [{'name': str, 'description': str}, ...] (description truncated to 60 chars — matches the curator / gateway adapter budget). The injected next-turn note formats each entry as 'name — description' so the model can actually reason about which new skills to call without running skills_list first. * Only emit the note when the diff is non-empty. On empty diff, print 'No new skills detected' and do nothing else. * Tests rewritten to cover the queue semantics, the description payload, and a regression guard that the prompt-cache snapshot is preserved.	2026-04-29 21:07:47 -07:00
Shannon Sands	7966560fb5	feat(skills): /reload-skills slash command + skills_reload agent tool Adds a public reload path for the in-process skill caches so newly installed (or removed) skills become visible mid-session without a gateway restart. Mirrors the shape of /reload-mcp. Three surfaces: * /reload-skills slash command — CLI (cli.py) and gateway (gateway/run.py), with /reload_skills alias for Telegram autocomplete and an explicit Discord registration. * skills_reload agent tool (tools/skills_tool.py) — lets agents/subagents pick up freshly-installed skills via tool call. * agent.skill_commands.reload_skills() — shared helper that clears _skill_commands, _SKILLS_PROMPT_CACHE (in-process LRU), and the on-disk .skills_prompt_snapshot.json, then returns an added/removed diff plus the new total count. Tested: * tests/agent/test_skill_commands_reload.py (9 cases) * tests/cli/test_cli_reload_skills.py (3 cases) * tests/gateway/test_reload_skills_command.py (4 cases) Use case: NemoClaw / OpenShell-style sandboxed orchestrators that drop skills into ~/.hermes/skills mid-session, plus agentic flows where the agent itself installs a skill via the shell tool and needs it bound without a gateway restart. The Python helper clear_skills_system_prompt_cache(clear_snapshot=True) already exists internally — this PR just exposes it via slash command and tool.	2026-04-29 21:07:47 -07:00
Teknium	31f70d1f2a	fix(ci): recover 38 failing tests on main (#17642 ) CI Tests workflow has been red on main for 40+ consecutive runs. This commit recovers every failure visible in run 25130722163 (most recent completed run prior to this PR). Root causes, by group: Test-mock drift after product landed (fix: update mocks) - test_mcp_structured_content / test_mcp_dynamic_discovery (6 tests): product added _rpc_lock (#02ae15222) and _schedule_tools_refresh (#1350d12b0) without updating sibling test files. Install a real asyncio.Lock inside the fake run-loop and patch at _schedule_tools_refresh. - test_session.py: renamed normalize_whatsapp_identifier → canonical_ whatsapp_identifier upstream; keep a local alias so the legacy tests keep working. - test_run_progress_topics Slack DM test: PR #8006 made Slack default tool_progress=off; explicitly set it to 'all' in the test fixture so the progress-callback path still runs. Also read tool_progress_callback at call time rather than freezing it in FakeAgent.__init__ — production assigns it AFTER construction. - test_tui_gateway_server session-create/close race: session.create now defers _start_agent_build behind a 50ms timer — wait for the build thread to enter _make_agent before closing, otherwise the orphan- cleanup path never runs. - test_protocol session.resume: product get_messages_as_conversation now takes include_ancestors kwarg; accept *_kwargs in the test stub. - test_copilot_acp_client redaction: redactor is OFF by default (snapshots HERMES_REDACT_SECRETS at import); patch agent.redact._REDACT_ENABLED=True for the duration of the test. - test_minimax_provider: after #17171, dots in non-Anthropic model names stay dots even with preserve_dots=False. Assert the new invariant rather than the old 'broken for MiniMax' behavior. - test_update_autostash: updater now scans `ps -A` for dashboard PIDs; the test's catch-all subprocess.run stub needed stdout/stderr fields. - test_accretion_caps: read_timestamps dict is populated lazily when os.path.getmtime succeeds. Use .get("read_timestamps", {}) to tolerate CI filesystems where the stat races file creation. Change-detector tests (fix: rewrite as structural invariants) - test_credential_sources_registry_has_expected_steps: was a frozen set comparison that broke when minimax-oauth was added. Rewrite as an invariant check (every step has description, no dupes, core steps present) per AGENTS.md 'don't write change-detector tests'. xdist ordering / test pollution (fix: reset state, use module-local patches) - test_setup vercel: sibling test saved VERCEL_PROJECT_ID='project' to os.environ via save_env_value() and never cleared it. monkeypatch.delenv the VERCEL_ vars in the link-file test. - test_clipboard TestIsWsl: GitHub Actions is on Azure VMs whose real /proc/version often contains 'microsoft'. Patching builtins.open with mock_open didn't reliably intercept hermes_constants.is_wsl's call in xdist workers that had already cached _wsl_detected=True from an earlier test. Patch hermes_constants.open directly and add teardown_method to reset the cache after each test. Pytest-asyncio cancellation hangs (fix: bound product await with timeout) - test_session_split_brain_11016 (3 params) + test_gateway_shutdown cancel-inflight: under pytest-asyncio 1.3.0, 'await task' and 'asyncio.gather(cancelled_tasks)' can stall for 30s when the cancelled task's finally block awaits typing-task cleanup. Bound both with asyncio.wait_for(..., timeout=5.0) and asyncio.shield — the stragglers are released from adapter tracking and allowed to finish unwinding in the background. This is also a legitimate hardening: a wedged finally shouldn't stall the caller's dispatch or a gateway shutdown. Orphan UI config (fix: merge tiny tab into messaging category) - test_web_server test_no_single_field_categories: the telegram.reactions config field lived in its own 'telegram' schema category with no siblings. Fold it under 'discord' via _CATEGORY_MERGE so the dashboard doesn't render an orphan single-field tab. Local verification: 38/38 originally-failing tests pass; 4044/4044 gateway tests pass; 684/684 targeted subset (all 16 touched test files) passes.	2026-04-29 20:05:32 -07:00
Nanako0129	c5a5e586d7	fix(gemini): nest OpenAI-compat thinking config under google	2026-04-29 12:10:40 -07:00
teknium1	fa3338c171	test(anthropic): regression guard for DeepSeek /anthropic thinking replay Covers the #16748 fix: - unsigned thinking blocks synthesised from reasoning_content survive replay - non-latest assistant turns keep their thinking (DeepSeek validates every turn) - signed Anthropic blocks are stripped (DeepSeek can't validate them) - cache_control is stripped from thinking blocks - OpenAI-compat base (api.deepseek.com without /anthropic) is NOT matched - non-DeepSeek third parties (minimax) keep the generic strip-all behaviour	2026-04-29 08:10:29 -07:00
teknium1	0a5ee01e48	fix(hindsight): route flush-on-switch through writer queue, not raw thread Follow-up to the cherry-picked PR #17447. The original flush spawned a bare threading.Thread for the buffer-flush path, overwriting self._sync_thread — which is aliased to the long-lived writer thread. Two consequences: 1. No serialization with the writer queue. If old-session retains were still queued in _retain_queue, the flush ran concurrently with the writer and both threads could call aretain_batch against the same document_id. 2. The pre-spawn 'self._sync_thread.join(timeout=5.0)' tried to join the long-lived writer, which never exits, so the join was a no-op that just timed out — never actually serialized anything. Fix: enqueue the flush closure on _retain_queue via _ensure_writer + put(). Natural FIFO ordering behind any pending retains, no new thread, no broken join. Shutdown-aware so it doesn't enqueue after teardown. Tests updated to drain via _retain_queue.join() instead of the stale _sync_thread.join(). Added regression guard test_flush_serializes_behind_pending_retains_via_writer_queue that blocks the writer mid-retain to prove the flush waits in FIFO behind the old retain. Also seeds _retain_queue / _shutting_down / stubbed _ensure_writer on the bare-object test helper in test_memory_session_switch.py so that path doesn't blow up under the new queue-enqueue. tests/plugins/memory/test_hindsight_provider.py + tests/agent/test_memory_session_switch.py: 103/103 passing.	2026-04-29 08:09:03 -07:00
Nicolò Boschi	c38dac742b	fix(hindsight): flush buffered turns and drop stale prefetch on session switch Two data-loss / leak gaps in HindsightMemoryProvider.on_session_switch introduced by #17409. 1. Buffered turns silently lost when retain_every_n_turns > 1. on_session_switch unconditionally cleared _session_turns without flushing. Users who batched every N>1 turns and switched mid-batch (/reset, /new, /resume, /branch, or context compression) had those buffered turns disappear. Same data-loss class as the shutdown race, different lifecycle event. Note commit_memory_session() -> on_session_end() runs before on_session_switch on /reset, but Hindsight doesn't implement on_session_end so the buffer survives that step and dies at clear time. /resume, /branch, and compression skip commit_memory_session entirely so an on_session_end impl wouldn't help them anyway. Fix: snapshot the old _session_id, _document_id, _parent_session_id, _turn_index, and _session_turns; spawn one final retain that lands under the OLD document_id; then rotate state. Metadata is built synchronously against the old self._* so session_id / lineage tags on the flushed item all reference the prior session consistently. 2. Stale _prefetch_result leaks across switch. If queue_prefetch ran in the old session and the result hadn't been consumed by prefetch() yet, on_session_switch left the cached recall text in place. The next session's first prefetch() call would return text mined from the prior session's bank/query. Fix: join any in-flight _prefetch_thread (3s bounded — matches shutdown()), then clear _prefetch_result under _prefetch_lock before rotating session_id. Tests ----- - tests/plugins/memory/test_hindsight_provider.py (TestSessionSwitchBufferFlush): - buffered turns flushed under OLD document_id with OLD lineage tags - empty buffer => no spurious retain - _prefetch_result cleared on switch - in-flight prefetch thread is awaited before clear (no race) - tests/agent/test_memory_session_switch.py: factory extended to seed the attrs the new flush path reads (_retain_source, _platform, _bank_id, prefetch state, etc.) and stub _run_hindsight_operation so existing switch-state assertions keep passing without network setup.	2026-04-29 08:09:03 -07:00
Teknium	1bedc836b5	docs(onboarding): lead OpenClaw residue banner with migrate, warn that cleanup breaks OpenClaw (#17507 ) The ~/.openclaw/ detection banner (#16327) had two problems flagged in #16629: 1. It only pitched 'hermes claw cleanup' (destructive archive) and never mentioned 'hermes claw migrate' — the actual non-destructive path that ports config/memory/skills into Hermes. 2. The copy anthropomorphized the bug ('the agent can still get confused', 'dutifully reads') and framed OpenClaw as a competitor to eliminate ('instead of Hermes's'). Rewrite so migrate leads, cleanup is a clearly-labelled follow-up with a warning that archiving breaks OpenClaw for users still running it. Closes #16629	2026-04-29 08:08:36 -07:00
Teknium	83c288da01	fix(anthropic): broaden Kimi thinking-suppression to custom endpoints (#17455 ) The guard that drops Anthropic's `thinking` kwarg for Kimi endpoints was matched on `https://api.kimi.com/coding` only. Users configuring a custom Kimi-compatible gateway (or an official Moonshot host) with `api_mode: anthropic_messages` fall through to the generic third-party path, which strips thinking blocks AND still sends `thinking={enabled,...}` → upstream rejects with HTTP 400 "reasoning_content is missing in assistant tool call message at index N" on the next request after a tool call. Replace `_is_kimi_coding_endpoint` callers (history replay + thinking kwarg gate) with `_is_kimi_family_endpoint(base_url, model)` that also matches the `api.kimi.com` / `moonshot.ai` / `moonshot.cn` hosts and Kimi/Moonshot family model names (`kimi-`, `moonshot-`, `k1.`, `k2.`, …) for custom / proxied endpoints. Keeps the UA-header check in `build_anthropic_client` URL-only — the `claude-code/0.1.0` header is an official-Kimi contract. Plumbs optional `model` through `convert_messages_to_anthropic` so the unsigned reasoning_content→thinking block synthesised for Kimi's history validation survives the third-party signature-stripping pass on custom hosts too. Closes #17057.	2026-04-29 06:35:42 -07:00
Teknium	ff687c019e	fix(aux): skip kimi-coding in vision auto-detect (closes #17076 ) (#17451 ) * docs(anthropic): correct OAuth scope to Max plan + extra usage credits only The previous docs pass (#17399) overstated what Anthropic OAuth works with. In practice Hermes can only route against a Claude Max plan that has purchased extra usage credits — the base Max allowance is not consumed, and Claude Pro is not supported at all. Without Max + extra credits, users must fall back to an ANTHROPIC_API_KEY (pay-per-token). Updates the four pages touched in #17399: - integrations/providers.md - user-guide/features/credential-pools.md - reference/environment-variables.md - getting-started/quickstart.md * fix(aux): skip kimi-coding in vision auto-detect (closes #17076) Kimi Coding Plan's /coding endpoint (Anthropic Messages wire) has no image_in capability — Kimi's own docs confirm and suggest switching to a vision-capable model. Vision lives on the separate Kimi Platform (api.moonshot.ai, OpenAI-wire, pay-as-you-go). When the user has kimi-coding as main provider and auxiliary.vision.provider=auto, resolve_vision_provider_client was handing back an AnthropicAuxiliaryClient wrapped around /coding which 404'd on every vision request. Add a _PROVIDERS_WITHOUT_VISION frozenset ({kimi-coding, kimi-coding-cn}) and gate the main-provider vision branch on membership. On a skip the auto-detect falls through to OpenRouter → Nous like any other main-provider-unavailable case. Explicit per-task overrides (auxiliary.vision.provider=kimi-coding) are unaffected — the skip only applies when the caller is in auto mode. Tests: 4 new targeted tests in TestVisionAutoSkipsKimiCoding covering the skip path, CN variant, explicit-override passthrough, and a guard against accidental skip-list widening.	2026-04-29 06:10:23 -07:00
Teknium	13683c0842	feat(memory): notify providers on mid-process session_id rotation (#17409 ) Fixes #6672 Memory providers now receive on_session_switch() whenever AIAgent.session_id rotates mid-process — /resume, /branch, /reset, /new, and context compression. Before this, providers that cached per-session state in initialize() (Hindsight's _session_id, _document_id, accumulated _session_turns, _turn_counter) kept writing into the old session's record after the agent had moved on. MemoryProvider ABC ------------------ - New optional hook on_session_switch(new_session_id, , parent_session_id='', reset=False, *kwargs) with no-op default for backward compat. reset=True signals /reset or /new — providers should flush accumulated per-session buffers. reset=False for /resume, /branch, compression where the logical conversation continues. MemoryManager ------------- - on_session_switch() fans the hook out to every registered provider. Isolated try/except per provider — one bad provider can't block others. - Empty/None new_session_id is a no-op to avoid corrupting provider state during shutdown paths. run_agent.py ------------ - _sync_external_memory_for_turn now passes session_id=self.session_id into sync_all() and queue_prefetch_all(). Providers with defensive session_id updates in sync_turn (Hindsight already had this at plugins/memory/hindsight/__init__.py:1199) now actually receive the current id. - Compression block at ~L8884 already notified the context engine of the rollover; now also calls _memory_manager.on_session_switch(reason='compression'). cli.py ------ - new_session() fires reset=True, reason='new_session' so providers flush buffers. - _handle_resume_command fires reset=False, reason='resume' with the previous session as parent_session_id. - _handle_branch_command fires reset=False, reason='branch' with the parent session_id already captured for the DB parent link. gateway/run.py -------------- - _handle_resume_command now evicts the cached AIAgent, mirroring /branch and /reset. The next message rebuilds a fresh agent whose memory provider initialize() runs with the correct session_id — matches the pattern the gateway already uses for provider state cross-session transitions. Hindsight reference implementation ---------------------------------- - plugins/memory/hindsight/__init__.py adds on_session_switch that: updates _session_id, mints a fresh _document_id (prevents vectorize-io/hindsight#1303 overwrite), and clears _session_turns / _turn_counter / _turn_index so in-flight batches don't flush under the new document id. parent_session_id only overwritten when provided (avoids clobbering on a bare switch). Tests ----- - tests/agent/test_memory_session_switch.py: new dedicated file. ABC default no-op, manager fan-out, failure isolation, empty-id no-op, session_id propagation through sync_all/queue_prefetch_all, Hindsight state transitions for every reset/non-reset case, parent preservation. - tests/cli/test_branch_command.py: new test verifying /branch fires the hook with correct parent_session_id + reset=False + reason. - tests/gateway/test_resume_command.py: new test verifying /resume evicts the cached agent. - tests/run_agent/test_memory_sync_interrupted.py: updated existing assertions to account for the session_id kwarg on sync_all and queue_prefetch_all. E2E verified (real imports, tmp HERMES_HOME): - /resume: session_id updates, doc_id fresh, buffers cleared, parent set - /branch: session_id forks, parent links to original - /new: reset=True clears accumulated state - compression: reason='compression' propagated, lineage preserved - Empty id: no-op, state preserved - Legacy provider without on_session_switch: no crash Reported by @nicoloboschi (Hindsight maintainer); related scope-widening comment by @kidonng extending coverage to compression.	2026-04-29 04:57:22 -07:00
Teknium	21676e80cc	Revert "fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957 )" (#17397 ) This reverts commit `023f5c74b1`.	2026-04-29 03:55:03 -07:00
Teknium	bc0d8a941e	feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307 ) Every curator pass now emits a dated report directory under `~/.hermes/logs/curator/{YYYYMMDD-HHMMSS}/` with two files: - `run.json` — machine-readable full record (before/after snapshot, state transitions, all tool calls, model/provider, timing, full LLM final response untruncated, error if any) - `REPORT.md` — human-readable markdown: model + duration header, auto-transition counts, LLM consolidation stats, archived-this-run list, new-skills-this-run list, state transitions, the full LLM final summary, and a recovery footer pointing at the archive + the `hermes curator restore` command Reports live under `logs/curator/`, not inside `skills/` — they're operational telemetry, not user-authored skill data, and belong alongside `agent.log` / `gateway.log`. Internals: - `_run_llm_review()` now returns a dict (final, summary, model, provider, tool_calls, error) instead of a bare truncated string so the reporter has full fidelity - Report writer is fully best-effort — any failure logs at DEBUG and never breaks the curator itself. Same-second rerun gets a numeric suffix so reports can't clobber each other - Report path stamped into `.curator_state` as `last_report_path` - `hermes curator status` surfaces a "last report:" line so users can immediately open the latest run Tests (all green): - 7 new tests in tests/agent/test_curator_reports.py covering: report location (logs not skills), both files written, run.json shape and diff accuracy, markdown structure, error path still writes, state transitions captured, same-second runs get unique dirs - Existing test_run_review_synchronous_invokes_llm_stub updated to stub the new dict-returning _run_llm_review signature Live E2E: ran a synchronous pass against a 1-skill test collection with a stubbed LLM; report written correctly, state stamped with last_report_path, markdown human-readable, run.json machine-parseable.	2026-04-28 23:23:11 -07:00
teknium1	fa9383d27b	feat(curator): umbrella-first prompt, inherit parent config, unbounded iterations Based on three live test runs against 346 agent-created skills on the author's own setup (~6.5 min, opus-4.7, 86 API calls), the curator prompt needed three sharpenings before it consistently produced real umbrella consolidation instead of passive audit output: Umbrella-first framing. The original 'decide keep/patch/archive/ consolidate' framing lets opus default to 'keep' whenever two skills aren't byte-identical. The new prompt explicitly tells the reviewer that pairwise distinctness is the wrong bar — the right question is 'would a human maintainer write this as N separate skills, or one skill with N labeled subsections?' Expect 10-25 prefix clusters; merge each into an umbrella via one of three methods. Three concrete consolidation methods. (a) Merge into an existing umbrella (patch the broadest skill, archive siblings); (b) Create a new umbrella SKILL.md (skill_manage action=create); (c) Demote session-specific detail into references/, templates/, or scripts/ under the umbrella via skill_manage action=write_file, then archive the narrow sibling. This matches the support-file vocabulary the review-prompt side already uses (PR #17213). Two observed bailouts pre-empted: 'usage counters are zero so I can't judge' (rule 4: judge on content, not use_count) and 'each has a distinct trigger' (rule 5: pairwise distinctness is the wrong bar). Config-aware parent inheritance. _run_llm_review() was building AIAgent() without explicit provider/model, hitting an auto-resolve path that returned empty credentials → HTTP 400 'No models provided' against OpenRouter. Fork now inherits the user's main provider and model (via load_config + resolve_runtime_provider) before spawning — runs on whatever the user is currently on, OAuth-backed or pool-backed included. Unbounded iteration ceiling. max_iterations=8 was way too low for an umbrella-build pass over hundreds of skills. A live pass takes 50-100 API calls (scanning, clustering, skill_view'ing candidates, patching umbrellas, mv'ing siblings). Raised to 9999 — the natural stopping criterion is 'no more clusters worth processing', not an arbitrary tool-call budget. Tests updated: test_curator_review_prompt_has_invariants accepts DO NOT / MUST NOT and drops 'keep' from the required-verb set (the umbrella-first prompt correctly deemphasizes 'keep' as a first-class decision label since passive keep-everything is the failure mode being prevented). Added test_curator_review_prompt_is_umbrella_first asserting the umbrella framing, class-level thinking, references/ + templates/ + scripts/ support-file mentions, and the 'use_count is not evidence of value' pre-emption. Added test_curator_review_prompt_offers_support_file_actions asserting skill_manage action=create and action=write_file are both named. Live validation on author's setup: - Run 1 (old prompt): 3 archives, stopped after surveying — typical passive outcome - Run 2 (consolidation prompt): 44 archives, 3 patches, surfaced the 50-skill mlops reorg duplicate bug but didn't umbrella - Run 3 (this prompt): 249 archives + 18 new class-level umbrellas created, reducing agent-created skills from 346 → 118 with every archived skill's content preserved as references/ under its umbrella. Pinned skill untouched. Full report in PR description.	2026-04-28 22:33:33 -07:00
Teknium	a12f7aa8bb	fix(curator): default cycle is every 7 days, not 24 hours Weekly is closer to how skill churn actually works — most agent-created skills don't change multiple times per day, so a daily review is pure cost without benefit. Bumping the default to 7 days reduces aux-model spend while still catching drift and staleness on the timescales that matter (30d stale, 90d archive). Changes: - DEFAULT_INTERVAL_HOURS: 24 -> 168 (7 days) - config.yaml default: interval_hours: 24 -> 24 * 7 - CLI status line renders as '7d' when interval is a whole-day multiple - Test `test_old_run_eligible` decoupled from the exact default: it now uses 2 * get_interval_hours() so future tweaks don't break it	2026-04-28 22:33:33 -07:00
Teknium	0d31864e3b	fix(curator): defense-in-depth gates against bundled/hub skills Previous invariants only gated the primary entry points (apply_automatic_transitions, archive_skill, CLI pin). Several paths were unprotected: - bump_view / bump_use / bump_patch / set_state / set_pinned wrote usage records unconditionally, which is confusing noise in .usage.json even though the review list filtered them out - restore_skill did not check whether a bundled skill now shadows the archived name - CLI unpin was asymmetric with CLI pin — it had no gate Fixes: - _mutate() (the shared counter / state writer) now drops silently when the skill is not agent-created. .usage.json never gains a record for a bundled or hub-installed skill. - restore_skill() refuses to restore under a name that is now bundled or hub-installed (would shadow upstream). - CLI unpin gate matches CLI pin. New tests: - 5 provenance-guard tests on skill_usage (one per mutator) - 1 end-to-end test that hammers every mutator at a bundled skill and a hub skill, asserts both are untouched on disk, and asserts the sidecar stays clean - 2 CLI tests proving pin/unpin refuse bundled skills symmetrically 64/64 tests passing (29 skill_usage + 27 curator + 8 new guards).	2026-04-28 22:33:33 -07:00
Teknium	c8b7e7268a	refactor(curator): point review prompt at existing tools The LLM review prompt mentioned bespoke `archive_skill` and `pin_skill` tools that are not registered as model tools. Swap the prompt to rely on the real surface: - skill_manage action=patch — for patching and consolidation - terminal — to `mv` skill dirs into .archive/ Also drop `pin` from the model's decision list — pinning is a user opt-out for `hermes curator pin <skill>`, not something the model should do autonomously. Decision list is now: keep / patch / consolidate / archive. Tests updated: prompt-invariant test now asserts the existing tools are referenced and that bespoke tool names do NOT appear. New test prevents `pin` from being re-added as a model decision.	2026-04-28 22:33:33 -07:00
Teknium	bc79e227e6	feat(curator): background skill maintenance (issue #7816 ) Adds the Curator — an auxiliary-model background task that periodically reviews AGENT-CREATED skills and keeps the collection tidy: tracks usage, transitions unused skills through active → stale → archived, and spawns a forked AIAgent to consolidate overlaps and patch drift. Default: enabled, inactivity-triggered (no cron daemon). Runs on CLI startup and gateway boot when the last run is older than interval_hours (default 24) AND the agent has been idle for min_idle_hours (default 2). Invariants (all load-bearing): - Never touches bundled or hub-installed skills (.bundled_manifest + .hub/lock.json double-filter) - Never auto-deletes — archive only. Archives are recoverable via `hermes curator restore <skill>` - Pinned skills bypass all auto-transitions - Uses the aux client; never touches the main session's prompt cache New files: - tools/skill_usage.py — sidecar .usage.json telemetry, atomic writes, provenance filter - agent/curator.py — orchestrator: config, idle gating, state-machine transitions (pure, no LLM), forked-agent review prompt - hermes_cli/curator.py — `hermes curator {status,run,pause,resume, pin,unpin,restore}` subcommand - tests/tools/test_skill_usage.py — 29 tests - tests/agent/test_curator.py — 25 tests Modified files (surgical patches): - tools/skills_tool.py — bump view_count on successful skill_view - tools/skill_manager_tool.py — bump patch_count on skill_manage patch/edit/write_file/remove_file; forget record on delete - hermes_cli/config.py — add curator: section to DEFAULT_CONFIG - hermes_cli/commands.py — add /curator CommandDef with subcommands - hermes_cli/main.py — register `hermes curator` subparser via register_cli() from hermes_cli.curator - cli.py — /curator slash-command dispatch + startup hook - gateway/run.py — gateway-boot hook (mirrors CLI) Validation: - 54 new tests across skill_usage + curator, all passing in 3s - 346 tests across all touched files' neighbors green - 2783 tests across hermes_cli/ + gateway/test_run_progress_topics.py green - CLI smoke: `hermes curator status/pause/resume` work end-to-end Companion to PR #16026 (class-first skill review prompt) — together they form a loop: the review prompt stops near-duplicate skill creation at the source, and the curator prunes/consolidates what still accumulates. Refs #7816.	2026-04-28 22:33:33 -07:00
Rugved Somwanshi	01ad0aacaf	fix(tui): show correct context length	2026-04-28 12:27:36 -07:00
Rugved Somwanshi	214ca943ac	feat(agent): add lmstudio integration	2026-04-28 12:27:36 -07:00
Teknium	1d8b9e6458	fix(auxiliary): auto-detect Anthropic Messages transport for all aux clients (#17027 ) Auxiliary tasks (title_generation, vision, compression, web_extract, session_search) now pick the correct wire protocol based on the endpoint, not just on which resolve_provider_client branch built the client. Fixes 404s on Kimi Coding Plan and any other named provider whose endpoint speaks Anthropic Messages. Root cause: the 'api_key' branch of resolve_provider_client (and the Step 2 fallback chain inside _resolve_auto) always built a plain OpenAI client regardless of what the endpoint actually spoke. For provider=kimi-coding + model=kimi-for-coding, that meant: POST https://api.kimi.com/coding/v1/chat/completions { "model": "kimi-for-coding", ... } → 404 resource_not_found_error The /coding route only accepts the Anthropic Messages shape (the main agent already uses api_mode=anthropic_messages for it). Earlier fixes (#16819, #22ddac4b1) patched the anonymous-custom, named-custom, and external-process branches — but the named api_key branch (kimi-coding, minimax, zai, future /anthropic providers) was the fourth sibling and never got the same treatment. Fix: one module-level helper _maybe_wrap_anthropic() that rewraps a plain OpenAI client in AnthropicAuxiliaryClient when: - api_mode is explicitly 'anthropic_messages', OR - the URL ends in '/anthropic', OR - the host is api.kimi.com + path contains '/coding', OR - the host is api.anthropic.com. Wired into _wrap_if_needed (covers all resolve_provider_client branches that already go through it) and into the Step 2 api_key fallback chain inside _resolve_auto. Explicit api_mode still wins: passing api_mode='chat_completions' forces OpenAI wire, and already- wrapped specialized adapters (Codex, Gemini native, CopilotACP) pass through unchanged. E2E verified: - resolve_provider_client('kimi-coding', 'kimi-for-coding') → AnthropicAuxiliaryClient (was plain OpenAI, which 404'd) - _resolve_auto Step 1 for kimi-coding runtime → AnthropicAuxiliaryClient - resolve_provider_client('openrouter', ...) → plain OpenAI (no regression) - api_mode='chat_completions' override → plain OpenAI (explicit wins) Tests: - tests/agent/test_auxiliary_transport_autodetect.py (new): 21 tests covering URL detection, wrap decisions, and integration. - 204/205 existing auxiliary tests pass (1 pre-existing failure on main, unrelated to this change). Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 06:50:14 -07:00
Teknium	391f1ca1f4	feat(aux): translate extra_body.reasoning into Codex Responses API (#17004 ) Auxiliary callers that configure reasoning via auxiliary.<task>.extra_body.reasoning were having that config silently dropped by the Codex Responses adapter — it only forwarded messages/model/tools through to responses.stream(), never translating chat.completions-shaped reasoning hints into the Responses API's top-level reasoning + include fields. Mirror the main-agent translation from agent/transports/codex.py: - extra_body.reasoning.effort → resp_kwargs.reasoning.{effort, summary:"auto"} - 'minimal' → 'low' clamp (Codex backend rejects 'minimal') - Always include ['reasoning.encrypted_content'] when reasoning is enabled - {'enabled': False} → omit reasoning and include entirely - Non-dict reasoning values are ignored defensively Reported by @OP (Apr 26 feedback bundle). ## Changes - agent/auxiliary_client.py: _CodexCompletionsAdapter.create() now reads and translates extra_body.reasoning before calling responses.stream() - tests/agent/test_auxiliary_client.py: 9 new tests covering all effort levels, the minimal→low clamp, the disabled path, the no-op paths, and defensive handling of wrong-shape inputs Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 05:47:42 -07:00
Teknium	06164a7b28	fix(codex): resync pool entry from auth.json after reauth (#17001 ) When openai-codex tokens expire or the ChatGPT account hits a 429 window, the pool entry gets marked STATUS_EXHAUSTED with last_error_reset_at many hours in the future. If the user then runs `hermes model` / `hermes auth openai-codex` to reauth, fresh tokens land in ~/.hermes/auth.json but the pool entry stayed frozen behind its reset_at — every request kept failing with 'credential pool: no available entries (all exhausted or empty)' until the original window elapsed. _available_entries() already had auth.json/credentials-file resync branches for anthropic/claude_code and nous/device_code; openai-codex was missing. Added _sync_codex_entry_from_auth_store() mirroring the nous version (reads state["tokens"][{access,refresh}_token] + state["last_refresh"]) and wired it into the exhausted-entry resync loop. Also softens the 'codex CLI not found' doctor warning — native device-code OAuth does not require the Codex binary, only importing existing Codex CLI tokens does. Downgraded to an info line. Reported on Discord by p1aceho1der: Codex stalled indefinitely after a rate-limit reset, reauth didn't help, and doctor falsely warned that the codex CLI was required. Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 05:43:09 -07:00
teknium1	529eb29b6a	fix(gemini): clamp Flash thinkingLevel to documented low/medium/high set Gemini 3 Flash documents low/medium/high as the accepted thinkingLevel values. The salvaged bridge was forwarding Hermes' "minimal" effort to Flash verbatim, which is not a documented Gemini level and risks a 400 from the native adapter. Clamp minimal->low on Flash (matching how Pro already clamps minimal+low down), and funnel anything outside {low, medium, high} into medium to keep the request valid by construction. No behaviour change for the documented effort levels.	2026-04-28 05:38:23 -07:00
Nanako0129	dbbe2d1973	fix(gemini): bridge reasoning_config into thinking_config for chat-completions routes	2026-04-28 05:38:23 -07:00
Pony.Ma	02ae152222	fix(mcp): normalize nullable tool schemas	2026-04-28 04:58:03 -07:00
Ruda Porto Filgueiras	37551ee53e	test(bedrock): add model picker and region routing tests 25 new tests (all Bedrock API calls mocked, no real AWS creds needed): tests/hermes_cli/test_bedrock_model_picker.py (20 tests): - provider_model_ids("bedrock") uses live discovery, returns regional model IDs, falls back gracefully on empty/exception, resolves all bedrock aliases (aws, aws-bedrock, amazon-bedrock) to live discovery - list_authenticated_providers() section 2: bedrock appears with AWS creds, model list from discover_bedrock_models(), total_models matches, is_current flag works, absent creds hides bedrock, discovery failure does not crash, no duplicate entries - Region routing: botocore profile eu-central-1 yields eu.* model IDs end-to-end; env var takes priority over botocore profile - providers.py overlay: exists with correct transport/auth_type, label is non-empty, all aliases normalize to bedrock tests/agent/test_bedrock_adapter.py (5 tests): - resolve_bedrock_region() botocore profile fallback, botocore failure fallback, us-east-1 hard fallback (with botocore mocked)	2026-04-28 03:53:11 -07:00
Teknium	023f5c74b1	fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path (#16957 ) * fix(anthropic): remove Claude Code fingerprinting from OAuth Messages API path OAuth requests now identify as Hermes on the wire. Removed: - "You are Claude Code, Anthropic's official CLI for Claude." system prompt prepend - Hermes Agent → Claude Code / Nous Research → Anthropic system-prompt substitutions - mcp_ tool-name prefix on outgoing tool schemas + message history - Matching mcp_ strip on inbound tool_use blocks (strip_tool_prefix path removed from AnthropicTransport.normalize_response, + all 5 call sites in run_agent.py and auxiliary_client.py) - user-agent: claude-cli/<v> (external, cli) and x-app: cli headers on the Messages API client Added: - OAuth path strips context-1m-2025-08-07 — Anthropic rejects OAuth requests carrying it with HTTP 400 'This authentication style is incompatible with the long context beta header.' Kept (auth plumbing, not identity spoofing): - _is_oauth_token classifier and is_oauth flag threading - Bearer vs x-api-key auth routing - _OAUTH_ONLY_BETAS (claude-code-20250219, oauth-2025-04-20) — backend requires these on the OAuth-gated Messages endpoint - _OAUTH_CLIENT_ID (Claude Code's) — Anthropic doesn't issue OAuth creds to third parties; this is the only way the login flow works - claude-cli/<v> User-Agent on the OAuth token exchange + refresh endpoints at platform.claude.com/v1/oauth/token — bare requests get Cloudflare 1010 blocked Verified live against api.anthropic.com with a fresh sk-ant-oat01-* token: - claude-haiku-4-5 simple message: HTTP 200, 'OK' response - claude-haiku-4-5 tool call: HTTP 200, stop_reason=tool_use, tool named 'terminal' (no mcp_ prefix) round-tripped correctly - Outgoing wire: no user-agent, no x-app, real Hermes identity in system prompt, real tool name in schema Closes/supersedes #16820 (mcp_ PascalCase normalization patch — no longer needed since the mcp_ round-trip is gone). * fix(anthropic): resolve_anthropic_token() reads credential pool first Close the gap where ~/.hermes/auth.json → credential_pool.anthropic (where hermes login + dashboard PKCE flow write OAuth tokens) was not in resolve_anthropic_token()'s source list. Before: users who authed via hermes login got the token written into the pool, but legacy fallback code paths (auxiliary_client, models catalog fetch, explicit-runtime path) that call resolve_anthropic_token() saw None and raised 'No Anthropic credentials found' — even though the token was sitting in auth.json. New priority 1: pool.select() with env-sourced entries skipped. Skipping env:* entries preserves the existing env-var priority logic further down the chain (static env OAuth → refreshable Claude Code upgrade via _prefer_refreshable_claude_code_token). Surfaced while writing the hermes-agent-dev skill playbook for 'finding a live OAuth token for an E2E test'. --------- Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 03:51:17 -07:00
Teknium	e63364b8df	revert: computer-use cua-driver (PR #16919 ) (#16927 ) Reverts PR #16919 (commits `dad10a78d`, `413ee1a28`, `b4a8031b2`, `afb958829`) which was merged prematurely. Restoring the pre-merge state so #14817 and #15328 can be revisited as standing PRs. Reverted commits: - `afb958829` fix(computer-use): harden image-rejection fallback + AUTHOR_MAP - `b4a8031b2` fix(computer-use): unwrap _multimodal tool results - `413ee1a28` feat(computer-use): background focus-safe backend - `dad10a78d` feat(computer-use): cua-driver backend, universal any-model schema Co-authored-by: teknium1 <teknium@users.noreply.github.com>	2026-04-28 01:57:21 -07:00
crayfish-ai	f3371c39a4	fix(auxiliary): custom provider URL rewrite + main_runtime model for title gen - auxiliary_client: apply _to_openai_base_url() to custom base_url (fixes /anthropic → /v1 rewrite missing for provider="custom") - auxiliary_client: use main_runtime.get("model") instead of _read_main_model() so auxiliary tasks follow system default model changes - title_generator: thread main_runtime through generate_title → auto_title_session → maybe_auto_title - cli.py / gateway/run.py: pass main_runtime to maybe_auto_title - tests: update mock assertions for new main_runtime parameter	2026-04-28 01:47:25 -07:00
Teknium	dad10a78d0	feat(computer-use): cua-driver backend, universal any-model schema Background macOS desktop control via cua-driver MCP — does NOT steal the user's cursor or keyboard focus, works with any tool-capable model. Replaces the Anthropic-native `computer_20251124` approach from the abandoned #4562 with a generic OpenAI function-calling schema plus SOM (set-of-mark) captures so Claude, GPT, Gemini, and open models can all drive the desktop via numbered element indices. - `tools/computer_use/` package — swappable ComputerUseBackend ABC + CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary). - Universal `computer_use` tool with one schema for all providers. Actions: capture (som/vision/ax), click, double_click, right_click, middle_click, drag, scroll, type, key, wait, list_apps, focus_app. - Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style `content: [text, image_url]` parts) that flows through handle_function_call into the tool message. Anthropic adapter converts into native `tool_result` image blocks; OpenAI-compatible providers get the parts list directly. - Image eviction in convert_messages_to_anthropic: only the 3 most recent screenshots carry real image data; older ones become text placeholders to cap per-turn token cost. - Context compressor image pruning: old multimodal tool results have their image parts stripped instead of being skipped. - Image-aware token estimation: each image counts as a flat 1500 tokens instead of its base64 char length (~1MB would have registered as ~250K tokens before). - COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset is active. - Session DB persistence strips base64 from multimodal tool messages. - Trajectory saver normalises multimodal messages to text-only. - `hermes tools` post-setup installs cua-driver via the upstream script and prints permission-grant instructions. - CLI approval callback wired so destructive computer_use actions go through the same prompt_toolkit approval dialog as terminal commands. - Hard safety guards at the tool level: blocked type patterns (curl\|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash, force delete, lock screen, log out). - Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic) workflow guide. - Docs: `user-guide/features/computer-use.md` plus reference catalog entries. 44 new tests in tests/tools/test_computer_use.py covering schema shape (universal, not Anthropic-native), dispatch routing, safety guards, multimodal envelope, Anthropic adapter conversion, screenshot eviction, context compressor pruning, image-aware token estimation, run_agent helpers, and universality guarantees. 469/469 pass across tests/tools/test_computer_use.py + the affected agent/ test suites. - `model_tools.py` provider-gating: the tool is available to every provider. Providers without multi-part tool message support will see text-only tool results (graceful degradation via `text_summary`). - Anthropic server-side `clear_tool_uses_20250919` — deferred; client-side eviction + compressor pruning cover the same cost ceiling without a beta header. - macOS only. cua-driver uses private SkyLight SPIs (SLEventPostToPid, SLPSPostEventRecordTo, _AXObserverAddNotificationAndCheckRemote) that can break on any macOS update. Pin with HERMES_CUA_DRIVER_VERSION. - Requires Accessibility + Screen Recording permissions — the post-setup prints the Settings path. Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic- native schema). Credit @0xbyt4 for the original #3816 groundwork whose context/eviction/token design is preserved here in generic form.	2026-04-28 01:46:36 -07:00
Teknium	a7cdd4133c	fix(bedrock): send context-1m-2025-08-07 beta so Opus 4.6/4.7 get 1M context (#16793 ) On AWS Bedrock (and Azure AI Foundry), Claude Opus 4.6/4.7 and Sonnet 4.6 are capped at 200K context unless the request carries the `context-1m-2025-08-07` beta header. On native Anthropic (api.anthropic.com) 1M went GA so the header is a harmless no-op, but Bedrock/Azure still gate it as beta as of 2026-04. Hermes was advertising 1M in model_metadata.py (`claude-opus-4-7: 1000000`) while silently sending a request without the beta — so Bedrock users saw a 200K ceiling with no error message, and no config knob unblocked it. Claude Code sends this header by default, which is why the same Bedrock credentials worked there. - Add `context-1m-2025-08-07` to `_COMMON_BETAS` (alongside interleaved thinking and fine-grained tool streaming). - Strip it in `_common_betas_for_base_url` for MiniMax bearer-auth endpoints — they host their own models, not Claude, so Anthropic beta headers are irrelevant and could risk rejection. - Attach `_COMMON_BETAS` as `default_headers` on the AnthropicBedrock client. Previously that constructor passed no betas at all, so native Anthropic had the 1M unlock via default_headers but Bedrock didn't. - Fast-mode per-request `extra_headers` already rebuilds from `_common_betas_for_base_url`, so it picks up the 1M beta automatically. Reported by user 'Rodmar' on Discord: Bedrock Opus 4.7 stuck at 200K while same credentials worked in Claude Code.	2026-04-27 20:41:36 -07:00
Teknium	6ea5699e3f	fix(compression): notify users when configured aux model fails even if main-model fallback recovers (#16775 ) A misconfigured auxiliary.compression.model is a user-fixable problem that silent recovery would hide. The previous retry-on-main logic transparently swallowed aux-model failures whenever the fallback succeeded, leaving the user's broken config in place and racking up future failures. Track the aux-model failure on the compressor alongside the existing fallback-placeholder fields: - _last_aux_model_failure_model: str \| None - _last_aux_model_failure_error: str \| None Both are set at the moment the aux model errors (captured before summary_model is cleared for retry), regardless of whether the retry succeeds. Cleared at compress() start and on on_session_reset() so a clean run doesn't leak stale warnings. Surface at three places: - gateway hygiene auto-compress: ℹ note to the platform adapter (thread_id preserved) - gateway /compress command: ℹ line appended to the reply - CLI via _emit_warning: deduped on (model, error) so repeat compactions don't spam Distinct from the existing ⚠️ dropped-turns warning — different severity, different emoji, explicit 'context is intact' reassurance.	2026-04-27 20:08:23 -07:00
Teknium	94b26f3ec9	fix(compression): retry summary on main model for unknown errors before giving up (#16774 ) The existing retry-on-main path in _generate_summary only fires for errors that match the _is_model_not_found heuristic (404/503, 'model_not_found', 'does not exist', 'no available channel'). Other misconfiguration errors — 400s from aggregators, provider-specific 'no route' strings, opaque rejections — fall straight through to the transient-cooldown branch, which drops N turns of context and inserts a static placeholder. Losing context is almost always worse than one extra summary attempt. Add a best-effort retry-on-main for the unknown-error branch, guarded by the same invariants as the existing fast-path retry: only when summary_model differs from main, and only once per compressor (_summary_model_fallen_back). Tests cover: 404 fast-path fallback still works, unknown 400 now falls back, same-model aux skips retry (no infinite loop), and a double-failure (aux + main) stops at 2 calls.	2026-04-27 19:25:57 -07:00
iamagenius00	dfdc4276e8	fix(compression): notify gateway users when summary generation fails When auxiliary compression's summary LLM call fails (e.g. model 404, auxiliary model misconfigured), the compressor still drops the selected turns and inserts a static fallback placeholder — the dropped context is unrecoverable. Previously the only signal of this was a WARNING in agent.log. Gateway users (Telegram/Discord/etc.) had no way to know context was lost because the existing _emit_warning path requires a status_callback, and the gateway hygiene path uses a temporary _hyg_agent with quiet_mode=True and no callback wired up. Changes: - ContextCompressor: track _last_summary_fallback_used and _last_summary_dropped_count on each compress() call. Cleared at the start of compress() and on session reset. - gateway/run.py hygiene: after auto-compress, inspect the temp agent's compressor; if fallback was used, send a visible ⚠️ warning to the user via the platform adapter (TG/Discord/etc.) including dropped count and the underlying error. - gateway/run.py /compress: append the same warning to the manual compress reply so users running /compress see the failure too. Acceptance: - Summary success: no user-visible warning (unchanged). - Summary failure on gateway hygiene: user receives a TG/Discord message with dropped count + error + remediation hint. - Summary failure on /compress: warning appended to the command reply. - CLI status_callback / _emit_warning path is untouched. - Test coverage: two new tests verify the tracking fields are set on failure and cleared on subsequent success.	2026-04-27 19:18:13 -07:00
Erosika	49e3a1d8ee	style: trim verbose comment blocks added by previous commit	2026-04-27 12:37:33 -07:00
Erosika	e553f6f3e4	fix(memory): narrow scrub surface to known wrapper boundaries Reviewer pushback on the original boundary-hardening commits — three overreach points pulled plugin-specific policy into shared core paths: 1. gateway/run.py hardcoded a '## Honcho Context' literal split for vision-LLM output. Plugin-format heading in framework code; could truncate legitimate output naturally containing that header. Drop the literal split; keep generic sanitize_context (the wrapper strip is plugin-agnostic). Plugin-specific cleanup belongs at the provider boundary, not the shared gateway path. 2. run_agent.run_conversation scrubbed user_message and persist_user_message before the conversation loop. User text is sacred — if a user types a literal <memory-context> tag we must not silently delete it. The producer (build_memory_context_block) is the only legitimate emitter; user input should never need the reverse op. 3. _build_assistant_message scrubbed model output before persistence. Same hazard: would silently mutate legitimate documentation/code the model emits containing the literal markers. The streaming scrubber catches real leaks delta-by-delta before content is concatenated; persist-time scrub was redundant belt-and-suspenders. 4. _fire_stream_delta stripped leading newlines from every delta unless a paragraph break flag was set. Mid-stream '\n' is legitimate markdown — lists, code fences, paragraph breaks — and chunk boundaries are arbitrary. Narrow lstrip to the very first delta of the stream only (so stale provider preamble still gets cleaned on turn start, but mid-stream formatting survives). Plus: build_memory_context_block now logs a warning when its defensive sanitize_context strips something — surfaces buggy providers returning pre-wrapped text instead of silently double-fencing. Net architectural change: scrub surface collapses from 8 sites to 3 (StreamingContextScrubber on output deltas, plugin→backend send, build_memory_context_block input-validation). Plugin-specific strings stay out of shared runtime paths. User input and persisted assistant output are no longer mutated. Tests: rescoped TestMemoryContextSanitization (helper-correctness only, no source-inspection of removed call sites), updated vision tests to drop '## Honcho Context' literal-split assertions, updated _build_assistant_message persistence test to assert preservation. Added: cross-turn scrubber reset, build_memory_context_block warn-on- violation, mid-stream newline preservation (plain + code fence).	2026-04-27 12:37:33 -07:00
Erosika	3b2edb347d	fix(gateway): scrub memory-context leaks from vision auto-analysis output fixes #5719 The auxiliary vision LLM called by gateway._enrich_message_with_vision can echo its injected Honcho system prompt back into the image description. That description gets embedded verbatim into the enriched user message, so recalled memory (personal facts, dialectic output) surfaces into a user-visible bubble. Strips both forms of leak before embedding: - <memory-context>...</memory-context> fenced blocks (sanitize_context) - trailing '## Honcho Context' sections (header + everything after) Plus regression tests: - tests/agent/test_streaming_context_scrubber.py — 13 tests on the stateful scrubber (whole block, split tags, false-positive partial tags, unterminated span, reset, case-insensitivity) - tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on _fire_stream_delta covering the realistic 7-chunk leak scenario and the cross-turn scrubber reset - tests/gateway/test_vision_memory_leak.py — 4 tests covering the vision auto-analysis boundary (clean pass-through, '## Honcho Context' header, fenced block, both patterns together)	2026-04-27 12:37:33 -07:00
kshitijk4poor	56724147ef	fix(providers/gmi): post-salvage review fixes - config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version is 22, so all users are past version 17 and would never be prompted for GMI_API_KEY on upgrade — consistent with how arcee was added) - auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview, stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview) - test_gmi_provider.py: fix malformed write_text() call in doctor test (was: write_text("GMI_API_KEY=* encoding="utf-8") → missing closing quote, wrote literal string 'GMI_API_KEY=* encoding=' to .env file) - test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions to match new cheaper default - docs/integrations/providers.md: add 'gmi' to inline 'Supported providers' fallback list (was only in the table, not the inline list at line ~1181) - docs/reference/cli-commands.md: add 'gmi' to --provider choices list	2026-04-27 11:17:59 -07:00
Isaac Huang	c53fcb0173	feat(providers): add GMI Cloud as a first-class API-key provider (#11955 ) Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider with built-in auth, aliases, model catalog, CLI entry points, auxiliary client routing, context length resolution, doctor checks, env var tracking, and docs. - auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL) - providers.py: HermesOverlay with extra_env_vars for models.dev detection - models.py: curated slash-form model catalog; live /v1/models fetch - main.py: 'gmi' in _named_custom_provider_map and --provider choices - model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated context-length probe block (GMI's /models has authoritative data) - auxiliary_client.py: alias entries; _compat_model fix for slash-form models on cached aggregator-style clients; gmi aux default model - doctor.py: GMI in provider connectivity checks - config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS - conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix) - docs: providers.md, environment-variables.md, fallback-providers.md, configuration.md, quickstart.md (expands provider table) Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local>	2026-04-27 11:17:59 -07:00
hermes-agent-dhabibi	8402ba150e	fix(copilot): send vision header for Copilot vision requests Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route. Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision. Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>	2026-04-27 08:35:50 -07:00
Teknium	ec671c4154	feat(image-input): native multimodal routing based on model vision capability (#16506 ) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto \| native \| text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.	2026-04-27 06:27:59 -07:00
Teknium	4a2ee6c162	fix(title-gen): surface auxiliary failures via _emit_auxiliary_failure Closes #15775. Title generation swallowed exceptions at debug level and returned None, so a depleted auxiliary provider (e.g. OpenRouter 402) silently left sessions with NULL titles. Reporter observed 45 untitled sessions accumulated over 19 days with no user-visible indication. - agent/title_generator.py: accept optional failure_callback, bump log to WARNING, invoke callback on call_llm exception (swallowing callback errors so nothing can crash the fire-and-forget worker thread). - cli.py, gateway/run.py: pass agent._emit_auxiliary_failure as the callback so failures route through the existing user-visible warning channel. - tests: cover callback fires / errors are swallowed / no-callback legacy behavior / maybe_auto_title forwards kwarg to worker.	2026-04-26 21:49:34 -07:00
briandevans	943465235e	fix(compressor): guard against bare-string items in multimodal content list raw_content from message["content"] can be a list that contains bare strings, not only dicts. The previous `p.get("text", "")` call raised AttributeError on string items, crashing context compression for any session that had a message with mixed content. Guard with isinstance checks: dict → .get("text"), str → len(p), fallback → len(str(p)). Adds a regression test covering the bare-string case that would have AttributeError'd on the pre-fix code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-26 21:48:09 -07:00
briandevans	cfc8befe65	fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens _find_tail_cut_by_tokens called len(content) to estimate message tokens. When content is a list of blocks (multimodal: text + image_url), len() returns block count (e.g. 2) rather than character count, so a message with 500 chars of text was counted as ~10 tokens instead of ~135. This caused the backward walk to exhaust all messages before hitting the budget ceiling; the head_end safeguard then forced cut = n - min_tail, shrinking the protected tail to the bare minimum and preventing effective compression of long multimodal conversations. Fix mirrors the existing pattern in _prune_old_tool_results (line 487): sum(len(p.get("text", "")) for p in raw_content) if isinstance(raw_content, list) else len(raw_content) Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard (confirms the test fails with the bug), plain-string regression guard, and image-only block edge case. Fixes #16087. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-26 21:48:09 -07:00
Teknium	6c87371815	fix(openclaw-migration): case-preserving brand rewrite + one-time ~/.openclaw residue banner (#16327 ) Two related fixes for OpenClaw-residue problems after an OpenClaw→Hermes migration (especially migrations done via OpenClaw's own tool, which doesn't archive the source directory). 1. optional-skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py: rebrand_text() was rewriting ~/.openclaw/config.yaml → ~/.Hermes/config.yaml (capital H — a directory that doesn't exist). Now case-preserving: "OpenClaw" → "Hermes" (prose), but "openclaw" → "hermes" (so filesystem paths land on the real Hermes home). Regex logic unchanged — replacement function now checks if the matched text was all-lowercase and emits the replacement in the matching case. 2. agent/onboarding.py + cli.py: one-time startup banner the first time Hermes launches and finds ~/.openclaw/. Tells the user to run `hermes claw cleanup` to archive it, gated on the existing onboarding seen-flag framework (onboarding.seen.openclaw_residue_cleanup in config.yaml). Fires once per install; re-running requires wiping that flag or running cleanup directly. Tests: - 4 new TestDetectOpenclawResidue tests (present / absent / file-instead- of-dir / default-home smoke) - 2 TestOpenclawResidueHint tests (content check) - 2 TestOpenclawResidueSeenFlag tests (flag isolation + round-trip) - test_rebrand_text_preserves_filesystem_path_casing regression test with 4 scenarios including the exact ~/.openclaw/config.yaml case - Existing test_rebrand_text_* tests updated to the new case-preserving contract (lowercase input → lowercase output) Co-authored-by: teknium1 <teknium@noreply.github.com>	2026-04-26 20:57:26 -07:00
Teknium	e19854d893	fix(shell_hooks): parse hooks_auto_accept as strict bool/string, not bool() (#16322 ) `_resolve_effective_accept()` used `return bool(cfg_val)` for the `hooks_auto_accept` config key. In Python, `bool("false")` is `True`, so a user setting `hooks_auto_accept: "false"` (quoted YAML string) in `config.yaml` would silently enable auto-approval of every shell hook, bypassing the consent prompt entirely. Replace the coercion with the same type-aware parsing already used for the HERMES_ACCEPT_HOOKS env var three lines above: bool passthrough, strings checked against {1,true,yes,on} case-insensitively, everything else (including "false", None, 0, ints) rejected. Add TestHooksAutoAcceptParsing guarding the regression across all four value shapes (bool, string-truthy, string-falsy, missing/None). Reported by @sprmn24 in #16244.	2026-04-26 20:48:35 -07:00
Teknium	635253b918	feat(busy): add 'steer' as a third display.busy_input_mode option (#16279 ) Enter while the agent is busy can now inject the typed text via /steer — arriving at the agent after the next tool call — instead of interrupting (current default) or queueing for the next turn. Changes: - cli.py: keybinding honors busy_input_mode='steer' by calling agent.steer(text) on the UI thread (thread-safe), with automatic fallback to 'queue' when the agent is missing, steer() is unavailable, images are attached, or steer() rejects the payload. /busy accepts 'steer' as a fourth argument alongside queue/interrupt/status. - gateway/run.py: busy-message handler and the PRIORITY running-agent path both route through running_agent.steer() when the mode is 'steer', with the same fallback-to-queue safety net. Ack wording tells users their message was steered into the current run. Restart-drain queueing now also activates for 'steer' so messages aren't lost across restarts. - agent/onboarding.py: first-touch hint has a steer branch for both CLI and gateway. - hermes_cli/commands.py: /busy args_hint updated to include steer, and 'steer' is registered as a subcommand (completions). - hermes_cli/web_server.py: dashboard select widget offers steer. - hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py: inline docs updated. - website/docs/user-guide/cli.md + messaging/index.md: documented. - Tests: steer set/status path for /busy; onboarding hints; _load_busy_input_mode accepts steer; busy-session ack exercises steer success + two fallback-to-queue branches. Requested on X by @CodingAcct. Default is unchanged (interrupt).	2026-04-26 18:21:29 -07:00
Teknium	9a70260490	Revert "feat(onboarding): port first-touch hints to the TUI (#16054 )" (#16062 ) This reverts commit `ffd2621039`.	2026-04-26 06:31:37 -07:00
Teknium	ffd2621039	feat(onboarding): port first-touch hints to the TUI (#16054 ) PR #16046 added /busy and /verbose hints to the classic CLI and the gateway runner but skipped the Ink TUI (and therefore the dashboard /chat page, which embeds the TUI via PTY). This extends the same latch to the TUI with TUI-native wording. The TUI's busy-input model is not the /busy knob from the CLI — single Enter while busy auto-queues, double Enter on an empty line interrupts. The new busy-input hint teaches THAT gesture instead of telling the user to flip a config that does not apply. Changes: - agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui() - tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into _on_tool_complete for the 30s/tool_progress=all path. Same config.yaml latch so each hint fires at most once per install across CLI, gateway, and TUI combined. - ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event - ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys() - ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first busy enqueue - tests/agent/test_onboarding.py — +3 cases for TUI hint shape - tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim - website/docs/user-guide/tui.md — new 'Interrupting and queueing' section explaining the TUI's double-Enter model and the hints Validation: scripts/run_tests.sh tests/agent/test_onboarding.py \ tests/tui_gateway/test_protocol.py \ tests/gateway/test_busy_session_ack.py -> 66 passed npm --prefix ui-tui run type-check -> clean npm --prefix ui-tui run lint -> clean npm --prefix ui-tui run build -> clean	2026-04-26 06:24:19 -07:00
Teknium	83c1c201f6	feat(onboarding): contextual first-touch hints for /busy and /verbose (#16046 ) Instead of a blocking first-run questionnaire, show a one-time hint the first time the user hits each behavior fork: 1. First message while the agent is working — appends a hint to the busy-ack explaining the /busy queue vs /busy interrupt knob, phrased to match the mode that was just applied (don't tell a queue-mode user to switch to queue). 2. First tool that runs for >= 30s in the noisiest progress mode (tool_progress: all) — prints a hint about /verbose to cycle display modes (all -> new -> off -> verbose). Gated on /verbose actually being usable on the surface: always shown on CLI; on gateway only shown when display.tool_progress_command is enabled. Each hint is latched in config.yaml under onboarding.seen.<flag>, so it fires exactly once per install across CLI, gateway, and cron, then never again. Users can wipe the section to re-see hints. New: - agent/onboarding.py — is_seen / mark_seen / hint strings, shared by both CLI and gateway. - onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in load_cli_config defaults (cli.py). No _config_version bump — deep merge handles new keys. Wired: - gateway/run.py: _handle_active_session_busy_message appends the hint after building the ack. progress_callback tracks tool.completed duration and queues the tool-progress hint into the progress bubble. - cli.py: CLI input loop appends the busy-input hint on the first busy Enter; _on_tool_progress appends the tool-progress hint on the first >=30s tool completion. In-memory CLI_CONFIG is also updated so subsequent fires in the same process are suppressed immediately. All writes go through atomic_yaml_write and are wrapped in try/except so onboarding can never break the input/busy-ack paths.	2026-04-26 06:06:27 -07:00
Teknium	438db0c7b0	fix(cli): /model picker honors provider-specific context caps (#16030 ) `_apply_model_switch_result` (the interactive `/model` picker's confirmation path) printed `ModelInfo.context_window` straight from models.dev, which reports the vendor-wide value (1.05M for gpt-5.5 on openai). ChatGPT Codex OAuth caps the same slug at 272K, so the picker showed 1M while the runtime (compressor, gateway `/model`, typed `/model <name>`) correctly used 272K — the classic 'sometimes 1M, sometimes 272K' mismatch on a single model. Both display paths now go through `resolve_display_context_length()`, matching the fix that `_handle_model_switch` received earlier. Also bump the stale last-resort fallback in DEFAULT_CONTEXT_LENGTHS (`gpt-5.5: 400000 -> 1050000`) to match the real OpenAI API value; the 272K Codex cap is already enforced via the Codex-OAuth branch, so the fallback now reflects what every non-Codex probe-miss should see. Tests: adds `test_apply_model_switch_result_context.py` with three scenarios (Codex cap wins, OpenRouter shows 1.05M, resolver-empty falls back to ModelInfo). Updates the existing non-Codex fallback test to assert 1.05M (the correct value). ## Validation \| path \| before \| after \| \|-------------------------------\|-----------\|-----------\| \| picker -> gpt-5.5 on Codex \| 1,050,000 \| 272,000 \| \| picker -> gpt-5.5 on OpenAI \| 1,050,000 \| 1,050,000 \| \| picker -> gpt-5.5 on OpenRouter \| 1,050,000 \| 1,050,000 \| \| typed /model gpt-5.5 on Codex \| 272,000 \| 272,000 \|	2026-04-26 05:43:31 -07:00
zkl	2ccdadcca6	fix(deepseek): bump V4 family context window to 1M tokens #14934 added deepseek-v4-pro / deepseek-v4-flash to the DeepSeek native provider but the context-window lookup still falls back to the existing "deepseek" substring entry (128K). DeepSeek V4 ships with a 1M context window, so any caller relying on get_model_context_length() for pre-flight token budgeting (compression, context warnings) under-counts by ~8x. Add explicit lowercase entries for the four DeepSeek model ids that ship 1M context: - deepseek-v4-pro - deepseek-v4-flash - deepseek-chat (legacy alias, server-side maps to v4-flash non-thinking) - deepseek-reasoner (legacy alias, server-side maps to v4-flash thinking) Longest-key-first substring matching means these explicit entries also cover the vendor-prefixed forms (deepseek/deepseek-v4-pro on OpenRouter and Nous Portal) without regressing the existing 128K fallback for older / unknown DeepSeek model ids on custom endpoints. Source: https://api-docs.deepseek.com/zh-cn/quick_start/pricing	2026-04-26 05:32:54 -07:00
Teknium	192e7eb21f	fix(nous): don't trip cross-session rate breaker on upstream-capacity 429s (#15898 ) Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi, MiMo, Hermes) behind one endpoint. Before this fix, any 429 on any of those models recorded a cross-session file breaker that blocked EVERY model on Nous for the cooldown window -- even though the caller's own RPM/RPH/TPM/TPH buckets were healthy. Users hit a DeepSeek V4 Pro capacity error, restarted, switched to Kimi 2.6, and still got 'Nous Portal rate limit active -- resets in 46m 53s'. Nous already emits the full x-ratelimit-* header suite on every response (captured by rate_limit_tracker into agent._rate_limit_state). We now gate the breaker on that data: trip it only when either the 429's own headers or the last-known-good state show a bucket with remaining == 0 AND a reset window >= 60s. Upstream-capacity 429s (healthy buckets everywhere, but upstream out of capacity) fall through to normal retry/fallback and the breaker is never written. Note: the in-memory 'restart TUI/gateway to clear' workaround circulated in Discord does NOT work -- the breaker is file-backed at ~/.hermes/rate_limits/nous.json. The workaround for users still affected by a bad state file is to delete it. Reported in Discord by CrazyDok1 and KYSIV (Apr 2026).	2026-04-26 04:53:42 -07:00
Teknium	125de02056	fix(context): honor custom_providers context_length on /model switch + bump probe tier to 256K (#15844 ) Fixes #15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback. ## What changed New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching. `agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe). Wired through five call sites that previously either duplicated the lookup or ignored it entirely: - `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning) - `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch - `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg - `gateway/run.py` /model confirmation (picker callback + text path) - `gateway/run.py` `_format_session_info` (/info) ## Context probe tiers `CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency. ## Repro (from #15779) ```yaml custom_providers: - name: my-custom-endpoint base_url: https://example.invalid/v1 model: gpt-5.5 models: gpt-5.5: context_length: 1050000 ``` `/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000". ## Tests - `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants - `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver - `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K) - `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier	2026-04-25 18:47:53 -07:00
nerijusas	81e01f6ee9	fix(agent): preserve Codex message items for replay	2026-04-25 18:22:06 -07:00
Teknium	ea01bdcebe	refactor(memory): remove flush_memories entirely (#15696 ) The AIAgent.flush_memories pre-compression save, the gateway _flush_memories_for_session, and everything feeding them are obsolete now that the background memory/skill review handles persistent memory extraction. Problems with flush_memories: - Pre-dates the background review loop. It was the only memory-save path when introduced; the background review now fires every 10 user turns on CLI and gateway alike, which is far more frequent than compression or session reset ever triggered flush. - Blocking and synchronous. Pre-compression flush ran on the live agent before compression, blocking the user-visible response. - Cache-breaking. Flush built a temporary conversation prefix (system prompt + memory-only tool list) that diverged from the live conversation's cached prefix, invalidating prompt caching. The gateway variant spawned a fresh AIAgent with its own clean prompt for each finalized session — still cache-breaking, just in a different process. - Redundant. Background review runs in the live conversation's session context, gets the same content, writes to the same memory store, and doesn't break the cache. Everything flush_memories claimed to preserve is already covered. What this removes: - AIAgent.flush_memories() method (~248 LOC in run_agent.py) - Pre-compression flush call in _compress_context - flush_memories call sites in cli.py (/new + exit) - GatewayRunner._flush_memories_for_session + _async_flush_memories (and the 3 call sites: session expiry watcher, /new, /resume) - 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks, hermes tools UI task list, auxiliary_client docstrings - _memory_flush_min_turns config + init - #15631's headroom-deduction math in _check_compression_model_feasibility (headroom was only needed because flush dragged the full main-agent system prompt along; the compression summariser sends a single user-role prompt so new_threshold = aux_context is safe again) - The dedicated test files and assertions that exercised flush-specific paths What this renames (with read-time backcompat on sessions.json): - SessionEntry.memory_flushed -> SessionEntry.expiry_finalized. The session-expiry watcher still uses the flag to avoid re-running finalize/eviction on the same expired session; the new name reflects what it now actually gates. from_dict() reads 'expiry_finalized' first, falls back to the legacy 'memory_flushed' key so existing sessions.json files upgrade seamlessly. Supersedes #15631 and #15638. Tested: 383 targeted tests pass across run_agent/, agent/, cli/, and gateway/ session-boundary suites. No behavior regressions — background memory review continues to handle persistent memory extraction on both CLI and gateway.	2026-04-25 08:21:14 -07:00
Teknium	3c1c65e754	fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633 ) Generalize the temperature-specific 400 retry that shipped in PR #15621 so the same reactive strategy covers any provider that rejects an arbitrary request parameter — — not just temperature. - agent/auxiliary_client.py: * New _is_unsupported_parameter_error(exc, param): matches the same six phrasings the old temperature detector did plus 'unrecognized parameter' and 'invalid parameter', against any named param. * _is_unsupported_temperature_error is now a thin back-compat wrapper so existing imports and tests keep working. * The max_tokens → max_completion_tokens retry branch in call_llm and async_call_llm now (a) gates on 'max_tokens is not None' so we do not pop a key that was never set and silently substitute a None value on the retry, and (b) also matches the generic helper in addition to the legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking up phrasings like 'Unknown parameter: max_tokens' that previously slipped through. - tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering the generic detector across params, the back-compat wrapper, and the two hardenings to the max_tokens retry branch (None gate + generic phrasing). Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR also proposed the reactive temperature retry which landed independently via PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages the remaining hardening ideas onto current main.	2026-04-25 05:50:34 -07:00
Ash Rowan Vale 🌿	facea84559	fix(auxiliary): retry without temperature when any provider rejects it Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature' across all providers/models — not just Codex Responses. The same backend can accept temperature for some models and reject it for others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does not scale. call_llm / async_call_llm now detect the concrete 'unsupported parameter: temperature' 400 and transparently retry once without temperature. Kimi's server-managed omission and Opus 4.7+'s proactive strip stay in place — this is the safety net for everything else. Changes: - agent/auxiliary_client.py: add _is_unsupported_temperature_error helper; wire into both sync and async call_llm paths before the existing max_tokens/payment/auth retry ladder - tests/agent/test_unsupported_temperature_retry.py: 19 tests covering detector phrasings, sync + async retry, no-retry-without-temperature, and non-temperature 400s not triggering the retry Builds on PR #15620 (codex_responses fallback) which stripped temperature up front for that one api_mode. This PR closes the gap for every other provider/model combo via reactive retry. Credit: retry approach and detector originate from @BlueBirdBack's PR #15578. Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>	2026-04-25 05:27:17 -07:00
vominh1919	5401a0080d	fix: recalculate token budgets on model switch in ContextCompressor update_model() recalculated threshold_tokens but left tail_token_budget and max_summary_tokens at their __init__ values. When switching from a 200K model to 32K, the tail budget stayed at ~20K tokens (62% of 32K) instead of the intended ~10%. Adds budget recalculation in update_model() and 2 regression tests.	2026-04-25 15:07:56 +05:30
helix4u	6a957a74bc	fix(memory): add write origin metadata	2026-04-24 14:37:55 -07:00
Andre Kurait	a9ccb03ccc	fix(bedrock): evict cached boto3 client on stale-connection errors ## Problem When a pooled HTTPS connection to the Bedrock runtime goes stale (NAT timeout, VPN flap, server-side TCP RST, proxy idle cull), the next Converse call surfaces as one of: * botocore.exceptions.ConnectionClosedError / ReadTimeoutError / EndpointConnectionError / ConnectTimeoutError * urllib3.exceptions.ProtocolError * A bare AssertionError raised from inside urllib3 or botocore (internal connection-pool invariant check) The agent loop retries the request 3x, but the cached boto3 client in _bedrock_runtime_client_cache is reused across retries — so every attempt hits the same dead connection pool and fails identically. Only a process restart clears the cache and lets the user keep working. The bare-AssertionError variant is particularly user-hostile because str(AssertionError()) is an empty string, so the retry banner shows: ⚠️ API call failed: AssertionError 📝 Error: with no hint of what went wrong. ## Fix Add two helpers to agent/bedrock_adapter.py: * is_stale_connection_error(exc) — classifies exceptions that indicate dead-client/dead-socket state. Matches botocore ConnectionError + HTTPClientError subtrees, urllib3 ProtocolError / NewConnectionError, and AssertionError raised from a frame whose module name starts with urllib3., botocore., or boto3.. Application-level AssertionErrors are intentionally excluded. * invalidate_runtime_client(region) — per-region counterpart to the existing reset_client_cache(). Evicts a single cached client so the next call rebuilds it (and its connection pool). Wire both into the Converse call sites: * call_converse() / call_converse_stream() in bedrock_adapter.py (defense-in-depth for any future caller) * The two direct client.converse(kwargs) / client.converse_stream(kwargs) call sites in run_agent.py (the paths the agent loop actually uses) On a stale-connection exception, the client is evicted and the exception re-raised unchanged. The agent's existing retry loop then builds a fresh client on the next attempt and recovers without requiring a process restart. ## Tests tests/agent/test_bedrock_adapter.py gets three new classes (14 tests): * TestInvalidateRuntimeClient — per-region eviction correctness; non-cached region returns False. * TestIsStaleConnectionError — classifies botocore ConnectionClosedError / EndpointConnectionError / ReadTimeoutError, urllib3 ProtocolError, library-internal AssertionError (both urllib3.* and botocore.* frames), and correctly ignores application-level AssertionError and unrelated exceptions (ValueError, KeyError). * TestCallConverseInvalidatesOnStaleError — end-to-end: stale error evicts the cached client, non-stale error (validation) leaves it alone, successful call leaves it cached. All 116 tests in test_bedrock_adapter.py pass. Signed-off-by: Andre Kurait <andrekurait@gmail.com>	2026-04-24 07:26:07 -07:00
Tranquil-Flow	7dc6eb9fbf	fix(agent): handle aws_sdk auth type in resolve_provider_client Bedrock's aws_sdk auth_type had no matching branch in resolve_provider_client(), causing it to fall through to the "unhandled auth_type" warning and return (None, None). This broke all auxiliary tasks (compression, memory, summarization) for Bedrock users — the main conversation loop worked fine, but background context management silently failed. Add an aws_sdk branch that creates an AnthropicAuxiliaryClient via build_anthropic_bedrock_client(), using boto3's default credential chain (IAM roles, SSO, env vars, instance metadata). Default auxiliary model is Haiku for cost efficiency. Closes #13919	2026-04-24 07:26:07 -07:00
Andre Kurait	b290297d66	fix(bedrock): resolve context length via static table before custom-endpoint probe ## Problem `get_model_context_length()` in `agent/model_metadata.py` had a resolution order bug that caused every Bedrock model to fall back to the 128K default context length instead of reaching the static Bedrock table (200K for Claude, etc.). The root cause: `bedrock-runtime.<region>.amazonaws.com` is not listed in `_URL_TO_PROVIDER`, so `_is_known_provider_base_url()` returned False. The resolution order then ran the custom-endpoint probe (step 2) before the Bedrock branch (step 4b), which: 1. Treated Bedrock as a custom endpoint (via `_is_custom_endpoint`). 2. Called `fetch_endpoint_model_metadata()` → `GET /models` on the bedrock-runtime URL (Bedrock doesn't serve this shape). 3. Fell through to `return DEFAULT_FALLBACK_CONTEXT` (128K) at the "probe-down" branch — never reaching the Bedrock static table. Result: users on Bedrock saw 128K context for Claude models that actually support 200K on Bedrock, causing premature auto-compression. ## Fix Promote the Bedrock branch from step 4b to step 1b, so it runs before the custom-endpoint probe at step 2. The static table in `bedrock_adapter.py::get_bedrock_context_length()` is the authoritative source for Bedrock (the ListFoundationModels API doesn't expose context window sizes), so there's no reason to probe `/models` first. The original step 4b is replaced with a one-line breadcrumb comment pointing to the new location, to make the resolution-order docstring accurate. ## Changes - `agent/model_metadata.py` - Add step 1b: Bedrock static-table branch (unchanged predicate, moved). - Remove dead step 4b block, replace with breadcrumb comment. - Update resolution-order docstring to include step 1b. - `tests/agent/test_model_metadata.py` - New `TestBedrockContextResolution` class (3 tests): - `test_bedrock_provider_returns_static_table_before_probe`: confirms `provider="bedrock"` hits the static table and does NOT call `fetch_endpoint_model_metadata` (regression guard). - `test_bedrock_url_without_provider_hint`: confirms the `bedrock-runtime.*.amazonaws.com` host match works without an explicit `provider=` hint. - `test_non_bedrock_url_still_probes`: confirms the probe still fires for genuinely-custom endpoints (no over-reach). ## Testing pytest tests/agent/test_model_metadata.py -q # 83 passed in 1.95s (3 new + 80 existing) ## Risk Very low. - Predicate is identical to the original step 4b — no behaviour change for non-Bedrock paths. - Original step 4b was dead code for the user-facing case (always hit the 128K fallback first), so removing it cannot regress behaviour. - Bedrock path now short-circuits before any network I/O — faster too. - `ImportError` fall-through preserved so users without `boto3` installed are unaffected. ## Related - This is a prerequisite for accurate context-window accounting on Bedrock — the fix for #14710 (stale-connection client eviction) depends on correct context sizing to know when to compress. Signed-off-by: Andre Kurait <andrekurait@gmail.com>	2026-04-24 07:26:07 -07:00
Qi Ke	f2fba4f9a1	fix(anthropic): auto-detect Bedrock model IDs in normalize_model_name (#12295 ) Bedrock model IDs use dots as namespace separators (anthropic.claude-opus-4-7, us.anthropic.claude-sonnet-4-5-v1:0), not version separators. normalize_model_name() was unconditionally converting all dots to hyphens, producing invalid IDs that Bedrock rejects with HTTP 400/404. This affected both the main agent loop (partially mitigated by _anthropic_preserve_dots in run_agent.py) and all auxiliary client calls (compression, session_search, vision, etc.) which go through _AnthropicCompletionsAdapter and never pass preserve_dots=True. Fix: add _is_bedrock_model_id() to detect Bedrock namespace prefixes (anthropic., us., eu., ap., jp., global.) and skip dot-to-hyphen conversion for these IDs regardless of the preserve_dots flag.	2026-04-24 07:26:07 -07:00
Wooseong Kim	be6b83562d	fix(aux): force anthropic oauth refresh after 401 Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-24 07:14:00 -07:00
5park1e	e1106772d9	fix: re-auth on stale OAuth token; read Claude Code credentials from macOS Keychain Bug 3 — Stale OAuth token not detected in 'hermes model': - _model_flow_anthropic used 'has_creds = bool(existing_key)' which treats any non-empty token (including expired OAuth tokens) as valid. - Added existing_is_stale_oauth check: if the only credential is an OAuth token (sk-ant- prefix) with no valid cc_creds fallback, mark it stale and force the re-auth menu instead of silently accepting a broken token. Bug 4 — macOS Keychain credentials never read: - Claude Code >=2.1.114 migrated from ~/.claude/.credentials.json to the macOS Keychain under service 'Claude Code-credentials'. - Added _read_claude_code_credentials_from_keychain() using the 'security' CLI tool; read_claude_code_credentials() now tries Keychain first then falls back to JSON file. - Non-Darwin platforms return None from Keychain read immediately. Tests: - tests/agent/test_anthropic_keychain.py: 11 cases covering Darwin-only guard, security command failures, JSON parsing, fallback priority. - tests/hermes_cli/test_anthropic_model_flow_stale_oauth.py: 8 cases covering stale OAuth detection, API key passthrough, cc_creds fallback. Refs: #12905	2026-04-24 07:14:00 -07:00
vlwkaos	f7f7588893	fix(agent): only set rate-limit cooldown when leaving primary; add tests	2026-04-24 05:35:43 -07:00
Teknium	ba44a3d256	fix(gemini): fail fast on missing API key + surface it in hermes dump (#15133 ) Two small fixes triggered by a support report where the user saw a cryptic 'HTTP 400 - Error 400 (Bad Request)!!1' (Google's GFE HTML error page, not a real API error) on every gemini-2.5-pro request. The underlying cause was an empty GOOGLE_API_KEY / GEMINI_API_KEY, but nothing in our output made that diagnosable: 1. hermes_cli/dump.py: the api_keys section enumerated 23 providers but omitted Google entirely, so users had no way to verify from 'hermes dump' whether the key was set. Added GOOGLE_API_KEY and GEMINI_API_KEY rows. 2. agent/gemini_native_adapter.py: GeminiNativeClient.__init__ accepted an empty/whitespace api_key and stamped it into the x-goog-api-key header, which made Google's frontend return a generic HTML 400 long before the request reached the Generative Language backend. Now we raise RuntimeError at construction with an actionable message pointing at GOOGLE_API_KEY/GEMINI_API_KEY and aistudio.google.com. Added a regression test that covers '', ' ', and None.	2026-04-24 05:35:17 -07:00
konsisumer	785d168d50	fix(credential_pool): add Nous OAuth cross-process auth-store sync Concurrent Hermes processes (e.g. cron jobs) refreshing a Nous OAuth token via resolve_nous_runtime_credentials() write the rotated tokens to auth.json. The calling process's pool entry becomes stale, and the next refresh against the already-rotated token triggers a 'refresh token reuse' revocation on the Nous Portal. _sync_nous_entry_from_auth_store() reads auth.json under the same lock used by resolve_nous_runtime_credentials, and adopts the newer token pair before refreshing the pool entry. This complements #15111 (which preserved the obtained_at timestamps through seeding). Partial salvage of #10160 by @konsisumer — only the agent/credential_pool.py changes + the 3 Nous-specific regression tests. The PR also touched 10 unrelated files (Dockerfile, tips.py, various tool tests) which were dropped as scope creep. Regression tests: - test_sync_nous_entry_from_auth_store_adopts_newer_tokens - test_sync_nous_entry_noop_when_tokens_match - test_nous_exhausted_entry_recovers_via_auth_store_sync	2026-04-24 05:20:05 -07:00
vominh1919	461899894e	fix: increment request_count in least_used pool strategy The least_used strategy selected entries via min(request_count) but never incremented the counter. All entries stayed at count=0, so the strategy degenerated to fill_first behavior with no actual load balancing. Now increments request_count after each selection and persists the update.	2026-04-24 05:20:05 -07:00
MestreY0d4-Uninter	7d2f93a97f	fix: set HOME for Copilot ACP subprocesses Pass an explicit HOME into Copilot ACP child processes so delegated ACP runs do not fail when the ambient environment is missing HOME. Prefer the per-profile subprocess home when available, then fall back to HOME, expanduser('~'), pwd.getpwuid(...), and /home/openclaw. Add regression tests for both profile-home preference and clean HOME fallback. Refs #11068.	2026-04-24 05:09:08 -07:00
Teknium	78450c4bd6	fix(nous-oauth): preserve obtained_at in pool + actionable message on RT reuse (#15111 ) Two narrow fixes motivated by #15099. 1. _seed_from_singletons() was dropping obtained_at, agent_key_obtained_at, expires_in, and friends when seeding device_code pool entries from the providers.nous singleton. Fresh credentials showed up with obtained_at=None, which broke downstream freshness-sensitive consumers (self-heal hooks, pool pruning by age) — they treated just-minted credentials as older than they actually were and evicted them. 2. When the Nous Portal OAuth 2.1 server returns invalid_grant with 'Refresh token reuse detected' in the error_description, rewrite the message to explain the likely cause (an external process consumed the rotated RT without persisting it back) and the mitigation. The generic reuse message led users to report this as a Hermes persistence bug when the actual trigger was typically a third-party monitoring script calling /api/oauth/token directly. Non-reuse errors keep their original server description untouched. Closes #15099. Regression tests: - tests/agent/test_credential_pool.py::test_nous_seed_from_singletons_preserves_obtained_at_timestamps - tests/hermes_cli/test_auth_nous_provider.py::test_refresh_token_reuse_detection_surfaces_actionable_message - tests/hermes_cli/test_auth_nous_provider.py::test_refresh_non_reuse_error_keeps_original_description	2026-04-24 05:08:46 -07:00
Teknium	3aa1a41e88	feat(gemini): block free-tier keys at setup + surface guidance on 429 (#15100 ) Google AI Studio's free tier (<= 250 req/day for gemini-2.5-flash) is exhausted in a handful of agent turns, so the setup wizard now refuses to wire up Gemini when the supplied key is on the free tier, and the runtime 429 handler appends actionable billing guidance. Setup-time probe (hermes_cli/main.py): - `_model_flow_api_key_provider` fires one minimal generateContent call when provider_id == 'gemini' and classifies the response as free/paid/unknown via x-ratelimit-limit-requests-per-day header or 429 body containing 'free_tier'. - Free -> print block message, refuse to save the provider, return. - Paid -> 'Tier check: paid' and proceed. - Unknown (network/auth error) -> 'could not verify', proceed anyway. Runtime 429 handler (agent/gemini_native_adapter.py): - `gemini_http_error` appends billing guidance when the 429 error body mentions 'free_tier', catching users who bypass setup by putting GOOGLE_API_KEY directly in .env. Tests: 21 unit tests for the probe + error path, 4 tests for the setup-flow block. All 67 existing gemini tests still pass.	2026-04-24 04:46:17 -07:00
Teknium	346601ca8d	fix(context): invalidate stale Codex OAuth cache entries >= 400k (#15078 ) PR #14935 added a Codex-aware context resolver but only new lookups hit the live /models probe. Users who had run Hermes on gpt-5.5 / 5.4 BEFORE that PR already had the wrong value (e.g. 1,050,000 from models.dev) persisted in ~/.hermes/context_length_cache.yaml, and the cache-first lookup in get_model_context_length() returns it forever. Symptom (reported in the wild by Ludwig, min heo, Gaoge on current main at `6051fba9d`, which is AFTER #14935): * Startup banner shows context usage against 1M * Compression fires late and then OpenAI hard-rejects with 'context length will be reduced from 1,050,000 to 128,000' around the real 272k boundary. Fix: when the step-1 cache returns a value for an openai-codex lookup, check whether it's >= 400k. Codex OAuth caps every slug at 272k (live probe values) so anything at or above 400k is definitionally a pre-#14935 leftover. Drop that entry from the on-disk cache and fall through to step 5, which runs the live /models probe and repersists the correct value (or 272k from the hardcoded fallback if the probe fails). Non-Codex providers and legitimately-cached Codex entries at 272k are untouched. Changes: - agent/model_metadata.py: * _invalidate_cached_context_length() — drop a single entry from context_length_cache.yaml and rewrite the file. * Step-1 cache check in get_model_context_length() now gates provider=='openai-codex' entries >= 400k through invalidation instead of returning them. Tests (3 new in TestCodexOAuthContextLength): - stale 1.05M Codex entry is dropped from disk AND re-resolved through the live probe to 272k; unrelated cache entries survive. - fresh 272k Codex entry is respected (no probe call, no invalidation). - non-Codex 1M entries (e.g. anthropic/claude-opus-4.6 on OpenRouter) are unaffected — the guard is strictly scoped to openai-codex. Full tests/agent/test_model_metadata.py: 88 passed.	2026-04-24 04:46:07 -07:00
Teknium	18f3fc8a6f	fix(tests): resolve 17 persistent CI test failures (#15084 ) Make the main-branch test suite pass again. Most failures were tests still asserting old shapes after recent refactors; two were real source bugs. Source fixes: - tools/mcp_tool.py: _kill_orphaned_mcp_children() slept 2s on every shutdown even when no tracked PIDs existed, making test_shutdown_is_parallel measure ~3s for 3 parallel 1s shutdowns. Early-return when pids is empty. - hermes_cli/tips.py: tip 105 was 157 chars; corpus max is 150. Test fixes (mostly stale mock targets / missing fixture fields): - test_zombie_process_cleanup, test_agent_cache: patch run_agent.cleanup_vm (the local name bound at import), not tools.terminal_tool.cleanup_vm. - test_browser_camofox: patch tools.browser_camofox.load_config, not hermes_cli.config.load_config (the source module, not the resolved one). - test_flush_memories_codex._chat_response_with_memory_call: add finish_reason, tool_call.id, tool_call.type so the chat_completions transport normalizer doesn't AttributeError. - test_concurrent_interrupt: polling_tool signature now accepts messages= kwarg that _invoke_tool() passes through. - test_minimax_provider: add _fallback_chain=[] to the __new__'d agent so switch_model() doesn't AttributeError. - test_skills_config: SKILLS_DIR MagicMock + .rglob stopped working after the scanner switched to agent.skill_utils.iter_skill_index_files (os.walk-based). Point SKILLS_DIR at a real tmp_path and patch agent.skill_utils.get_external_skills_dirs. - test_browser_cdp_tool: browser_cdp toolset was intentionally split into 'browser-cdp' (commit `96b0f3700`) so its stricter check_fn doesn't gate the whole browser toolset; test now expects 'browser-cdp'. - test_registry: add tools.browser_dialog_tool to the expected builtin-discovery set (PR #14540 added it). - test_file_tools TestPatchHints: patch_tool surfaces hints as a '_hint' key on the JSON payload, not inline '[Hint: ...' text. - test_write_deny test_hermes_env: resolve .env via get_hermes_home() so the path matches the profile-aware denylist under hermetic HERMES_HOME. - test_checkpoint_manager test_falls_back_to_parent: guard the walk-up so a stray /tmp/pyproject.toml on the host doesn't pick up /tmp as the project root. - test_quick_commands: set cli.session_id in the __new__'d CLI so the alias-args path doesn't trip AttributeError when fuzzy-matching leaks a skill command across xdist test distribution.	2026-04-24 03:46:46 -07:00
Teknium	1f9c368622	fix(gemini): drop integer/number/boolean enums from tool schemas (#15082 ) Gemini's Schema validator requires every `enum` entry to be a string, even when the parent `type` is integer/number/boolean. Discord's `auto_archive_duration` parameter (`type: integer, enum: [60, 1440, 4320, 10080]`) tripped this on every request that shipped the full tool catalog to generativelanguage.googleapis.com, surfacing as `Gateway: Non-retryable client error: Gemini HTTP 400 (INVALID_ARGUMENT) Invalid value ... (TYPE_STRING), 60` and aborting the turn. Sanitize by dropping the `enum` key when the declared type is numeric or boolean and any entry is non-string. The `type` and `description` survive, so the model still knows the allowed values; the tool handler keeps its own runtime validation. Other providers (OpenAI, OpenRouter, Anthropic) are unaffected — the sanitizer only runs for native Gemini / cloudcode adapters. Reported by @selfhostedsoul on Discord with hermes debug share.	2026-04-24 03:40:00 -07:00
Nicecsh	2e2de124af	fix(aux): normalize GitHub Copilot provider slugs Keep auxiliary provider resolution aligned with the switch and persisted main-provider paths when models.dev returns github-copilot slugs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-24 03:33:29 -07:00
Teknium	b2e124d082	refactor(commands): drop /provider, /plan handler, and clean up slash registry (#15047 ) * refactor(commands): drop /provider and clean up slash registry * refactor(commands): drop /plan special handler — use plain skill dispatch	2026-04-24 03:10:52 -07:00

1 2 3 4 5 ...

411 Commits