hermes-agent-features

Author	SHA1	Message	Date
Teknium	70d7f79bef	refactor(steer): simplify injection marker to 'User guidance:' prefix (#13340 ) The mid-run steer marker was '[USER STEER (injected mid-run, not tool output): <text>]'. Replaced with a plain two-newline-prefixed 'User guidance: <text>' suffix. Rationale: the marker lives inside the tool result's content string regardless of whether the tool returned JSON, plain text, an MCP result, or a plugin result. The bracketed tag read like structured metadata that some tools (terminal, execute_code) could confuse with their own output formatting. A plain labelled suffix works uniformly across every content shape we produce. Behavior unchanged: - Still injected into the last tool-role message's content. - Still preserves multimodal (Anthropic) content-block lists by appending a text block. - Still drained at both sites added in #12959 and #13205 — per-tool drain between individual calls, and pre-API-call drain at the top of each main-loop iteration. Checked Codex's equivalent (pending_input / inject_user_message_without_turn in codex-rs/core): they record mid-turn user input as a real role:user message via record_user_prompt_and_emit_turn_item(). That's cleaner for their Responses-API model but not portable to Chat Completions where role alternation after tool_calls is strict. Embedding the guidance in the last tool result remains the correct placement for us. Validation: all 21 tests in tests/run_agent/test_steer.py pass.	2026-04-20 22:18:49 -07:00
jerilynzheng	b117538798	feat: attribution default_headers for ai-gateway provider Requests through Vercel AI Gateway now carry referrerUrl / appName / User-Agent attribution so traffic shows up in the gateway's analytics. Adds _AI_GATEWAY_HEADERS in auxiliary_client and a new ai-gateway.vercel.sh branch in _apply_client_headers_for_base_url.	2026-04-20 21:02:28 -07:00
Teknium	999dc43899	fix(steer): drain pending steer before each API call, not just after tool execution (#13205 ) When /steer is sent during an API call (model thinking), the steer text sits in _pending_steer until after the next tool batch — which may never come if the model returns a final response. In that case the steer is only delivered as a post-run follow-up, defeating the purpose. Add a pre-API-call drain at the top of the main loop: before building api_messages, check _pending_steer and inject into the last tool result in the messages list. This ensures steers sent during model thinking are visible on the very next API call. If no tool result exists yet (first iteration), the steer is restashed for the post-tool drain to pick up — injecting into a user message would break role alternation. Three new tests cover the pre-API-call drain: injection into last tool result, restash when no tool message exists, and backward scan past non-tool messages.	2026-04-20 16:06:17 -07:00
Teknium	3cba81ebed	fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157 ) Kimi's gateway selects the correct temperature server-side based on the active mode (thinking -> 1.0, non-thinking -> 0.6). Sending any temperature value — even the previously "correct" one — conflicts with gateway-managed defaults. Replaces the old approach of forcing specific temperature values (0.6 for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel that tells all call sites to strip the temperature key from API kwargs entirely. Changes: - agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model() prefix check (covers all kimi-* models), _fixed_temperature_for_model() returns sentinel for kimi models. _build_call_kwargs() strips temp. - run_agent.py: _build_api_kwargs, flush_memories, and summary generation paths all handle the sentinel by popping/omitting temperature. - trajectory_compressor.py: _effective_temperature_for_model returns None for kimi (sentinel mapped), direct client calls use kwargs dict to conditionally include temperature. - mini_swe_runner.py: same sentinel handling via wrapper function. - 6 test files updated: all 'forces temperature X' assertions replaced with 'temperature not in kwargs' assertions. Net: -76 lines (171 added, 247 removed). Inspired by PR #13137 (@kshitijk4poor).	2026-04-20 12:23:05 -07:00
Teknium	9725b452a1	fix: extract _repair_tool_call_arguments helper, add tests, bound loop Follow-up for PR #12252 salvage: - Extract 75-line inline repair block to _repair_tool_call_arguments() module-level helper for testability and readability - Remove redundant 'import re as _re' (re already imported at line 33) - Bound the while-True excess-delimiter removal loop to 50 iterations - Add 17 tests covering all 6 repair stages - Add sirEven to AUTHOR_MAP in release.py	2026-04-20 05:12:55 -07:00
Sanjays2402	570f8bab8f	fix(compression): exclude completion tokens from compression trigger (#12026 ) Cherry-picked from PR #12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes #12026	2026-04-20 05:12:10 -07:00
Teknium	f683132c1d	feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969 ) OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision requests to the API server. Both endpoints accept the canonical OpenAI multimodal shape: Chat Completions: {type: text\|image_url, image_url: {url, detail?}} Responses: {type: input_text\|input_image, image_url: <str>, detail?} The server validates and converts both into a single internal shape that the existing agent pipeline already handles (Anthropic adapter converts, OpenAI-wire providers pass through). Remote http(s) URLs and data:image/* URLs are supported. Uploaded files (file, input_file, file_id) and non-image data: URLs are rejected with 400 unsupported_content_type. Changes: - gateway/platforms/api_server.py - _normalize_multimodal_content(): validates + normalizes both Chat and Responses content shapes. Returns a plain string for text-only content (preserves prompt-cache behavior on existing callers) or a canonical [{type:text\|image_url,...}] list when images are present. - _content_has_visible_payload(): replaces the bare truthy check so a user turn with only an image no longer rejects as 'No user message'. - _handle_chat_completions and _handle_responses both call the new helper for user/assistant content; system messages continue to flatten to text. - Codex conversation_history, input[], and inline history paths all share the same validator. No duplicated normalizers. - run_agent.py - _summarize_user_message_for_log(): produces a short string summary ('[1 image] describe this') from list content for logging, spinner previews, and trajectory writes. Fixes AttributeError when list user_message hit user_message[:80] + '...' / .replace(). - _chat_content_to_responses_parts(): module-level helper that converts chat-style multimodal content to Responses 'input_text'/'input_image' parts. Used in _chat_messages_to_responses_input for Codex routing. - _preflight_codex_input_items() now validates and passes through list content parts for user/assistant messages instead of stringifying. - tests/gateway/test_api_server_multimodal.py (new, 38 tests) - Unit coverage for _normalize_multimodal_content, including both part formats, data URL gating, and all reject paths. - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses verifying multimodal payloads reach _run_agent intact. - 400 coverage for file / input_file / non-image data URL. - tests/run_agent/test_run_agent_multimodal_prologue.py (new) - Regression coverage for the prologue no-crash contract. - _chat_content_to_responses_parts round-trip coverage. - website/docs/user-guide/features/api-server.md - Inline image examples for both endpoints. - Updated Limitations: files still unsupported, images now supported. Validated live against openrouter/anthropic/claude-opus-4.6: POST /v1/chat/completions → 200, vision-accurate description POST /v1/responses → 200, same image, clean output_text POST /v1/chat/completions [file] → 400 unsupported_content_type POST /v1/responses [input_file] → 400 unsupported_content_type POST /v1/responses [non-image data URL] → 400 unsupported_content_type Closes #5621, #8253, #4046, #6632. Co-authored-by: Paul Bergeron <paul@gamma.app> Co-authored-by: zhangxicen <zhangxicen@example.com> Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com> Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>	2026-04-20 04:16:13 -07:00
Teknium	4f24db4258	fix(compression): enforce 64k floor on aux model + auto-correct threshold (#12898 ) Context compression silently failed when the auxiliary compression model's context window was smaller than the main model's compression threshold (e.g. GLM-4.5-air at 131k paired with a 150k threshold). The feasibility check warned but the session kept running and compression attempts errored out mid-conversation. Two changes in _check_compression_model_feasibility(): 1. Hard floor: if detected aux context < MINIMUM_CONTEXT_LENGTH (64k), raise ValueError so the session refuses to start. Mirrors the existing main-model rejection at AIAgent.__init__ line 1600. A compression model below 64k cannot summarise a full threshold-sized window. 2. Auto-correct: when aux context is >= 64k but below the computed threshold, lower the live compressor's threshold_tokens to aux_context (and update threshold_percent to match so later update_model() calls stay in sync). Warning reworded to say what was done and how to persist the fix in config.yaml. Only ValueError re-raises; other exceptions in the check remain swallowed as non-fatal.	2026-04-20 00:56:04 -07:00
kshitijk4poor	e485bc60cd	test(kimi): cover api.moonshot.cn direct-call regressions\n\n- add run_agent coverage for the Moonshot China endpoint\n- add sync/async trajectory compressor coverage for api.moonshot.cn	2026-04-20 00:32:06 -07:00
Teknium	65a31ee0d5	fix(anthropic): complete third-party Anthropic-compatible provider support (#12846 ) Third-party gateways that speak the native Anthropic protocol (MiniMax, Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end with the same feature set as direct api.anthropic.com callers. Synthesizes eight stale community PRs into one consolidated change. Five fixes: - URL detection: consolidate three inline `endswith("/anthropic")` checks in runtime_provider.py into the shared _detect_api_mode_for_url helper. Third-party /anthropic endpoints now auto-resolve to api_mode=anthropic_messages via one code path instead of three. - OAuth leak-guard: all five sites that assign `_is_anthropic_oauth` (__init__, switch_model, _try_refresh_anthropic_client_credentials, _swap_credential, _try_activate_fallback) now gate on `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips Claude-Code identity injection on third-party endpoints. Previously only 2 of 5 sites were guarded. - Prompt caching: new method `_anthropic_prompt_cache_policy()` returns `(should_cache, use_native_layout)` per endpoint. Replaces three inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')` call-site flag. Native Anthropic and third-party Anthropic gateways both get the native cache_control layout; OpenRouter gets envelope layout. Layout is persisted in `_primary_runtime` so fallback restoration preserves the per-endpoint choice. - Auxiliary client: `_try_custom_endpoint` honors `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient` instead of silently downgrading to an OpenAI-wire client. Degrades gracefully to OpenAI-wire when the anthropic SDK isn't installed. - Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py) clears stale `api_key`/`api_mode` when switching to a built-in provider, so a previous MiniMax custom endpoint's credentials can't leak into a later OpenRouter session. - Truncation continuation: length-continuation and tool-call-truncation retry now cover `anthropic_messages` in addition to `chat_completions` and `bedrock_converse`. Reuses the existing `_build_assistant_message` path via `normalize_anthropic_response()` so the interim message shape is byte-identical to the non-truncated path. Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent, tests/agent, tests/hermes_cli all pass (4554 passed). Synthesized from (credits preserved via Co-authored-by trailers): #7410 @nocoo — URL detection helper #7393 @keyuyuan — OAuth 5-site guard #7367 @n-WN — OAuth guard (narrower cousin, kept comment) #8636 @sgaofen — caching helper + native-vs-proxy layout split #10954 @Only-Code-A — caching on anthropic_messages+Claude #7648 @zhongyueming1121 — aux client anthropic_messages branch #6096 @hansnow — /model switch clears stale api_mode #9691 @TroyMitchell911 — anthropic_messages truncation continuation Closes: #7366, #8294 (third-party Anthropic identity + caching). Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691. Rejects: #9621 (OpenAI-wire caching with incomplete blocklist — risky), #7242 (superseded by #9691, stale branch), #8321 (targets smart_model_routing which was removed in #12732). Co-authored-by: nocoo <nocoo@users.noreply.github.com> Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com> Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com> Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com> Co-authored-by: Only-Code-A <bxzt2006@163.com> Co-authored-by: zhongyueming <mygamez@163.com> Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com> Co-authored-by: Troy Mitchell <i@troy-y.org>	2026-04-19 22:43:09 -07:00
Teknium	c9b833feb3	fix(ci): unblock test suite + cut ~2s of dead Z.AI probes from every AIAgent CI on main had 7 failing tests. Five were stale test fixtures; one (agent cache spillover timeout) was covering up a real perf regression in AIAgent construction. The perf bug: every AIAgent.__init__ calls _check_compression_model_feasibility → resolve_provider_client('auto') → _resolve_api_key_provider which iterates PROVIDER_REGISTRY. When it hits 'zai', it unconditionally calls resolve_api_key_provider_credentials → _resolve_zai_base_url → probes 8 Z.AI endpoints with an empty Bearer token (all 401s), ~2s of pure latency per agent, even when the user has never touched Z.AI. Landed in `9e844160` (PR for credential-pool Z.AI auto-detect) — the short-circuit when api_key is empty was missing. _resolve_kimi_base_url had the same shape; fixed too. Test fixes: - tests/gateway/test_voice_command.py: _make_adapter helpers were missing self._voice_locks (added in PR #12644, 7 call sites — all updated). - tests/test_toolsets.py: test_hermes_platforms_share_core_tools asserted equality, but hermes-discord has discord_server (DISCORD_BOT_TOKEN-gated, discord-only by design). Switched to subset check. - tests/run_agent/test_streaming.py: test_tool_name_not_duplicated_when_resent_per_chunk missing api_key/base_url — classic pitfall (PR #11619 fixed 16 of these; this one slipped through on a later commit). - tests/tools/test_discord_tool.py: TestConfigAllowlist caplog assertions fail in parallel runs because AIAgent(quiet_mode=True) globally sets logging.getLogger('tools').setLevel(ERROR) and xdist workers are persistent. Autouse fixture resets the 'tools' and 'tools.discord_tool' levels per test. Validation: tests/cron + voice + agent_cache + streaming + toolsets + command_guards + discord_tool: 550/550 pass tests/hermes_cli + tests/gateway: 5713/5713 pass AIAgent construction without Z.AI creds: 2.2s → 0.24s (9x)	2026-04-19 19:18:19 -07:00
kshitijk4poor	50d6799389	fix: propagate kimi base-url temperature overrides Follow up salvaged PR #12668 by threading base_url through the remaining direct-call sites so kimi-k2.5 uses temperature=1.0 on api.moonshot.ai and keeps 0.6 on api.kimi.com/coding. Add focused regression tests for run_agent, trajectory_compressor, and mini_swe_runner.	2026-04-19 18:54:35 -07:00
Teknium	aa5bd09232	fix(tests): unstick CI — sweep stale tests from recent merges (#12670 ) One source fix (web_server category merge) + five test updates that didn't travel with their feature PRs. All 13 failures on the 04-19 CI run on main are now accounted for (5 already self-healed on main; 8 fixed here). Changes - web_server.py: add code_execution → agent to _CATEGORY_MERGE (new singleton section from #11971 broke no-single-field-category invariant). - test_browser_camofox_state: bump hardcoded _config_version 18 → 19 (also from #11971). - test_registry: add browser_cdp_tool (#12369) and discord_tool (#4753) to the expected built-in tool set. - test_run_agent::test_tool_call_accumulation: rewrite fragment chunks — #0f778f77 switched streaming name-accumulation from += to = to fix MiniMax/NIM duplication; the test still encoded the old fragment-per-chunk premise. - test_concurrent_interrupt::_Stub: no-op _apply_pending_steer_to_tool_results — #12116 added this call after concurrent tool batches; the hand-rolled stub was missing it. - test_codex_cli_model_picker: drop the two obsolete tests that asserted auto-import from ~/.codex/auth.json into the Hermes auth store. #12360 explicitly removed that behavior (refresh-token reuse races with Codex CLI / VS Code); adoption is now explicit via `hermes auth openai-codex`. Remaining 3 tests in the file (normal path, Claude Code fallback, negative case) still cover the picker. Validation - scripts/run_tests.sh across all 6 affected files + surrounding tests (54 tests total) all green locally.	2026-04-19 12:39:58 -07:00
Teknium	d48d6fadff	test(run_agent): pin proxy-env forwarding through keepalive transport Adds a regression guard for the #11277 → proxy-bypass regression fixed in 42b394c3. With HTTPS_PROXY / HTTP_PROXY / ALL_PROXY set, the custom httpx transport used for TCP keepalives must still route requests through an HTTPProxy pool; without proxy env, no HTTPProxy mount should exist. Also maps zrc <zhurongcheng@rcrai.com> → heykb in scripts/release.py AUTHOR_MAP so the salvage PR passes the author-attribution CI check.	2026-04-19 11:44:43 -07:00
Teknium	f1fe29d1c3	feat(providers): extend request_timeout_seconds to all client paths Follow-up on top of mvanhorn's cherry-picked commit. Original PR only wired request_timeout_seconds into the explicit-creds OpenAI branch at run_agent.py init; router-based implicit auth, native Anthropic, and the fallback chain were still hardcoded to SDK defaults. - agent/anthropic_adapter.py: build_anthropic_client() accepts an optional timeout kwarg (default 900s preserved when unset/invalid). - run_agent.py: resolve per-provider/per-model timeout once at init; apply to Anthropic native init + post-refresh rebuild + stale/interrupt rebuilds + switch_model + _restore_primary_runtime + the OpenAI implicit-auth path + _try_activate_fallback (with immediate client rebuild so the first fallback request carries the configured timeout). - tests: cover anthropic adapter kwarg honoring; widen mock signatures to accept the new timeout kwarg. - docs/example: clarify that the knob now applies to every transport, the fallback chain, and rebuilds after credential rotation.	2026-04-19 11:23:00 -07:00
kshitijk4poor	7bd1a3a4b1	test(compression): cover real init feasibility override	2026-04-19 10:40:26 -07:00
kshitijk4poor	045b28733e	fix(compression): resolve missing config attribute in feasibility check Commit `4a9c3565` added a reference to `self.config` in `_check_compression_model_feasibility()` to pass the user-configured `auxiliary.compression.context_length` to `get_model_context_length()`. However, `AIAgent` never stores the loaded config dict as an instance attribute — the config is loaded into a local variable `_agent_cfg` in `__init__()` and discarded after init. This causes an `AttributeError: 'AIAgent' object has no attribute 'config'` on every session start when compression is enabled, caught by the try/except and logged as a non-fatal DEBUG message. Fix: store the loaded config as `self._config` in `__init__()` and update the reference in the feasibility check to use `self._config`.	2026-04-19 10:40:26 -07:00
helix4u	cd59af17cc	fix(agent): silence quiet_mode in python library use	2026-04-19 00:28:25 -07:00
helix4u	7b1a11b971	fix(memory): keep Honcho provider opt-in	2026-04-18 22:50:55 -07:00
Tranquil-Flow	ec48ec5530	fix(agent): strip <think> blocks from stored assistant content Inline reasoning tags in an assistant message's content field leak to every downstream consumer: messaging platforms (#8878, #9568), API replay of prior turns, session transcript, CLI recap, generated session titles, and context compression. _extract_reasoning() already captures the reasoning text into msg['reasoning'] separately, so the raw tags in content are redundant. Stripping once at the storage boundary in _build_assistant_message() cleans the content for every downstream path in one place — no per-platform or per-path stripper needed. Measured impact on a real MiniMax M2.7-highspeed session (per @luoyejiaoe-source, #9306): 55% of assistant messages started with <think> blocks, 51/100 session titles were polluted, 16% content-size reduction. 3 new regression tests in TestBuildAssistantMessage: closed-pair strip with reasoning capture, no-think-tag passthrough, and unterminated-block strip. Resolves #8878 and #9568. Originally proposed as PR #9250.	2026-04-18 19:19:24 -07:00
Teknium	9489d1577d	fix(agent): strip unterminated <think> blocks from visible content Providers served via NIM (MiniMax M2.7, some Moonshot/DeepSeek proxies) sometimes drop the closing </think> tag, leaving raw reasoning in the assistant's content field. _strip_think_blocks()'s closed-pair regex is non-greedy so it only matches complete blocks — any orphan <think>...EOF survived the stripper and leaked to users (#8878, #9568, #10408). Adds an unterminated-tag pass that fires when an open reasoning tag sits at a block boundary (start of text or after a newline) with no matching close. Everything from that tag to end of string is stripped. The block-boundary check mirrors gateway/stream_consumer.py's filter so models that mention <think> in prose are not over-stripped. Also makes the closed-pair regexes consistently case-insensitive so <THINK>...</THINK> and <Thinking>...</Thinking> are handled uniformly — previously the mixed-case open tag would bypass the closed-pair pass and be caught by the unterminated-tag pass, taking trailing visible content with it. 6 new regression tests in TestStripThinkBlocks covering: unterminated <think>, unterminated <thought>, multi-line unterminated, line-start orphan with preserved prefix, prose-mention non-regression, mixed-case closed pairs. The implementation is inspired by @luinbytes's PR #10408 report of the NIM/MiniMax symptom. This commit does not include the 💭/🧠 emoji regexes from that PR — those glyphs are Hermes CLI display decorations, not model content markers.	2026-04-18 19:19:24 -07:00
jarvischer	0f778f7768	fix: prevent tool name duplication in streaming accumulator (MiniMax/NVIDIA NIM) Based on #11984 by @maxchernin. Fixes #8259. Some providers (MiniMax M2.7 via NVIDIA NIM) resend the full function name in every streaming chunk instead of only the first. The old accumulator used += which concatenated them into 'read_fileread_file'. Changed to simple assignment (=), matching the OpenAI Node SDK, LiteLLM, and Vercel AI SDK patterns. Function names are atomic identifiers delivered complete — no provider splits them across chunks, so concatenation was never correct semantics.	2026-04-18 12:50:32 -07:00
Teknium	2edebedc9e	feat(steer): /steer <prompt> injects a mid-run note after the next tool call (#12116 ) * feat(steer): /steer <prompt> injects a mid-run note after the next tool call Adds a new slash command that sits between /queue (turn boundary) and interrupt. /steer <text> stashes the message on the running agent and the agent loop appends it to the LAST tool result's content once the current tool batch finishes. The model sees it as part of the tool output on its next iteration. No interrupt is fired, no new user turn is inserted, and no prompt cache invalidation happens beyond the normal per-turn tool-result churn. Message-role alternation is preserved — we only modify an existing role:"tool" message's content. Wiring ------ - hermes_cli/commands.py: register /steer + add to ACTIVE_SESSION_BYPASS_COMMANDS. - run_agent.py: add _pending_steer state, AIAgent.steer(), _drain_pending_steer(), _apply_pending_steer_to_tool_results(); drain at end of both parallel and sequential tool executors; clear on interrupt; return leftover as result['pending_steer'] if the agent exits before another tool batch. - cli.py: /steer handler — route to agent.steer() when running, fall back to the regular queue otherwise; deliver result['pending_steer'] as next turn. - gateway/run.py: running-agent intercept calls running_agent.steer(); idle-agent path strips the prefix and forwards as a regular user message. - tui_gateway/server.py: new session.steer JSON-RPC method. - ui-tui: SessionSteerResponse type + local /steer slash command that calls session.steer when ui.busy, otherwise enqueues for the next turn. Fallbacks --------- - Agent exits mid-steer → surfaces in run_conversation result as pending_steer so CLI/gateway deliver it as the next user turn instead of silently dropping it. - All tools skipped after interrupt → re-stashes pending_steer for the caller. - No active agent → /steer reduces to sending the text as a normal message. Tests ----- - tests/run_agent/test_steer.py — accept/reject, concatenation, drain, last-tool-result injection, multimodal list content, thread safety, cleared-on-interrupt, registry membership, bypass-set membership. - tests/gateway/test_steer_command.py — running agent, pending sentinel, missing steer() method, rejected payload, empty payload. - tests/gateway/test_command_bypass_active_session.py — /steer bypasses the Level-1 base adapter guard. - tests/test_tui_gateway_server.py — session.steer RPC paths. 72/72 targeted tests pass under scripts/run_tests.sh. * feat(steer): register /steer in Discord's native slash tree Discord's app_commands tree is a curated subset of slash commands (not derived from COMMAND_REGISTRY like Telegram/Slack). /steer already works there as plain text (routes through handle_message → base adapter bypass → runner), but registering it here adds Discord's native autocomplete + argument hint UI so users can discover and type it like any other first-class command.	2026-04-18 04:17:18 -07:00
Teknium	8322b42c6c	fix(streaming): surface dropped tool-call on mid-stream stall (#12072 ) When streaming died after text was already delivered to the user but before a tool-call's arguments finished streaming, the partial-stream stub at the end of _interruptible_streaming_api_call silently set `tool_calls=None` on the returned message and kept `finish_reason=stop`. The agent treated the turn as complete, the session exited cleanly with code 0, and the attempted action was lost with zero user-facing signal. Live-observed Apr 2026 with MiniMax M2.7 on a ~6-minute audit task: agent streamed 'Let me write the audit:', started emitting a write_file tool call, MiniMax stalled for 240s mid-arguments, the stale-stream detector killed the connection, the stub fired, session ended, no file written, no error shown. Fix: the streaming accumulator now records each tool-call's name into `result['partial_tool_names']` as soon as the name is known. When the stub builder fires after a partial delivery and finds any recorded tool names, it appends a human-visible warning to the stub's content — and also fires it as a live stream delta so the user sees it immediately, not only in the persisted transcript. The next turn's model also sees the warning in conversation history and can retry on its own. Text-only partial streams keep the original bare-recovery behaviour (no warning). Validation: \| Scenario \| Before \| After \| \|---------------------------------------------\|---------------------------\|---------------------------------------------\| \| Stream dies mid tool-call, text already sent \| Silent exit, no indication \| User sees ⚠ warning naming the dropped tool \| \| Text-only partial stream \| Bare recovered text \| Unchanged \| \| tests/run_agent/test_streaming.py \| 24 passed \| 26 passed (2 new) \|	2026-04-18 01:52:06 -07:00
Teknium	20f2258f34	fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907 ) * fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace interrupt() previously only flagged the agent's _execution_thread_id. Tools running inside _execute_tool_calls_concurrent execute on ThreadPoolExecutor worker threads whose tids are distinct from the agent's, so is_interrupted() inside those tools returned False no matter how many times the gateway called .interrupt() — hung ssh / curl / long make-builds ran to their own timeout. Changes: - run_agent.py: track concurrent-tool worker tids in a per-agent set, fan interrupt()/clear_interrupt() out to them, and handle the register-after-interrupt race at _run_tool entry. getattr fallback for the tracker so test stubs built via object.__new__ keep working. - tools/environments/base.py: opt-in _wait_for_process trace (ENTER, per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1. - tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target tid, set snapshot) behind the same env flag. - tests: new regression test runs a polling tool on a concurrent worker and asserts is_interrupted() flips to True within ~1s of interrupt(). Second new test guards clear_interrupt() clearing tracked worker bits. Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env subset 216 pass. * fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when quiet_mode=True (the CLI default). This would silently swallow every INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation added in the parent commit — confirmed by running hermes chat -q with the flag and finding zero trace lines in agent.log even though _wait_for_process was clearly executing (subprocess pid existed). Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets its own logger level to INFO at import time, overriding the 'tools' parent-level filter. Scoped to the opt-in case only, so production (quiet_mode default) logs stay quiet as designed. Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes '_wait_for_process ENTER/EXIT' lines to agent.log as expected. * fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses Tool subprocesses spawned by the local environment backend use os.setsid so they run in their own process group. Before this fix, SIGTERM/SIGHUP to the hermes CLI killed the main thread via KeyboardInterrupt but the worker thread running _wait_for_process never got a chance to call _kill_process — Python exited, the child was reparented to init (PPID=1), and the subprocess ran to its natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM to the agent until manual cleanup). Changes: - cli.py _signal_handler (interactive) + _signal_handler_q (-q mode): route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll loop sees the per-thread interrupt flag and calls _kill_process (os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default 1.5s) gives the worker time to complete its SIGTERM+SIGKILL escalation before KeyboardInterrupt unwinds main. - tools/environments/base.py _wait_for_process: wrap the poll loop in try/except (KeyboardInterrupt, SystemExit) so the cleanup fires even on paths the signal handlers don't cover (direct sys.exit, unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace line when HERMES_DEBUG_INTERRUPT=1. - New regression test: injects KeyboardInterrupt into a running _wait_for_process via PyThreadState_SetAsyncExc, verifies the subprocess process group is dead within 3s of the exception and that KeyboardInterrupt re-raises cleanly afterward. Validation: \| Before \| After \| \|---------------------------------------------------------\|--------------------\| \| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM \| dies within 2 s \| \| No INTERRUPT DETECTED in trace \| INTERRUPT DETECTED fires + killing process group \| \| tests/tools/test_local_interrupt_cleanup \| 1/1 pass \| \| tests/run_agent/test_concurrent_interrupt \| 4/4 pass \|	2026-04-17 20:39:25 -07:00
helix4u	016ae5c334	fix(kimi): force 0.6 on main chat path	2026-04-17 18:47:01 -07:00
Teknium	3207b9bda0	test: speed up slow tests (backoff + subprocess + IMDS network) (#11797 ) Cuts shard-3 local runtime in half by neutralizing real wall-clock waits across three classes of slow test: ## 1. Retry backoff mocks - tests/run_agent/conftest.py (NEW): autouse fixture mocks jittered_backoff to 0.0 so the `while time.time() < sleep_end` busy-loop exits immediately. No global time.sleep mock (would break threading tests). - test_anthropic_error_handling, test_413_compression, test_run_agent_codex_responses, test_fallback_model: per-file fixtures mock time.sleep / asyncio.sleep for retry / compression paths. - test_retaindb_plugin: cap the retaindb module's bound time.sleep to 0.05s via a per-test shim (background writer-thread retries sleep 2s after errors; tests don't care about exact duration). Plus replace arbitrary time.sleep(N) waits with short polling loops bounded by deadline. ## 2. Subprocess sleeps in production code - test_update_gateway_restart: mock time.sleep. Production code does time.sleep(3) after `systemctl restart` to verify the service survived. Tests mock subprocess.run \u2014 nothing actually restarts \u2014 so the wait is dead time. ## 3. Network / IMDS timeouts (biggest single win) - tests/conftest.py: add AWS_EC2_METADATA_DISABLED=true plus AWS_METADATA_SERVICE_TIMEOUT=1 and ATTEMPTS=1. boto3 falls back to IMDS (169.254.169.254) when no AWS creds are set. Any test hitting has_aws_credentials() / resolve_aws_auth_env_var() (e.g. test_status, test_setup_copilot_acp, anything that touches provider auto-detect) burned ~2-4s waiting for that to time out. - test_exit_cleanup_interrupt: explicitly mock resolve_runtime_provider which was doing real network auto-detect (~4s). Tests don't care about provider resolution \u2014 the agent is already mocked. - test_timezone: collapse the 3-test "TZ env in subprocess" suite into 2 tests by checking both injection AND no-leak in the same subprocess spawn (was 3 \u00d7 3.2s, now 2 \u00d7 4s). ## Validation \| Test \| Before \| After \| \|---\|---\|---\| \| test_anthropic_error_handling (8 tests) \| ~80s \| ~15s \| \| test_413_compression (14 tests) \| ~18s \| 2.3s \| \| test_retaindb_plugin (67 tests) \| ~13s \| 1.3s \| \| test_status_includes_tavily_key \| 4.0s \| 0.05s \| \| test_setup_copilot_acp_skips_same_provider_pool_step \| 8.0s \| 0.26s \| \| test_update_gateway_restart (5 tests) \| ~18s total \| ~0.35s total \| \| test_exit_cleanup_interrupt (2 tests) \| 8s \| 1.5s \| \| Matrix shard 3 local \| 108s \| 50s \| No behavioral contract changed \u2014 tests still verify retry happens, service restart logic runs, etc.; they just don't burn real seconds waiting for it. Supersedes PR #11779 (those changes are included here).	2026-04-17 14:21:22 -07:00
Teknium	d0e1388ca9	fix(tests): make AIAgent constructor calls self-contained (#11755 ) * fix(tests): make AIAgent constructor calls self-contained (no env leakage) Tests in tests/run_agent/ were constructing AIAgent() without passing both api_key and base_url, then relying on leaked state from other tests in the same xdist worker (or process-level env vars) to keep provider resolution happy. Under hermetic conftest + pytest-split, that state is gone and the tests fail with 'No LLM provider configured'. Fix: pass both api_key and base_url explicitly on 47 AIAgent() construction sites across 13 files. AIAgent.__init__ with both set takes the direct-construction path (line 960 in run_agent.py) and skips the resolver entirely. One call site (test_none_base_url_passed_as_none) left alone — that test asserts behavior for base_url=None specifically. This is a prerequisite for any future matrix-split or stricter isolation work, and lands cleanly on its own. Validation: - tests/run_agent/ full: 760 passed, 0 failed (local) - Previously relied on cross-test pollution; now self-contained * fix(tests): update opencode-go model order assertion to match kimi-k2.5-first commit `78a74bb` promoted kimi-k2.5 to first position in model suggestion lists but didn't update this test, which has been failing on main since. Reorder expected list to match the new canonical order.	2026-04-17 12:32:03 -07:00
Teknium	24fa055763	fix(ci): resolve 4 pre-existing main failures (docs lint + 3 stale tests) (#11373 ) * docs: fix ascii-guard border alignment errors Three docs pages had ASCII diagram boxes with off-by-one column alignment issues that failed docs-site-checks CI: - architecture.md: outer box is 71 cols but inner-box content lines and border corners were offset by 1 col, making content-line right border at col 70/72 while top/bottom border was at col 71. Inner boxes also had border corners at cols 19/36/53 but content pipes at cols 20/37/54. Rewrote the diagram with consistent 71-col width throughout, aligned inner boxes at cols 4-19, 22-37, 40-55 with 2-space gaps and 15-space trailing padding. - gateway-internals.md: same class of issue — outer box at 51 cols, inner content lines varied 52-54 cols. Rewrote with consistent 51-col width, inner boxes at cols 4-15, 18-29, 32-43. Also restructured the bottom-half message flow so it's bare text (not half-open box cells) matching the intent of the original. - agent-loop.md line 112-114: box 2 (API thread) content lines had one extra space pushing the right border to col 46 while the top and bottom borders of that box sat at col 45. Trimmed one trailing space from each of the three content lines. All 123 docs files now pass `npm run lint:diagrams`: ✓ Errors: 0 (warnings: 6, non-fatal) Pre-existing failures on main — unrelated to any open PR. * test(setup): accept description kwarg in prompt_choice mock lambdas setup.py's `_curses_prompt_choice` gained an optional `description` parameter (used for rendering context hints alongside the prompt). `prompt_choice` forwards it via keyword arg. The two existing tests mocked `_curses_prompt_choice` with lambdas that didn't accept the new kwarg, so the forwarded call raised TypeError. Fix: add `description=None` to both mock lambda signatures so they absorb the new kwarg without changing behavior. * test(matrix): update stale audio-caching assertion test_regular_audio_has_http_url asserted that non-voice audio messages keep their HTTP URL and are NOT downloaded/cached. That was true when the caching code only triggered on `is_voice_message`. Since `bec02f37` (encrypted-media caching refactor), matrix.py caches all media locally — photos, audio, video, documents — so downstream tools can read them as real files via media_urls. This applies to regular audio too. Renamed the test to `test_regular_audio_is_cached_locally`, flipped the assertions accordingly, and documented the intentional behavior change in the docstring. Other tests in the file (voice-specific caching, message-type detection, reply-to threading) continue to pass. * test(413): allow multi-pass preflight compression run_agent.py's preflight compression runs up to 3 passes in a loop for very large sessions (each pass summarizes the middle N turns, then re-checks tokens). The loop breaks when a pass returns a message list no shorter than its input (can't compress further). test_preflight_compresses_oversized_history used a static mock return value that returned the same 2 messages regardless of input, so the loop ran pass 1 (41 -> 2) and pass 2 (2 -> 2 -> break), making call_count == 2. The assert_called_once() assertion was strictly wrong under the multi-pass design. The invariant the test actually cares about is: preflight ran, and its first invocation received the full oversized history. Replaced the count assertion with those two invariants. * docs: drop '...' from gateway diagram, merge side-by-side boxes ascii-guard 2.3.0 flagged two remaining issues after the initial fix pass: 1. gateway-internals.md L33: the '...' suffix after inner box 3's right border got parsed as 'extra characters after inner-box right border'. Dropped the '...' — the surrounding prose already conveys 'and more platforms' without needing the visual hint. 2. agent-loop.md: ascii-guard can't cleanly parse two side-by-side boxes of different heights (main thread 7 rows, API thread 5 rows). Even equalizing heights didn't help — the linter treats the left box's right border as the end of the diagram. Merged into a single 54-char-wide outer box with both threads labeled as regions inside, keeping the ▶ arrow to preserve the main→API flow direction.	2026-04-16 20:43:41 -07:00
Teknium	5797728ca6	test: regression guards for the keepalive/transport bug class (#10933 ) (#11266 ) Two new tests in tests/run_agent/ that pin the user-visible invariant behind AlexKucera's Discord report (2026-04-16): no matter how a future keepalive / transport fix for #10324 plumbs sockets in, sequential chats on the same AIAgent instance must all succeed. test_create_openai_client_reuse.py (no network, runs in CI): - test_second_create_does_not_wrap_closed_transport_from_first back-to-back _create_openai_client calls must not hand the same http_client (after an SDK close) to the second construction - test_replace_primary_openai_client_survives_repeated_rebuilds three sequential rebuilds via the real _replace_primary_openai_client entrypoint must each install a live client test_sequential_chats_live.py (opt-in, HERMES_LIVE_TESTS=1): - test_three_sequential_chats_across_client_rebuild real OpenRouter round trips, with an explicit _replace_primary_openai_client call between turns 2 and 3. Error-sentinel detector treats 'API call failed after 3 retries' replies as failures instead of letting them pass the naive truthy check (which is how a first draft of this test missed the bug it was meant to catch). Validation: clean main (post-revert, defensive copy present) -> all 4 tests PASS broken #10933 state (keepalive injection, no defensive copy) -> all 4 tests FAIL with precise messages pointing at #10933 Companion to taeuk178's test_create_openai_client_kwargs_isolation.py, which pins the syntactic 'don't mutate input dict' half of the same contract. Together they catch both the specific mechanism of #10933 and any other reimplementation that breaks the sequential-call invariant.	2026-04-16 16:36:33 -07:00
taeuk178	896e7b03e8	fix(run_agent): prevent _create_openai_client from mutating caller kwargs Shallow-copy client_kwargs at the top of _create_openai_client() to prevent in-place mutation from leaking back into self._client_kwargs. Defensive fix that locks the contract for future httpx/transport work. Cherry-picked from #10978 by @taeuk178.	2026-04-16 07:45:22 -07:00
Teknium	77bdad5b02	fix(tests): resolve 12 CI failures + 10 errors across 6 root causes (#11040 ) Group A (3 tests): 'No LLM provider configured' RuntimeError - test_user_message_surrogates_sanitized, test_counters_initialized_in_init, test_openai_prompt_tokens_unchanged - Root cause: AIAgent.__init__ now requires base_url alongside api_key to skip resolve_provider_client() (which returns None when API keys are blanked in CI). Added base_url='http://localhost:1234/v1' to test agent construction. Group B (5 tests): Discord slash command auto-registration - test_auto_registers_missing_gateway_commands, test_auto_registered_command_, test_register_skill_group_ - Root cause: xdist workers that loaded a discord mock WITHOUT app_commands.Command/Group caused _register_slash_commands() to fail silently. Added comprehensive shared discord mock in tests/gateway/conftest.py (same pattern as existing telegram mock). Group C (5 errors): Discord reply mode 'NoneType has no DMChannel' - All TestReplyToText tests - Root cause: FakeDMChannel was not a subclass of real discord.DMChannel, so isinstance() checks in _handle_message failed when running in full suite (real discord installed). Made FakeDMChannel inherit from discord.DMChannel when available. Removed fragile monkeypatch approach. Group D (2 tests): detect_provider_for_model wrong provider - test_openrouter_slug_match (got 'ai-gateway'), test_bare_name_gets_ openrouter_slug (got 'copilot') - Root cause: ai-gateway, copilot, and kilocode are multi-vendor aggregators that list other providers' models (OpenRouter-style slugs). They were being matched in Step 1 before OpenRouter. Added all three to _AGGREGATORS set so they're skipped like nous/openrouter. Group E (1 test): model_flow_custom StopIteration - test_model_flow_custom_saves_verified_v1_base_url - Root cause: 'Display name' prompt was added after the test was written. The input iterator had 5 answers but the flow now asks 6 questions. Added 6th empty string answer. Group F (1 test): Telegram proxy env assertion - test_uses_proxy_env_for_primary_and_fallback_transports - Root cause: _resolve_proxy_url() now checks TELEGRAM_PROXY first (via resolve_proxy_url('TELEGRAM_PROXY')). Test didn't clear this env var, allowing potential leakage from other tests in xdist workers. Added TELEGRAM_PROXY to the cleanup list.	2026-04-16 06:49:36 -07:00
Teknium	fe12042e50	fix: remove context pressure warnings entirely (#11039 ) The gateway compression notifications were already removed in commit `cc63b2d1` (PR #4139), but the agent-level context pressure warnings (85%/95% tiered alerts via _emit_context_pressure) were still firing on both CLI and gateway. Removed: - _emit_context_pressure method and all call sites in run_conversation() - Class-level dedup state (_context_pressure_last_warned, _CONTEXT_PRESSURE_COOLDOWN) - Instance attribute _context_pressure_warned_at - Pressure reset logic in _compress_context - format_context_pressure and format_context_pressure_gateway from agent/display.py - Orphaned ANSI constants that only served these functions - tests/run_agent/test_context_pressure.py (all 361 lines) Compression itself continues to run silently in the background. Closes #3784	2026-04-16 06:44:23 -07:00
Teknium	333cb8251b	fix: improve interrupt responsiveness during concurrent tool execution and follow-up turns (#10935 ) Three targeted fixes for the 'agent stuck on terminal command' report: 1. Concurrent tool wait loop now checks interrupts (run_agent.py) The sequential path checked _interrupt_requested before each tool call, but the concurrent path's wait loop just blocked with 30s timeouts. Now polls every 5s and cancels pending futures on interrupt, giving already-running tools 3s to notice the per-thread interrupt signal. 2. Cancelled concurrent tools get proper interrupt messages (run_agent.py) When a concurrent tool is cancelled or didn't return a result due to interrupt, the tool result message says 'skipped due to user interrupt' instead of a generic error. 3. Typing indicator fires before follow-up turn (gateway/run.py) After an interrupt is acknowledged and the pending message dequeued, the gateway now sends a typing indicator before starting the recursive _run_agent call. This gives the user immediate visual feedback that the system is processing their new message (closing the perceived 'dead air' gap between the interrupt ack and the response). Reported by @_SushantSays.	2026-04-16 02:44:56 -07:00
Teknium	5c397876b9	fix(cli): hint about /v1 suffix when configuring local model endpoints When a user enters a local model server URL (Ollama, vLLM, llama.cpp) without a /v1 suffix during 'hermes model' custom endpoint setup, prompt them to add it. Most OpenAI-compatible local servers require /v1 in the base URL for chat completions to work.	2026-04-16 02:22:09 -07:00
Mibayy	3522a7aa13	feat(ollama): pass think=false to custom providers when reasoning_effort is none When a custom/Ollama provider is used and reasoning_effort is set to 'none' (or enabled: false), inject 'think': false into the request extra_body. Ollama does not recognise the OpenRouter-style 'reasoning' extra_body field, so thinking-capable models (Qwen3, etc.) generate <think> blocks regardless of the reasoning_effort setting. This produces empty-response errors that corrupt session state. The fix adds a provider-specific block in _build_api_kwargs() that sets think=false in extra_body whenever self.provider == 'custom' and reasoning is explicitly disabled. Closes #3191	2026-04-16 02:22:09 -07:00
LeonSGP43	8011aa31ba	fix(agent): continue ollama glm truncation replies	2026-04-16 02:22:09 -07:00
kshitijk4poor	ff5bf0d6c8	fix(tests): resolve CI test failures — pool auto-seeding, stale assertions, mock isolation Salvaged from PR #10643 by kshitijk4poor, updated for current main. Root causes fixed: 1. Telegram xdist mock pollution — new tests/gateway/conftest.py with shared mock that runs at collection time (prevents ChatType=None caching) 2. VIRTUAL_ENV env var leak — monkeypatch.delenv in _detect_venv_dir tests 3. Copilot base_url missing — add fallback in _resolve_runtime_from_pool_entry 4. Stale vision model assertion — zai now uses glm-5v-turbo 5. Reasoning item id intentionally stripped — assert 'id' not in (store=False) 6. Context length warning unreachable — pass base_url to AIAgent in test 7. Kimi provider label updated — 'Kimi / Kimi Coding Plan' matches models.py 8. Google Workspace calendar tests — rewritten for current production code, properly mock subprocess on api_module, removed stale +agenda assertions 9. Credential pool auto-seeding — mock _select_pool_entry / _resolve_auto / _import_codex_cli_tokens to prevent real credentials from leaking into tests	2026-04-15 22:05:21 -07:00
Teknium	cc6e8941db	feat(honcho): context injection overhaul, 5-tool surface, cost safety, session isolation (#10619 ) Salvaged from PR #9884 by erosika. Cherry-picked plugin changes onto current main with minimal core modifications. Plugin changes (plugins/memory/honcho/): - New honcho_reasoning tool (5th tool, splits LLM calls from honcho_context) - Two-layer context injection: base context (summary + representation + card) on contextCadence, dialectic supplement on dialecticCadence - Multi-pass dialectic depth (1-3 passes) with early bail-out on strong signal - Cold/warm prompt selection based on session state - dialecticCadence defaults to 3 (was 1) — ~66% fewer Honcho LLM calls - Session summary injection for conversational continuity - Bidirectional peer targeting on all 5 tools - Correctness fixes: peer param fallback, None guard on set_peer_card, schema validation, signal_sufficient anchored regex, mid->medium level fix Core changes (~20 lines across 3 files): - agent/memory_manager.py: Enhanced sanitize_context() to strip full <memory-context> blocks and system notes (prevents leak from saveMessages) - run_agent.py: gateway_session_key param for stable per-chat Honcho sessions, on_turn_start() call before prefetch_all() for cadence tracking, sanitize_context() on user messages to strip leaked memory blocks - gateway/run.py: skip_memory=True on 2 temp agents (prevents orphan sessions), gateway_session_key threading to main agent Tests: 509 passed (3 skipped — honcho SDK not installed locally) Docs: Updated honcho.md, memory-providers.md, tools-reference.md, SKILL.md Co-authored-by: erosika <erosika@users.noreply.github.com>	2026-04-15 19:12:19 -07:00
helix4u	96cc556055	fix(copilot): preserve base URL and gpt-5-mini routing	2026-04-15 15:04:14 -07:00
Teknium	93b6f45224	fix: always retry on ASCII codec UnicodeEncodeError — don't gate on per-component sanitization The recovery block previously only retried (continue) when one of the per-component sanitization checks (messages, tools, system prompt, headers, credentials) found and stripped non-ASCII content. When the non-ASCII lived only in api_messages' reasoning_content field (which is built from messages['reasoning'] and not checked by the original _sanitize_messages_non_ascii), all checks returned False and the recovery fell through to the normal error path — burning a retry attempt despite _force_ascii_payload being set. Now the recovery always continues (retries) when _is_ascii_codec is detected. The _force_ascii_payload flag guarantees the next iteration runs _sanitize_structure_non_ascii(api_kwargs) on the full API payload, catching any remaining non-ASCII regardless of where it lives. Also adds test for the 'reasoning' field on canonical messages. Fixes #6843	2026-04-15 15:03:28 -07:00
MestreY0d4-Uninter	efd1ddc6e1	fix: sanitize api_messages and extra string fields during ASCII-codec recovery (#6843 ) The ASCII-locale recovery path in run_agent.py sanitized the canonical 'messages' list but left 'api_messages' untouched. api_messages is a separate API-copy built before the retry loop and may carry extra fields (reasoning_content, extra_body entries) that are not present in 'messages'. This caused the retry to still raise UnicodeEncodeError even after the 'System encoding is ASCII — stripped...' log line appeared. Two changes: - _sanitize_messages_non_ascii now walks all extra top-level string fields in each message dict (any key not in {content, name, tool_calls, role}) so reasoning_content and future extras are cleaned in both 'messages' and 'api_messages'. - The ASCII-codec recovery block now also calls sanitize on api_messages and api_kwargs so no non-ASCII survives into the next retry attempt. Adds regression tests covering: - reasoning_content with non-ASCII in api_messages - extra_body with non-ASCII in api_kwargs - canonical messages clean but api_messages dirty Fixes #6843	2026-04-15 15:03:28 -07:00
helix4u	93f6f66872	fix(interrupt): preserve pre-start terminal interrupts	2026-04-15 13:29:57 -07:00
Teknium	6391b46779	fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200 ) (#10470 ) The _client_cache used event loop id() as part of the cache key, so every new worker-thread event loop created a new entry for the same provider config. In long-running gateways where threads are recycled frequently, this caused unbounded cache growth — each stale entry held an unclosed AsyncOpenAI client with its httpx connection pool, eventually exhausting file descriptors. Fix: remove loop_id from the cache key and instead validate on each async cache hit that the cached loop is the current, open loop. If the loop changed or was closed, the stale entry is replaced in-place rather than creating an additional entry. This bounds cache growth to at most one entry per unique provider config. Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO eviction as defense-in-depth against any remaining unbounded growth. Cross-loop safety is preserved: different event loops still get different client instances (validated by existing test suite). Closes #10200	2026-04-15 13:16:28 -07:00
Teknium	a4e1842f12	fix: strip reasoning item IDs from Responses API input when store=False (#10217 ) With store=False (our default for the Responses API), the API does not persist response items. When reasoning items with 'id' fields were replayed on subsequent turns, the API attempted a server-side lookup for those IDs and returned 404: Item with id 'rs_...' not found. Items are not persisted when store is set to false. The encrypted_content blob is self-contained for reasoning chain continuity — the id field is unnecessary and triggers the failed lookup. Fix: strip 'id' from reasoning items in both _chat_messages_to_responses_input (message conversion) and _preflight_codex_input_items (normalization layer). The id is still used for local deduplication but never sent to the API. Reported by @zuogl448 on GPT-5.4.	2026-04-15 03:19:43 -07:00
Teknium	5d5d21556e	fix: sync client.api_key during UnicodeEncodeError ASCII recovery (#10090 ) The existing recovery block sanitized self.api_key and self._client_kwargs['api_key'] but did not update self.client.api_key. The OpenAI SDK stores its own copy of api_key and reads it dynamically via the auth_headers property on every request. Without this fix, the retry after sanitization would still send the corrupted key in the Authorization header, causing the same UnicodeEncodeError. The bug manifests when an API key contains Unicode lookalike characters (e.g. ʋ U+028B instead of v) from copy-pasting out of PDFs, rich-text editors, or web pages with decorative fonts. httpx hard-encodes all HTTP headers as ASCII, so the non-ASCII char in the Authorization header triggers the error. Adds TestApiKeyClientSync with two tests verifying: - All three key locations are synced after sanitization - Recovery handles client=None (pre-init) without crashing	2026-04-14 22:37:45 -07:00
Teknium	93fe4ead83	fix: warn on invalid context_length format in config.yaml (#10067 ) Previously, non-integer context_length values (e.g. '256K') in config.yaml were silently ignored, causing the agent to fall back to 128K auto-detection with no user feedback. This was confusing for users with custom LiteLLM endpoints expecting larger context. Now prints a clear stderr warning and logs at WARNING level when model.context_length or custom_providers[].models.<model>.context_length cannot be parsed as an integer, telling users to use plain integers (e.g. 256000 instead of '256K'). Reported by community user ChFarhan via Discord.	2026-04-14 22:14:27 -07:00
Teknium	c5688e7c8b	fix(gateway): break compression-exhaustion infinite loop and auto-reset session (#9893 ) When compression fails after max attempts, the agent returns {completed: False, partial: True} but was missing the 'failed' flag. The gateway's agent_failed_early guard checked for 'failed' AND 'not final_response', but _run_agent_blocking always converts errors to final_response — making the guard dead code. This caused the oversized session to persist, creating an infinite fail loop where every subsequent message hits the same compression failure. Changes: - run_agent.py: add 'failed: True' and 'compression_exhausted: True' to all 5 compression-exhaustion return paths - gateway/run.py (_run_agent_blocking): forward 'failed' and 'compression_exhausted' flags through to the caller - gateway/run.py (_handle_message_with_agent): fix agent_failed_early to check bool(failed) without the broken 'not final_response' clause; auto-reset the session when compression is exhausted so the next message starts fresh - Update tests to match new guard logic and add TestCompressionExhaustedFlag test class Closes #9893	2026-04-14 21:18:17 -07:00
Teknium	da528a8207	fix: detect and strip non-ASCII characters from API keys (#6843 ) API keys containing Unicode lookalike characters (e.g. ʋ U+028B instead of v) cause UnicodeEncodeError when httpx encodes the Authorization header as ASCII. This commonly happens when users copy-paste keys from PDFs, rich-text editors, or web pages with decorative fonts. Three layers of defense: 1. Save-time validation (hermes_cli/config.py): _check_non_ascii_credential() strips non-ASCII from credential values when saving to .env, with a clear warning explaining the issue. 2. Load-time sanitization (hermes_cli/env_loader.py): _sanitize_loaded_credentials() strips non-ASCII from credential env vars (those ending in _API_KEY, _TOKEN, _SECRET, _KEY) after dotenv loads them, so the rest of the codebase never sees non-ASCII keys. 3. Runtime recovery (run_agent.py): The UnicodeEncodeError recovery block now also sanitizes self.api_key and self._client_kwargs['api_key'], fixing the gap where message/tool sanitization succeeded but the API key still caused httpx to fail on the Authorization header. Also: hermes_logging.py RotatingFileHandler now explicitly sets encoding='utf-8' instead of relying on locale default (defensive hardening for ASCII-locale systems).	2026-04-14 20:20:31 -07:00
Teknium	19199cd38d	fix: clamp 'minimal' reasoning effort to 'low' on Responses API (#9429 ) GPT-5.4 supports none/low/medium/high/xhigh but not 'minimal'. Users may configure 'minimal' via OpenRouter conventions, which would cause a 400 on native OpenAI. Clamp to 'low' in the codex_responses path before sending.	2026-04-13 23:11:13 -07:00
Gianfranco Piana	eabc0a2f66	feat(plugins): let pre_tool_call hooks block tool execution Plugins can now return {"action": "block", "message": "reason"} from their pre_tool_call hook to prevent a tool from executing. The error message is returned to the model as a tool result so it can adjust. Covers both execution paths: handle_function_call (model_tools.py) and agent-level tools (run_agent.py _invoke_tool + sequential/concurrent). Blocked tools skip all side effects (counter resets, checkpoints, callbacks, read-loop tracker). Adds skip_pre_tool_call_hook flag to avoid double-firing the hook when run_agent.py already checked and then calls handle_function_call. Salvaged from PR #5385 (gianfrancopiana) and PR #4610 (oredsecurity).	2026-04-13 22:01:49 -07:00
helix4u	8680f61f8b	fix(copilot-acp): keep acp runtime off responses path	2026-04-13 16:17:43 -07:00
Teknium	063244bb16	test: add coverage for plugin context engine init (#9071 ) Verify that plugin context engines receive update_model() with correct context_length during AIAgent init — regression test for the ctx -- bug.	2026-04-13 15:00:57 -07:00
yongtenglei	2773b18b56	fix(run_agent): refresh activity during streaming responses Previously, long-running streamed responses could be incorrectly treated as idle by the gateway/cron inactivity timeout even while tokens were actively arriving. The _touch_activity() call (which feeds get_activity_summary() polled by the external timeout) was either called only on the first chunk (chat completions) or not at all (Anthropic, Codex, Codex fallback). Add _touch_activity() on every chunk/event in all four streaming paths so the inactivity monitor knows data is still flowing. Fixes #8760	2026-04-13 10:55:51 -07:00
Teknium	0dd26c9495	fix(tests): fix 78 CI test failures and remove dead test (#9036 ) Production fixes: - voice_mode.py: add is_recording property to AudioRecorder (parity with TermuxAudioRecorder) - cronjob_tools.py: add sms example to deliver description Test fixes: - test_real_interrupt_subagent: add missing _execution_thread_id (fixes 19 cascading failures from leaked _build_system_prompt patch) - test_anthropic_error_handling: add _FakeMessages, override _interruptible_streaming_api_call (6 fixes) - test_ctx_halving_fix: add missing request_overrides attribute (4 fixes) - test_context_token_tracking: set _disable_streaming=True for non-streaming test path (4 fixes) - test_dict_tool_call_args: set _disable_streaming=True (1 fix) - test_provider_parity: add model='gpt-4o' for AIGateway tests to meet 64K minimum context (4 fixes) - test_session_race_guard: add user_id to SessionSource (5 fixes) - test_restart_drain/helpers: add user_id to SessionSource (2 fixes) - test_telegram_photo_interrupts: add user_id to SessionSource - test_interrupt: target thread_id for per-thread interrupt system (2 fixes) - test_zombie_process_cleanup: rewrite with object.__new__ for refactored GatewayRunner.stop() (1 fix) - test_browser_camofox_state: update config version 15->17 (1 fix) - test_trajectory_compressor_async: widen lookback window 10->20 for line-shifted AsyncOpenAI (1 fix) - test_voice_mode: fixed by production is_recording addition (5 fixes) - test_voice_cli_integration: add _attached_images to CLI stub (2 fixes) - test_hermes_logging: explicit propagation/level reset for cross-test pollution defense (1 fix) - test_run_agent: add base_url for OpenRouter detection tests (2 fixes) Deleted: - test_inline_think_blocks_reasoning_only_accepted: tested unimplemented inline <think> handling	2026-04-13 10:50:24 -07:00
kimsr96	b909a9efef	fix: extend ASCII-locale UnicodeEncodeError recovery to full request payload The existing ASCII codec handler only sanitized conversation messages, leaving tool schemas, system prompts, ephemeral prompts, prefill messages, and HTTP headers as unhandled sources of non-ASCII content. On systems with LANG=C or non-UTF-8 locale, Unicode symbols in tool descriptions (e.g. arrows, em-dashes from prompt_builder) and system prompt content would cause UnicodeEncodeError that fell through to the error path. Changes: - Add _sanitize_structure_non_ascii() generic recursive walker for nested dict/list payloads - Add _sanitize_tools_non_ascii() thin wrapper for tool schemas - Add _force_ascii_payload flag: once ASCII locale is detected, all subsequent API calls get proactively sanitized (prevents recurring failures from new tool results bringing fresh Unicode each turn) - Extend the ASCII codec error handler to sanitize: prefill_messages, tool schemas (self.tools), system prompt, ephemeral system prompt, and default HTTP headers - Update stale comment that acknowledged the gap Cherry-picked from PR #8834 (credential pool changes dropped as separate concern).	2026-04-13 05:16:35 -07:00
Teknium	a5bd56eae3	fix: eliminate provider hang dead zones in retry/timeout architecture (#8985 ) Three targeted changes to close the gaps between retry layers that caused users to experience 'No response from provider for 580s' and 'No activity for 15 minutes' despite having 5 layers of retry: 1. Remove non-streaming fallback from streaming path Previously, when all 3 stream retries exhausted, the code fell back to _interruptible_api_call() which had no stale detection and no activity tracking — a black hole that could hang for up to 1800s. Now errors propagate to the main retry loop which has richer recovery (credential rotation, provider fallback, backoff). For 'stream not supported' errors, sets _disable_streaming flag so the main retry loop automatically switches to non-streaming on the next attempt. 2. Add _touch_activity to recovery dead zones The gateway inactivity monitor relies on _touch_activity() to know the agent is alive, but activity was never touched during: - Stale stream detection/kill cycles (180-300s gaps) - Stream retry connection rebuilds - Main retry backoff sleeps (up to 120s) - Error recovery classification Now all these paths touch activity every ~30s, keeping the gateway informed during recovery cycles. 3. Add stale-call detector to non-streaming path _interruptible_api_call() now has the same stale detection pattern as the streaming path: kills hung connections after 300s (default, configurable via HERMES_API_CALL_STALE_TIMEOUT), scaled for large contexts (450s for 50K+ tokens, 600s for 100K+ tokens), disabled for local providers. Also touches activity every ~30s during the wait so the gateway monitor stays informed. Env vars: - HERMES_API_CALL_STALE_TIMEOUT: non-streaming stale timeout (default 300s) - HERMES_STREAM_STALE_TIMEOUT: unchanged (default 180s) Before: worst case ~2+ hours of sequential retries with no feedback After: worst case bounded by gateway inactivity timeout (default 1800s) with continuous activity reporting	2026-04-13 04:55:20 -07:00
Teknium	397eae5d93	fix: recover partial streamed content on connection failure When streaming fails after partial content delivery (e.g. OpenRouter timeout kills connection mid-response), the stub response now carries the accumulated streamed text instead of content=None. Two fixes: 1. The partial-stream stub response includes recovered content from _current_streamed_assistant_text — the text that was already delivered to the user via stream callbacks before the connection died. 2. The empty response recovery chain now checks for partial stream content BEFORE falling back to _last_content_with_tools (prior turn content) or wasting API calls on retries. This prevents: - Showing wrong content from a prior turn - Burning 3+ unnecessary retry API calls - Falling through to '(empty)' when the user already saw content The root cause: OpenRouter has a ~125s inactivity timeout. When Anthropic's SSE stream goes silent during extended reasoning, the proxy kills the connection. The model's text was already partially streamed but the stub discarded it, triggering the empty recovery chain which would show stale prior-turn content or waste retries.	2026-04-13 02:12:01 -07:00
Teknium	bc4e2744c3	test: add tests for compression config_context_length passthrough - Test that auxiliary.compression.context_length from config is forwarded to get_model_context_length (positive case) - Test that invalid/non-integer config values are silently ignored - Fix _make_agent() to set config=None (cherry-picked code reads self.config)	2026-04-12 17:52:34 -07:00
Teknium	d6785dc4d4	fix: empty response recovery for reasoning models (mimo, qwen, GLM) (#8609 ) Three fixes for the (empty) response bug affecting open reasoning models: 1. Allow retries after prefill exhaustion — models like mimo-v2-pro always populate reasoning fields via OpenRouter, so the old 'not _has_structured' guard on the retry path blocked retries for EVERY reasoning model after the 2 prefill attempts. Now: 2 prefills + 3 retries = 6 total attempts before (empty). 2. Reset prefill/retry counters on tool-call recovery — the counters accumulated across the entire conversation, never resetting during tool-calling turns. A model cycling empty→prefill→tools→empty burned both prefill attempts and the third empty got zero recovery. Now counters reset when prefill succeeds with tool calls. 3. Strip think blocks before _truly_empty check — inline <think> content made the string non-empty, skipping both retry paths. Reported by users on Telegram with xiaomi/mimo-v2-pro and qwen3.5 models. Reproduced: qwen3.5-9b emits tool calls as XML in reasoning field instead of proper function calls, causing content=None + tool_calls=None + reasoning with embedded <tool_call> XML. Prefill recovery works but counter accumulation caused permanent (empty) in long sessions.	2026-04-12 15:38:11 -07:00
Teknium	400fe9b2a1	fix: add <thought> stripping to auxiliary_client + tests auxiliary_client.py had its own regex mirroring _strip_think_blocks but was missing the <thought> variant. Also adds test coverage for <thought> paired and orphaned tags.	2026-04-12 12:44:49 -07:00
Harish Kukreja	b1f13a8c5f	fix(agent): route compression aux through live session runtime	2026-04-12 01:34:52 -07:00
Teknium	39cd57083a	refactor: remove budget warning injection system (dead code) The _get_budget_warning() method already returned None unconditionally — the entire budget warning system was disabled. Remove all dead code: - _BUDGET_WARNING_RE regex - _strip_budget_warnings_from_history() function and its call site - Both injection blocks (concurrent + sequential tool execution) - _get_budget_warning() method - 7 tests for the removed functions The budget exhaustion grace call system (_budget_exhausted_injected, _budget_grace_call) is a separate recovery mechanism and is preserved.	2026-04-11 16:56:33 -07:00
asheriif	97b0cd51ee	feat(gateway): surface natural mid-turn assistant messages in chat platforms Add display.interim_assistant_messages config (enabled by default) that forwards completed assistant commentary between tool calls to the user as separate chat messages. Models already emit useful status text like 'I'll inspect the repo first.' — this surfaces it on Telegram, Discord, and other messaging platforms instead of swallowing it. Independent from tool_progress and gateway streaming. Disabled for webhooks. Uses GatewayStreamConsumer when available, falls back to direct adapter send. Tracks response_previewed to prevent double-delivery when interim message matches the final response. Also fixes: cursor not stripped from fallback prefix in stream consumer (affected continuation calculation on no-edit platforms like Signal). Cherry-picked from PR #7885 by asheriif, default changed to enabled. Fixes #5016	2026-04-11 16:21:39 -07:00
Teknium	c8aff74632	fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking Three root causes of the 'agent stops mid-task' gateway bug: 1. Compression threshold floor (64K tokens minimum) - The 50% threshold on a 100K-context model fired at 50K tokens, causing premature compression that made models lose track of multi-step plans. Now threshold_tokens = max(50% * context, 64K). - Models with <64K context are rejected at startup with a clear error. 2. Budget warning removal — grace call instead - Removed the 70%/90% iteration budget warnings entirely. These injected '[BUDGET WARNING: Provide your final response NOW]' into tool results, causing models to abandon complex tasks prematurely. - Now: no warnings during normal execution. When the budget is actually exhausted (90/90), inject a user message asking the model to summarise, allow one grace API call, and only then fall back to _handle_max_iterations. 3. Activity touches during long terminal execution - _wait_for_process polls every 0.2s but never reported activity. The gateway's inactivity timeout (default 1800s) would fire during long-running commands that appeared 'idle.' - Now: thread-local activity callback fires every 10s during the poll loop, keeping the gateway's activity tracker alive. - Agent wires _touch_activity into the callback before each tool call. Also: docs update noting 64K minimum context requirement. Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).	2026-04-11 16:18:57 -07:00
Teknium	8160d7a03d	test: add dedup coverage for reasoning item ID deduplication Adds two tests verifying that duplicate reasoning item IDs across multi-turn Codex Responses conversations are correctly deduplicated in both _chat_messages_to_responses_input() and _preflight_codex_input_items().	2026-04-11 14:43:47 -07:00
Teknium	dfc820345d	fix: scope tool interrupt signal per-thread to prevent cross-session leaks (#7930 ) The interrupt mechanism in tools/interrupt.py used a process-global threading.Event. In the gateway, multiple agents run concurrently in the same process via run_in_executor. When any agent was interrupted (user sends a follow-up message), the global flag killed ALL agents' running tools — terminal commands, browser ops, web requests — across all sessions. Changes: - tools/interrupt.py: Replace single threading.Event with a set of interrupted thread IDs. set_interrupt() targets a specific thread; is_interrupted() checks the current thread. Includes a backward- compatible _ThreadAwareEventProxy for legacy _interrupt_event usage. - run_agent.py: Store execution thread ID at start of run_conversation(). interrupt() and clear_interrupt() pass it to set_interrupt() so only this agent's thread is affected. - tools/code_execution_tool.py: Use is_interrupted() instead of directly checking _interrupt_event.is_set(). - tools/process_registry.py: Same — use is_interrupted(). - tests: Update interrupt tests for per-thread semantics. Add new TestPerThreadInterruptIsolation with two tests verifying cross-thread isolation.	2026-04-11 14:02:58 -07:00
Teknium	59e630a64d	fix: update thinking-exhaustion test for think-tag gating The test expected content=None to immediately trigger thinking-exhaustion, but PR #7738 correctly gates that check on _has_think_tags. Without think tags, the agent falls through to normal continuation retry (3 attempts).	2026-04-11 13:47:25 -07:00
Tom Qiao	5910412002	fix: detect truncated tool_calls when finish_reason is not length When API routers rewrite finish_reason from "length" to "tool_calls", truncated JSON arguments bypassed the length handler and wasted 3 retry attempts in the generic JSON validation loop. Now detects truncation patterns in tool call arguments regardless of finish_reason. Fixes #7680 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-04-11 13:47:25 -07:00
Teknium	dafe443beb	feat: warn at session start when compression model context is too small (#7894 ) Two-phase design so the warning fires before the user's first message on every platform: Phase 1 (__init__): _check_compression_model_feasibility() runs during agent construction. Resolves the auxiliary compression model (same chain as call_llm with task='compression'), compares its context length to the main model's compression threshold. If too small, emits via _emit_status() (prints for CLI) and stores the warning in _compression_warning. Phase 2 (run_conversation, first call): _replay_compression_warning() re-sends the stored warning through status_callback — which the gateway wires AFTER construction. The warning is then cleared so it only fires once. This ensures: - CLI users see the warning immediately at startup (right after the context limit line) - Gateway users (Telegram, Discord, Slack, WhatsApp, Signal, Matrix, Mattermost, Home Assistant, DingTalk, etc.) receive it via status_callback('lifecycle', ...) on their first message - logger.warning() always hits agent.log regardless of platform Also warns when no auxiliary LLM provider is configured at all. Entire check wrapped in try/except — never blocks startup. 11 tests covering: core warning logic, boundary conditions, exception safety, two-phase store+replay, gateway callback wiring, and single-delivery guarantee.	2026-04-11 12:01:30 -07:00
Teknium	06e1d9cdd4	fix: resolve three high-impact community bugs (#5819 , #6893 , #3388 ) (#7881 ) Matrix gateway: fix sync loop never dispatching events (#5819) - _sync_loop() called client.sync() but never called handle_sync() to dispatch events to registered callbacks — _on_room_message was registered but never fired for new messages - Store next_batch token from initial sync and pass as since= to subsequent incremental syncs (was doing full initial sync every time) - 17 comments, confirmed by multiple users on matrix.org Feishu docs: add interactive card configuration for approvals (#6893) - Error 200340 is a Feishu Developer Console configuration issue, not a code bug — users need to enable Interactive Card capability and configure Card Request URL - Added required 3-step setup instructions to feishu.md - Added troubleshooting entry for error 200340 - 17 comments from Feishu users Copilot provider drift: detect GPT-5.x Responses API requirement (#3388) - GPT-5.x models are rejected on /v1/chat/completions by both OpenAI and OpenRouter (unsupported_api_for_model error) - Added _model_requires_responses_api() to detect models needing Responses API regardless of provider - Applied in __init__ (covers OpenRouter primary users) and in _try_activate_fallback() (covers Copilot->OpenRouter drift) - Fixed stale comment claiming gateway creates fresh agents per message (it caches them via _agent_cache since the caching was added) - 7 comments, reported on Copilot+Telegram gateway	2026-04-11 11:12:20 -07:00
kshitijk4poor	af9caec44f	fix(qwen): correct context lengths for qwen3-coder models and send max_tokens to portal Based on PR #7285 by @kshitijk4poor. Two bugs affecting Qwen OAuth users: 1. Wrong context window — qwen3-coder-plus showed 128K instead of 1M. Added specific entries before the generic qwen catch-all: - qwen3-coder-plus: 1,000,000 (corrected from PR's 1,048,576 per official Alibaba Cloud docs and OpenRouter) - qwen3-coder: 262,144 2. Random stopping — max_tokens was suppressed for Qwen Portal, so the server applied its own low default. Reasoning models exhaust that on thinking tokens. Now: honor explicit max_tokens, default to 65536 when unset. Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>	2026-04-11 03:29:31 -07:00
Teknium	842e669a13	fix: activate fallback provider on repeated empty responses + user-visible status (#7505 ) When models return empty responses (no content, no tool calls, no reasoning), Hermes previously retried 3 times silently then fell through to '(empty)' — without ever trying the fallback provider chain. Users on GLM-4.5-Air and similar models experienced what appeared to be a complete hang, especially in gateway (Telegram/Discord) contexts where the silent retries produced zero feedback. Changes: - After exhausting 3 empty retries, attempt _try_activate_fallback() before giving up with '(empty)'. If fallback succeeds, reset retry counter and continue the conversation loop with the new provider. - Replace all _vprint() calls in recovery paths with _emit_status(), which surfaces messages through both CLI (_vprint with force=True) and gateway (status_callback -> adapter.send). Users now see: * '⚠️ Empty response from model — retrying (N/3)' during retries * '⚠️ Model returning empty responses — switching to fallback...' * '↻ Switched to fallback: <model> (<provider>)' on success * '❌ Model returned no content after all retries [and fallback]' - Add logger.warning() throughout empty response paths for log file visibility (model name, provider, retry counts). - Upgrade _last_content_with_tools fallback from logger.debug to logger.info + _emit_status so recovery is visible. - Upgrade thinking-only prefill continuation to use _emit_status. Tests: - test_empty_response_triggers_fallback_provider: verifies fallback activation after 3 empty retries produces content from fallback model - test_empty_response_fallback_also_empty_returns_empty: verifies graceful degradation when fallback also returns empty - test_empty_response_emits_status_for_gateway: verifies _emit_status is called during retries so gateway users see feedback Addresses #7180.	2026-04-10 19:15:41 -07:00
Teknium	ea81aa2eec	fix: guard api_kwargs in except handler to prevent UnboundLocalError (#7376 ) When _build_api_kwargs() throws an exception, the except handler in the retry loop referenced api_kwargs before it was assigned. This caused an UnboundLocalError that masked the real error, making debugging impossible for the user. Two _dump_api_request_debug() calls in the except block (non-retryable client error path and max-retries-exhausted path) both accessed api_kwargs without checking if it was assigned. Fix: initialize api_kwargs = None before the retry loop and guard both dump calls. Now the real error surfaces instead of the masking UnboundLocalError. Reported by Discord user gruman0.	2026-04-10 15:12:00 -07:00
angelos	7ccdb74364	fix(delegate): make max_concurrent_children configurable + error on excess `delegate_task` silently truncated batch tasks to 3 — the model sends 5 tasks, gets results for 3, never told 2 were dropped. Now returns a clear tool_error explaining the limit and how to fix it. The limit is configurable via: - delegation.max_concurrent_children in config.yaml (priority 1) - DELEGATION_MAX_CONCURRENT_CHILDREN env var (priority 2) - default: 3 Uses the same _load_config() path as the rest of delegate_task for consistent config priority. Clamps to min 1, warns on non-integer config values. Also removes the hardcoded maxItems: 3 from the JSON schema — the schema was blocking the model from even attempting >3 tasks before the runtime check could fire. The runtime check gives a much more actionable error message. Backwards compatible: default remains 3, existing configs unchanged.	2026-04-10 13:38:14 -07:00
Hermes Audit	2c99b4e79b	fix(unicode): sanitize surrogate metadata and allow two-pass retry	2026-04-10 13:05:01 -07:00
Hermes Audit	71036a7a75	fix: handle UnicodeEncodeError with ASCII codec (#6843 ) Broaden the UnicodeEncodeError recovery to handle systems with ASCII-only locale (LANG=C, Chromebooks) where ANY non-ASCII character causes encoding failure, not just lone surrogates. Changes: - Add _strip_non_ascii() and _sanitize_messages_non_ascii() helpers that strip all non-ASCII characters from message content, name, and tool_calls - Update the UnicodeEncodeError handler to detect ASCII codec errors and fall back to non-ASCII sanitization after surrogate check fails - Sanitize tool_calls arguments and name fields (not just content) - Fix bare .encode() in cli.py suspend handler to use explicit utf-8 - Add comprehensive test suite (17 tests)	2026-04-10 13:05:01 -07:00
Kenny Xie	b730c2955a	fix(model): normalize direct provider ids in auxiliary routing	2026-04-10 05:52:45 -07:00
Kenny Xie	fd5cc6e1b4	fix(model): normalize native provider-prefixed model ids	2026-04-10 05:52:45 -07:00
Ronald Reis	fd3e855d58	fix: pass config_context_length to switch_model context compressor When switching models at runtime, the config_context_length override was not being passed to the new context compressor instance. This meant the user-specified context length from config.yaml was lost after a model switch. - Store _config_context_length on AIAgent instance during __init__ - Pass _config_context_length when creating new ContextCompressor in switch_model - Add test to verify config_context_length is preserved across model switches Fixes: quando estamos alterando o modelo não está alterando o tamanho do contexto	2026-04-10 05:52:45 -07:00
Teknium	957485876b	fix: update 6 test files broken by dead code removal - test_percentage_clamp.py: remove TestContextCompressorUsagePercent class and test_context_compressor_clamped (tested removed get_status() method) - test_credential_pool.py: remove test_mark_used_increments_request_count (tested removed mark_used()), replace active_lease_count() calls with direct _active_leases dict access, remove mark_used from thread test - test_session.py: replace SessionSource.local_cli() factory calls with direct SessionSource construction (local_cli classmethod removed) - test_error_classifier.py: remove test_is_transient_property (tested removed is_transient property on ClassifiedError) - test_delivery.py: remove TestDeliveryRouter class (tested removed resolve_targets method), clean up unused imports - test_skills_hub.py: remove test_is_hub_installed (tested removed is_hub_installed method on HubLockFile)	2026-04-10 03:44:43 -07:00
helix4u	5a8b5f149d	fix(run-agent): rotate credential pool on billing-classified 400s	2026-04-10 03:27:19 -07:00
helix4u	9aedab00f4	fix(run_agent): recover primary client on openai transport errors	2026-04-10 03:21:24 -07:00
Teknium	871313ae2d	fix: clear conversation_history after mid-loop compression to prevent empty sessions (#7001 ) After mid-loop compression (triggered by 413, context_overflow, or Anthropic long-context tier errors), _compress_context() creates a new session in SQLite and resets _last_flushed_db_idx=0. However, conversation_history was not cleared, so _flush_messages_to_session_db() computed: flush_from = max(len(conversation_history=200), _last_flushed_db_idx=0) = 200 messages[200:] → empty (compressed messages < 200) This resulted in zero messages being written to the new session's SQLite store. On resume, the user would see 'Session found but has no messages.' The preflight compression path (line 7311) already had the fix: conversation_history = None This commit adds the same clearing to the three mid-loop compression sites: - Anthropic long-context tier overflow - HTTP 413 payload too large - Generic context_overflow error Reported by Aaryan (Nous community).	2026-04-10 00:14:59 -07:00
Teknium	8394b5ddd2	feat: expand /fast to all OpenAI Priority Processing models (#6960 ) Previously /fast only supported gpt-5.4 and forced a provider switch to openai-codex. Now supports all 13 models from OpenAI's Priority Processing pricing table (gpt-5.4, gpt-5.4-mini, gpt-5.2, gpt-5.1, gpt-5, gpt-5-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, gpt-4o, gpt-4o-mini, o3, o4-mini). Key changes: - Replaced _FAST_MODE_BACKEND_CONFIG with _PRIORITY_PROCESSING_MODELS frozenset - Removed provider-forcing logic — service_tier is now injected into whatever API path the user is already on (Codex Responses, Chat Completions, or OpenRouter passthrough) - Added request_overrides support to chat_completions path in run_agent.py - Updated messaging from 'Codex inference tier' to 'Priority Processing' - Expanded test coverage for all supported models	2026-04-09 22:06:30 -07:00
g-guthrie	d416a69288	feat: add Codex fast mode toggle (/fast command) Add /fast slash command to toggle OpenAI Codex service_tier between normal and priority ('fast') inference. Only exposed for models registered in _FAST_MODE_BACKEND_CONFIG (currently gpt-5.4). - Registry-based backend config for extensibility - Dynamic command visibility (hidden from help/autocomplete for non-supported models) via command_filter on SlashCommandCompleter - service_tier flows through request_overrides from route resolution - Omit max_output_tokens for Codex backend (rejects it) - Persists to config.yaml under agent.service_tier Salvage cleanup: removed simple_term_menu/input() menu (banned), bare /fast now shows status like /reasoning. Removed redundant override resolution in _build_api_kwargs — single source of truth via request_overrides from route. Co-authored-by: Hermes Agent <hermes@nousresearch.com>	2026-04-09 21:54:32 -07:00
AIandI0x1	2d0d05a337	fix(agent): detect truncated streaming tool calls before execution When a streaming response is cut mid-tool-call (connection drop, timeout), the accumulated function.arguments is invalid JSON. The mock response builder defaulted finish_reason to 'stop', so the agent loop treated it as a valid completed turn and tried to execute tools with broken args. Fix: validate tool call arguments with json.loads() during mock response reconstruction. If any are invalid JSON, override finish_reason to 'length'. In the main loop's length handler, if tool calls are present, refuse to execute and return partial=True with a clear error instead of silently failing or wasting retries. Also fixes _thinking_exhausted to not short-circuit when tool calls are present — truncated tool calls are not thinking exhaustion. Original cherry-picked from PR #6776 by AIandI0x1. Closes #6638.	2026-04-09 17:03:54 -07:00
Teknium	3b554bf839	fix: test for suppress_status_output should capture stdout, not mock _vprint The test was mocking _vprint entirely, bypassing the suppress guard. Switch to capturing _print_fn output so the real _vprint runs and the guard suppresses retry noise as intended.	2026-04-09 16:24:53 -07:00
adybag14-cyber	a3aed1bd26	fix(termux): keep quiet chat output parseable	2026-04-09 16:24:53 -07:00
adybag14-cyber	4970705ed3	fix(termux): silence quiet chat tool previews	2026-04-09 16:24:53 -07:00
Teknium	1eabbe905e	fix: retry 3 times when model returns truly empty response (#6488 ) When a model returns no content, no structured reasoning, and no tool calls (common with open models), the agent now silently retries up to 3 times before falling through to (empty). Silent retry (no synthetic messages) keeps the conversation history clean, preserves prompt caching, and respects the no-synthetic-user- injection invariant. Most empty responses from open models are transient (provider hiccups, rate limits, sampling flukes) so a simple retry is sufficient. This fills the last gap in the empty-response recovery chain: 1. _last_content_with_tools fallback (prior tool turn had content) 2. Thinking-only prefill continuation (#5931 — structured reasoning) 3. Empty response silent retry (NEW — truly empty, no reasoning) 4. (empty) terminal (last resort after all retries exhausted) Inline <think> blocks are excluded — the model chose to reason, it just produced no visible text. That differs from truly empty. Tests: - Updated test_truly_empty to expect 4 API calls (1 + 3 retries) - Added test_truly_empty_response_succeeds_on_nudge	2026-04-09 02:06:12 -07:00
Teknium	54db7cbbe1	fix(agent): tiered context pressure warnings + gateway dedup (#6411 ) Combines the approaches from PR #6309 (duan78) and PR #5963 (KUSH42): Tiered warnings (from #5963): - Replaces boolean _context_pressure_warned with float _context_pressure_warned_at - Fires at 85% (orange) and re-fires at 95% (red/critical) - Adds 'compacting context...' status message before compression Gateway dedup (from #6309): - Class-level dict _context_pressure_last_warned survives across AIAgent instances (gateway creates a new instance per message) - 5-minute cooldown per session prevents warning spam - Higher-tier warnings bypass the cooldown (85% → 95% always fires) - Compression reset clears the dedup entry for the session - Stale entries evicted (older than 2x cooldown) to prevent memory leak Does NOT inject into messages — purely user-facing via _safe_print (CLI) and status_callback (gateway). Zero prompt cache impact. Fixes #6309. Fixes #5963.	2026-04-08 21:31:44 -07:00
Teknium	989d4ea43d	fix: set compression_count on mock to avoid TypeError in test The new degradation warning reads compression_count as an int, but the existing test's MagicMock returns a MagicMock object for that attribute, causing '>=' comparison to fail.	2026-04-08 20:54:23 -07:00
konsisumer	42e366f27b	fix(agent): respect config timeout for flush_memories instead of hardcoded 30s The _call_llm() and direct OpenAI fallback paths in flush_memories() both hardcoded timeout=30.0, ignoring the user-configurable value at auxiliary.flush_memories.timeout in config.yaml. Remove the explicit timeout from the auxiliary _call_llm() call so that _get_task_timeout('flush_memories') reads from config. For the direct OpenAI fallback, import and use _get_task_timeout() instead of the hardcoded value. Add two regression tests verifying both code paths respect the config. Fixes #6154	2026-04-08 18:55:33 -07:00
kshitijk4poor	3377017eb4	feat(qwen): add Qwen OAuth provider with portal request support Based on #6079 by @tunamitom with critical fixes and comprehensive tests. Changes from #6079: - Fix: sanitization overwrite bug — Qwen message prep now runs AFTER codex field sanitization, not before (was silently discarding Qwen transforms) - Fix: missing try/except AuthError in runtime_provider.py — stale Qwen credentials now fall through to next provider on auto-detect - Fix: 'qwen' alias conflict — bare 'qwen' stays mapped to 'alibaba' (DashScope); use 'qwen-portal' or 'qwen-cli' for the OAuth provider - Fix: hardcoded ['coder-model'] replaced with live API fetch + curated fallback list (qwen3-coder-plus, qwen3-coder) - Fix: extract _is_qwen_portal() helper + _qwen_portal_headers() to replace 5 inline 'portal.qwen.ai' string checks and share headers between init and credential swap - Fix: add Qwen branch to _apply_client_headers_for_base_url for mid-session credential swaps - Fix: remove suspicious TypeError catch blocks around _prompt_provider_choice - Fix: handle bare string items in content lists (were silently dropped) - Fix: remove redundant dict() copies after deepcopy in message prep - Revert: unrelated ai-gateway test mock removal and model_switch.py comment deletion New tests (30 test functions): - _qwen_cli_auth_path, _read_qwen_cli_tokens (success + 3 error paths) - _save_qwen_cli_tokens (roundtrip, parent creation, permissions) - _qwen_access_token_is_expiring (5 edge cases: fresh, expired, within skew, None, non-numeric) - _refresh_qwen_cli_tokens (success, preserve old refresh, 4 error paths, default expires_in, disk persistence) - resolve_qwen_runtime_credentials (fresh, auto-refresh, force-refresh, missing token, env override) - get_qwen_auth_status (logged in, not logged in) - Runtime provider resolution (direct, pool entry, alias) - _build_api_kwargs (metadata, vl_high_resolution_images, message formatting, max_tokens suppression)	2026-04-08 13:46:30 -07:00
alt-glitch	bbcff8dcd0	fix(tools): address PR review — remove _extract_raw_output, BudgetConfig everywhere, read_file hardening - Remove _extract_raw_output: persist content verbatim (fixes size mismatch bug) - Drop import aliases: import from budget_config directly, one canonical name - BudgetConfig param on maybe_persist_tool_result and enforce_turn_budget - read_file: limit=None signature, pre-read guard fires only when limit omitted (256KB) - Unify binary extensions: file_operations.py imports from binary_extensions.py - Exclude .pdf and .svg from binary set (text-based, agents may inspect) - Remove redundant outer try/except in eval path (internal fallback handles it) - Fix broken tests: update assertion strings for new persistence format - Module-level constants: _PRE_READ_MAX_BYTES, _DEFAULT_READ_LIMIT - Remove redundant pathlib import (Path already at module level) - Update spec.md with IMPLEMENTED annotations and design decisions	2026-04-08 02:24:32 -07:00
alt-glitch	65e24c942e	wip: tool result fixes -- persistence	2026-04-08 02:24:32 -07:00
lesterli	37bf19a29d	fix(codex): align validation with normalization for empty stream output The response validation stage unconditionally marked Codex Responses API replies as invalid when response.output was empty, triggering unnecessary retries and fallback chains. However, _normalize_codex_response can recover from this state by synthesizing output from response.output_text. Now the validation stage checks for output_text before marking the response invalid, matching the normalization logic. Also fixes logging.warning → logger.warning for consistency with the rest of the file. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 17:29:41 -07:00
Siddharth Balyan	f3006ebef9	refactor(tests): re-architect tests + fix CI failures (#5946 ) * refactor: re-architect tests to mirror the codebase * Update tests.yml * fix: add missing tool_error imports after registry refactor * fix(tests): replace patch.dict with monkeypatch to prevent env var leaks under xdist patch.dict(os.environ) can leak TERMINAL_ENV across xdist workers, causing test_code_execution tests to hit the Modal remote path. * fix(tests): fix update_check and telegram xdist failures - test_update_check: replace patch("hermes_cli.banner.os.getenv") with monkeypatch.setenv("HERMES_HOME") — banner.py no longer imports os directly, it uses get_hermes_home() from hermes_constants. - test_telegram_conflict/approval_buttons: provide real exception classes for telegram.error mock (NetworkError, TimedOut, BadRequest) so the except clause in connect() doesn't fail with "catching classes that do not inherit from BaseException" when xdist pollutes sys.modules. * fix(tests): accept unavailable_models kwarg in _prompt_model_selection mock	2026-04-07 17:19:07 -07:00

1 2 3

149 Commits