hermes-agent-features

Author	SHA1	Message	Date
Teknium	bd6b138e85	fix: clean up HTML error messages in CLI display (#3069 ) When API calls fail with HTML error pages (e.g., CloudFlare errors), the CLI was dumping raw HTML content to users like: 📝 Error: <!DOCTYPE html><!--[if lt IE 7]> <html class="no-js ie6... This commit adds a _clean_error_message() utility method that: - Detects HTML content and replaces with user-friendly message - Collapses multiline errors to single line - Truncates overly long errors (>150 chars) - Preserves meaningful error text for regular errors Applied to all user-facing error displays: - API call failure messages (line 6314) - Interrupt error responses (line 6324) - Invalid response error messages (line 6000) Before: 📝 Error: <!DOCTYPE html><!--[if lt IE 7]>... After: 📝 Error: Service temporarily unavailable (HTML error page returned)	2026-03-25 16:39:22 -07:00
Teknium	9792bde31a	fix(agent): count compression restarts toward retry limit (#3070 ) When context overflow triggers compression, the outer retry loop restarts via continue without incrementing retry_count. If compression reduces messages but not enough to fit the context window, this creates an infinite loop burning API credits: API call → overflow → compress → retry → overflow → compress → ... Increment retry_count on compression restarts so the loop exits after max_retries total attempts. Cherry-picked from PR #2766 by dieutx.	2026-03-25 16:35:17 -07:00
Teknium	f7f30aaab9	fix(streaming): detect and kill stale SSE connections Adds a wall-clock stale stream detector (HERMES_STREAM_STALE_TIMEOUT, default 90s) that force-closes the httpx client when no real chunks arrive, even if SSE keep-alive pings keep the socket alive. Works with the existing streaming retry loop to recover via fresh connection. Made-with: Cursor	2026-03-25 16:07:05 -07:00
Teknium	77bcaba2d7	refactor: consolidate get_hermes_home() and parse_reasoning_effort() (#3062 ) Centralizes two widely-duplicated patterns into hermes_constants.py: 1. get_hermes_home() — Path resolution for ~/.hermes (HERMES_HOME env var) - Was copy-pasted inline across 30+ files as: Path(os.getenv("HERMES_HOME", Path.home() / ".hermes")) - Now defined once in hermes_constants.py (zero-dependency module) - hermes_cli/config.py re-exports it for backward compatibility - Removed local wrapper functions in honcho_integration/client.py, tools/website_policy.py, tools/tirith_security.py, hermes_cli/uninstall.py 2. parse_reasoning_effort() — Reasoning effort string validation - Was copy-pasted in cli.py, gateway/run.py, cron/scheduler.py - Same validation logic: check against (xhigh, high, medium, low, minimal, none) - Now defined once in hermes_constants.py, called from all 3 locations - Warning log for unknown values kept at call sites (context-specific) 31 files changed, net +31 lines (125 insertions, 94 deletions) Full test suite: 6179 passed, 0 failed	2026-03-25 15:54:28 -07:00
Teknium	8bb1d15da4	chore: remove ~100 unused imports across 55 files (#3016 ) Automated cleanup via pyflakes + autoflake with manual review. Changes: - Removed unused stdlib imports (os, sys, json, pathlib.Path, etc.) - Removed unused typing imports (List, Dict, Any, Optional, Tuple, Set, etc.) - Removed unused internal imports (hermes_cli.auth, hermes_cli.config, etc.) - Fixed cli.py: removed 8 shadowed banner imports (imported from hermes_cli.banner then immediately redefined locally — only build_welcome_banner is actually used) - Added noqa comments to imports that appear unused but serve a purpose: - Re-exports (gateway/session.py SessionResetPolicy, tools/terminal_tool.py is_interrupted/_interrupt_event) - SDK presence checks in try/except (daytona, fal_client, discord) - Test mock targets (auxiliary_client.py Path, mcp_config.py get_hermes_home) Zero behavioral changes. Full test suite passes (6162/6162, 2 pre-existing streaming test failures unrelated to this change).	2026-03-25 15:02:03 -07:00
Teknium	94e3d9adbf	fix(agent): restore safe non-streaming fallback after stream failures (#3020 ) After streaming retries are exhausted on transient errors, fall back to non-streaming instead of propagating the error. Also fall back for any other pre-delivery stream error (not just 'streaming not supported'). Added user-facing message when streaming is not supported by a model/ provider, directing users to set display.streaming: false in config.yaml to avoid the fallback delay. Cherry-picked from PR #3008 by kshitijk4poor. Added UX message for streaming-not-supported detection. Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-03-25 12:46:04 -07:00
Teknium	099dfca6db	fix: GLM reasoning-only and max-length handling (#3010 ) - Add 'prompt exceeds max length' to context overflow detection for Z.AI/GLM 400 errors - Extract inline reasoning blocks from assistant content as fallback when no structured reasoning fields are present - Guard inline extraction so structured API reasoning takes priority - Update test for reasoning-only response salvage behavior Cherry-picked from PR #2993 by kshitijk4poor. Added priority guard to fix test_structured_reasoning_takes_priority failure. Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-03-25 12:05:37 -07:00
Teknium	68ab37e891	fix(delegate): give subagents independent iteration budgets (#3004 ) Each subagent now gets its own IterationBudget instead of sharing the parent's. The per-subagent cap is controlled by delegation.max_iterations in config.yaml (default 50). Total iterations across parent + subagents can exceed the parent's max_iterations, but the user retains control via the config setting. Previously, subagents shared the parent's budget, so three parallel subagents configured for max_iterations=50 racing against a parent that already used 60 of 90 would each only get ~10 iterations. Inspired by PR #2928 (Bartok9) which identified the issue (#2873).	2026-03-25 11:29:49 -07:00
Teknium	61949f0af7	Fix (#2997 ) Co-authored-by: Jack <jvand@DESKTOP-JACK.localdomain>	2026-03-25 11:12:11 -07:00
Teknium	52c5e491f5	fix(session): surface silent SessionDB failures that cause session data loss (#2999 ) * fix(session): surface silent SessionDB failures that cause session data loss SessionDB initialization and operation failures were logged at debug level or silently swallowed, causing sessions to never be indexed in the FTS5 database. This made session_search unable to find affected conversations. In practice, ~48% of sessions can be lost without any visible indication. The JSON session files are still written (separate code path), but the SQLite/FTS5 index gets nothing — making session_search return empty results for affected sessions. Changes: - cli.py: Log warnings (not debug) when SessionDB init fails at both __init__ and _start_session entry points - run_agent.py: Log warnings on create_session, append_message, and compression split failures - run_agent.py: Set _session_db = None after create_session failure to fail fast instead of silently dropping every message for the session Root cause: When gateway restarts or DB lock contention occurs during SessionDB() init, the exception is caught and swallowed. The agent continues running normally — JSON session logs are written to disk — but no messages reach the FTS5 index. * fix: use module logger instead of root logging for SessionDB warnings Follow-up to cherry-picked PR #2939 — the original used logging.warning() (root logger) instead of logger.warning() (module logger) in the 5 new warning calls. Module logger preserves the logger hierarchy and shows the correct module name in log output. --------- Co-authored-by: LucidPaths <lc77@outlook.de>	2026-03-25 11:10:19 -07:00
Teknium	42fec19151	feat: persist reasoning across gateway session turns (schema v6) (#2974 ) feat: persist reasoning across gateway session turns (schema v6) Tested against OpenAI Codex (direct), Anthropic (direct + OAI-compat), and OpenRouter → 6 backends. All reasoning field types (reasoning, reasoning_details, codex_reasoning_items) round-trip through the DB correctly.	2026-03-25 09:47:28 -07:00
Teknium	5dbe2d9d73	fix: skills-sh install fails for deeply nested repo structures (#2980 ) * fix(run_agent): ensure _fire_first_delta() is called for tool generation events Added calls to _fire_first_delta() in the AIAgent class to improve the handling of tool generation events, ensuring timely notifications during the processing of function calls and tool usage. * fix(run_agent): improve timeout handling for chat completions Enhanced the timeout configuration for chat completions in the AIAgent class by introducing customizable connection, read, and write timeouts using environment variables. This ensures more robust handling of API requests during streaming operations. * fix(run_agent): reduce default stream read timeout for chat completions Updated the default stream read timeout from 120 seconds to 60 seconds in the AIAgent class, enhancing the timeout configuration for chat completions. This change aims to improve responsiveness during streaming operations. * fix(run_agent): enhance streaming error handling and retry logic Improved the error handling and retry mechanism for streaming requests in the AIAgent class. Introduced a configurable maximum number of stream retries and refined the handling of transient network errors, allowing for retries with fresh connections. Non-transient errors now trigger a fallback to non-streaming only when appropriate, ensuring better resilience during API interactions. * fix: skills-sh install fails for deeply nested repo structures Skills in repos with deep directory nesting (e.g. cli-tool/components/skills/development/senior-backend/) could not be installed because the candidate path generation and shallow root-dir scan never reached them. Added GitHubSource._find_skill_in_repo_tree() which uses the GitHub Trees API to recursively search the entire repo tree in a single API call. This is used as a final fallback in SkillsShSource._discover_identifier() when the standard candidate paths and shallow scan both fail. Fixes installation of skills from repos like davila7/claude-code-templates where skills are nested 4+ levels deep. Reported by user Samuraixheart.	2026-03-25 09:31:05 -07:00
Teknium	fd292e676b	fix: skip KawaiiSpinner when TUI handles tool progress (#2973 ) * docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event The hooks page only documented gateway event hooks (HOOK.yaml system). The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't referenced from the hooks page, which was confusing. Changes: - hooks.md: Add overview table showing both hook systems - hooks.md: Add Plugin Hooks section with available hooks, callback signatures, and example - hooks.md: Add missing session:end gateway event (emitted but undocumented) - hooks.md: Mark pre_llm_call, post_llm_call, on_session_start, on_session_end as planned (defined in VALID_HOOKS but not yet invoked) - hooks.md: Update info box to cross-reference plugin hooks - hooks.md: Fix heading hierarchy (gateway content as subsections) - plugins.md: Add cross-reference to hooks page for full details - plugins.md: Mark planned hooks as (planned) * feat(session_search): add recent sessions mode when query is omitted When session_search is called without a query (or with an empty query), it now returns metadata for the most recent sessions instead of erroring. This lets the agent quickly see what was worked on recently without needing specific keywords. Returns for each session: session_id, title, source, started_at, last_active, message_count, preview (first user message). Zero LLM cost — pure DB query. Current session lineage and child delegation sessions are excluded. The agent can then keyword-search specific sessions if it needs deeper context from any of them. * docs: clarify two-mode behavior in session_search schema description * fix(compression): restore sane defaults and cap summary at 12K tokens - threshold: 0.80 → 0.50 (compress at 50%, not 80%) - target_ratio: 0.40 → 0.20, now relative to threshold not total context (20% of 50% = 10% of context as tail budget) - summary ceiling: 32K → 12K (Gemini can't output more than ~12K) - Updated DEFAULT_CONFIG, config display, example config, and tests * fix: browser_vision ignores auxiliary.vision.timeout config (#2901) * docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event The hooks page only documented gateway event hooks (HOOK.yaml system). The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't referenced from the hooks page, which was confusing. Changes: - hooks.md: Add overview table showing both hook systems - hooks.md: Add Plugin Hooks section with available hooks, callback signatures, and example - hooks.md: Add missing session:end gateway event (emitted but undocumented) - hooks.md: Mark pre_llm_call, post_llm_call, on_session_start, on_session_end as planned (defined in VALID_HOOKS but not yet invoked) - hooks.md: Update info box to cross-reference plugin hooks - hooks.md: Fix heading hierarchy (gateway content as subsections) - plugins.md: Add cross-reference to hooks page for full details - plugins.md: Mark planned hooks as (planned) * fix: browser_vision ignores auxiliary.vision.timeout config browser_vision called call_llm() without passing a timeout parameter, so it always used the 30-second default in auxiliary_client.py. This made vision analysis with local models (llama.cpp, ollama) impossible since they typically need more than 30s for screenshot analysis. Now browser_vision reads auxiliary.vision.timeout from config.yaml (same config key that vision_analyze already uses) and passes it through to call_llm(). Also bumped the default vision timeout from 30s to 120s in both browser_vision and vision_analyze — 30s is too aggressive for local models and the previous default silently failed for anyone running vision locally. Fixes user report from GamerGB1988. * fix(skills): agent-created skills were incorrectly treated as untrusted community content _resolve_trust_level() didn't handle 'agent-created' source, so it fell through to 'community' trust level. Community policy blocks on any caution or dangerous findings, which meant common patterns like curl with env vars, systemctl, crontab, cloudflared references etc. would block skill creation/patching. The agent-created policy row already existed in INSTALL_POLICY with permissive settings (allow caution, ask on dangerous) but was never reached. Now it is. Fixes reports of skill_manage being blocked by security scanner. * fix(cli): enhance real-time reasoning output by forcing flush of long partial lines Updated the reasoning output mechanism to emit complete lines and force-flush long partial lines, ensuring reasoning is visible in real-time even without newlines. This improves user experience during reasoning sessions. * fix: skip KawaiiSpinner when TUI handles tool progress In the interactive CLI, the agent runs with quiet_mode=True and tool_progress_callback set. The quiet_mode condition triggered KawaiiSpinner for every tool call, but the TUI was already handling progress display via the spinner widget. The KawaiiSpinner writes carriage-return animation through StdoutProxy, triggering run_in_terminal() erase/redraw cycles on every flush. These redundant cycles cause the status bar to ghost into terminal scrollback. The thinking spinner already had this guard (checks thinking_callback). This extends the same pattern to the three tool spinner creation sites: concurrent tools, delegate_task, and single tool execution.	2026-03-25 08:33:44 -07:00
Teknium	7ca22ea11b	fix(compression): restore sane defaults and cap summary at 12K tokens - threshold: 0.80 → 0.50 (compress at 50%, not 80%) - target_ratio: 0.40 → 0.20, now relative to threshold not total context (20% of 50% = 10% of context as tail budget) - summary ceiling: 32K → 12K (Gemini can't output more than ~12K) - Updated DEFAULT_CONFIG, config display, example config, and tests	2026-03-24 18:48:47 -07:00
Teknium	9231a335d4	fix(compression): replace dead summary_target_tokens with ratio-based scaling (#2554 ) The summary_target_tokens parameter was accepted in the constructor, stored on the instance, and never used — the summary budget was always computed from hardcoded module constants (_SUMMARY_RATIO=0.20, _MAX_SUMMARY_TOKENS=8000). This caused two compounding problems: 1. The config value was silently ignored, giving users no control over post-compression size. 2. Fixed budgets (20K tail, 8K summary cap) didn't scale with context window size. Switching from a 1M-context model to a 200K model would trigger compression that nuked 350K tokens of conversation history down to ~30K. Changes: - Replace summary_target_tokens with summary_target_ratio (default 0.40) which sets the post-compression target as a fraction of context_length. Tail token budget and summary cap now scale proportionally: MiniMax 200K → ~80K post-compression GPT-5 1M → ~400K post-compression - Change threshold_percent default: 0.50 → 0.80 (don't fire until 80% of context is consumed) - Change protect_last_n default: 4 → 20 (preserve ~10 full turns) - Summary token cap scales to 5% of context (was fixed 8K), capped at 32K ceiling - Read target_ratio and protect_last_n from config.yaml compression section (both are now configurable) - Remove hardcoded summary_target_tokens=500 from run_agent.py - Add 5 new tests for ratio scaling, clamping, and new defaults	2026-03-24 17:45:49 -07:00
Teknium	8ee4f32819	fix(gateway): use TERMINAL_CWD for context file discovery, not process cwd The gateway process runs from the hermes-agent install directory, so os.getcwd() picks up the repo's AGENTS.md (16k chars) and other dev context files — inflating input tokens by ~10k on every gateway message. Fix: use TERMINAL_CWD (which the gateway sets to MESSAGING_CWD or $HOME) as the cwd for build_context_files_prompt(). In CLI mode, TERMINAL_CWD is the user's actual project directory, so behavior is unchanged. Before: gateway 15-20k input tokens, CLI 6-8k After: gateway ~6-8k input tokens (same as CLI) Reported by keri on Discord.	2026-03-24 17:30:33 -07:00
Teknium	618f15dda9	fix: reorder setup wizard providers — OpenRouter first Move OpenRouter to position 1 in the setup wizard's provider list to match hermes model ordering. Update default selection index and fix test expectations for the new ordering. Setup order: OpenRouter → Nous Portal → Codex → Custom → ...	2026-03-24 12:50:24 -07:00
Teknium	481915587e	fix: update context pressure warnings and token estimates after compaction Reset context pressure warnings and update last_prompt_tokens and last_completion_tokens in the context compressor to prevent stale values from causing excessive warnings and re-triggering compression. This change ensures accurate pressure calculations following the compaction process.	2026-03-24 09:25:10 -07:00
Teknium	ad1bf16f28	chore: remove all remaining mini-swe-agent references Complete cleanup after dropping the mini-swe-agent submodule (PR #2804): - Remove MSWEA_SILENT_STARTUP and MSWEA_GLOBAL_CONFIG_DIR env var settings from cli.py, run_agent.py, hermes_cli/main.py, doctor.py - Remove mini-swe-agent health check from hermes doctor - Remove 'minisweagent' from logger suppression lists - Remove litellm/typer/platformdirs from requirements.txt - Remove mini-swe-agent install steps from install.ps1 (Windows) - Remove mini-swe-agent install steps from website docs - Update all stale comments/docstrings referencing mini-swe-agent in terminal_tool.py, tools/__init__.py, code_execution_tool.py, environments/README.md, environments/agent_loop.py - Remove mini_swe_runner from pyproject.toml py-modules (still exists as standalone script for RL training use) - Shrink test_minisweagent_path.py to empty stub The orphaned mini-swe-agent/ directory on disk needs manual removal: rm -rf mini-swe-agent/	2026-03-24 08:19:23 -07:00
Teknium	02b38b93cb	refactor: remove mini-swe-agent dependency — inline Docker/Modal backends (#2804 ) Drop the mini-swe-agent git submodule. All terminal backends now use hermes-agent's own environment implementations directly. Docker backend: - Inline the `docker run -d` container startup (was 15 lines in minisweagent's DockerEnvironment). Our wrapper already handled execute(), cleanup(), security hardening, volumes, and resource limits. Modal backend: - Import swe-rex's ModalDeployment directly instead of going through minisweagent's 90-line passthrough wrapper. - Bake the _AsyncWorker pattern (from environments/patches.py) directly into ModalEnvironment for Atropos compatibility without monkey-patching. Cleanup: - Remove minisweagent_path.py (submodule path resolution helper) - Remove submodule init/install from install.sh and setup-hermes.sh - Remove mini-swe-agent from .gitmodules - environments/patches.py is now a no-op (kept for backward compat) - terminal_tool.py no longer does sys.path hacking for minisweagent - mini_swe_runner.py guards imports (optional, for RL training only) - Update all affected tests to mock the new direct subprocess calls - Update README.md, CONTRIBUTING.md No functionality change — all Docker, Modal, local, SSH, Singularity, and Daytona backends behave identically. 6093 tests pass.	2026-03-24 07:30:25 -07:00
Teknium	a312ee7b4c	fix(agent): ensure first delta is fired during reasoning updates - Added calls to `_fire_first_delta()` in the `AIAgent` class to ensure that the first delta is triggered for both reasoning and thinking updates. This change improves the handling of delta events during streaming, enhancing the responsiveness of the agent's reasoning capabilities.	2026-03-24 07:16:20 -07:00
Teknium	87e2626cf6	feat(cli, agent): add tool generation callback for streaming updates - Introduced `_on_tool_gen_start` in `HermesCLI` to indicate when tool-call arguments are being generated, enhancing user feedback during streaming. - Updated `AIAgent` to support a new `tool_gen_callback`, notifying the display layer when tool generation starts, allowing for better user experience during large payloads. - Ensured that the callback is triggered appropriately during streaming events to prevent user interface freezing.	2026-03-23 23:10:58 -07:00
Teknium	942f6eac94	fix(run_agent): ensure proper cleanup of OpenAI client in background review Added explicit closing of the OpenAI/httpx client in the background review process to prevent "Event loop is closed" errors. This change ensures that the client is properly cleaned up when the review agent is no longer needed, enhancing stability and resource management.	2026-03-22 16:03:16 -07:00
Teknium	bfe4baa6ed	chore: remove unused imports, dead code, and stale comments Mechanical cleanup — no behavior changes. Unused imports removed: - model_tools.py: import os - run_agent.py: OPENROUTER_MODELS_URL, get_model_context_length - cli.py: Table, VERSION, RELEASE_DATE, resolve_toolset, get_skill_commands - terminal_tool.py: signal, uuid, tempfile, set_interrupt_event, DANGEROUS_PATTERNS, _load_permanent_allowlist, _detect_dangerous_command Dead code removed: - toolsets.py: print_toolset_tree() (zero callers) - browser_tool.py: _get_session_name() (never called) Stale comments removed: - toolsets.py: duplicated/garbled comment line - web_tools.py: 3 aspirational TODO comments from early development	2026-03-22 08:33:34 -07:00
MacroAnarchy	f9c2ad48c2	fix: defer streaming iteration linebreak to prevent blank line stacking Follow-up to `669c60a6` (cherry-pick of PR #2187, fixes #2177). The original fix emits a "\n\n" delta immediately after every _execute_tool_calls() invocation. When the model runs multiple consecutive tool iterations before producing text (common with search → read → analyze flows), each iteration appends its own paragraph break, resulting in 4-6+ blank lines before the actual response. Replace the immediate delta with a deferred flag (_stream_needs_break). _fire_stream_delta() checks the flag and prepends a single "\n\n" only when the first real text delta arrives, so multiple back-to-back tool iterations still produce exactly one paragraph break.	2026-03-22 04:59:12 -07:00
Teknium	34be3f8be6	revert: remove trailing empty assistant message stripping Reverts the sanitizer addition from PR #2466 (originally #2129). We already have _empty_content_retries handling for reasoning-only responses. The trailing strip risks silently eating valid messages and is redundant with existing empty-content handling.	2026-03-22 04:55:34 -07:00
ygd58	5407d12bc6	fix(agent): strip trailing empty assistant messages before API calls to prevent prefill rejection	2026-03-22 04:38:17 -07:00
Bartok Moltbot	e6a708aa04	fix(io): catch ValueError in _SafeWriter for closed file handles (#2428 ) When subagents run in ThreadPoolExecutor threads, the shared stdout handle can close between thread teardown and KawaiiSpinner cleanup. Python raises ValueError (not OSError) for I/O operations on closed files: ValueError: I/O operation on closed file The _SafeWriter class was only catching OSError, missing this case. Changes: - Add ValueError to exception handling in write(), flush(), and isatty() - Update docstring to document the ThreadPoolExecutor teardown scenario Fixes #2428	2026-03-22 04:38:17 -07:00
Teknium	8cb7864110	fix: resolve garbled ANSI escape codes in status printouts (#2262 ) (#2448 ) Two related root causes for the '?[33mTool progress: NEW?[0m' garbling reported on kitty, alacritty, ghostty and gnome-console: 1. /verbose label printing used self.console.print() with Rich markup ([yellow]...[/]). self.console is a plain Rich Console() whose output goes directly to sys.stdout, which patch_stdout's StdoutProxy intercepts and mangles raw ANSI sequences. 2. Context pressure status lines (e.g. 'approaching compaction') from AIAgent._safe_print() had the same problem -- _safe_print() was a @staticmethod that always called builtin print(), bypassing the prompt_toolkit renderer entirely. Fix: - Convert AIAgent._safe_print() from @staticmethod to an instance method that delegates to self._print_fn (defaults to builtin print, preserving all non-CLI behaviour). - After the CLI creates its AIAgent instance, wire self.agent._print_fn to the existing _cprint() helper which routes through prompt_toolkit.print_formatted_text(ANSI(text)). - Rewrite the /verbose feedback labels to use hermes_cli.colors.Colors ANSI constants in f-strings and emit them via _cprint() directly, removing the Rich-markup-inside-patch_stdout anti-pattern. Fixes #2262 Co-authored-by: Animesh Mishra <animesh.m.7523@gmail.com>	2026-03-22 04:07:06 -07:00
Teknium	306e67f32d	fix: fail fast when explicit provider has no API key instead of silent OpenRouter fallback (#2445 ) When a non-OpenRouter provider (e.g. minimax, anthropic) is set in config.yaml but its API key is missing, Hermes silently fell back to OpenRouter, causing confusing 404 errors. Now checks if the user explicitly configured a provider before falling back. Explicit providers raise RuntimeError with a clear message naming the missing env var. Auto/openrouter/custom providers still fall through to OpenRouter as before. Three code paths fixed: - run_agent.py AIAgent.__init__ — main client initialization - auxiliary_client.py call_llm — sync auxiliary calls - auxiliary_client.py call_llm_streaming — async auxiliary calls Based on PR #2272 by @StefanIsMe. Applied manually to fix a pconfig NameError in the original and extend to call_llm_streaming. Co-authored-by: StefanIsMe <StefanIsMe@users.noreply.github.com>	2026-03-22 03:59:29 -07:00
Teknium	669c60a6bb	fix: add iteration boundary linebreak to prevent stream concatenation Cherry-picked from PR #2187 by @devorun. Fixes #2177. When streaming is enabled, text before and after tool calls gets concatenated without separation. Adds a paragraph break delta after _execute_tool_calls() so stream consumers insert proper whitespace between iteration boundaries.	2026-03-21 19:19:26 -07:00
teyrebaz33	bd49bce278	fix(prompt-caching): skip top-level cache_control on role:tool for OpenRouter On the native Anthropic Messages API path, convert_messages_to_anthropic() moves top-level cache_control on role:tool messages inside the tool_result block. On OpenRouter (chat_completions), no such conversion happens — the unexpected top-level field causes a silent hang on the second tool call. Add native_anthropic parameter to _apply_cache_marker() and apply_anthropic_cache_control(). When False (OpenRouter), role:tool messages are skipped entirely. When True (native Anthropic), existing behaviour is preserved. Fixes #2362	2026-03-21 16:54:43 -07:00
Teknium	525caadd8c	fix: prevent Anthropic token leaking to third-party anthropic_messages providers (salvage #2383 ) (#2389 ) * fix: prevent Anthropic token fallback leaking to third-party anthropic_messages providers When provider is minimax/alibaba/etc and MINIMAX_API_KEY is not set, the code fell back to resolve_anthropic_token() sending Anthropic OAuth credentials to third-party endpoints, causing 401 errors. Now only provider=="anthropic" triggers the fallback. Generalizes the Alibaba-specific guard from #1739 to all non-Anthropic providers. * fix: set provider='anthropic' in credential refresh tests Follow-up for cherry-picked PR #2383 — existing tests didn't set agent.provider, which the new guard requires to allow Anthropic token refresh. --------- Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>	2026-03-21 16:42:46 -07:00
Teknium	2a5f86ed6d	Merge pull request #2343 from NousResearch/hermes/hermes-31d7db3b feat: @ context references + Honcho config fixes	2026-03-21 16:10:19 -07:00
Teknium	2c06ec5f51	fix: correct provider check for Alibaba model identity injection PR #2314 checked for provider names 'alibaba-coding-plan' and 'alibaba-coding-plan-anthropic' which don't exist in the provider registry. The provider is always 'alibaba' — the condition was dead code. Fixed to check self.provider == 'alibaba'.	2026-03-21 09:46:26 -07:00
crazywriter1	523d8c38f9	fix: Alibaba/DashScope: preserve model dots (qwen3.5-plus) and fix 401 auth When using Alibaba (DashScope) with an anthropic-compatible endpoint, model names like qwen3.5-plus were being normalized to qwen3-5-plus. Alibaba's API expects the dot. Added preserve_dots parameter to normalize_model_name() and build_anthropic_kwargs(). Also fixed 401 auth: when provider is alibaba or base_url contains dashscope/aliyuncs, use only the resolved API key (DASHSCOPE_API_KEY). Never fall back to resolve_anthropic_token(), and skip Anthropic credential refresh for DashScope endpoints. Cherry-picked from PR #1748 by crazywriter1. Fixes #1739.	2026-03-21 09:38:04 -07:00
Teknium	e183744cb5	feat(honcho): instance-local config via HERMES_HOME, default session strategy to per-directory - Add resolve_config_path(): checks $HERMES_HOME/honcho.json first, falls back to ~/.honcho/config.json. Enables isolated Hermes instances with independent Honcho credentials and settings. - Update CLI and doctor to use resolved path instead of hardcoded global. - Change default session_strategy from per-session to per-directory. Part 1 of #1962 by @erosika.	2026-03-21 09:34:00 -07:00
Teknium	9305164bf3	fix: add None-entry guard to tool_calls loops in run_agent, batch_runner, and mini_swe_runner (#2316 ) Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>	2026-03-21 07:20:41 -07:00
ygd58	2ea8054304	fix(agent): inject model identity for Alibaba Coding Plan to work around API returning wrong model name	2026-03-21 07:11:08 -07:00
Teknium	58b52dfb2f	Merge pull request #2303 from NousResearch/hermes/hermes-31d7db3b fix: remove synthetic error message injection, fix session resume after repeated failures	2026-03-21 07:03:54 -07:00
Teknium	779619f742	fix: remove synthetic error message injection, fix session resume after repeated failures Two changes to the error handler in the agent loop: 1. Remove the 'if not pending_handled' block that injected fake [System error during processing: ...] messages into conversation history. These polluted history, burned tokens on retries, and could violate role alternation by injecting as role=user. The tool_calls error-result path (role=tool) is preserved. 2. Append the error final_response as an assistant message when hitting the iteration limit, so session resume doesn't produce consecutive user messages.	2026-03-21 06:33:05 -07:00
Teknium	96a5e9fc11	feat(agent): add summary of successful tool actions in review agent Enhanced the review agent to scan and summarize successful tool actions, providing users with a compact overview of updates made during the review process. This includes actions related to memory and user profiles, improving user feedback and interaction clarity.	2026-03-21 06:31:59 -07:00
Teknium	885f88fb60	feat(agent): suppress non-forced output during post-response housekeeping - Introduced a mechanism to mute output after the main response is delivered, ensuring that subsequent tool calls run without cluttering the CLI. - Redirected stdout to devnull during the review agent's execution to prevent any print statements from interfering with the main CLI display. - Added a new attribute `_mute_post_response` to manage output suppression effectively.	2026-03-20 23:54:42 -07:00
Teknium	761a8ad39a	fix(display): show provider and endpoint in API error messages (#2266 ) fix(display): show provider and endpoint in API error messages	2026-03-20 21:57:53 -07:00
Test	d560f2d1f2	fix(display): show provider and endpoint in API error messages When an API call fails, the error output now shows the provider name, model, and endpoint URL so users can immediately identify which service rejected their request. Auth errors (401/403) get actionable guidance: check key validity, model access, and OpenRouter credits link. Before: 'API call failed (attempt 1/3): PermissionDeniedError' After: 'API call failed (attempt 1/3): PermissionDeniedError Provider: openrouter Model: anthropic/claude-sonnet-4 Endpoint: https://openrouter.ai/api/v1 Your API key was rejected by the provider. Check: • Is the key valid? Run: hermes setup • Does your account have access to anthropic/claude-sonnet-4? • Check credits: https://openrouter.ai/settings/credits'	2026-03-20 21:06:55 -07:00
Teknium	45058b4105	feat: replace inline nudges with background memory/skill review (#2235 ) Remove the memory and skill nudges that were appended directly to user messages, causing backward-looking system instructions to compete with forward-looking user tasks. Found in 43% of user messages across 15 sessions, with confirmed cases of the agent spending tool calls on nudge responses before starting the user's actual request. Replace with a background review agent that runs AFTER the main agent finishes responding: - Spawns a background thread with a snapshot of the conversation - Uses the main model (not auxiliary) for high-precision memory/skill work - Only has memory + skill_manage tools (5 iteration budget) - Shares the memory store for direct writes - Never modifies the main conversation history - Never competes with the user's task for model attention - Zero latency impact (runs after response is delivered) - Same token cost (processes the same context, just on a separate track) The trigger conditions are unchanged (every 10 user turns for memory, after 10+ tool iterations for skills). Only the execution path changes: from inline injection to background fork. Closes #2227. Co-authored-by: Test <test@test.com>	2026-03-20 18:51:31 -07:00
Teknium	4263350c5b	fix: remove post-compression file-read history injection (#2226 ) Remove the [Files already read — do NOT re-read these] user message that was injected into the conversation after context compression. This message used role='user' for system-generated content, creating a fake user turn that confused models about conversation state and could contribute to task-redo behavior. The file_tools.py read tracker (warn on 3rd consecutive read, block on 4th+) already handles re-read prevention inline without injecting synthetic messages. Closes #2224. Co-authored-by: Test <test@test.com>	2026-03-20 14:54:25 -07:00
Test	76bc27199f	fix(cli, agent): improve streaming handling and state management - Updated _stream_delta method in HermesCLI to handle None values, flushing the stream and resetting state for clean tool execution. - Enhanced quiet mode handling in AIAgent to ensure proper display closure before tool execution, preventing display issues with intermediate streamed content. These changes improve the robustness of the streaming functionality and ensure a smoother user experience during tool interactions.	2026-03-20 10:02:42 -07:00
Test	55ce601502	fix: 6 bugs in model metadata, reasoning detection, and delegate tool Cherry-picked from PR #2169 by @0xbyt4. 1. _strip_provider_prefix: skip Ollama model:tag names (qwen:0.5b) 2. Fuzzy match: remove reverse direction that made claude-sonnet-4 resolve to 1M instead of 200K 3. _has_content_after_think_block: reuse _strip_think_blocks() to handle all tag variants (thinking, reasoning, REASONING_SCRATCHPAD) 4. models.dev lookup: elif→if so nous provider also queries models.dev 5. Disk cache fallback: use 5-min TTL instead of full hour so network is retried soon 6. Delegate build: wrap child construction in try/finally so _last_resolved_tool_names is always restored on exception	2026-03-20 08:52:37 -07:00
Teknium	c52353cf8a	feat: context pressure warnings for CLI and gateway (#2159 ) * feat: context pressure warnings for CLI and gateway User-facing notifications as context approaches the compaction threshold. Warnings fire at 60% and 85% of the way to compaction — relative to the configured compression threshold, not the raw context window. CLI: Formatted line with a progress bar showing distance to compaction. Cyan at 60% (approaching), bold yellow at 85% (imminent). ◐ context ▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱▱▱▱▱▱ 60% to compaction 100k threshold (50%) · approaching compaction ⚠ context ▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▰▱▱▱ 85% to compaction 100k threshold (50%) · compaction imminent Gateway: Plain-text notification sent to the user's chat via the new status_callback mechanism (asyncio.run_coroutine_threadsafe bridge, same pattern as step_callback). Does NOT inject into the message stream. The LLM never sees these warnings. Flags reset after each compaction cycle. Files changed: - agent/display.py — format_context_pressure(), format_context_pressure_gateway() - run_agent.py — status_callback param, _context_50/70_warned flags, _emit_context_pressure(), flag reset in _compress_context() - gateway/run.py — _status_callback_sync bridge, wired to AIAgent - tests/test_context_pressure.py — 23 tests * Merge remote-tracking branch 'origin/main' into hermes/hermes-7ea545bf --------- Co-authored-by: Test <test@test.com>	2026-03-20 08:37:36 -07:00
Teknium	88643a1ba9	feat: overhaul context length detection with models.dev and provider-aware resolution (#2158 ) Replace the fragile hardcoded context length system with a multi-source resolution chain that correctly identifies context windows per provider. Key changes: - New agent/models_dev.py: Fetches and caches the models.dev registry (3800+ models across 100+ providers with per-provider context windows). In-memory cache (1hr TTL) + disk cache for cold starts. - Rewritten get_model_context_length() resolution chain: 0. Config override (model.context_length) 1. Custom providers per-model context_length 2. Persistent disk cache 3. Endpoint /models (local servers) 4. Anthropic /v1/models API (max_input_tokens, API-key only) 5. OpenRouter live API (existing, unchanged) 6. Nous suffix-match via OpenRouter (dot/dash normalization) 7. models.dev registry lookup (provider-aware) 8. Thin hardcoded defaults (broad family patterns) 9. 128K fallback (was 2M) - Provider-aware context: same model now correctly resolves to different context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic, 128K on GitHub Copilot). Provider name flows through ContextCompressor. - DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns. models.dev replaces the per-model hardcoding. - CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K] to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M. - hermes model: prompts for context_length when configuring custom endpoints. Supports shorthand (32k, 128K). Saved to custom_providers per-model config. - custom_providers schema extended with optional models dict for per-model context_length (backward compatible). - Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash normalization. Handles all 15 current Nous models. - Anthropic direct: queries /v1/models for max_input_tokens. Only works with regular API keys (sk-ant-api*), not OAuth tokens. Falls through to models.dev for OAuth users. Tests: 5574 passed (18 new tests for models_dev + updated probe tiers) Docs: Updated configuration.md context length section, AGENTS.md Co-authored-by: Test <test@test.com>	2026-03-20 06:04:33 -07:00
Teknium	b7b585656b	Merge pull request #2110 from NousResearch/hermes/hermes-5d6932ba fix: session reset + custom provider model switch + honcho base_url	2026-03-20 06:01:44 -07:00
Teknium	aa6416399e	Merge pull request #2161 from NousResearch/hermes/hermes-6757a563 fix(display): show spinners and tool progress during streaming mode	2026-03-20 05:17:55 -07:00
Test	b313751acf	fix(display): show spinners and tool progress during streaming mode When streaming was enabled, two visual feedback mechanisms were completely suppressed: 1. The thinking spinner (TUI toolbar) was skipped because the entire spinner block was gated on 'not self._has_stream_consumers()'. Now the thinking_callback fires in streaming mode too — the raw KawaiiSpinner is still skipped (would conflict with streamed tokens) but the TUI toolbar widget works fine alongside streaming. 2. Tool progress lines (the ┊ feed) were invisible because _vprint was blanket-suppressed when stream consumers existed. But during tool execution, no tokens are actively streaming, so printing is safe. Added an _executing_tools flag that _vprint respects to allow output during tool execution even with stream consumers registered.	2026-03-20 05:14:42 -07:00
Test	b1d05dfe8b	fix(openai): route api.openai.com to Responses API for GPT-5.x Based on PR #1859 by @magi-morph (too stale to cherry-pick, reimplemented). GPT-5.x models reject tool calls + reasoning_effort on /v1/chat/completions with a 400 error directing to /v1/responses. This auto-detects api.openai.com in the base URL and switches to codex_responses mode in three places: - AIAgent.__init__: upgrades chat_completions → codex_responses - _try_activate_fallback(): same routing for fallback model - runtime_provider.py: _detect_api_mode_for_url() for both custom provider and openrouter runtime resolution paths Also extracts _is_direct_openai_url() helper to replace the inline check in _max_tokens_param().	2026-03-20 05:09:41 -07:00
Test	5822711ae6	fix: complete session reset — missing compressor counters + test Follow-up to PR #2101 (InB4DevOps). Adds three missing context compressor resets in reset_session_state(): - compression_count (displayed in status bar) - last_total_tokens - _context_probed (stale context-error flag) Also fixes the test_cli_new_session.py prompt_toolkit mock (missing auto_suggest stub) and adds a regression test for #2099 that verifies all token counters and compressor state are zeroed on /new.	2026-03-20 04:35:17 -07:00
Teknium	b19f5133c3	Merge pull request #2118 from NousResearch/hermes/hermes-e83093f0 feat: show reasoning/thinking blocks when show_reasoning is enabled	2026-03-20 04:35:12 -07:00
Test	b1832faaae	feat: show reasoning/thinking blocks when show_reasoning is enabled - Add <thinking> tag to streaming filter's tag list - When show_reasoning is on, route XML reasoning content to the reasoning display box instead of silently discarding it - Expand _strip_think_blocks to handle all tag variants: <think>, <thinking>, <THINKING>, <reasoning>, <REASONING_SCRATCHPAD>	2026-03-19 19:44:31 -07:00
Teknium	3a9a1bbb84	Merge pull request #2091 from dusterbloom/fix/lmstudio-context-length-detection feat: query local servers for actual context window size	2026-03-19 19:08:21 -07:00
InB4DevOps	fe331ed9bd	fix: Reset token counters on new session for accurate usage display (#2099 )	2026-03-20 01:21:25 +01:00
Peppi Littera	746abf5e28	fix: use reasoning content as response when model only produces think blocks Local models (especially Qwen 3.5) sometimes wrap their entire response inside <think> tags, leaving actual content empty. Previously this caused 3 retries and then an error, wasting tokens and failing the request. Now when retries are exhausted and reasoning_text contains the response, it is used as final_response instead of returning an error. The user sees the actual answer instead of "Model generated only think blocks."	2026-03-20 00:26:36 +01:00
Teknium	e84d952dc0	fix(codex): handle reasoning-only responses and replay path (#2070 ) * fix(codex): treat reasoning-only responses as incomplete, not stop When a Codex Responses API response contains only reasoning items (encrypted thinking state) with no message text or tool calls, the _normalize_codex_response method was setting finish_reason='stop'. This sent the response into the empty-content retry loop, which burned 3 retries and then failed — exactly the pattern Nester reported in Discord. Two fixes: 1. _normalize_codex_response: reasoning-only responses (reasoning_items_raw non-empty but no final_text) now get finish_reason='incomplete', routing them to the Codex continuation path instead of the retry loop. 2. Incomplete handling: also checks for codex_reasoning_items when deciding whether to preserve an interim message, so encrypted reasoning state is not silently dropped when there is no visible reasoning text. Adds 4 regression tests covering: - Unit: reasoning-only → incomplete, reasoning+content → stop - E2E: reasoning-only → continuation → final answer succeeds - E2E: encrypted reasoning items preserved in interim messages * fix(codex): ensure reasoning items have required following item in API input Follow-up to the reasoning-only response fix. Three additional issues found by tracing the full replay path: 1. _chat_messages_to_responses_input: when a reasoning-only interim message was converted to Responses API input, the reasoning items were emitted as the last items with no following item. The Responses API requires a following item after each reasoning item (otherwise: 'missing_following_item' error, as seen in OpenHands #11406). Now emits an empty assistant message as the required following item when content is empty but reasoning items were added. 2. Duplicate detection: two consecutive reasoning-only incomplete messages with identical empty content/reasoning but different encrypted codex_reasoning_items were incorrectly treated as duplicates, silently dropping the second response's reasoning state. Now includes codex_reasoning_items in the duplicate comparison. 3. Added tests for both the API input conversion path and the duplicate detection edge case. Research context: verified against OpenCode (uses Vercel AI SDK, no retry loop so avoids the issue), Clawdbot (drops orphaned reasoning blocks entirely), and OpenHands (hit the missing_following_item error). Our approach preserves reasoning continuity while satisfying the API constraint. --------- Co-authored-by: Test <test@test.com>	2026-03-19 10:34:44 -07:00
Teknium	d76fa7fc37	fix: detect context length for custom model endpoints via fuzzy matching + config override (#2051 ) * fix: detect context length for custom model endpoints via fuzzy matching + config override Custom model endpoints (non-OpenRouter, non-known-provider) were silently falling back to 2M tokens when the model name didn't exactly match what the endpoint's /v1/models reported. This happened because: 1. Endpoint metadata lookup used exact match only — model name mismatches (e.g. 'qwen3.5:9b' vs 'Qwen3.5-9B-Q4_K_M.gguf') caused a miss 2. Single-model servers (common for local inference) required exact name match even though only one model was loaded 3. No user escape hatch to manually set context length Changes: - Add fuzzy matching for endpoint model metadata: single-model servers use the only available model regardless of name; multi-model servers try substring matching in both directions - Add model.context_length config override (highest priority) so users can explicitly set their model's context length in config.yaml - Log an informative message when falling back to 2M probe, telling users about the config override option - Thread config_context_length through ContextCompressor and AIAgent init Tests: 6 new tests covering fuzzy match, single-model fallback, config override (including zero/None edge cases). * fix: auto-detect local model name and context length for local servers Cherry-picked from PR #2043 by sudoingX. - Auto-detect model name from local server's /v1/models when only one model is loaded (no manual model name config needed) - Add n_ctx_train and n_ctx to context length detection keys for llama.cpp - Query llama.cpp /props endpoint for actual allocated context (not just training context from GGUF metadata) - Strip .gguf suffix from display in banner and status bar - _auto_detect_local_model() in runtime_provider.py for CLI init Co-authored-by: sudo <sudoingx@users.noreply.github.com> * fix: revert accidental summary_target_tokens change + add docs for context_length config - Revert summary_target_tokens from 2500 back to 500 (accidental change during patching) - Add 'Context Length Detection' section to Custom & Self-Hosted docs explaining model.context_length config override --------- Co-authored-by: Test <test@test.com> Co-authored-by: sudo <sudoingx@users.noreply.github.com>	2026-03-19 06:01:16 -07:00
Teknium	a7cc1cf309	fix: support Anthropic-compatible endpoints for third-party providers (#1997 ) Three bugs prevented providers like MiniMax from using their Anthropic-compatible endpoints (e.g. api.minimax.io/anthropic): 1. _VALID_API_MODES was missing 'anthropic_messages', so explicit api_mode config was silently rejected and defaulted to chat_completions. 2. API-key provider resolution hardcoded api_mode to 'chat_completions' without checking model config or detecting Anthropic-compatible URLs. 3. run_agent.py auto-detection only recognized api.anthropic.com, not third-party endpoints using the /anthropic URL convention. Fixes: - Add 'anthropic_messages' to _VALID_API_MODES - API-key providers now check model config api_mode and auto-detect URLs ending in /anthropic - run_agent.py and fallback logic detect /anthropic URL convention - 5 new tests covering all scenarios Users can now either: - Set MINIMAX_BASE_URL=https://api.minimax.io/anthropic (auto-detected) - Set api_mode: anthropic_messages in model config (explicit) - Use custom_providers with api_mode: anthropic_messages Co-authored-by: Test <test@test.com>	2026-03-18 16:26:06 -07:00
Teknium	7c7feaa033	Merge pull request #1929 from NousResearch/hermes/hermes-b29f73b2 feat: inject model and provider into system prompt	2026-03-18 04:18:41 -07:00
Test	e99aca98ab	feat: inject model and provider into system prompt Adds model name and provider to the system prompt metadata block, alongside the existing session ID and timestamp. These are frozen at session start and don't change mid-conversation, so they won't break prompt caching.	2026-03-18 04:18:26 -07:00
Teknium	e4a3ffa9c1	feat: use SOUL.md as primary agent identity instead of hardcoded default (#1922 ) SOUL.md now loads in slot #1 of the system prompt, replacing the hardcoded DEFAULT_AGENT_IDENTITY. This lets users fully customize the agent's identity and personality by editing ~/.hermes/SOUL.md without it conflicting with the built-in identity text. When SOUL.md is loaded as identity, it's excluded from the context files section to avoid appearing twice. When SOUL.md is missing, empty, unreadable, or skip_context_files is set, the hardcoded DEFAULT_AGENT_IDENTITY is used as a fallback. The default SOUL.md (seeded on first run) already contains the full Hermes personality, so existing installs are unaffected. Co-authored-by: Test <test@test.com>	2026-03-18 04:11:20 -07:00
Test	e7844e9c8d	Merge origin/main, resolve conflicts (self._base_url_lower)	2026-03-18 04:09:00 -07:00
Teknium	c0c14e60b4	fix: make concurrent tool batching path-aware for file mutations (#1914 ) * Improve tool batching independence checks * fix: address review feedback on path-aware batching - Log malformed/non-dict tool arguments at debug level before falling back to sequential, instead of silently swallowing the error into an empty dict - Guard empty paths in _paths_overlap (unreachable in practice due to upstream filtering, but makes the invariant explicit) - Add tests: malformed JSON args, non-dict args, _paths_overlap unit tests including empty path edge cases - web_crawl is not a registered tool (only web_search/web_extract are); no addition needed to _PARALLEL_SAFE_TOOLS --------- Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-18 03:25:38 -07:00
Teknium	a2440f72f6	feat: use endpoint metadata for custom model context and pricing (#1906 ) * perf: cache base_url.lower() via property, consolidate triple load_config(), hoist set constant run_agent.py: - Add base_url property that auto-caches _base_url_lower on every assignment, eliminating 12+ redundant .lower() calls per API cycle across __init__, _build_api_kwargs, _supports_reasoning_extra_body, and the main conversation loop - Consolidate three separate load_config() disk reads in __init__ (memory, skills, compression) into a single call, reusing the result dict for all three config sections model_tools.py: - Hoist _READ_SEARCH_TOOLS set to module level (was rebuilt inside handle_function_call on every tool invocation) * Use endpoint metadata for custom model context and pricing --------- Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-18 03:04:07 -07:00
Test	5b74df2bfc	fix: OAuth flag stale after refresh/fallback, memory nudge never fires, dead code - Update _is_anthropic_oauth in _try_refresh_anthropic_client_credentials() when token type changes during credential refresh - Set _is_anthropic_oauth in _try_activate_fallback() Anthropic path - Move _turns_since_memory and _iters_since_skill init to __init__ so nudge counters accumulate across run_conversation() calls in CLI mode - Remove unreachable retry_count >= max_retries block after raise Adds 7 regression tests. Salvaged from PR #1797 by @0xbyt4.	2026-03-18 02:19:57 -07:00
max	0c392e7a87	feat: integrate GitHub Copilot providers across Hermes Add first-class GitHub Copilot and Copilot ACP provider support across model selection, runtime provider resolution, CLI sessions, delegated subagents, cron jobs, and the Telegram gateway. This also normalizes Copilot model catalogs and API modes, introduces a Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by resolving Homebrew-installed gh binaries under launchd. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-17 23:40:22 -07:00
Teknium	d1d17f4f0a	feat(compression): add summary_base_url + move compression config to YAML-only - Add summary_base_url config option to compression block for custom OpenAI-compatible endpoints (e.g. zai, DeepSeek, Ollama) - Remove compression env var bridges from cli.py and gateway/run.py (CONTEXT_COMPRESSION_* env vars no longer set from config) - Switch run_agent.py to read compression config directly from config.yaml instead of env vars - Fix backwards-compat block in _resolve_task_provider_model to also fire when auxiliary.compression.provider is 'auto' (DEFAULT_CONFIG sets this, which was silently preventing the compression section's summary_* keys from being read) - Add test for summary_base_url config-to-client flow - Update docs to show compression as config.yaml-only Closes #1591 Based on PR #1702 by @uzaylisak	2026-03-17 04:46:15 -07:00
Teknium	85993fbb5a	feat: pre-call sanitization and post-call tool guardrails (#1732 ) Salvage of PR #1321 by @alireza78a (cherry-picked concept, reimplemented against current main). Phase 1 — Pre-call message sanitization: _sanitize_api_messages() now runs unconditionally before every LLM call. Previously gated on context_compressor being present, so sessions loaded from disk or running without compression could accumulate dangling tool_call/tool_result pairs causing API errors. Phase 2a — Delegate task cap: _cap_delegate_task_calls() truncates excess delegate_task calls per turn to MAX_CONCURRENT_CHILDREN. The existing cap in delegate_tool.py only limits the task array within a single call; this catches multiple separate delegate_task tool_calls in one turn. Phase 2b — Tool call deduplication: _deduplicate_tool_calls() drops duplicate (tool_name, arguments) pairs within a single turn when models stutter. All three are static methods on AIAgent, independently testable. 29 tests covering happy paths and edge cases.	2026-03-17 04:24:27 -07:00
Teknium	9f81c11ba0	feat: eager fallback to backup model on rate-limit errors (#1730 ) When a fallback model is configured, switch to it immediately upon detecting rate-limit conditions (429, quota exhaustion, empty/malformed responses) instead of exhausting all retries with exponential backoff. Two eager-fallback checks: 1. Invalid/empty API responses — fallback attempted before retry loop 2. HTTP 429 / rate-limit keyword detection — fallback before backoff Both guarded by _fallback_activated for one-shot semantics. Cherry-picked from PR #1413 by usvimal. Co-authored-by: usvimal <usvimal@users.noreply.github.com>	2026-03-17 04:21:16 -07:00
Teknium	72bcec0ce5	Merge pull request #1723 from NousResearch/fix/compression-attempts-persist fix(core): compression_attempts resets each iteration — allows unlimited compressions	2026-03-17 04:13:54 -07:00
Teknium	d604b9622c	Merge pull request #1722 from NousResearch/fix/run-agent-role-violations fix(core): message role alternation violations in JSON recovery and error handler	2026-03-17 04:13:51 -07:00
teknium1	1264275cc3	fix(core): compression_attempts counter resets each loop iteration compression_attempts was initialized inside the outer while loop, resetting to 0 on every iteration. Since compression triggers a 'continue' back to the top of the loop, the counter never accumulated past 1 — effectively allowing unlimited compression attempts. Move initialization before the outer while loop so the cap of 3 applies across the entire run_conversation() call.	2026-03-17 04:11:32 -07:00
teknium1	cd6dc4ef7e	fix(core): message role violations in JSON recovery and error handler Two edge cases could inject messages that violate role alternation: 1. Invalid JSON recovery (line ~5985): After 3 retries of invalid JSON tool args, a user-role recovery message was injected. But the assistant's tool_calls were never appended, so the sequence could become user → user. Fix: append the assistant message with its tool_calls, then respond with proper tool-role error results. 2. System error handler (line ~6238): Always injected a user-role error message, which creates consecutive user messages if the last message was already user. Fix: dynamically choose the role based on the last message to maintain alternation.	2026-03-17 04:10:41 -07:00
teknium1	24282dceb1	fix(core): reset length_continue_retries after successful continuation length_continue_retries and truncated_response_prefix were initialized once before the outer loop and never reset after a successful continuation. If a conversation hit length truncation once (counter=1), succeeded on continuation, did more tool calls, then hit length again, the counter started at 1 instead of 0 — reducing available retries from 3 to 2. The stale truncated_response_prefix would also leak into the next response. Reset both after the prefix is consumed on a successful final response.	2026-03-17 04:05:20 -07:00
Teknium	56e0c90445	Merge pull request #1700 from NousResearch/fix/redacting-formatter-import fix(core): RedactingFormatter NameError when verbose_logging=True	2026-03-17 03:46:49 -07:00
Teknium	d417ba2a48	feat: add route-aware pricing estimates (#1695 ) Salvaged from PR #1563 by @kshitijk4poor. Cherry-picked with authorship preserved. - Route-aware pricing architecture replacing static MODEL_PRICING + heuristics - Canonical usage normalization (Anthropic/OpenAI/Codex API shapes) - Cache-aware billing (separate cache_read/cache_write rates) - Cost status tracking (estimated/included/unknown/actual) - OpenRouter live pricing via models API - Schema migration v4→v5 with billing metadata columns - Removed speculative forward-looking entries - Removed cost display from CLI status bar - Threaded OpenRouter metadata pre-warm Co-authored-by: kshitij <82637225+kshitijk4poor@users.noreply.github.com>	2026-03-17 03:44:44 -07:00
teknium1	c713d01e72	fix(core): move RedactingFormatter import before conditional block RedactingFormatter was imported inside 'if not has_errors_log_handler:' (line 461) but also used unconditionally in the verbose_logging block (line 479). When the error log handler already exists (e.g. second AIAgent in the same process) AND verbose_logging=True, the import was skipped and line 479 raised NameError. Fix: Move the import one level up so it's always available regardless of whether the error log handler already exists.	2026-03-17 03:43:21 -07:00
Teknium	1d5a39e002	fix: thread safety for concurrent subagent delegation (#1672 ) * fix: thread safety for concurrent subagent delegation Four thread-safety fixes that prevent crashes and data races when running multiple subagents concurrently via delegate_task: 1. Remove redirect_stdout/stderr from delegate_tool — mutating global sys.stdout races with the spinner thread when multiple children start concurrently, causing segfaults. Children already run with quiet_mode=True so the redirect was redundant. 2. Split _run_single_child into _build_child_agent (main thread) + _run_single_child (worker thread). AIAgent construction creates httpx/SSL clients which are not thread-safe to initialize concurrently. 3. Add threading.Lock to SessionDB — subagents share the parent's SessionDB and call create_session/append_message from worker threads with no synchronization. 4. Add _active_children_lock to AIAgent — interrupt() iterates _active_children while worker threads append/remove children. 5. Add _client_cache_lock to auxiliary_client — multiple subagent threads may resolve clients concurrently via call_llm(). Based on PR #1471 by peteromallet. * feat: Honcho base_url override via config.yaml + quick command alias type Two features salvaged from PR #1576: 1. Honcho base_url override: allows pointing Hermes at a remote self-hosted Honcho deployment via config.yaml: honcho: base_url: "http://192.168.x.x:8000" When set, this overrides the Honcho SDK's environment mapping (production/local), enabling LAN/VPN Honcho deployments without requiring the server to live on localhost. Uses config.yaml instead of env var (HONCHO_URL) per project convention. 2. Quick command alias type: adds a new 'alias' quick command type that rewrites to another slash command before normal dispatch: quick_commands: sc: type: alias target: /context Supports both CLI and gateway. Arguments are forwarded to the target command. Based on PR #1576 by redhelix. --------- Co-authored-by: peteromallet <peteromallet@users.noreply.github.com> Co-authored-by: redhelix <redhelix@users.noreply.github.com>	2026-03-17 02:53:33 -07:00
Teknium	a3ac142c83	fix(core): guard print() calls in run_conversation() against OSError In headless environments (systemd, Docker, nohup) stdout can become unavailable mid-session. Raw print() raises OSError which crashes cron jobs — agent finishes work but delivery never happens because the error handler's own print() also raises OSError. Fix: - Add _safe_print() static method that wraps print() with try/except OSError — silently drops output when stdout is broken - Make _vprint() use _safe_print() — protects all calls through the verbose print path - Convert raw print() calls in run_conversation() hot path to use _safe_print(): starting conversation, interrupt, budget exhausted, preflight compression, context cache, conversation completed - Error handler print (the cascading crash point) gets explicit try/except with logger.error() fallback so diagnostics aren't lost Fixes #845 Closes #1358 (superseded — PR was 323 commits stale with a bug)	2026-03-17 02:41:01 -07:00
Teknium	f2414bfd45	feat: allow custom endpoints to use responses API via api_mode override (#1651 ) Add HERMES_API_MODE env var and model.api_mode config field to let custom OpenAI-compatible endpoints opt into codex_responses mode without requiring the OpenAI Codex OAuth provider path. - _get_configured_api_mode() reads HERMES_API_MODE env (precedence) then model.api_mode from config.yaml; validates against whitelist - Applied in both _resolve_openrouter_runtime() and _resolve_named_custom_runtime() (original PR only covered openrouter) - Fix _dump_api_request_debug() to show /responses URL when in codex_responses mode instead of always showing /chat/completions - Tests for config override, env override, invalid values, named custom providers, and debug dump URL for both API modes Inspired by PR #1041 by @mxyhi. Co-authored-by: mxyhi <mxyhi@users.noreply.github.com>	2026-03-17 02:04:36 -07:00
Teknium	96dac22194	fix: prevent infinite 400 loop on context overflow + block prompt injection via cache files (#1630 , #1558 ) * fix: prevent infinite 400 failure loop on context overflow (#1630) When a gateway session exceeds the model's context window, Anthropic may return a generic 400 invalid_request_error with just 'Error' as the message. This bypassed the phrase-based context-length detection, causing the agent to treat it as a non-retryable client error. Worse, the failed user message was still persisted to the transcript, making the session even larger on each attempt — creating an infinite loop. Three-layer fix: 1. run_agent.py — Fallback heuristic: when a 400 error has a very short generic message AND the session is large (>40% of context or >80 messages), treat it as a probable context overflow and trigger compression instead of aborting. 2. run_agent.py + gateway/run.py — Don't persist failed messages: when the agent returns failed=True before generating any response, skip writing the user's message to the transcript/DB. This prevents the session from growing on each failure. 3. gateway/run.py — Smarter error messages: detect context-overflow failures and suggest /compact or /reset specifically, instead of a generic 'try again' that will fail identically. * fix(skills): detect prompt injection patterns and block cache file reads Adds two security layers to prevent prompt injection via skills hub cache files (#1558): 1. read_file: blocks direct reads of ~/.hermes/skills/.hub/ directory (index-cache, catalog files). The 3.5MB clawhub_catalog_v1.json was the original injection vector — untrusted skill descriptions in the catalog contained adversarial text that the model executed. 2. skill_view: warns when skills are loaded from outside the trusted ~/.hermes/skills/ directory, and detects common injection patterns in skill content ("ignore previous instructions", "<system>", etc.). Cherry-picked from PR #1562 by ygd58. --------- Co-authored-by: buray <ygd58@users.noreply.github.com>	2026-03-17 01:50:59 -07:00
Teknium	5ada0b95e9	Merge pull request #1609 from 0xbyt4/fix/context-counter-cache-tokens fix: context counter shows cached token count in status bar	2026-03-17 01:45:12 -07:00
Teknium	eaa9ceeb43	Merge pull request #1621 from Death-Incarnate/main fix: isolate test_anthropic_adapter from local credentials	2026-03-17 01:40:39 -07:00
Teknium	3576f44a57	feat: add Vercel AI Gateway provider (#1628 ) * feat: add Vercel AI Gateway as a first-class provider Adds AI Gateway (ai-gateway.vercel.sh) as a new inference provider with AI_GATEWAY_API_KEY authentication, live model discovery, and reasoning support via extra_body.reasoning. Based on PR #1492 by jerilynzheng. * feat: add AI Gateway to setup wizard, doctor, and fallback providers * test: add AI Gateway to api_key_providers test suite * feat: add AI Gateway to hermes model CLI and model metadata Wire AI Gateway into the interactive model selection menu and add context lengths for AI Gateway model IDs in model_metadata.py. * feat: use claude-haiku-4.5 as AI Gateway auxiliary model * revert: use gemini-3-flash as AI Gateway auxiliary model * fix: move AI Gateway below established providers in selection order --------- Co-authored-by: jerilynzheng <jerilynzheng@users.noreply.github.com> Co-authored-by: jerilynzheng <zheng.jerilyn@gmail.com>	2026-03-17 00:12:16 -07:00
DeadMan	285300528b	fix: isolate test_anthropic_adapter from local credentials Two tests lacked filesystem isolation causing them to pick up real ~/.claude/.credentials.json tokens on machines with Claude Code installed. - test_prefers_oauth_token_over_api_key: add tmp_path, mock Path.home, clear CLAUDE_CODE_OAUTH_TOKEN env - test_falls_back_to_token: same isolation Also commit run_agent.py generic-400 retry fix.	2026-03-16 22:53:32 -07:00
0xbyt4	8d0a96a8bf	fix: context counter shows cached token count in status bar Anthropic prompt caching splits input into cache_read_input_tokens, cache_creation_input_tokens, and non-cached input_tokens. The context counter only read input_tokens (non-cached portion), showing ~3 tokens instead of the real ~18K total. Now includes cached portions for Anthropic native provider only — other providers (OpenAI, OpenRouter, Codex) already include cached tokens in their prompt_tokens field. Before: 3/200K \| 0% After: 17.7K/200K \| 9%	2026-03-17 05:06:11 +03:00
Teknium	2158c44efd	fix: Anthropic OAuth compatibility — Claude Code identity fingerprinting (#1597 ) Anthropic routes OAuth/subscription requests based on Claude Code's identity markers. Without them, requests get intermittent 500 errors (~25% failure rate observed). This matches what pi-ai (clawdbot) and OpenCode both implement for OAuth compatibility. Changes (OAuth tokens only — API key users unaffected): 1. Headers: user-agent 'claude-cli/2.1.2 (external, cli)' + x-app 'cli' 2. System prompt: prepend 'You are Claude Code, Anthropic's official CLI' 3. System prompt sanitization: replace Hermes/Nous references 4. Tool names: prefix with 'mcp_' (Claude Code convention for non-native tools) 5. Tool name stripping: remove 'mcp_' prefix from response tool calls Before: 9/12 OK, 1 hard fail, 4 needed retries (~25% error rate) After: 16/16 OK, 0 failures, 0 retries (0% error rate)	2026-03-16 17:08:22 -07:00
Teknium	e6cf1c94a8	Merge pull request #1585 from 0xbyt4/fix/anthropic-error-handling fix(anthropic): retry 429/529 errors and surface error details to users	2026-03-16 15:46:06 -07:00
0xbyt4	d998cac319	fix(anthropic): retry 429/529 errors and surface error details to users - 429 rate limit and 529 overloaded were incorrectly treated as non-retryable client errors, causing immediate failure instead of exponential backoff retry. Users hitting Anthropic rate limits got silent failures or no response at all. - Generic "Sorry, I encountered an unexpected error" now includes error type, details, and status-specific hints (auth, rate limit, overloaded). - Failed agent with final_response=None now surfaces the actual error message instead of returning an empty response.	2026-03-17 01:07:11 +03:00
teknium1	f4d61c168b	merge: resolve conflicts with main (show_cost, turn routing, docker docs)	2026-03-16 14:22:38 -07:00
Teknium	1ecfe68675	feat: improve memory prioritization + aggressive skill updates (inspired by OpenAI Codex) * feat: improve memory prioritization — user preferences over procedural knowledge Inspired by OpenAI Codex's memory prompt improvements (openai/codex#14493) which focus memory writes on user preferences and recurring patterns rather than procedural task details. Key insight: 'Optimize for reducing future user steering — the most valuable memory prevents the user from having to repeat themselves.' Changes: - MEMORY_GUIDANCE (prompt_builder.py): added prioritization hierarchy and the core principle about reducing user steering - MEMORY_SCHEMA (memory_tool.py): reordered WHEN TO SAVE list to put corrections first, added explicit PRIORITY guidance - Memory nudge (run_agent.py): now asks specifically about preferences, corrections, and workflow patterns instead of generic 'anything' - Memory flush (run_agent.py): now instructs to prioritize user preferences and corrections over task-specific details * feat: more aggressive skill creation and update prompting Press harder on skill updates — the agent should proactively patch skills when it encounters issues during use, not wait to be asked. Changes: - SKILLS_GUIDANCE: 'consider saving' → 'save'; added explicit instruction to patch skills immediately when found outdated/wrong - Skills header: added instruction to update loaded skills before finishing if they had missing steps or wrong commands - Skill nudge: more assertive ('save the approach' not 'consider saving'), now also prompts for updating existing skills used in the task - Skill nudge interval: lowered default from 15 to 10 iterations - skill_manage schema: added 'patch it immediately' to update triggers	2026-03-16 06:52:32 -07:00
teknium1	8e07f9ca56	fix: audit fixes — 5 bugs found and resolved Thorough code review found 5 issues across run_agent.py, cli.py, and gateway/: 1. CRITICAL — Gateway stream consumer task never started: stream_consumer_holder was checked BEFORE run_sync populated it. Fixed with async polling pattern (same as track_agent). 2. MEDIUM-HIGH — Streaming fallback after partial delivery caused double-response: if streaming failed after some tokens were delivered, the fallback would re-deliver the full response. Now tracks deltas_were_sent and only falls back when no tokens reached consumers yet. 3. MEDIUM — Codex mode lost on_first_delta spinner callback: _run_codex_stream now accepts on_first_delta parameter, fires it on first text delta. Passed through from _interruptible_streaming_api_call via _codex_on_first_delta instance attribute. 4. MEDIUM — CLI close-tag after-text bypassed tag filtering: text after a reasoning close tag was sent directly to _emit_stream_text, skipping open-tag detection. Now routes through _stream_delta for full filtering. 5. LOW — Removed 140 lines of dead code: old _streaming_api_call method (superseded by _interruptible_streaming_api_call). Updated 13 tests in test_run_agent.py and test_openai_client_lifecycle.py to use the new method name and signature. 4573 tests passing.	2026-03-16 06:35:46 -07:00
teknium1	99369b926c	fix: always fall back to non-streaming on ANY streaming error Previously the fallback only triggered on specific error keywords like 'streaming is not supported'. Many third-party providers have partial or broken streaming — rejecting stream=True, crashing on stream_options, dropping connections mid-stream, returning malformed chunks, etc. Now: any exception during the streaming API call triggers an automatic fallback to the standard non-streaming request path. The error is logged at INFO level for diagnostics but never surfaces to the user. If the fallback also fails, THAT error propagates normally. This ensures streaming is additive — it improves UX when it works but never breaks providers that don't support it. Tests: 2 new (any-error fallback, double-failure propagation), 15 total.	2026-03-16 06:15:09 -07:00
teknium1	ac739e485f	fix(cli): reasoning tag suppression during streaming + fix fallback detection Fixes two issues found during live testing: 1. Reasoning tag suppression: close tags like </REASONING_SCRATCHPAD> that arrive split across stream tokens (e.g. '</REASONING_SCRATCH' + 'PAD>\n\nHello') were being lost because the buffer was discarded. Fix: keep a sliding window of the tail (max close tag length) so partial tags survive across tokens. 2. Streaming fallback detection was too broad — 'stream' matched any error containing that word (including 'stream_options' rejections). Narrowed to specific phrases: 'streaming is not', 'streaming not support', 'does not support stream', 'not available'. Verified with real API calls: streaming works end-to-end with reasoning block suppression, response box framing, and proper fallback to Rich Panel when streaming isn't active.	2026-03-16 05:28:10 -07:00
teknium1	c1ac32737d	feat: unified streaming infrastructure — core delta callbacks for all providers Stage 1 of streaming support. Adds: - stream_delta_callback parameter on AIAgent.__init__ for real-time token delivery - _interruptible_streaming_api_call() handling chat_completions + anthropic_messages - Enhanced _run_codex_stream() to fire delta callbacks during Codex streaming - _fire_stream_delta() fires both display and TTS callbacks - _fire_reasoning_delta() for reasoning content streaming - Tool-call suppression: callbacks only fire on text-only responses - on_first_delta callback for spinner control on first token - Provider fallback: graceful degradation to non-streaming - _has_stream_consumers() unifies stream_delta_callback and _stream_callback checks - Anthropic streaming returns native Message for downstream compatibility Drawing from PRs #922 (unified streaming), #1312 (gateway consumer), #774 (Telegram streaming), #798 (CLI streaming), #1214 (reasoning modes). Credit: jobless0x, OutThisLife, clicksingh, raulvidis.	2026-03-16 05:05:45 -07:00
Teknium	9e845a6e53	feat: major /rollback improvements — enabled by default, diff preview, file-level restore, conversation undo, terminal checkpoints Checkpoint & rollback upgrades: 1. Enabled by default — checkpoints are now on for all new sessions. Zero cost when no file-mutating tools fire. Disable with checkpoints.enabled: false in config.yaml. 2. Diff preview — /rollback diff <N> shows a git diff between the checkpoint and current working tree before committing to a restore. 3. File-level restore — /rollback <N> <file> restores a single file from a checkpoint instead of the entire directory. 4. Conversation undo on rollback — when restoring files, the last chat turn is automatically undone so the agent's context matches the restored filesystem state. 5. Terminal command checkpoints — destructive terminal commands (rm, mv, sed -i, truncate, git reset/clean, output redirects) now trigger automatic checkpoints before execution. Previously only write_file and patch were covered. 6. Change summary in listing — /rollback now shows file count and +insertions/-deletions for each checkpoint. 7. Fixed dead code — removed duplicate _run_git call in list_checkpoints with nonsensical --all if False condition. 8. Updated help text — /rollback with no args now shows available subcommands (diff, file-level restore).	2026-03-16 04:43:37 -07:00
Teknium	dd7921d514	fix(honcho): isolate session routing for multi-user gateway (#1500 ) Salvaged from PR #1470 by adavyas. Core fix: Honcho tool calls in a multi-session gateway could route to the wrong session because honcho_tools.py relied on process-global state. Now threads session context through the call chain: AIAgent._invoke_tool() → handle_function_call() → registry.dispatch() → handler **kw → _resolve_session_context() Changes: - Add _resolve_session_context() to prefer per-call context over globals - Plumb honcho_manager + honcho_session_key through handle_function_call - Add sync_honcho=False to run_conversation() for synthetic flush turns - Pass honcho_session_key through gateway memory flush lifecycle - Harden gateway PID detection when /proc cmdline is unreadable - Make interrupt test scripts import-safe for pytest-xdist - Wrap BibTeX examples in Jekyll raw blocks for docs build - Fix thread-order-dependent assertion in client lifecycle test - Expand Honcho docs: session isolation, lifecycle, routing internals Dropped from original PR: - Indentation change in _create_request_openai_client that would move client creation inside the lock (causes unnecessary contention) Co-authored-by: adavyas <adavyas@users.noreply.github.com>	2026-03-16 00:23:47 -07:00
Teknium	eb4f0348e1	fix: persist CLI token counts to session DB for /insights Token usage was tracked in-memory during CLI sessions (session_prompt_tokens, session_completion_tokens) but never written to the SQLite session DB. The gateway persisted tokens via session_store.update_session(), but CLI sessions always showed 0 tokens in /insights. Now run_agent.py persists token deltas to the DB after each API call for CLI sessions. Gateway sessions continue to use their existing persist path to avoid double-counting.	2026-03-16 00:23:13 -07:00
Teknium	3f0f4a04a9	fix(agent): skip reasoning extra_body for unsupported OpenRouter models (#1485 ) * fix(agent): skip reasoning extra_body for models that don't support it Sending reasoning config to models like MiniMax or Nvidia via OpenRouter causes a 400 BadRequestError. Previously, reasoning extra_body was sent to all OpenRouter and Nous models unconditionally. Fix: only send reasoning extra_body when the model slug starts with a known reasoning-capable prefix (deepseek/, anthropic/, openai/, x-ai/, google/gemini-2, qwen/qwen3) or when using Nous Portal directly. Applies to both the main API call path (_build_api_kwargs) and the conversation summary path. Fixes #1083 * test(agent): cover reasoning extra_body gating --------- Co-authored-by: ygd58 <buraysandro9@gmail.com>	2026-03-15 20:42:07 -07:00
Teknium	c564e1c3dc	feat(tools): centralize tool emoji metadata in registry + skin integration (#1484 ) feat(tools): centralize tool emoji metadata in registry + skin integration	2026-03-15 20:35:24 -07:00
teknium1	210d5ade1e	feat(tools): centralize tool emoji metadata in registry + skin integration - Add 'emoji' field to ToolEntry and 'get_emoji()' to ToolRegistry - Add emoji= to all 50+ registry.register() calls across tool files - Add get_tool_emoji() helper in agent/display.py with 3-tier resolution: skin override → registry default → hardcoded fallback - Replace hardcoded emoji maps in run_agent.py, delegate_tool.py, and gateway/run.py with centralized get_tool_emoji() calls - Add 'tool_emojis' field to SkinConfig so skins can override per-tool emojis (e.g. ares skin could use swords instead of wrenches) - Add 11 tests (5 registry emoji, 6 display/skin integration) - Update AGENTS.md skin docs table Based on the approach from PR #1061 by ForgingAlex (emoji centralization in registry). This salvage fixes several issues from the original: - Does NOT split the cronjob tool (which would crash on missing schemas) - Does NOT change image_generate toolset/requires_env/is_async - Does NOT delete existing tests - Completes the centralization (gateway/run.py was missed) - Hooks into the skin system for full customizability	2026-03-15 20:21:21 -07:00
Teknium	103f7b1ebc	fix: verbose mode shows full untruncated output * fix(cli): silence tirith prefetch install warnings at startup * fix: verbose mode now shows full untruncated tool args, results, content, and think blocks When tool progress is set to 'verbose' (via /verbose or config), the display was still truncating tool arguments to 100 chars, tool results to 100-200 chars, assistant content to 100 chars, and think blocks to 5 lines. This defeated the purpose of verbose mode. Changes: - Tool args: show full JSON args (not truncated to log_prefix_chars) - Tool results: show full result content in both display and debug logs - Assistant content: show full content during tool-call loops - Think blocks: show full reasoning text (not truncated to 5 lines/100 chars) - Auto-enable reasoning display when verbose mode is active - Fix initial agent creation to respect verbose config (was always quiet_mode=True) - Updated verbose label to mention think blocks	2026-03-15 20:03:37 -07:00
teknium1	93a0c0cddd	fix: handle dict tool call arguments from local backends Normalize tool call arguments when OpenAI-compatible backends return parsed dict/list payloads instead of JSON strings. This prevents the .strip() crash during tool-call validation for llama.cpp and similar servers, while preserving existing empty-string and invalid-JSON handling. Adds a focused regression test for dict arguments in the agent loop.	2026-03-15 08:00:19 -07:00
teknium1	f24c00a5bf	fix(config): reload .env over stale shell overrides Hermes startup entrypoints now load ~/.hermes/.env and project fallback env files with user config taking precedence over stale shell-exported values. This makes model/provider/base URL changes in .env actually take effect after restarting Hermes. Adds a shared env loader plus regression coverage, and reproduces the original bug case where OPENAI_BASE_URL and HERMES_INFERENCE_PROVIDER remained stuck on old shell values before import.	2026-03-15 06:46:28 -07:00
teknium1	62abb453d3	Merge origin/main into hermes/hermes-daa73839	2026-03-14 23:44:47 -07:00
teknium1	735a6e7651	fix: convert anthropic image content blocks	2026-03-14 23:41:20 -07:00
0xbyt4	6f85283553	fix: use json.dumps instead of str() for Codex Responses API arguments When the Responses API returns tool call arguments as a dict, str(dict) produces Python repr with single quotes (e.g. {'key': 'val'}) which is invalid JSON. Downstream json.loads() fails silently and the tool gets called with empty arguments, losing all parameters. Affects both function_call and custom_tool_call item types in _normalize_codex_response().	2026-03-14 22:03:53 -07:00
yemi-lagosinternationalmarket	00c5e77724	fix: prevent closed OpenAI client reuse across retries Use per-request OpenAI clients inside _interruptible_api_call so interrupts and transport failures do not poison later retries. Also add closed-client detection/recreation for the shared client and regression tests covering retry and concurrency behavior.	2026-03-14 21:56:00 -07:00
Teknium	fc5443d854	Merge pull request #1360 from NousResearch/hermes/hermes-aa701810 fix: refresh Anthropic OAuth before stale env tokens	2026-03-14 19:53:40 -07:00
teknium1	70ea13eb40	fix: preflight Anthropic auth and prefer Claude store	2026-03-14 19:38:55 -07:00
teknium1	e052c74727	fix: refresh Anthropic OAuth before stale env tokens	2026-03-14 19:22:31 -07:00
teknium1	df5c61b37c	feat: compress cron management into one tool	2026-03-14 12:21:50 -07:00
teknium1	7b10881b9e	fix: persist clean voice transcripts and /voice off state - keep CLI voice prefixes API-local while storing the original user text - persist explicit gateway off state and restore adapter auto-TTS suppression on restart - add regression coverage for both behaviors	2026-03-14 06:14:22 -07:00
0xbyt4	eb34c0b09a	fix: voice pipeline hardening — 7 bug fixes with tests 1. Anthropic + ElevenLabs TTS silence: forward full response to TTS callback for non-streaming providers (choices first, then native content blocks fallback). 2. Subprocess timeout kill: play_audio_file now kills the process on TimeoutExpired instead of leaving zombie processes. 3. Discord disconnect cleanup: leave all voice channels before closing the client to prevent leaked state. 4. Audio stream leak: close InputStream if stream.start() fails. 5. Race condition: read/write _on_silence_stop under lock in audio callback thread. 6. _vprint force=True: show API error, retry, and truncation messages even during streaming TTS. 7. _refresh_level lock: read _voice_recording under _voice_lock.	2026-03-14 14:27:21 +03:00
0xbyt4	cc0a453476	fix: address PR review round 5 — streaming guard, VC auth, history prefix, auto-TTS control 1. Gate _streaming_api_call to chat_completions mode only — Anthropic and Codex fall back to _interruptible_api_call. Preserve Anthropic base_url across all client rebuild paths (interrupt, fallback, 401 refresh). 2. Discord VC synthetic events now use chat_type="channel" instead of defaulting to "dm" — prevents session bleed into DM context. Authorization runs before echoing transcript. Sanitize @everyone/@here in voice transcripts. 3. CLI voice prefix ("[Voice input...]") is now API-call-local only — stripped from returned history so it never persists to session DB or resumed sessions. 4. /voice off now disables base adapter auto-TTS via _auto_tts_disabled_chats set — voice input no longer triggers TTS when voice mode is off.	2026-03-14 14:27:21 +03:00
0xbyt4	0ff1b4ade2	fix: harden web gateway security and fix error swallowing - Use hmac.compare_digest for timing-safe token comparison (3 endpoints) - Default bind to 127.0.0.1 instead of 0.0.0.0 - Sanitize upload filenames with Path.name to prevent path traversal - Add DOMPurify to sanitize marked.parse() output against XSS - Replace add_static with authenticated media handler - Hide token in group chats for /remote-control command - Use ctypes.util.find_library for Opus instead of hardcoded paths - Add force=True to 5 interrupt _vprint calls for visibility - Log Opus decode errors and voice restart failures instead of swallowing	2026-03-14 14:27:21 +03:00
0xbyt4	d646442692	fix: restore Anthropic interrupt handler in _interruptible_api_call Rebase auto-merge silently overwrote main's Anthropic-aware interrupt handler with the older OpenAI-only version. Without this fix, interrupting an Anthropic API call closes the wrong client and leaves token generation running on the Anthropic side.	2026-03-14 14:27:21 +03:00
0xbyt4	a78249230c	fix: address voice mode PR review (streaming TTS, prompt cache, _vprint) Bug A: Replace stale _HAS_ELEVENLABS/_HAS_AUDIO boolean imports with lazy import function calls (_import_elevenlabs, _import_sounddevice). The old constants no longer exist in tts_tool -- the try/except silently swallowed the ImportError, leaving streaming TTS dead. Bug B: Use user message prefix instead of modifying system prompt for voice mode instruction. Changing ephemeral_system_prompt mid-session invalidates the prompt cache. Now the concise-response hint is prepended to the user_message passed to run_conversation while conversation_history keeps the original text. Minor: Add force parameter to _vprint so critical error messages (max retries, non-retryable errors, API failures) are always shown even during streaming TTS playback. Tests: 15 new tests in test_voice_cli_integration.py covering all three fixes -- lazy import activation, message prefix behavior, history cleanliness, system prompt stability, and AST verification that all critical _vprint calls use force=True.	2026-03-14 14:27:20 +03:00
0xbyt4	b859dfab16	fix: address voice mode review feedback 1. Fully lazy imports: sounddevice, numpy, elevenlabs, edge_tts, and openai are never imported at module level. Each is imported only when the feature is explicitly activated, preventing crashes in headless environments (SSH, Docker, WSL, no PortAudio). 2. No core agent loop changes: streaming TTS path extracted from _interruptible_api_call() into separate _streaming_api_call() method. The original method is restored to its upstream form. 3. Configurable key binding: push-to-talk key changed from Ctrl+R (conflicts with readline reverse-search) to Ctrl+B by default. Configurable via voice.push_to_talk_key in config.yaml. 4. Environment detection: new detect_audio_environment() function checks for SSH, Docker, WSL, and missing audio devices before enabling voice mode. Auto-disables with clear warnings in incompatible environments. 5. Graceful degradation: every audio touchpoint (sd.play, sd.InputStream, sd.OutputStream) wrapped in try/except with ImportError/OSError handling. Failures produce warnings, not crashes.	2026-03-14 14:27:20 +03:00
0xbyt4	46db7aeffd	fix: streaming tool call parsing, error handling, and fake HA state mutation - Fix Gemini streaming tool call merge bug: multiple tool calls with same index but different IDs are now parsed as separate calls instead of concatenating names (e.g. ha_call_serviceha_call_service) - Handle partial results in voice mode: show error and stop continuous mode when agent returns partial/failed results with empty response - Fix error display during streaming TTS: error messages are shown in full response box even when streaming box was already opened - Add duplicate sentence filter in TTS: skip near-duplicate sentences from LLM repetition - Fix fake HA server state mutation: turn_on/turn_off/set_temperature correctly update entity states; temperature sensor simulates change when thermostat is adjusted	2026-03-14 14:27:20 +03:00
0xbyt4	b00c5949fc	fix: suppress verbose logs during streaming TTS, improve hallucination filter, stop continuous mode on errors - Add _vprint() helper to suppress log output when stream_callback is active - Expand Whisper hallucination filter with multi-language phrases and regex pattern for repetitive text - Stop continuous voice mode when agent returns a failed result (e.g. 429 rate limit)	2026-03-14 14:26:55 +03:00
0xbyt4	179d9e1a22	feat: add streaming sentence-by-sentence TTS via ElevenLabs Stream audio to speaker as the agent generates tokens instead of waiting for the full response. First sentence plays within ~1-2s of agent starting to respond. - run_agent: add stream_callback to run_conversation/chat, streaming path in _interruptible_api_call accumulates chunks into mock ChatCompletion while forwarding content deltas to callback - tts_tool: add stream_tts_to_speaker() with sentence buffering, think block filtering, markdown stripping, ElevenLabs pcm_24000 streaming to sounddevice OutputStream - cli: wire up streaming TTS pipeline in chat(), detect elevenlabs provider + sounddevice availability, skip batch TTS when streaming is active, signal stop on interrupt Falls back to batch TTS for Edge/OpenAI providers or when elevenlabs/sounddevice are not available. Zero impact on non-voice mode (callback defaults to None).	2026-03-14 14:26:30 +03:00
teknium1	cbbba87099	fix: reuse shared atomic session log helper	2026-03-14 02:56:13 -07:00
alireza78a	f685741481	fix(agent): use atomic write in _save_session_log to prevent data loss	2026-03-14 02:53:01 -07:00
Teknium	1117a21065	Merge pull request #1271 from NousResearch/hermes/hermes-de3d4e49 fix: guard init-time stdio writes	2026-03-14 02:21:39 -07:00
teknium1	936040d8f7	fix: guard init-time stdio writes	2026-03-14 02:19:46 -07:00
Teknium	29176f302e	fix: sanitize chat payloads and provider precedence (#1253 ) fix: sanitize chat payloads and provider precedence	2026-03-14 00:09:14 -07:00
Adavya Sharma	a628c607f0	fix: preserve chat kwargs identity when no sanitization is needed	2026-03-13 23:59:12 -07:00
Adavya Sharma	358dab52ce	fix: sanitize chat payloads and provider precedence	2026-03-13 23:59:12 -07:00
Eris	c2a7921f3b	fix: prevent logging handler accumulation in gateway mode Use exact Path comparison instead of endswith to detect existing errors.log handlers, avoiding false positives from similarly-named log files.	2026-03-13 23:56:22 -07:00
0xIbra	437ec17125	fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths Several files resolved paths via Path.home() / ".hermes" or os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME environment variable. This broke isolation when running multiple Hermes instances with distinct HERMES_HOME directories. Replace all hardcoded paths with calls to get_hermes_home() from hermes_cli.config, consistent with the rest of the codebase. Files fixed: - tools/process_registry.py (processes.json) - gateway/pairing.py (pairing/) - gateway/sticker_cache.py (sticker_cache.json) - gateway/channel_directory.py (channel_directory.json, sessions.json) - gateway/config.py (gateway.json, config.yaml, sessions_dir) - gateway/mirror.py (sessions/) - gateway/hooks.py (hooks/) - gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/) - gateway/platforms/whatsapp.py (whatsapp/session) - gateway/delivery.py (cron/output) - agent/auxiliary_client.py (auth.json) - agent/prompt_builder.py (SOUL.md) - cli.py (config.yaml, images/, pastes/, history) - run_agent.py (logs/) - tools/environments/base.py (sandboxes/) - tools/environments/modal.py (modal_snapshots.json) - tools/environments/singularity.py (singularity_snapshots.json) - tools/tts_tool.py (audio_cache) - hermes_cli/status.py (cron/jobs.json, sessions.json) - hermes_cli/gateway.py (logs/, whatsapp session) - hermes_cli/main.py (whatsapp/session) Tests updated to use HERMES_HOME env var instead of patching Path.home(). Closes #892 (cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)	2026-03-13 21:32:53 -07:00
Teknium	938e887b4c	fix: keep honcho recall out of cached system prefix (#1201 ) Attach later-turn Honcho recall to the current-turn user message at API call time instead of appending it to the system prompt. This preserves the stable system-prefix cache while keeping Honcho continuity context available for the turn. Also adds regression coverage for the injection helper and for continuing sessions so Honcho recall stays out of the system prompt.	2026-03-13 21:07:00 -07:00
Teknium	b74facd119	fix: handle YAML null values in session reset policy + configurable API timeout (#1194 ) * fix: Home Assistant event filtering now closed by default Previously, when no watch_domains or watch_entities were configured, ALL state_changed events passed through to the agent, causing users to be flooded with notifications for every HA entity change. Now events are dropped by default unless the user explicitly configures: - watch_domains: list of domains to monitor (e.g. climate, light) - watch_entities: list of specific entity IDs to monitor - watch_all: true (new option — opt-in to receive all events) A warning is logged at connect time if no filters are configured, guiding users to set up their HA platform config. All 49 gateway HA tests + 52 HA tool tests pass. * docs: update Home Assistant integration documentation - homeassistant.md: Fix event filtering docs to reflect closed-by-default behavior. Add watch_all option. Replace Python dict config example with YAML. Fix defaults table (was incorrectly showing 'all'). Add required configuration warning admonition. - environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section. - messaging/index.md: Add Home Assistant to description, architecture diagram, platform toolsets table, and Next Steps links. * fix(terminal): strip provider env vars from background and PTY subprocesses Extends the env var blocklist from #1157 to also cover the two remaining leaky paths in process_registry.py: - spawn_local() PTY path (line 156) - spawn_local() background Popen path (line 197) Both were still using raw os.environ, leaking provider vars to background processes and interactive PTY sessions. Now uses the same dynamic _HERMES_PROVIDER_ENV_BLOCKLIST from local.py. Explicit env_vars passed to spawn_local() still override the blocklist, matching the existing behavior for callers that intentionally need these. Gap identified by PR #1004 (@PeterFile). * feat(delegate): add observability metadata to subagent results Enrich delegate_task results with metadata from the child AIAgent: - model: which model the child used - exit_reason: completed \| interrupted \| max_iterations - tokens.input / tokens.output: token counts - tool_trace: per-tool-call trace with byte sizes and ok/error status Tool trace uses tool_call_id matching to correctly pair parallel tool calls with their results, with a fallback for messages without IDs. Cherry-picked from PR #872 by @omerkaz, with fixes: - Fixed parallel tool call trace pairing (was always updating last entry) - Removed redundant 'iterations' field (identical to existing 'api_calls') - Added test for parallel tool call trace correctness Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> * feat(stt): add free local whisper transcription via faster-whisper Replace OpenAI-only STT with a dual-provider system mirroring the TTS architecture (Edge TTS free / ElevenLabs paid): STT: faster-whisper local (free, default) / OpenAI Whisper API (paid) Changes: - tools/transcription_tools.py: Full rewrite with provider dispatch, config loading, local faster-whisper backend, and OpenAI API backend. Auto-downloads model (~150MB for 'base') on first voice message. Singleton model instance reused across calls. - pyproject.toml: Add faster-whisper>=1.0.0 as core dependency - hermes_cli/config.py: Expand stt config to match TTS pattern with provider selection and per-provider model settings - agent/context_compressor.py: Fix .strip() crash when LLM returns non-string content (dict from llama.cpp, None). Fixes #1100 partially. - tests/: 23 new tests for STT providers + 2 for compressor fix - docs/: Updated Voice & TTS page with STT provider table, model sizes, config examples, and fallback behavior Fallback behavior: - Local not installed → OpenAI API (if key set) - OpenAI key not set → local whisper (if installed) - Neither → graceful error message to user Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com> * fix: handle YAML null values in session reset policy + configurable API timeout Two fixes from PR #888 by @Jah-yee: 1. SessionResetPolicy.from_dict() — data.get('at_hour', 4) returns None when the YAML key exists with a null value. Now explicitly checks for None and falls back to defaults. Zero remains a valid value. 2. API timeout — hardcoded 900s is now configurable via HERMES_API_TIMEOUT env var. Useful for slow local models (llama.cpp) that need longer. Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com> --------- Co-authored-by: omerkaz <omerkaz@users.noreply.github.com> Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>	2026-03-13 11:16:42 -07:00
Teknium	0157253145	Merge pull request #1152 from NousResearch/hermes/hermes-f47f71c0 feat: concurrent tool execution with ThreadPoolExecutor	2026-03-13 03:20:38 -07:00
kshitijk4poor	ccfbf42844	feat: secure skill env setup on load (core #688 ) When a skill declares required_environment_variables in its YAML frontmatter, missing env vars trigger a secure TUI prompt (identical to the sudo password widget) when the skill is loaded. Secrets flow directly to ~/.hermes/.env, never entering LLM context. Key changes: - New required_environment_variables frontmatter field for skills - Secure TUI widget (masked input, 120s timeout) - Gateway safety: messaging platforms show local setup guidance - Legacy prerequisites.env_vars normalized into new format - Remote backend handling: conservative setup_needed=True - Env var name validation, file permissions hardened to 0o600 - Redact patterns extended for secret-related JSON fields - 12 existing skills updated with prerequisites declarations - ~48 new tests covering skip, timeout, gateway, remote backends - Dynamic panel widget sizing (fixes hardcoded width from original PR) Cherry-picked from PR #723 by kshitijk4poor, rebased onto current main with conflict resolution. Fixes #688 Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-03-13 03:14:04 -07:00
teknium1	5d0d5b191c	feat: concurrent tool execution with ThreadPoolExecutor When the model returns multiple tool calls in a single response, they are now executed concurrently using a thread pool instead of sequentially. This significantly reduces wall-clock time when multiple independent tools are batched (e.g. parallel web_search, read_file, terminal calls). Architecture: - _execute_tool_calls() dispatches to sequential or concurrent path - Single tool calls and batches containing 'clarify' use sequential path - Multiple non-interactive tools use ThreadPoolExecutor (max 8 workers) - Results are collected and appended to messages in original order - _invoke_tool() extracted as shared tool invocation helper Safety: - Pre-flight interrupt check skips all tools if interrupted - Per-tool exception handling: one failure doesn't crash the batch - Result truncation (100k char limit) applied per tool - Budget pressure injection after all tools complete - Checkpoints taken before file-mutating tools - CLI spinner shows batch progress, then per-tool completion messages Tests: 10 new tests covering dispatch logic, ordering, error handling, interrupt behavior, truncation, and _invoke_tool routing.	2026-03-13 02:51:51 -07:00
kshitijk4poor	bb3f5ed32a	fix: separate Anthropic OAuth tokens from API keys Persist OAuth/setup tokens in ANTHROPIC_TOKEN instead of ANTHROPIC_API_KEY. Reserve ANTHROPIC_API_KEY for regular Console API keys. Changes: - anthropic_adapter: reorder resolve_anthropic_token() priority — ANTHROPIC_TOKEN first, ANTHROPIC_API_KEY as legacy fallback - config: add save_anthropic_oauth_token() / save_anthropic_api_key() helpers that clear the opposing slot to prevent priority conflicts - config: show_config() prefers ANTHROPIC_TOKEN for display - setup: OAuth login and pasted setup-tokens write to ANTHROPIC_TOKEN - setup: API key entry writes to ANTHROPIC_API_KEY and clears ANTHROPIC_TOKEN - main: same fixes in _run_anthropic_oauth_flow() and _model_flow_anthropic() - main: _has_any_provider_configured() checks ANTHROPIC_TOKEN - doctor: use _is_oauth_token() for correct auth method validation - runtime_provider: updated error message - run_agent: simplified client init to use resolve_anthropic_token() - run_agent: updated 401 troubleshooting messages - status: prefer ANTHROPIC_TOKEN in status display - tests: updated priority test, added persistence helper tests Cherry-picked from PR #1141 by kshitijk4poor, rebased onto current main with unrelated changes (web_policy config, blocklist CLI) removed. Co-authored-by: kshitijk4poor <kshitijk4poor@users.noreply.github.com>	2026-03-13 02:09:52 -07:00
Teknium	9dfa81ab4b	Merge pull request #1125 from NousResearch/hermes/hermes-c877bdeb fix(anthropic): add diagnostic output on 401 auth failures	2026-03-12 19:15:21 -07:00
teknium1	e5b8e06037	fix(anthropic): add diagnostic output on 401 auth failures When Anthropic returns 401 and credential refresh doesn't help, now prints actionable troubleshooting info: - Which auth method was used (Bearer vs x-api-key) - Token prefix for debugging - Common fixes (stale ANTHROPIC_API_KEY, verify key, refresh login) - How to clear stale keys	2026-03-12 19:09:06 -07:00
Teknium	a282322845	Merge pull request #1121 from 0xbyt4/fix/anthropic-adapter-issues fix: anthropic adapter — max_tokens, fallback crash, proxy base_url	2026-03-12 19:07:06 -07:00
Teknium	475dd58a8e	Merge PR #736 : feat(honcho): async writes, memory modes, session title integration, setup CLI Authored by erosika. Builds on #38 and #243. Adds async write support, configurable memory modes, context prefetch pipeline, 4 new Honcho tools (honcho_context, honcho_profile, honcho_search, honcho_conclude), full 'hermes honcho' CLI, session strategies, AI peer identity, recallMode A/B, gateway lifecycle management, and comprehensive docs. Cherry-picks fixes from PRs #831/#832 (adavyas). Co-authored-by: erosika <erosika@users.noreply.github.com> Co-authored-by: adavyas <adavyas@users.noreply.github.com>	2026-03-12 19:05:11 -07:00
0xbyt4	22479b053c	fix: anthropic adapter — max_tokens ignored, fallback crash, proxy base_url filtered - Pass self.max_tokens to build_anthropic_kwargs instead of hardcoded None - Add anthropic case to _try_activate_fallback (was only handling openai-codex) - Remove 'anthropic in base_url' filter that blocked custom proxy URLs	2026-03-13 04:22:16 +03:00
teknium1	e976879cf2	merge: resolve conflicts with main (URL update to hermes-agent.nousresearch.com)	2026-03-12 17:49:26 -07:00
teknium1	7f7282c78d	fix(anthropic): guard memory flush tool_calls extraction for Anthropic response format The memory flush path extracted tool_calls from the response assuming OpenAI format (response.choices[0].message.tool_calls). When using the Anthropic client directly (aux unavailable), the response is an Anthropic Message object which has no .choices attribute. Now uses normalize_anthropic_response() to extract tool_calls correctly.	2026-03-12 17:35:01 -07:00
teknium1	aaaba78126	fix(anthropic): final polish — tool ID sanitization, crash guards, temp=1 Remaining issues from deep scan: Adapter (agent/anthropic_adapter.py): - Add _sanitize_tool_id() — Anthropic requires IDs matching [a-zA-Z0-9_-], now strips invalid chars and ensures non-empty (both tool_use and tool_result) - Empty tool result content → '(no output)' placeholder (Anthropic rejects empty) - Set temperature=1 when thinking type='enabled' on older models (required) - normalize_model_name now case-insensitive for 'Anthropic/' prefix - Fix stale docstrings referencing only ~/.claude/.credentials.json Agent loop (run_agent.py): - Guard memory flush path (line ~2684) — was calling self.client.chat.completions which is None in anthropic_messages mode. Now routes through Anthropic client. - Guard summary generation path (line ~3171) — same crash when reaching iteration limit. Now builds proper Anthropic kwargs and normalizes response. - Guard retry summary path (line ~3200) — same fix for the summary retry loop. All three self.client.chat.completions.create() calls outside the main loop now have anthropic_messages branches to prevent NoneType crashes.	2026-03-12 17:23:09 -07:00
teknium1	4068f20ce9	fix(anthropic): deep scan fixes — auth, retries, edge cases Fixes from comprehensive code review and cross-referencing with clawdbot/OpenCode implementations: CRITICAL: - Add one-shot guard (anthropic_auth_retry_attempted) to prevent infinite 401 retry loops when credentials keep changing - Fix _is_oauth_token(): managed keys from ~/.claude.json are NOT regular API keys (don't start with sk-ant-api). Inverted the logic: only sk-ant-api* is treated as API key auth, everything else uses Bearer auth + oauth beta headers HIGH: - Wrap json.loads(args) in try/except in message conversion — malformed tool_call arguments no longer crash the entire conversation - Raise AuthError in runtime_provider when no Anthropic token found (was silently passing empty string, causing confusing API errors) - Remove broken _try_anthropic() from auxiliary vision chain — the centralized router creates an OpenAI client for api_key providers which doesn't work with Anthropic's Messages API MEDIUM: - Handle empty assistant message content — Anthropic rejects empty content blocks, now inserts '(empty)' placeholder - Fix setup.py existing_key logic — set to 'KEEP' sentinel instead of None to prevent falling through to the auth choice prompt - Add debug logging to _fetch_anthropic_models on failure Tests: 43 adapter tests (2 new for token detection), 3197 total passed	2026-03-12 17:14:22 -07:00
Teknium	39f3c0aeb0	fix: use hermes-agent.nousresearch.com as OpenRouter HTTP-Referer * fix: stop rejecting unlisted models + auto-detect from /models endpoint validate_requested_model() now accepts models not in the provider's API listing with a warning instead of blocking. Removes hardcoded catalog fallback for validation — if API is unreachable, accepts with a warning. Model selection flows (setup + /model command) now probe the provider's /models endpoint to get the real available models. Falls back to hardcoded defaults with a clear warning when auto-detection fails: 'Could not auto-detect models — use Custom model if yours isn't listed.' Z.AI setup no longer excludes GLM-5 on coding plans. * fix: use hermes-agent.nousresearch.com as HTTP-Referer for OpenRouter OpenRouter scrapes the favicon/logo from the HTTP-Referer URL for app rankings. We were sending the GitHub repo URL, which gives us a generic GitHub logo. Changed to the proper website URL so our actual branding shows up in rankings. Changed in run_agent.py (main agent client) and auxiliary_client.py (vision/summarization clients).	2026-03-12 16:20:22 -07:00
teknium1	d7adfe8f61	fix(anthropic): address gaps found in deep-dive audit After studying clawdbot (OpenClaw) and OpenCode implementations: ## Beta headers - Add interleaved-thinking-2025-05-14 and fine-grained-tool-streaming-2025-05-14 as common betas (sent with ALL auth types, not just OAuth) - OAuth tokens additionally get oauth-2025-04-20 - API keys now also get the common betas (previously got none) ## Vision/image support - Add _convert_vision_content() to convert OpenAI multimodal format (image_url blocks) to Anthropic format (image blocks with base64/url source) - Handles both data: URIs (base64) and regular URLs ## Role alternation enforcement - Anthropic strictly rejects consecutive same-role messages (400 error) - Add post-processing step that merges consecutive user/assistant messages - Handles string, list, and mixed content types during merge ## Tool choice support - Add tool_choice parameter to build_anthropic_kwargs() - Maps OpenAI values: auto→auto, required→any, none→omit, name→tool ## Cache metrics tracking - Anthropic uses cache_read_input_tokens / cache_creation_input_tokens (different from OpenRouter's prompt_tokens_details.cached_tokens) - Add api_mode-aware branch in run_agent.py cache stats logging ## Credential refresh on 401 - On 401 error during anthropic_messages mode, re-read credentials via resolve_anthropic_token() (picks up refreshed Claude Code tokens) - Rebuild client if new token differs from current one - Follows same pattern as Codex/Nous 401 refresh handlers ## Tests - 44 adapter tests (8 new: vision conversion, role alternation, tool choice) - Updated beta header tests to verify new structure - Full suite: 3198 passed, 0 regressions	2026-03-12 16:00:46 -07:00
Teknium	1bb8ed4495	chore: lower default compression threshold from 85% to 50% (#1096 ) * fix: ClawHub skill install — use /download ZIP endpoint The ClawHub API v1 version endpoint only returns file metadata (path, size, sha256, contentType) without inline content or download URLs. Our code was looking for inline content in the metadata, which never existed, causing all ClawHub installs to fail with: 'no inline/raw file content was available' Fix: Use the /api/v1/download endpoint (same as the official clawhub CLI) to download skills as ZIP bundles and extract files in-memory. Changes: - Add _download_zip() method that downloads and extracts ZIP bundles - Retry on 429 rate limiting with Retry-After header support - Path sanitization and binary file filtering for security - Keep _extract_files() as a fallback for inline/raw content - Also fix nested file lookup (version_data.version.files) * chore: lower default compression threshold from 85% to 50% Triggers context compression earlier — at 50% of the model's context window instead of 85%. Updated in all four places where the default is defined: context_compressor.py, cli.py, run_agent.py, config.py, and gateway/run.py.	2026-03-12 15:51:50 -07:00
teknium1	5e12442b4b	feat: native Anthropic provider with Claude Code credential auto-discovery Add Anthropic as a first-class inference provider, bypassing OpenRouter for direct API access. Uses the native Anthropic SDK with a full format adapter (same pattern as the codex_responses api_mode). ## Auth (three methods, priority order) 1. ANTHROPIC_API_KEY env var (regular API key, sk-ant-api-) 2. ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN env var (setup-token, sk-ant-oat-) 3. Auto-discovery from ~/.claude/.credentials.json (Claude Code subscription) - Reads Claude Code's OAuth credentials - Checks token expiry with 60s buffer - Setup tokens use Bearer auth + anthropic-beta: oauth-2025-04-20 header - Regular API keys use standard x-api-key header ## Changes by file ### New files - agent/anthropic_adapter.py — Client builder, message/tool/response format conversion, Claude Code credential reader, token resolver. Handles system prompt extraction, tool_use/tool_result blocks, thinking/reasoning, orphaned tool_use cleanup, cache_control. - tests/test_anthropic_adapter.py — 36 tests covering all adapter logic ### Modified files - pyproject.toml — Add anthropic>=0.39.0 dependency - hermes_cli/auth.py — Add 'anthropic' to PROVIDER_REGISTRY with three env vars, plus 'claude'/'claude-code' aliases - hermes_cli/models.py — Add model catalog, labels, aliases, provider order - hermes_cli/main.py — Add 'anthropic' to --provider CLI choices - hermes_cli/runtime_provider.py — Add Anthropic branch returning api_mode='anthropic_messages' (before generic api_key fallthrough) - hermes_cli/setup.py — Add Anthropic setup wizard with Claude Code credential auto-discovery, model selection, OpenRouter tools prompt - agent/auxiliary_client.py — Add claude-haiku-4-5 as aux model - agent/model_metadata.py — Add bare Claude model context lengths - run_agent.py — Add anthropic_messages api_mode: * Client init (Anthropic SDK instead of OpenAI) * API call dispatch (_anthropic_client.messages.create) * Response validation (content blocks) * finish_reason mapping (stop_reason -> finish_reason) * Token usage (input_tokens/output_tokens) * Response normalization (normalize_anthropic_response) * Client interrupt/rebuild * Prompt caching auto-enabled for native Anthropic - tests/test_run_agent.py — Update test_anthropic_base_url_accepted to expect native routing, add test_prompt_caching_native_anthropic	2026-03-12 15:47:45 -07:00
Erosika	fefc709b2c	merge: resolve conflict with main in subagent interrupt test	2026-03-12 16:28:57 -04:00
Erosika	0aed9bfde1	refactor(honcho): rename memory tools to Honcho tools, clarify recall mode language Replace "memory tools" with "Honcho tools" and "pre-warmed/prefetch" with "auto-injected context" in all user-facing strings and docs.	2026-03-12 16:26:10 -04:00
Erosika	ae2a5e5743	refactor(honcho): remove local memory mode The "local" memoryMode was redundant with enabled: false. Simplifies the mode system to hybrid and honcho only.	2026-03-12 16:23:34 -04:00
Teknium	73ea5102dc	Merge pull request #1058 from NousResearch/hermes/hermes-465f3702 fix: strip call_id/response_item_id from tool_calls for Mistral compatibility	2026-03-12 08:21:36 -07:00
teknium1	400b8d92b7	fix: strip call_id/response_item_id from tool_calls for Mistral compatibility Mistral's API strictly validates the Chat Completions schema and rejects unknown fields (call_id, response_item_id) with 422. These fields are added by _build_assistant_message() for Codex Responses API support. This fix: - Only strips when targeting Mistral (api.mistral.ai in base_url) - Creates new tool_call dicts instead of mutating originals (shallow copy safety — msg.copy() shares the tool_calls list) - Preserves call_id/response_item_id in the internal message history so _chat_messages_to_responses_input() can still read them if the session falls back to a Codex provider mid-conversation Applied in all 3 API message building locations: - Main conversation loop (run_conversation) - _handle_max_iterations() - flush_memories() Inspired by PR #864 (unmodeled-tyler) which identified the issue but applied the fix unconditionally and mutated originals via shallow copy. Co-authored-by: unmodeled-tyler <unmodeled.tyler@proton.me>	2026-03-12 08:18:27 -07:00
Teknium	e9c3317158	fix: improve Kimi model selection — auto-detect endpoint, add missing models (#1039 ) * fix: /reasoning command output ordering, display, and inline think extraction Three issues with the /reasoning command: 1. Output interleaving: The command echo used print() while feedback used _cprint(), causing them to render out-of-order under prompt_toolkit's patch_stdout. Changed echo to use _cprint() so all output renders through the same path in correct order. 2. Reasoning display not working: /reasoning show toggled a flag but reasoning never appeared for models that embed thinking in inline <think> blocks rather than structured API fields. Added fallback extraction in _build_assistant_message to capture <think> block content as reasoning when no structured reasoning fields (reasoning, reasoning_content, reasoning_details) are present. This feeds into both the reasoning callback (during tool loops) and the post-response reasoning box display. 3. Feedback clarity: Added checkmarks to confirm actions, persisted show/hide to config (was session-only before), and aligned the status display for readability. Tests: 7 new tests for inline think block extraction (41 total). * feat: add /reasoning command to gateway (Telegram/Discord/etc) The /reasoning command only existed in the CLI — messaging platforms had no way to view or change reasoning settings. This adds: 1. /reasoning command handler in the gateway: - No args: shows current effort level and display state - /reasoning <level>: sets reasoning effort (none/low/medium/high/xhigh) - /reasoning show\|hide: toggles reasoning display in responses - All changes saved to config.yaml immediately 2. Reasoning display in gateway responses: - When show_reasoning is enabled, prepends a 'Reasoning' block with the model's last_reasoning content before the response - Collapses long reasoning (>15 lines) to keep messages readable - Uses last_reasoning from run_conversation result dict 3. Plumbing: - Added _show_reasoning attribute loaded from config at startup - Propagated last_reasoning through _run_agent return dict - Added /reasoning to help text and known_commands set - Uses getattr for _show_reasoning to handle test stubs * fix: improve Kimi model selection — auto-detect endpoint, add missing models Kimi Coding Plan setup: - New dedicated _model_flow_kimi() replaces the generic API-key flow for kimi-coding. Removes the confusing 'Base URL' prompt entirely — the endpoint is auto-detected from the API key prefix: sk-kimi-* → api.kimi.com/coding/v1 (Kimi Coding Plan) other → api.moonshot.ai/v1 (legacy Moonshot) - Shows appropriate models for each endpoint: Coding Plan: kimi-for-coding, kimi-k2.5, kimi-k2-thinking, kimi-k2-thinking-turbo Moonshot: full model catalog - Clears any stale KIMI_BASE_URL override so runtime auto-detection via _resolve_kimi_base_url() works correctly. Model catalog updates: - Added kimi-for-coding (primary Coding Plan model) and kimi-k2-thinking-turbo to models.py, main.py _PROVIDER_MODELS, and model_metadata.py context windows. - Updated User-Agent from KimiCLI/1.0 to KimiCLI/1.3 (Kimi's coding endpoint whitelists known coding agents via User-Agent sniffing).	2026-03-12 05:58:48 -07:00
dmahan93	c7fc39bde0	feat: include session ID in system prompt via --pass-session-id flag Adds --pass-session-id CLI flag. When set, the agent's system prompt includes the session ID: Conversation started: Sunday, March 08, 2026 06:32 PM Session ID: 20260308_183200_abc123 Usage: hermes --pass-session-id hermes chat --pass-session-id Implementation threads the flag as a proper parameter through the full chain (main.py → cli.py → run_agent.py) rather than using an env var, avoiding collisions in multi-agent/multitenant setups. Based on PR #726 by dmahan93, reworked to use instance parameter instead of HERMES_PASS_SESSION_ID environment variable. Co-authored-by: dmahan93 <dmahan93@users.noreply.github.com>	2026-03-12 05:51:31 -07:00
teknium1	2192b17670	merge: resolve conflicts with origin/main - gateway/run.py: Take main's _resolve_gateway_model() helper - hermes_cli/setup.py: Re-apply nous-api removal after merge brought it back. Fix provider_idx offset (Custom is now index 3, not 4). - tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)	2026-03-12 00:29:04 -07:00
teknium1	65356003e3	revert: keep provider preferences for all providers (Nous will proxy) Nous Portal backend will become a transparent proxy for OpenRouter- specific parameters (provider preferences, etc.), so keep sending them to all providers. The reasoning disabled fix is kept (that's a real constraint of the Nous endpoint).	2026-03-11 22:53:06 -07:00
teknium1	a7e5f19528	fix: don't send OpenRouter-specific provider preferences to Nous Portal Two bugs in _build_api_kwargs that broke Nous Portal: 1. Provider preferences (only, ignore, order, sort) are OpenRouter- specific routing features. They were being sent in extra_body to ALL providers, including Nous Portal. When the config had providers_only=['google-vertex'], Nous Portal returned 404 'Inference host not found' because it doesn't have a google-vertex backend. Fix: Only include provider preferences when _is_openrouter is True. 2. Reasoning config with enabled=false was being sent to Nous Portal, which requires reasoning and returns 400 'Reasoning is mandatory for this endpoint and cannot be disabled.' Fix: Omit the reasoning parameter for Nous when enabled=false. Root cause found via HERMES_DUMP_REQUESTS=1 which showed the exact request payload being sent to Nous Portal's inference API.	2026-03-11 22:41:33 -07:00
teknium1	a29801286f	refactor: route main agent client + fallback through centralized router Phase 2 of the provider router migration — route the main agent's client construction and fallback activation through resolve_provider_client() instead of duplicated ad-hoc logic. run_agent.py: - __init__: When no explicit api_key/base_url, use resolve_provider_client(provider, raw_codex=True) for client construction. Explicit creds (from CLI/gateway runtime provider) still construct directly. - _try_activate_fallback: Replace _resolve_fallback_credentials and its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS dicts with a single resolve_provider_client() call. The router handles all provider types (API-key, OAuth, Codex) centrally. - Remove _resolve_fallback_credentials method and both fallback dicts. agent/auxiliary_client.py: - Add raw_codex parameter to resolve_provider_client(). When True, returns the raw OpenAI client for Codex providers instead of wrapping in CodexAuxiliaryClient. The main agent needs this for direct responses.stream() access. 3251 passed, 2 pre-existing unrelated failures.	2026-03-11 21:38:29 -07:00
teknium1	0aa31cd3cb	feat: call_llm/async_call_llm + config slots + migrate all consumers Add centralized call_llm() and async_call_llm() functions that own the full LLM request lifecycle: 1. Resolve provider + model from task config or explicit args 2. Get or create a cached client for that provider 3. Format request args (max_tokens handling, provider extra_body) 4. Make the API call with max_tokens/max_completion_tokens retry 5. Return the response Config: expanded auxiliary section with provider:model slots for all tasks (compression, vision, web_extract, session_search, skills_hub, mcp, flush_memories). Config version bumped to 7. Migrated all auxiliary consumers: - context_compressor.py: uses call_llm(task='compression') - vision_tools.py: uses async_call_llm(task='vision') - web_tools.py: uses async_call_llm(task='web_extract') - session_search_tool.py: uses async_call_llm(task='session_search') - browser_tool.py: uses call_llm(task='vision'/'web_extract') - mcp_tool.py: uses call_llm(task='mcp') - skills_guard.py: uses call_llm(provider='openrouter') - run_agent.py flush_memories: uses call_llm(task='flush_memories') Tests updated for context_compressor and MCP tool. Some test mocks still need updating (15 remaining failures from mock pattern changes, 2 pre-existing).	2026-03-11 20:52:19 -07:00
Erosika	2d35016b94	fix(honcho): harden tool gating and migration peer routing Prevent stale Honcho tool exposure in context/local modes, restore reliable async write retry behavior, and ensure SOUL.md migration uploads target the AI peer instead of the user peer. Also align Honcho CLI key checks with host-scoped apiKey resolution and lock the fixes with regression tests. Made-with: Cursor	2026-03-11 18:21:27 -04:00
Erosika	a0b0dbe6b2	Merge remote-tracking branch 'origin/main' into feat/honcho-async-memory Made-with: Cursor # Conflicts: # cli.py # tests/test_run_agent.py	2026-03-11 12:22:56 -04:00
Teknium	8fa96debc9	Merge pull request #963 from NousResearch/hermes/hermes-cf9f7d54 fix: guard all print() against OSError with _SafeWriter	2026-03-11 09:19:52 -07:00
teknium1	a8409a161f	fix: guard all print() calls against OSError with _SafeWriter When hermes-agent runs as a systemd service, Docker container, or headless daemon, the stdout pipe can become unavailable (idle timeout, buffer exhaustion, socket reset). Any print() call then raises OSError: [Errno 5] Input/output error, crashing run_conversation() and causing cron jobs to fail. Rather than wrapping individual print() calls (68 in run_conversation alone), this adds a transparent _SafeWriter wrapper installed once at the start of run_conversation(). It delegates all writes to the real stdout and silently catches OSError. Zero overhead on the happy path, comprehensive coverage of all print calls including future ones. Fixes #845 Co-authored-by: J0hnLawMississippi <J0hnLawMississippi@users.noreply.github.com>	2026-03-11 09:19:10 -07:00
Erosika	047b118299	fix(honcho): resolve review blockers for merge Address merge-blocking review feedback by removing unsafe signal handler overrides, wiring next-turn Honcho prefetch, restoring per-directory session defaults, and exposing all Honcho tools to the model surface. Also harden prefetch cache access with public thread-safe accessors and remove duplicate browser cleanup code. Made-with: Cursor	2026-03-11 11:46:37 -04:00
Teknium	01d3b31479	Merge PR #785 : feat: conditional skill activation based on tool availability Authored by teyrebaz33. Closes #539. feat: conditional skill activation based on tool availability	2026-03-11 08:43:30 -07:00
teknium1	a54405e339	fix: proactive compression after large tool results + Anthropic error detection Two fixes for context overflow handling: 1. Proactive compression after tool execution: The compression check now estimates the next prompt size using real token counts from the last API response (prompt_tokens + completion_tokens) plus a conservative estimate of newly appended tool results (chars // 3 for JSON-heavy content). Previously, should_compress() only checked last_prompt_tokens which didn't account for tool results — so a 130k prompt + 100k chars of tool output would pass the 140k threshold check but fail the 200k API limit. 2. Safety net: Added 'prompt is too long' to context-length error detection phrases. Anthropic returns 'prompt is too long: N tokens > M maximum' on HTTP 400, which wasn't matched by existing phrases. This ensures compression fires even if the proactive check underestimates. Fixes #813	2026-03-11 08:04:52 -07:00
teknium1	683c8b24d4	fix: reduce max_retries to 3 and make ValueError/TypeError non-retryable - max_retries reduced from 6 to 3 — 6 retries with exponential backoff could stall for ~275s total on persistent errors - ValueError and TypeError now detected as non-retryable client errors and abort immediately instead of being retried with backoff (these are local validation/programming errors that will never succeed on retry)	2026-03-11 07:04:46 -07:00
teknium1	d2dee43825	fix: allow tool_choice, parallel_tool_calls, prompt_cache_key in codex preflight _preflight_codex_api_kwargs rejected these three fields as unsupported, but _build_api_kwargs adds them to every codex request. This caused a ValueError before _interruptible_api_call was reached, which was caught by the retry loop and retried with exponential backoff — appearing as an infinite hang in tests (275s total backoff across 6 retries). The fix adds these keys to allowed_keys and passes them through to the normalized request dict. This fixes the hanging test_cron_run_job_codex_path_handles_internal_401_refresh test (now passes in 2.6s instead of timing out).	2026-03-11 07:00:14 -07:00
teknium1	4d873f77c1	feat(cli): add /reasoning command for effort level and display toggle Combined implementation of reasoning management: - /reasoning Show current effort level and display state - /reasoning <level> Set reasoning effort (none, low, medium, high, xhigh) - /reasoning show\|on Show model thinking/reasoning in output - /reasoning hide\|off Hide model thinking/reasoning from output Effort level changes persist to config and force agent re-init. Display toggle updates the agent callback dynamically without re-init. When display is enabled: - Intermediate reasoning shown as dim [thinking] lines during tool loops - Final reasoning shown in a bordered box above the response - Long reasoning collapsed (5 lines intermediate, 10 lines final) Also adds: - reasoning_callback parameter to AIAgent - last_reasoning in run_conversation result dict - show_reasoning config option (display section, default: false) - Display section in /config output - 34 tests covering both features Combines functionality from PR #789 and PR #790. Co-authored-by: Aum Desai <Aum08Desai@users.noreply.github.com> Co-authored-by: 0xbyt4 <35742124+0xbyt4@users.noreply.github.com>	2026-03-11 06:02:18 -07:00
teknium1	a82ce60294	fix: add missing Responses API parameters for Codex provider Adds tool_choice, parallel_tool_calls, and prompt_cache_key to the Codex Responses API request kwargs — matching what the official Codex CLI sends. - tool_choice: 'auto' — enables the model to proactively call tools. Without this, the model may default to not using tools, which explains reports of the agent claiming it lacks shell access (#747). - parallel_tool_calls: True — allows the model to issue multiple tool calls in a single turn for efficiency. - prompt_cache_key: session_id — enables server-side prompt caching across turns in the same session, reducing latency and cost. Refs #747	2026-03-11 04:28:31 -07:00
teknium1	21ff0d39ad	feat: iteration budget pressure via tool result injection Two-tier warning system that nudges the LLM as it approaches max_iterations, injected into the last tool result JSON rather than as a separate system message: - Caution (70%): {"_budget_warning": "[BUDGET: 42/60...]"} - Warning (90%): {"_budget_warning": "[BUDGET WARNING: 54/60...]"} For JSON tool results, adds a _budget_warning field to the existing dict. For plain text results, appends the warning as text. Key properties: - No system messages injected mid-conversation - No changes to message structure - Prompt cache stays valid - Configurable thresholds (0.7 / 0.9) - Can be disabled: _budget_pressure_enabled = False Inspired by PR #421 (@Bartok9) and issue #414. 8 tests covering thresholds, edge cases, JSON and text injection.	2026-03-11 00:37:24 -07:00
teknium1	b53d5dad67	Merge PR #705 : fix: detect, warn, and block file re-read/search loops after context compression Authored by 0xbyt4. Adds read/search loop detection, file history injection after compression, and todo filtering for active items only.	2026-03-10 16:17:03 -07:00
teknium1	c1171fe666	fix: eliminate 3x SQLite message duplication in gateway sessions (#860 ) Three separate code paths all wrote to the same SQLite state.db with no deduplication, inflating session transcripts by 3-4x: 1. _log_msg_to_db() — wrote each message individually after append 2. _flush_messages_to_session_db() — re-wrote ALL new messages at every _persist_session() call (~18 exit points), with no tracking of what was already written 3. gateway append_to_transcript() — wrote everything a third time after the agent returned Since load_transcript() prefers SQLite over JSONL, the inflated data was loaded on every session resume, causing proportional token waste. Fix: - Remove _log_msg_to_db() and all 16 call sites (redundant with flush) - Add _last_flushed_db_idx tracking in _flush_messages_to_session_db() so repeated _persist_session() calls only write truly new messages - Reset flush cursor on compression (new session ID) - Add skip_db parameter to SessionStore.append_to_transcript() so the gateway skips SQLite writes when the agent already persisted them - Gateway now passes skip_db=True for agent-managed messages, still writes to JSONL as backup Verified: a 12-message CLI session with tool calls produces exactly 12 SQLite rows with zero duplicates (previously would be 36-48). Tests: 9 new tests covering flush deduplication, skip_db behavior, compression reset, and initialization. Full suite passes (2869 tests).	2026-03-10 15:22:44 -07:00
adavyas	87cc5287a8	fix(honcho): enforce local mode and cache-safe warmup	2026-03-10 16:21:42 -04:00
Erosika	0cb639d472	refactor(honcho): rename query_user_context to honcho_context Consistent naming: all honcho tools now prefixed with honcho_ (honcho_context, honcho_search, honcho_profile, honcho_conclude).	2026-03-10 16:21:07 -04:00
Erosika	792be0e8e3	feat(honcho): add honcho_conclude tool for writing facts back to memory New tool lets Hermes persist conclusions about the user (preferences, corrections, project context) directly to Honcho via the conclusions API. Feeds into the user's peer card and representation.	2026-03-10 16:21:07 -04:00
Erosika	c1228e9a4a	refactor(honcho): rename recallMode "auto" to "hybrid" Matches the mental model: hybrid = context + tools, context = context only, tools = tools only.	2026-03-10 16:21:07 -04:00
Erosika	74c214e957	feat(honcho): async memory integration with prefetch pipeline and recallMode Adds full Honcho memory integration to Hermes: - Session manager with async background writes, memory modes (honcho/hybrid/local), and dialectic prefetch for first-turn context warming - Agent integration: prefetch pipeline, tool surface gated by recallMode, system prompt context injection, SIGTERM/SIGINT flush handlers - CLI commands: setup, status, mode, tokens, peer, identity, migrate - recallMode setting (auto \| context \| tools) for A/B testing retrieval strategies - Session strategies: per-session, per-repo (git tree root), per-directory, global - Polymorphic memoryMode config: string shorthand or per-peer object overrides - 97 tests covering async writes, client config, session resolution, and memory modes	2026-03-10 16:21:07 -04:00
teyrebaz33	1caee06b22	fix: tool call repair — auto-lowercase, fuzzy match, helpful error on unknown tool (#520 ) - Add _repair_tool_call(): tries lowercase, normalize, then fuzzy match (difflib 0.7) - Replace 3-retry-then-abort with graceful error: model receives helpful message and self-corrects - Conversation stays alive instead of dying on hallucinated tool names Closes #520	2026-03-10 06:54:11 -07:00
teknium1	771969f747	fix: wire up enabled_tools in agent loop + simplify sandbox tool selection Completes the fix started in `8318a51` — handle_function_call() accepted enabled_tools but run_agent.py never passed it. Now both call sites in _execute_tool_calls() pass self.valid_tool_names, so each agent session uses its own tool list instead of the process-global _last_resolved_tool_names (which subagents can overwrite). Also simplifies the redundant ternary in code_execution_tool.py: sandbox_tools is already computed correctly (intersection with session tools, or full SANDBOX_ALLOWED_TOOLS as fallback), so the conditional was dead logic. Inspired by PR #663 (JasonOA888). Closes #662. Tests: 2857 passed.	2026-03-10 06:35:28 -07:00
vincent	b0a5fe8974	fix: continue after output-length truncation	2026-03-10 04:30:19 -07:00
teknium1	899dfdcfb9	Merge PR #616 : fix: retry with rebuilt payload after compression Authored by tripledoublev. After context compression on 413/400 errors, the inner retry loop was reusing the stale pre-compression api_messages payload. Fix breaks out of the inner retry loop so the outer loop rebuilds api_messages from the now-compressed messages list. Adds regression test verifying the second request actually contains the compressed payload.	2026-03-10 04:22:42 -07:00
teknium1	f16f2912cf	Merge PR #607 : fix: reset all retry counters at start of run_conversation() Authored by 0xbyt4. Adds missing resets for _incomplete_scratchpad_retries and _codex_incomplete_retries to prevent stale counters carrying over between CLI conversations.	2026-03-10 04:17:47 -07:00
teknium1	c1775de56f	feat: filesystem checkpoints and /rollback command Automatic filesystem snapshots before destructive file operations, with user-facing rollback. Inspired by PR #559 (by @alireza78a). Architecture: - Shadow git repos at ~/.hermes/checkpoints/{hash}/ via GIT_DIR - CheckpointManager: take/list/restore, turn-scoped dedup, pruning - Transparent — the LLM never sees it, no tool schema, no tokens - Once per turn — only first write_file/patch triggers a snapshot Integration: - Config: checkpoints.enabled + checkpoints.max_snapshots - CLI flag: hermes --checkpoints - Trigger: run_agent.py _execute_tool_calls() before write_file/patch - /rollback slash command in CLI + gateway (list, restore by number) - Pre-rollback snapshot auto-created on restore (undo the undo) Safety: - Never blocks file operations — all errors silently logged - Skips root dir, home dir, dirs >50K files - Disables gracefully when git not installed - Shadow repo completely isolated from project git Tests: 35 new tests, all passing (2798 total suite) Docs: feature page, config reference, CLI commands reference	2026-03-10 00:49:15 -07:00
teknium1	ee4008431a	fix: stop terminal border flashing with steady cursor and TUI spinner widget Cherry-picked and improved from PR #470 (fixes #464). Problem: On Ubuntu 24.04 with ghostty + tmux, the prompt input box border lines flash due to cursor blink and raw spinner terminal writes conflicting with prompt_toolkit's rendering. Changes: - cli.py: Add CursorShape.BLOCK to Application() to disable cursor blink - cli.py: Add thinking_callback + spinner_widget in TUI layout so thinking status displays as a proper prompt_toolkit widget instead of raw terminal writes that conflict with the TUI renderer - run_agent.py: Add thinking_callback parameter to AIAgent; when set, uses the callback instead of KawaiiSpinner for thinking display What was NOT changed (preserving existing behavior): - agent/display.py: Untouched. KawaiiSpinner _write() stdout capture, _animate() logic, and 0.12s frame interval all preserved. This protects subagent stdout redirection and keeps smooth animations for non-CLI contexts (gateway, batch runner). - Original emoji spinner types (brain/sparkle/pulse/moon/star) preserved for all non-CLI contexts. Fixes from original PR #470: - CursorShape.STEADY_BLOCK -> CursorShape.BLOCK (STEADY_BLOCK doesn't exist in prompt_toolkit 3.0.52) - Removed duplicate self._spinner_text = '' line - Removed redundant nested if-checks Tested: 2706 tests pass, interactive CLI verified via tmux.	2026-03-09 23:26:43 -07:00
teknium1	3e352f8a0d	fix: add upstream guard for non-dict function_args + tests for build_tool_preview Complements PR #453 by 0xbyt4. Adds isinstance(dict) guard in run_agent.py to catch cases where json.loads returns non-dict (e.g. null, list, string) before they reach downstream code. Also adds 15 tests for build_tool_preview covering None args, empty dicts, known/unknown tools, fallback keys, truncation, and all special-cased tools (process, todo, memory, session_search).	2026-03-09 21:01:40 -07:00
teyrebaz33	94023e6a85	feat: conditional skill activation based on tool availability Skills can now declare fallback_for_toolsets, fallback_for_tools, requires_toolsets, and requires_tools in their SKILL.md frontmatter. The system prompt builder filters skills automatically based on which tools are available in the current session. - Add _read_skill_conditions() to parse conditional frontmatter fields - Add _skill_should_show() to evaluate conditions against available tools - Update build_skills_system_prompt() to accept and apply tool availability - Pass valid_tool_names and available toolsets from run_agent.py - Backward compatible: skills without conditions always show; calling build_skills_system_prompt() with no args preserves existing behavior Closes #539	2026-03-09 23:13:39 +03:00
teknium1	1f0944de21	fix: handle non-string content from OpenAI-compatible servers (#759 ) Some local LLM servers (llama-server, etc.) return message.content as a dict or list instead of a plain string. This caused AttributeError 'dict object has no attribute strip' on every API call. Normalizes content to string immediately after receiving the response: - dict: extracts 'text' or 'content' field, falls back to json.dumps - list: extracts text parts (OpenAI multimodal content format) - other: str() conversion Applied at the single point where response.choices[0].message is read in the main agent loop, so all downstream .strip()/.startswith()/[:100] operations work regardless of server implementation. Closes #759	2026-03-09 03:32:32 -07:00
0xbyt4	4684aaffdc	merge: resolve file_tools.py conflict with origin/main Combine read/search loop detection with main's redact_sensitive_text and truncation hint features. Add tracker reset to TestSearchHints to prevent cross-test state leakage.	2026-03-09 13:21:46 +03:00
teknium1	aedb773f0d	fix: stabilize system prompt across gateway turns for cache hits Two changes to prevent unnecessary Anthropic prompt cache misses in the gateway, where a fresh AIAgent is created per user message: 1. Reuse stored system prompt for continuing sessions: When conversation_history is non-empty, load the system prompt from the session DB instead of rebuilding from disk. The model already has updated memory in its conversation history (it wrote it!), so re-reading memory from disk produces a different system prompt that breaks the cache prefix. 2. Stabilize Honcho context per session: - Only prefetch Honcho context on the first turn (empty history) - Bake Honcho context into the cached system prompt and store to DB - Remove the per-turn Honcho injection from the API call loop This ensures the system message is identical across all turns in a session. Previously, re-fetching Honcho could return different context on each turn, changing the system message and invalidating the cache. Both changes preserve the existing behavior for compression (which invalidates the prompt and rebuilds from scratch) and for the CLI (where the same AIAgent persists and the cached prompt is already stable across turns). Tests: 2556 passed (6 new)	2026-03-09 01:50:58 -07:00
teknium1	35d57ed752	refactor: unified OAuth/API-key credential resolution for fallback Split fallback provider handling into two clean registries: _FALLBACK_API_KEY_PROVIDERS — env-var-based (openrouter, zai, kimi, minimax) _FALLBACK_OAUTH_PROVIDERS — OAuth-based (openai-codex, nous) New _resolve_fallback_credentials() method handles all three cases (OAuth, API key, custom endpoint) and returns a uniform (key, url, mode) tuple. _try_activate_fallback() is now just validation + client build. Adds Nous Portal as a fallback provider — uses the same OAuth flow as the primary provider (hermes login), returns chat_completions mode. OAuth providers get credential refresh for free: the existing 401 retry handlers (_try_refresh_codex/nous_client_credentials) check self.provider, which is set correctly after fallback activation. 4 new tests (nous activation, nous no-login, codex retained). 27 total fallback tests passing, 2548 full suite.	2026-03-08 21:44:48 -07:00
teknium1	5785bd3272	feat: add openai-codex as fallback provider Codex OAuth uses a different auth flow (OAuth tokens, not env vars) and a different API mode (codex_responses, not chat_completions). The fallback now handles this specially: - Resolves credentials via resolve_codex_runtime_credentials() - Sets api_mode to codex_responses - Fails gracefully if no Codex OAuth session exists Also added to the commented-out config.yaml example. 2 new tests (codex activation + graceful failure).	2026-03-08 21:34:15 -07:00
teknium1	b3765c28d0	fix: restrict fallback providers to actual hermes providers Remove hallucinated providers (openai, deepseek, together, groq, fireworks, mistral, gemini, nous) from the fallback provider map. These don't exist in hermes-agent's provider system. The real supported providers for fallback are: openrouter (OPENROUTER_API_KEY) zai (ZAI_API_KEY) kimi-coding (KIMI_API_KEY) minimax (MINIMAX_API_KEY) minimax-cn (MINIMAX_CN_API_KEY) For any other OpenAI-compatible endpoint, users can use the base_url + api_key_env overrides in the config. Also adds Kimi User-Agent header for kimi fallback (matching the main provider system).	2026-03-08 20:49:55 -07:00
teknium1	161436cfdd	feat: simple fallback model for provider resilience When the primary model/provider fails after retries (rate limit, overload, auth errors, connection failures), Hermes automatically switches to a configured fallback model for the remainder of the session. Config (in ~/.hermes/config.yaml): fallback_model: provider: openrouter model: anthropic/claude-sonnet-4 Supports all major providers: OpenRouter, OpenAI, Nous, DeepSeek, Together, Groq, Fireworks, Mistral, Gemini — plus custom endpoints via base_url and api_key_env overrides. Design principles: - Dead simple: one fallback model, not a chain - One-shot: switches once, doesn't ping-pong back - Zero new dependencies: uses existing OpenAI client - Minimal code: ~100 lines in run_agent.py, ~5 lines in cli.py/gateway - Three trigger points: max retries exhausted, non-retryable client errors, and invalid response exhaustion Does NOT trigger on context overflow or payload-too-large errors (those are handled by the existing compression system). Addresses #737. 25 new tests, 2492 total passing.	2026-03-08 20:22:33 -07:00
teknium1	2394e18729	fix: add context to interruption messages for model awareness When the agent is interrupted, the model now receives descriptive context instead of a generic 'Operation interrupted.' string: - Tool skip messages include the tool name: '[Tool execution cancelled — terminal was skipped due to user interrupt]' '[Tool execution skipped — web_search was not started. User sent a new message]' - API call interrupts include timing: 'Operation interrupted: waiting for model response (4.2s elapsed).' - Retry/error interrupts include retry context: 'Operation interrupted: retrying API call after rate limit (retry 2/5).' 'Operation interrupted: handling API error (Timeout: connection timed out).' This helps the model understand what was happening when it was interrupted, reducing wasted iterations spent re-discovering state.	2026-03-08 18:58:23 -07:00
teknium1	60b6abefd9	feat: session naming with unique titles, auto-lineage, rich listing, resume by name - Schema v4: unique title index, migration from v2/v3 - set/get/resolve session titles with uniqueness enforcement - Auto-lineage: context compression auto-numbers titles (Task -> Task #2 -> Task #3) - resolve_session_by_title: auto-latest finds most recent continuation - list_sessions_rich: preview (first 60 chars) + last_active timestamp - CLI: -c accepts optional name arg (hermes -c 'my project') - CLI: /title command with deferred mode (set before session exists) - CLI: sessions list shows Title, Preview, Last Active, ID - 27 new tests (1844 total passing)	2026-03-08 15:20:29 -07:00
0xbyt4	9eee529a7f	fix: detect and warn on file re-read loops after context compression When context compression summarizes conversation history, the agent loses track of which files it already read and re-reads them in a loop. Users report the agent reading the same files endlessly without writing. Root cause: context compression is lossy — file contents and read history are lost in the summary. After compression, the model thinks it hasn't examined the files yet and reads them again. Fix (two-part): 1. Track file reads per task in file_tools.py. When the same file region is read again, include a _warning in the response telling the model to stop re-reading and use existing information. 2. After context compression, inject a structured message listing all files already read in the session with explicit "do NOT re-read" instruction, preserving read history across compression boundaries. Adds 16 tests covering warning detection, task isolation, summary accuracy, tracker cleanup, and compression history injection.	2026-03-08 20:44:42 +03:00
teknium1	19b6f81ee7	fix: allow Anthropic API URLs as custom OpenAI-compatible endpoints Removed the hard block on base_url containing 'api.anthropic.com'. Anthropic now offers an OpenAI-compatible /chat/completions endpoint, so blocking their URL prevents legitimate use. If the endpoint isn't compatible, the API call will fail with a proper error anyway. Removed from: run_agent.py, mini_swe_runner.py Updated test to verify Anthropic URLs are accepted.	2026-03-07 23:36:35 -08:00
Christo Mitov	4447e7d71a	fix: add Kimi Code API support (api.kimi.com/coding/v1) Kimi Code (platform.kimi.ai) issues API keys prefixed sk-kimi- that require: 1. A different base URL: api.kimi.com/coding/v1 (not api.moonshot.ai/v1) 2. A User-Agent header identifying a recognized coding agent Without this fix, sk-kimi- keys fail with 401 (wrong endpoint) or 403 ('only available for Coding Agents') errors. Changes: - Auto-detect sk-kimi- key prefix and route to api.kimi.com/coding/v1 - Send User-Agent: KimiCLI/1.0 header for Kimi Code endpoints - Legacy Moonshot keys (api.moonshot.ai) continue to work unchanged - KIMI_BASE_URL env var override still takes priority over auto-detection - Updated .env.example with correct docs and all endpoint options - Fixed doctor.py health check for Kimi Code keys Reference: https://github.com/MoonshotAI/kimi-cli (platforms.py)	2026-03-07 21:00:12 -05:00
vincent	86eed141af	fix: rebuild compressed payload before retry	2026-03-07 18:55:01 -05:00
teknium1	e64d646bad	Critical: fix bug in new subagent tool call budget to not be session-level but tool call loop level	2026-03-07 10:32:51 -08:00
teknium1	b84f9e410c	feat: default reasoning effort from xhigh to medium Reduces token usage and latency for most tasks by defaulting to medium reasoning effort instead of xhigh. Users can still override via config or CLI flag. Updates code, tests, example config, and docs.	2026-03-07 10:14:19 -08:00
teknium1	23e84de830	refactor: remove model parameter from AIAgent initialization Eliminated the model parameter from the AIAgent class initialization, streamlining the constructor and ensuring consistent behavior across agent instances. This change aligns with recent updates to the task delegation logic.	2026-03-07 09:48:19 -08:00
teknium1	5a711f32b1	fix: enhance payload and context compression handling Added logic to manage multiple compression attempts for large payloads and context length errors. Introduced limits on compression attempts to prevent infinite retries, with appropriate logging and error handling. This ensures better resilience and user feedback when facing compression issues during API calls.	2026-03-07 09:19:07 -08:00
0xbyt4	8c26a057a3	fix: reset all retry counters at start of run_conversation() _incomplete_scratchpad_retries and _codex_incomplete_retries were not reset at the start of run_conversation(). In CLI mode, where the same AIAgent instance is reused across conversations, stale counters from a previous conversation could carry over, causing premature retry exhaustion and partial responses.	2026-03-07 20:12:08 +03:00
teknium1	4d34427cc7	fix: update model version in agent configurations Updated the default model version from "anthropic/claude-sonnet-4-20250514" to "anthropic/claude-sonnet-4.6" across multiple files including AGENTS.md, batch_runner.py, mini_swe_runner.py, and run_agent.py for consistency and to reflect the latest model improvements.	2026-03-07 09:06:37 -08:00
teknium1	0a82396718	feat: shared iteration budget across parent + subagents Subagent tool calls now count toward the same session-wide iteration limit as the parent agent. Previously, each subagent had its own independent counter, so a parent with max_iterations=60 could spawn 3 subagents each doing 50 calls = 150 total tool calls unmetered. Changes: - IterationBudget: thread-safe shared counter (run_agent.py) - consume(): try to use one iteration, returns False if exhausted - refund(): give back one iteration (for execute_code turns) - Thread-safe via Lock (subagents run in ThreadPoolExecutor) - Parent creates the budget, children inherit it via delegate_tool.py - execute_code turns are refunded (don't count against budget) - Default raised from 60 → 90 to account for shared consumption - Per-child cap (50) still applies as a safety valve The per-child max_iterations (default 50) remains as a per-child ceiling, but the shared budget is the hard session-wide limit. A child stops at whichever comes first.	2026-03-07 08:16:37 -08:00
teknium1	5da55ea1e3	fix: sanitize orphaned tool-call/result pairs in message compression Enhance message compression by adding a method to clean up orphaned tool-call and tool-result pairs. This ensures that the API receives well-formed messages, preventing errors related to mismatched IDs. The new functionality includes removing orphaned results and adding stub results for missing calls, improving overall message integrity during compression.	2026-03-07 08:08:00 -08:00
teknium1	69a36a3361	Merge PR #309 : fix(timezone): timezone-aware now() for prompt, cron, and execute_code Authored by areu01or00. Adds timezone support via hermes_time.now() helper with IANA timezone resolution (HERMES_TIMEZONE env → config.yaml → server-local). Updates system prompt timestamp, cron scheduling, and execute_code sandbox TZ injection. Includes config migration (v4→v5) and comprehensive test coverage.	2026-03-07 00:04:41 -08:00
Robin Fernandes	bc091eb7ef	fix: implement Nous credential refresh on 401 error for retry logic	2026-03-07 13:34:23 +11:00
teknium1	8ae4a6f824	fix: improve handling of empty responses after tool calls - Added fallback mechanism to utilize previous content when the model generates an empty response after tool calls, reducing unnecessary API retries. - Enhanced logging to indicate when prior content is used as a final response. - Updated logic to ensure that genuine empty responses are retried appropriately, maintaining user experience.	2026-03-06 16:54:31 -08:00
teknium1	3e93db16bd	Merge PR #436 : fix: use _max_tokens_param in max-iterations retry path Authored by Farukest. Fixes #435. The retry summary in _handle_max_iterations() hardcoded max_tokens instead of using _max_tokens_param(), which returns max_completion_tokens for direct OpenAI API (required by gpt-4o, o-series). The first attempt already used _max_tokens_param correctly — only the retry path was wrong. Includes 4 tests for _max_tokens_param provider detection.	2026-03-06 04:46:24 -08:00
teknium1	c886333d32	feat: smart context length probing with persistent caching + banner display Replaces the unsafe 128K fallback for unknown models with a descending probe strategy (2M → 1M → 512K → 200K → 128K → 64K → 32K). When a context-length error occurs, the agent steps down tiers and retries. The discovered limit is cached per model+provider combo in ~/.hermes/context_length_cache.yaml so subsequent sessions skip probing. Also parses API error messages to extract the actual context limit (e.g. 'maximum context length is 32768 tokens') for instant resolution. The CLI banner now displays the context window size next to the model name (e.g. 'claude-opus-4 · 200K context · Nous Research'). Changes: - agent/model_metadata.py: CONTEXT_PROBE_TIERS, persistent cache (save/load/get), parse_context_limit_from_error(), get_next_probe_tier() - agent/context_compressor.py: accepts base_url, passes to metadata - run_agent.py: step-down logic in context error handler, caches on success - cli.py + hermes_cli/banner.py: context length in welcome banner - tests: 22 new tests for probing, parsing, and caching Addresses #132. PR #319's approach (8K default) rejected — too conservative.	2026-03-05 16:09:57 -08:00
PercyDikec	938499ddfb	fix: add missing empty-content guard after think-block stripping in retry path	2026-03-05 18:57:59 +03:00
Farukest	e25ad79d5d	fix: use _max_tokens_param in max-iterations retry path The retry summary in _handle_max_iterations hardcodes max_tokens instead of calling _max_tokens_param(). For direct OpenAI API users (gpt-4o, o-series), the correct parameter name is max_completion_tokens. The first attempt at line 2697 already uses _max_tokens_param correctly but the retry path at line 2743 was missed.	2026-03-05 17:49:37 +03:00
teknium1	41adca4e77	fix: strip internal fields from API messages in _handle_max_iterations The flush_memories() and run_conversation() code paths already stripped finish_reason and reasoning from API messages (added in `7a0b377` via PR #253), but _handle_max_iterations() was missed. It was sending raw messages.copy() which could include finish_reason, causing 422 errors on strict APIs like Mistral when the agent hit max iterations. Now strips the same internal fields consistently across all three API call sites.	2026-03-04 21:08:20 -08:00
teknium1	3220bb8aaa	Merge PR #403 : Fix context overrun crash with local LLM backends Authored by ch3ronsa. Fixes #348. Adds 'context size' (LM Studio) and 'context window' (Ollama) to context-length error detection phrases so local backend 400 errors trigger compression instead of aborting. Also removes 'error code: 400' from the non-retryable error list as defense in depth.	2026-03-04 17:48:44 -08:00
teknium1	8311e8984b	fix: preflight context compression + error handler ordering for model switches Two fixes for the case where a user switches to a model with a smaller context window while having a large existing session: 1. Preflight compression in run_conversation(): Before the main loop, estimate tokens of loaded history + system prompt. If it exceeds the model's compression threshold (85% of context), compress proactively with up to 3 passes. This naturally handles model switches because the gateway creates a fresh AIAgent per message with the current model's context length. 2. Error handler reordering: Context-length errors (400 with 'maximum context length' etc.) are now checked BEFORE the generic 4xx handler. Previously, OpenRouter's 400-status context-length errors were caught as non-retryable client errors and aborted immediately, never reaching the compression+retry logic. Reported by Sonicrida on Discord: 840-message session (2MB+) crashed after switching from a large-context model to minimax via OpenRouter.	2026-03-04 14:42:41 -08:00
Vicaversa	e9ab711b66	Fix context overrun crash with local LLM backends (fixes #348 ) Local backends (LM Studio, Ollama, llama.cpp) return HTTP 400 with messages like "Context size has been exceeded" when the context window is full. The error phrase list did not include "context size" or "context window", so these errors fell through to the generic 4xx abort handler instead of triggering compression. Changes: - Move context-length check above generic 4xx handler so it runs first (same pattern as the existing 413 check) - Add "context size" and "context window" to the phrase list - Guard 4xx handler with `not is_context_length_error` to prevent context-related 400s from being treated as non-retryable	2026-03-05 01:12:34 +03:00
teknium1	70a0a5ff4a	fix: exclude current session from session_search results session_search was returning the current session if it matched the query, which is redundant — the agent already has the current conversation context. This wasted an LLM summarization call and a result slot. Added current_session_id parameter to session_search(). The agent passes self.session_id and the search filters out any results where either the raw or parent-resolved session ID matches. Both the raw match and the parent-resolved match are checked to handle child sessions from delegation. Two tests added verifying the exclusion works and that other sessions are still returned.	2026-03-04 06:06:40 -08:00
teknium1	db0521ce0e	Merge PR #184 : feat: Home Assistant integration (REST tools + WebSocket gateway) Authored by 0xbyt4. Adds smart home control via REST tools (ha_list_entities, ha_get_state, ha_call_service) with domain blocklist and entity_id validation, plus WebSocket gateway adapter for real-time event monitoring. Also includes Gemini 3 thought_signature preservation fix (extra_content on tool calls) needed for multi-turn tool calling via OpenRouter.	2026-03-03 05:01:39 -08:00
areu01or00	a1c25046a9	fix(timezone): add timezone-aware clock across agent, cron, and execute_code	2026-03-03 18:23:40 +05:30
0xbyt4	aefc330b8f	merge: resolve conflict with main (add mcp + homeassistant extras)	2026-03-03 14:52:22 +03:00
teknium1	4f5ffb8909	fix: NoneType not iterable error when summarizing at max iterations In _handle_max_iterations, the codex_responses path set tools=None to prevent tool calls during summarization. However, the OpenAI SDK's _make_tools() treats None as a valid value (not its Omit sentinel) and tries to iterate over it, causing TypeError: 'NoneType' object is not iterable. Fix: use codex_kwargs.pop('tools', None) to remove the key entirely, so the SDK never receives it and uses its default omit behavior. Fixes #300	2026-03-03 03:42:44 -08:00
teknium1	3c13feed4c	feat: show detailed tool call args in gateway based on config Issue #263: Telegram/Discord/WhatsApp/Slack now show tool call details based on display.tool_progress in config.yaml. Changes: - gateway/run.py: 'verbose' mode shows full args (keys + JSON, 200 char max). 'all' mode preview increased from 40 to 80 chars. Added missing tool emojis (execute_code, delegate_task, clarify, skill_manage, search_files). - agent/display.py: Added execute_code, delegate_task, clarify, skill_manage to primary_args. Added 'code' and 'goal' to fallback keys. - run_agent.py: Pass function_args dict to tool_progress_callback so gateway can format based on its own verbosity config. Config usage: display: tool_progress: verbose # off \| new \| all \| verbose	2026-03-02 05:23:15 -08:00
teknium1	56b53bff6e	Merge PR #229 : fix(agent): copy conversation_history to avoid mutating caller's list Authored by Farukest. Fixes #228. # Conflicts: # tests/test_run_agent.py	2026-03-02 04:21:39 -08:00
teknium1	c4ea996612	fix: repair flush sentinel test — mock auxiliary client and add guard The TestFlushSentinelNotLeaked test from PR #227 had two issues: 1. flush_memories() uses get_text_auxiliary_client() which could bypass agent.client entirely — mock it to return (None, None) 2. No assertion that the API was actually called — added guard assert Without these fixes the test passed vacuously (API never called).	2026-03-02 03:21:08 -08:00
teknium1	e27e3a4f8a	Merge PR #223 : fix: correct off-by-one in retry exhaustion checks Authored by Farukest. Fixes #222.	2026-03-02 02:54:10 -08:00
teknium1	33ab5cec82	fix: handle None message content across codebase (fixes #276 ) The OpenAI API returns content: null on assistant messages with tool calls. msg.get('content', '') returns None when the key exists with value None, causing TypeError on len(), string concatenation, and .strip() in downstream code paths. Fixed 4 locations that process conversation messages: - agent/auxiliary_client.py:84 — None passed to API calls - cli.py:1288 — crash on content[:200] and len(content) - run_agent.py:3444 — crash on None.strip() - honcho_integration/session.py:445 — 'None' rendered in transcript 13 other instances were verified safe (already protected, only process user/tool messages, or use the safe pattern). Pattern: msg.get('content', '') → msg.get('content') or '' Fixes #276	2026-03-02 02:23:53 -08:00
Sertug17	7a0b37712f	fix(agent): strip finish_reason from assistant messages to fix Mistral 422 errors (#253 ) * fix(agent): skip reasoning param for Mistral API to prevent 422 errors * fix(agent): strip finish_reason from assistant messages to fix Mistral 422 errors	2026-03-02 00:35:03 -08:00
teknium1	45d132d098	fix(agent): remove preview truncation in assistant message output Updated the AIAgent class to print the full content of assistant messages without truncation, enhancing visibility of the messages during runtime. This change improves the clarity of communication from the agent.	2026-03-02 00:32:06 -08:00
teknium1	0512ada793	feat(agent): include tools in agent status output Added the tools attribute to the AIAgent class's status output, ensuring that the current tools used by the agent are included in the status information. This enhancement improves the visibility of the agent's capabilities during runtime.	2026-03-02 00:13:41 -08:00
teknium1	47289ba6f1	feat(agent): include system prompt in agent status output Added the system prompt to the AIAgent class's status output, ensuring that the current system prompt is included in the agent's status information. This enhancement improves visibility into the agent's configuration during runtime.	2026-03-01 23:50:54 -08:00
teknium1	e5893075f9	feat(agent): add summary handling for reasoning items Enhanced the AIAgent class to capture and normalize summary information for reasoning items. Implemented logic to handle summaries as lists, ensuring proper formatting for API interactions. Updated tests to validate the inclusion of summaries in reasoning items, both for existing and default cases.	2026-03-01 20:03:03 -08:00
teknium1	8bc2de4ab6	feat(provider-routing): add OpenRouter provider routing configuration Introduced a new `provider_routing` section in the CLI configuration to control how requests are routed across providers when using OpenRouter. This includes options for sorting providers by throughput, latency, or price, as well as allowing or ignoring specific providers, setting the order of provider attempts, and managing data collection policies. Updated relevant classes and documentation to support these features, enhancing flexibility in provider selection.	2026-03-01 18:24:27 -08:00
teknium1	92da8e7e62	feat(agent): enhance reasoning handling and configuration Added support for processing encrypted reasoning content within the AIAgent class. Introduced logic to determine reasoning effort and enable/disable reasoning based on configuration settings. Updated the kwargs to reflect these changes, ensuring proper handling of reasoning parameters during agent execution.	2026-03-01 16:15:20 -08:00
0xbyt4	3fdf03390e	Merge remote-tracking branch 'origin/main' into feature/homeassistant-integration # Conflicts: # run_agent.py	2026-03-01 11:59:12 +03:00
teknium1	177be32b7f	feat(cli): add /usage command to display session token usage Introduced a new command "/usage" in the CLI to show cumulative token usage for the current session. This includes details on prompt tokens, completion tokens, total tokens, API calls, and context state. Updated command documentation to reflect this addition. Enhanced the AIAgent class to track token usage throughout the session.	2026-03-01 00:23:19 -08:00
lila	dd69f16c3e	feat(gateway): expose subagent tool calls and thinking to user (fixes #169 ) (#186 ) When subagents run via delegate_task, the user now sees real-time progress instead of silence: CLI: tree-view activity lines print above the delegation spinner 🔀 Delegating: research quantum computing ├─ 💭 "I'll search for papers first..." ├─ 🔍 web_search "quantum computing" ├─ 📖 read_file "paper.pdf" └─ ⠹ working... (18.2s) Gateway (Telegram/Discord): batched progress summaries sent every 5 tool calls to avoid message spam. Remaining tools flushed on subagent completion. Changes: - agent/display.py: add KawaiiSpinner.print_above() to print status lines above an active spinner without disrupting animation. Uses captured stdout (self._out) so it works inside the child's redirect_stdout(devnull). - tools/delegate_tool.py: add _build_child_progress_callback() that creates a per-child callback relaying tool calls and thinking events to the parent's spinner (CLI) or progress queue (gateway). Each child gets its own callback instance, so parallel subagents don't share state. Includes _flush() for gateway batch completion. - run_agent.py: fire tool_progress_callback with '_thinking' event when the model produces text content. Guarded by _delegate_depth > 0 so only subagents fire this (prevents gateway spam from main agent). REASONING_SCRATCHPAD/think/ reasoning XML tags are stripped before display. Tests: 21 new tests covering print_above, callback builder, thinking relay, SCRATCHPAD filtering, batching, flush, thread isolation, delegate_depth guard, and prefix handling.	2026-02-28 23:18:00 -08:00
teknium1	23d0b7af6a	feat(logging): implement persistent error logging for tool failures - Introduce a separate error log for capturing warnings and errors related to tool execution, ensuring detailed inspection of issues post-failure. - Enhance error handling in the AIAgent class to log exceptions with stack traces for better debugging. - Add a similar error logging mechanism in the gateway to streamline debugging processes.	2026-02-28 22:49:58 -08:00
teknium1	95b0610f36	refactor(cli, auth): Add Codex/OpenAI OAuth Support - finalized - Replace `hermes login` with `hermes model` for selecting providers and managing authentication. - Update documentation and CLI commands to reflect the new provider selection process. - Introduce a new redaction system for logging sensitive information. - Enhance Codex model discovery by integrating API fetching and local cache. - Adjust max turns configuration logic for better clarity and precedence. - Improve error handling and user feedback during authentication processes.	2026-02-28 21:56:27 -08:00
teknium1	500f0eab4a	refactor(cli): Finalize OpenAI Codex Integration with OAuth - Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults. - Updated the context compressor's summary target tokens to 2500 for improved performance. - Added external credential detection for Codex CLI to streamline authentication. - Refactored various components to ensure consistent handling of authentication and model selection across the application.	2026-02-28 21:47:51 -08:00
Teknium	5a79e423fe	Merge branch 'main' into codex/align-codex-provider-conventions-mainrepo	2026-02-28 18:13:38 -08:00
teknium1	7f7643cf63	feat(hooks): introduce event hooks system for lifecycle management Add a new hooks system allowing users to run custom code at key lifecycle points in the agent's operation. This includes support for events such as `gateway:startup`, `session:start`, `agent:step`, and more. Documentation for creating hooks and available events has been added to `README.md` and a new `hooks.md` file. Additionally, integrate step callbacks in the agent to facilitate hook execution during tool-calling iterations.	2026-02-28 17:09:26 -08:00
Teknium	31a5cd185a	Merge pull request #174 from Bartok9/fix-think-block-leakage fix: strip <think> blocks from final response to users	2026-02-28 16:43:47 -08:00
Farukest	e87859e82c	fix(agent): copy conversation_history to avoid mutating caller's list	2026-03-01 03:06:13 +03:00
Farukest	de101a8202	fix(agent): strip _flush_sentinel from API messages	2026-03-01 02:51:31 +03:00
Farukest	c33f8d381b	fix: correct off-by-one in retry exhaustion checks The retry exhaustion checks used > instead of >= to compare retry_count against max_retries. Since the while loop condition is retry_count < max_retries, the check retry_count > max_retries can never be true inside the loop. When retries are exhausted, the loop exits and falls through to response.choices[0] on an invalid response, crashing with IndexError instead of returning a proper error.	2026-03-01 02:27:26 +03:00
teknium1	2205b22409	fix(headers): update X-OpenRouter-Categories to include 'productivity'	2026-02-28 10:38:49 -08:00
0xbyt4	dfd50ceccd	fix: preserve Gemini thought_signature in tool call messages Gemini 3 thinking models attach extra_content with thought_signature to function call responses. This must be echoed back on subsequent API calls or the server rejects with a 400 error. The assistant message builder was dropping this field, causing all Gemini 3 Flash/Pro tool-calling flows to fail after the first function call.	2026-02-28 18:10:05 +03:00
teknium1	6366177118	refactor: update context compression configuration to use config.yaml and improve model handling	2026-02-28 04:46:38 -08:00
Bartok9	1e463a8e39	fix: strip <think> blocks from final response to users Fixes #149 The _strip_think_blocks() method existed but was not applied to the final_response in the normal completion path. This caused <think>...</think> XML tags to leak into user-facing responses on all platforms (CLI, Telegram, Discord, Slack, WhatsApp). Changes: - Strip think blocks from final_response before returning in normal path (line ~2600) - Strip think blocks from fallback content when salvaging from prior tool_calls turn Notes: - The raw content with think blocks is preserved in messages[] for trajectory export - this only affects the user-facing final_response - The _has_content_after_think_block() check still uses raw content before stripping, which is correct for detecting think-only responses	2026-02-28 03:06:20 -05:00
Teknium	4a9086b848	Merge branch 'main' into feat/honcho-integration	2026-02-27 23:32:49 -08:00
teknium1	50cb4d5fc7	fix(agent): update error message for unsupported Anthropic API endpoints to clarify usage of OpenRouter	2026-02-27 23:23:31 -08:00
Teknium	2bc9508b7c	Merge pull request #173 from adavyas/fix/anthropic-base-url-guard fix(agent): fail fast on Anthropic native base URLs	2026-02-27 23:22:01 -08:00
teknium1	19f28a633a	fix(agent): enhance 413 error handling and improve conversation history management in tests	2026-02-27 23:04:32 -08:00
Teknium	2c817ce4a5	Merge pull request #153 from tekelala/main fix(agent): handle 413 payload-too-large via compression instead of aborting	2026-02-27 22:57:55 -08:00
adavyas	0c0a2eb0a2	fix(agent): fail fast on Anthropic native base URLs	2026-02-27 21:19:29 -08:00
teknium1	de0829cec3	fix(cli): increase max iterations for child agents and extend API call timeout for improved reliability	2026-02-27 17:35:29 -08:00
tekelala	79bd65034c	fix(agent): handle 413 payload-too-large via compression instead of aborting The 413 "Request Entity Too Large" error from the LLM API was caught by the generic 4xx handler which aborts immediately. This is wrong for 413 — it's a payload-size issue that can be resolved by compressing conversation history. - Intercept 413 before the generic 4xx block and route to _compress_context - Exclude 413 from generic is_client_error detection - Add 'request entity too large' to context-length phrases as safety net - Add tests for 413 compression behavior Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-27 12:21:27 -05:00
teknium1	c77f3da0ce	Cherry-pick 6 bug fixes from PR #76 and update documentation Code fixes (run_agent.py): - Fix off-by-one in _flush_messages_to_session_db skipping one message per flush - Add clear_interrupt() to 3 early-return paths preventing stale interrupt state - Wrap handle_function_call in try/except so tool crashes don't kill the conversation - Replace fragile `is` identity check with _flush_sentinel marker for memory flush cleanup - Fix retry loop off-by-one (6 attempts not 7) - Remove redundant inline `import re`	2026-02-27 03:21:49 -08:00
Bartok Moltbot	8aa531c7fa	fix(gateway): Pass session_db to AIAgent, fixing session_search error When running via the gateway (e.g. Telegram), the session_search tool returned: {"error": "session_search must be handled by the agent loop"} Root cause: - gateway/run.py creates AIAgent without passing session_db= - self._session_db is None in the agent instance - The dispatch condition "elif function_name == 'session_search' and self._session_db" skips when _session_db is None, falling through to the generic error This fix: 1. Initializes self._session_db in GatewayRunner.__init__() 2. Passes session_db to all AIAgent instantiations in gateway/run.py 3. Adds defensive fallback in run_agent.py to return a clear error when session_db is unavailable, instead of falling through Fixes #105	2026-02-27 00:32:17 -05:00
teknium1	58fce0a37b	feat(api): implement dynamic max tokens handling for various providers - Added _max_tokens_param method in AIAgent to return appropriate max tokens parameter based on the provider (OpenAI vs. others). - Updated API calls in AIAgent to utilize the new max tokens handling. - Introduced auxiliary_max_tokens_param function in auxiliary_client for consistent max tokens management across auxiliary clients. - Refactored multiple tools to use auxiliary_max_tokens_param for improved compatibility with different models and providers.	2026-02-26 20:23:56 -08:00
Erosika	70d1abf81b	refactor: run Honcho and USER.md in tandem USER.md stays in system prompt when Honcho is active -- prefetch is additive context, not a replacement. Memory tool user observations write to both USER.md (local) and Honcho (cross-session) simultaneously.	2026-02-26 18:07:33 -05:00
Erosika	1fd0fcddb2	feat: integrate Honcho with USER.md memory system When Honcho is active: - System prompt uses Honcho prefetch instead of USER.md - memory tool target=user add routes to Honcho - MEMORY.md untouched in all cases When disabled, everything works as before. Also wires up contextTokens config to cap prefetch size.	2026-02-26 18:07:17 -05:00
Erosika	ab4bbf2fb2	feat: add Honcho AI-native memory integration Opt-in persistent cross-session user modeling via Honcho. Reads ~/.honcho/config.json as single source of truth (shared with Claude Code, Cursor, and other Honcho-enabled tools). Zero impact when disabled or unconfigured. - honcho_integration/ package (client, session manager, peer resolution) - Host-based config resolution matching claude-honcho/cursor-honcho pattern - Prefetch user context into system prompt per conversation turn - Sync user/assistant messages to Honcho after each exchange - query_user_context tool for mid-conversation dialectic reasoning - Gated activation: requires ~/.honcho/config.json with enabled=true	2026-02-26 18:07:17 -05:00
George Pickett	32070e6bc0	Merge remote-tracking branch 'origin/main' into codex/align-codex-provider-conventions-mainrepo # Conflicts: # cron/scheduler.py # gateway/run.py # tools/delegate_tool.py	2026-02-26 10:56:29 -08:00
Dean Kerr	5a569eb1b6	fix: resolve .env and config paths from HERMES_HOME, not PROJECT_ROOT The `hermes` CLI entry point (hermes_cli/main.py) and the agent runner (run_agent.py) only loaded .env from the project installation directory. After the standard installer, code lives at ~/.hermes/hermes-agent/ but config lives at ~/.hermes/ — so the .env was never found. Aligns these entry points with the pattern already used by gateway/run.py and rl_cli.py: load ~/.hermes/.env first, fall back to project root .env for dev-mode compatibility. Also fixes: - status.py checking .env existence and API keys at PROJECT_ROOT - doctor.py KeyError on tool availability (missing_vars vs env_vars) - doctor.py checking logs/ and Skills Hub at PROJECT_ROOT instead of HERMES_HOME - doctor.py redundant logs/ check (already covered by subdirectory loop) - mini-swe-agent loading config from platformdirs default instead of ~/.hermes/ Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 16:49:14 +11:00
George Pickett	74c662b63a	Harden Codex auth refresh and responses compatibility	2026-02-25 19:27:54 -08:00
George Pickett	91bdb9eb2d	Fix Codex stream fallback for Responses completion gaps	2026-02-25 19:08:11 -08:00
George Pickett	47f16505d2	Omit optional function_call id in Responses replay input	2026-02-25 19:00:11 -08:00
George Pickett	e63986b534	Harden Codex stream handling and ack continuation	2026-02-25 18:56:06 -08:00
George Pickett	ce175d7372	Fix Codex Responses continuation and schema parity	2026-02-25 18:20:41 -08:00
George Pickett	609b19b630	Add OpenAI Codex provider runtime and responses integration (without .agent/PLANS.md)	2026-02-25 18:20:38 -08:00
teknium1	e3cb957a10	refactor: streamline reasoning configuration checks in AIAgent - Simplified the logic for determining support for reasoning based on the base URL by introducing clearer variable names. - Added product attribution for the Nous Portal to the extra body of requests when applicable, enhancing tagging for better tracking.	2026-02-25 16:49:41 -08:00
teknium1	9a858b8d67	add identifier for openrouter calls	2026-02-25 16:34:47 -08:00
teknium1	d72b9eadec	More fixes for windoze	2026-02-25 15:20:42 -08:00
teknium1	f64a87209d	refactor: enhance session content handling in AIAgent and update TTS output path - Introduced a new static method `_clean_session_content` in the `AIAgent` class to convert REASONING_SCRATCHPAD tags to <think> blocks and clean up whitespace in session logs. - Updated the `_save_session_log` method to utilize the cleaned content for assistant messages, ensuring consistency in session logs. - Changed the default output directory for TTS audio files from `~/voice-memos` to `~/.hermes/audio_cache`, reflecting a more appropriate storage location.	2026-02-25 04:22:03 -08:00
teknium1	41df8ee4f5	refactor: enhance interrupt handling in AIAgent class - Updated the `clear_interrupt` method to also reset the global tool interrupt signal, improving the clarity of interrupt management within the agent. - This change ensures that all interrupt states are properly cleared, enhancing the reliability of the agent's operation.	2026-02-25 03:45:47 -08:00
teknium1	681141a526	fix: ansi escapes causing broken terminal cli output	2026-02-24 03:42:12 -08:00
teknium1	e049441d93	feat: add reasoning effort configuration for agent - Introduced a new configuration option for reasoning effort in the CLI, allowing users to specify the level of reasoning the agent should perform before responding. - Updated the CLI and agent initialization to incorporate the reasoning configuration, enhancing the agent's responsiveness and adaptability. - Implemented logic to load reasoning effort from environment variables and configuration files, providing flexibility in agent behavior. - Enhanced the documentation in the example configuration file to clarify the new reasoning effort options available.	2026-02-24 03:30:19 -08:00
teknium1	2bf96ad244	feat: add ephemeral prefill messages and system prompt loading - Implemented functionality to load ephemeral prefill messages from a JSON file, enhancing few-shot priming capabilities for the agent. - Introduced a mechanism to load an ephemeral system prompt from environment variables or configuration files, ensuring dynamic prompt adjustments at API-call time. - Updated the CLI and agent initialization to utilize the new prefill messages and system prompt, improving the overall interaction experience. - Enhanced configuration options with new environment variables for prefill messages and system prompts, allowing for greater customization without persistence.	2026-02-23 23:55:42 -08:00
teknium1	d18c753b3c	refactor: streamline scratchpad handling in AIAgent - Removed static methods for converting and checking <REASONING_SCRATCHPAD> tags, simplifying the codebase. - Replaced calls to the removed methods with direct function calls for better clarity and maintainability. - Updated trajectory saving logic to utilize a dedicated function for improved organization and readability.	2026-02-23 09:55:09 -08:00
teknium1	90af34bc83	feat: enhance interrupt handling and container resource configuration - Introduced a shared interrupt signaling mechanism to allow tools to check for user interrupts during long-running operations. - Updated the AIAgent to handle interrupts more effectively, ensuring in-progress tool calls are canceled and multiple interrupt messages are combined into one prompt. - Enhanced the CLI configuration to include container resource limits (CPU, memory, disk) and persistence options for Docker, Singularity, and Modal environments. - Improved documentation to clarify interrupt behaviors and container resource settings, providing users with better guidance on configuration and usage.	2026-02-23 02:11:33 -08:00
teknium1	c7857dc1d4	feat: enhance AIAgent's tool usage nudges and content handling - Introduced a method to strip <think> blocks from content, improving text visibility. - Implemented counters to reset nudge intervals when memory and skill tools are used, enhancing user guidance. - Captured content from turns with tool calls to provide fallback responses, ensuring continuity in conversation. - Updated nudge logic to remind users about saving memories and creating skills based on interaction patterns.	2026-02-22 21:33:28 -08:00
teknium1	6037b6a5ab	Fix session saving to DB with full conversation history (not just user/assistant messages without tool calls)	2026-02-22 17:10:24 -08:00
teknium1	db23f51bc6	feat: introduce skills management features in AIAgent and CLI - Added skills configuration options in cli-config.yaml.example, including a nudge interval for skill creation reminders. - Implemented skills guidance in AIAgent to prompt users to save reusable workflows after complex tasks. - Enhanced skills indexing in the prompt builder to include descriptions from SKILL.md files for better context. - Updated the agent's behavior to periodically remind users about potential skills during tool-calling iterations.	2026-02-22 13:28:13 -08:00
teknium1	3c6750f37b	feat: enhance memory management features in AIAgent and CLI - Added configuration options for memory nudge interval and flush minimum turns in cli-config.yaml.example. - Implemented memory flushing before conversation reset, clearing, and exit in the CLI to ensure memories are saved. - Introduced a flush_memories method in AIAgent to handle memory persistence before context loss. - Added periodic nudges to remind the agent to consider saving memories based on user interactions.	2026-02-22 10:15:17 -08:00
teknium1	e223b4ac09	Enhance agent guidance with memory and session search tools - Introduced MEMORY_GUIDANCE and SESSION_SEARCH_GUIDANCE to improve agent's contextual awareness and proactive assistance. - Updated AIAgent to conditionally include tool-aware guidance in prompts based on available tools. - Enhanced descriptions in memory and session search schemas for clearer user instructions on when to utilize these features.	2026-02-22 02:31:52 -08:00
teknium1	f072801f38	refactor: remove unused compression model variable in AIAgent - Eliminated the `compression_model` variable from the AIAgent class, as it was not being utilized. - Cleaned up the context compressor initialization for improved clarity and maintainability.	2026-02-22 02:17:33 -08:00
teknium1	ededaaa874	Hermes Agent UX Improvements	2026-02-22 02:16:11 -08:00
teknium1	51b95236f9	refactor: move model metadata functions to agent/model_metadata.py - Relocated functions related to model metadata, including fetch_model_metadata, get_model_context_length, estimate_tokens_rough, and estimate_messages_tokens_rough, to agent/model_metadata.py for better organization and maintainability. - Updated imports in run_agent.py to reflect the new location of these functions.	2026-02-21 22:34:18 -08:00
teknium1	9123cfb5dd	Refactor Terminal and AIAgent cleanup	2026-02-21 22:31:43 -08:00
teknium1	8f6788474b	feat: enhance logging in AIAgent for quiet mode - Added functionality to suppress logging noise from specific modules when in quiet mode, improving user experience in CLI. - Updated terminal_tool.py to change the log level for fallback directory usage from warning to debug, providing clearer context without cluttering logs.	2026-02-21 12:41:05 -08:00
teknium1	c98ee98525	feat: implement interactive prompts for sudo password and command approval in CLI - Added methods for handling sudo password and dangerous command approval prompts using a callback mechanism in cli.py. - Integrated these prompts with the prompt_toolkit UI for improved user experience. - Updated terminal_tool.py to support callback registration for interactive prompts, enhancing the CLI's interactivity. - Introduced a background thread for API calls in run_agent.py to allow for interrupt handling during long-running operations. - Enhanced error handling for interrupted API calls, ensuring graceful degradation of user experience.	2026-02-21 12:15:40 -08:00
teknium1	ecb430effe	refactor: enhance API interaction and message handling in AIAgent - Introduced new methods in run_agent.py for building API keyword arguments and normalizing assistant messages from API responses. - Added functionality for compressing conversation context and managing session state in SQLite. - Improved tool call execution handling, including enhanced logging and error management. - Updated path handling in multiple platform files to utilize pathlib for better compatibility and readability.	2026-02-21 04:17:27 -08:00
teknium1	748fd3db88	refactor: enhance error handling with structured logging across multiple modules - Updated various modules including cli.py, run_agent.py, gateway, and tools to replace silent exception handling with structured logging. - Improved error messages to provide more context, aiding in debugging and monitoring. - Ensured consistent logging practices throughout the codebase, enhancing traceability and maintainability.	2026-02-21 03:32:11 -08:00
teknium1	a885d2f240	refactor: implement structured logging across multiple modules - Introduced logging functionality in cli.py, run_agent.py, scheduler.py, and various tool modules to replace print statements with structured logging. - Enhanced error handling and informational messages to improve debugging and monitoring capabilities. - Ensured consistent logging practices across the codebase, facilitating better traceability and maintenance.	2026-02-21 03:11:11 -08:00
teknium1	3555c6173d	refactor: remove temporary API payload logging and enhance session log structure - Eliminated the `_log_api_payload` method used for temporary debugging, streamlining the codebase. - Updated the `_save_session_log` method to save the full raw session, including all messages and metadata, improving the clarity and completeness of session logs. - Adjusted session log entry to include additional context such as `base_url` and `platform` for better tracking.	2026-02-21 01:26:37 -08:00
teknium1	3976962621	fix: update session logging directory path in README and code - Changed the session logging directory from `~/.hermes-agent/logs/` to `~/.hermes/sessions/` for consistency. - Updated the `run_agent.py` to reflect the new logging path, ensuring session logs are stored correctly alongside gateway sessions.	2026-02-21 01:20:18 -08:00
teknium1	b33ed9176f	feat: update database schema and enhance message persistence - Incremented schema version to 2 and added a new column `finish_reason` to the `messages` table. - Implemented a method to flush un-logged messages to the session database, ensuring data integrity during conversation interruptions. - Enhanced error handling to persist messages in various early-return scenarios, preventing data loss.	2026-02-21 00:05:39 -08:00
teknium1	70dd3a16dc	Cleanup time!	2026-02-20 23:23:32 -08:00
teknium1	cfef34f7a6	feat: add multi-provider authentication and inference provider selection - Implemented a multi-provider authentication system for the Hermes Agent, supporting OAuth for Nous Portal and traditional API key methods for OpenRouter and custom endpoints. - Enhanced CLI with commands for logging in and out of providers, allowing users to authenticate and manage their credentials easily. - Updated configuration options to select inference providers, with detailed documentation on usage and setup. - Improved status reporting to include authentication status and provider details, enhancing user awareness of their current configuration. - Added new files for authentication handling and updated existing components to integrate the new provider system.	2026-02-20 17:24:00 -08:00
teknium1	ba07d9d5e3	feat: enhance task delegation with spinner updates and progress display - Added a spinner to visually indicate task delegation progress in quiet mode, improving user experience during batch processing. - Implemented a method to update spinner text dynamically based on remaining tasks, providing real-time feedback. - Enhanced the `delegate_task` function to include per-task completion messages, ensuring clarity on task status during execution. - Updated the KawaiiSpinner class to allow message updates while running, facilitating better interaction during long-running tasks.	2026-02-20 03:23:23 -08:00
teknium1	90e5211128	feat: implement subagent delegation for task management - Introduced the `delegate_task` tool, allowing the main agent to spawn child AIAgent instances with isolated context for complex tasks. - Supported both single-task and batch processing (up to 3 concurrent tasks) to enhance task management capabilities. - Updated configuration options for delegation, including maximum iterations and default toolsets for subagents. - Enhanced documentation to provide clear guidance on using the delegation feature and its configuration. - Added comprehensive tests to ensure the functionality and reliability of the delegation logic.	2026-02-20 03:15:53 -08:00
teknium1	f9eb5edb96	refactor: rename search tool for clarity and consistency - Updated the tool name from "search" to "search_files" across multiple files to better reflect its functionality. - Adjusted related documentation and descriptions to ensure clarity in usage and expected behavior. - Enhanced the toolset definitions and mappings to incorporate the new naming convention, improving overall consistency in the codebase.	2026-02-20 02:43:57 -08:00
teknium1	783acd712d	feat: implement code execution sandbox for programmatic tool calling - Introduced a new `execute_code` tool that allows the agent to run Python scripts that call Hermes tools via RPC, reducing the number of round trips required for tool interactions. - Added configuration options for timeout and maximum tool calls in the sandbox environment. - Updated the toolset definitions to include the new code execution capabilities, ensuring integration across platforms. - Implemented comprehensive tests for the code execution sandbox, covering various scenarios including tool call limits and error handling. - Enhanced the CLI and documentation to reflect the new functionality, providing users with clear guidance on using the code execution tool.	2026-02-19 23:23:43 -08:00
teknium1	9350e26e68	feat: introduce clarifying questions tool for interactive user engagement - Added a new `clarify_tool` to enable the agent to ask structured multiple-choice or open-ended questions to users. - Implemented callback functionality for user interaction, allowing the platform to handle UI presentation. - Updated the CLI and agent to support clarify questions, including timeout handling and response management. - Enhanced toolset definitions and requirements to include the clarify tool, ensuring availability across platforms.	2026-02-19 20:06:14 -08:00
teknium1	4d5f29c74c	feat: introduce skill management tool for agent-created skills and skills migration to ~/.hermes - Added a new `skill_manager_tool` to enable agents to create, update, and delete their own skills, enhancing procedural memory capabilities. - Updated the skills directory structure to support user-created skills in `~/.hermes/skills/`, allowing for better organization and management. - Enhanced the CLI and documentation to reflect the new skill management functionalities, including detailed instructions on creating and modifying skills. - Implemented a manifest-based syncing mechanism for bundled skills to ensure user modifications are preserved during updates.	2026-02-19 18:25:53 -08:00
teknium1	3f4b494c61	refactor: streamline thinking spinner behavior in AIAgent - Updated the logic for stopping the thinking spinner to improve clarity in tool execution messages. - Removed unnecessary checks for tool calls, simplifying the spinner's stop behavior while maintaining informative output for users.	2026-02-19 01:56:04 -08:00
teknium1	56ee8a5cc6	refactor: remove 'read' action from memory tool and agent logging - Eliminated the 'read' action from the memory tool and related logging in the agent, streamlining the available actions to 'add', 'replace', and 'remove'. - Updated error messages and documentation to reflect the removal of the 'read' action, ensuring clarity in the API's usage.	2026-02-19 01:03:08 -08:00
teknium1	440c244cac	feat: add persistent memory system + SQLite session store Two-part implementation: Part A - Curated Bounded Memory: - New memory tool (tools/memory_tool.py) with MEMORY.md + USER.md stores - Character-limited (2200/1375 chars), § delimited entries - Frozen snapshot injected into system prompt at session start - Model manages pruning via replace/remove with substring matching - Usage indicator shown in system prompt header Part B - SQLite Session Store: - New hermes_state.py with SessionDB class, FTS5 full-text search - Gateway session.py rewritten to dual-write SQLite + legacy JSONL - Compression-triggered session splitting with parent_session_id chains - New session_search tool with Gemini Flash summarization of matched sessions - CLI session lifecycle (create on launch, close on exit) Also: - System prompt now cached per session, only rebuilt on compression (fixes prefix cache invalidation from date/time changes every turn) - Config version bumped to 3, hermes doctor checks for new artifacts - Disabled in batch_runner and RL environments	2026-02-19 00:57:31 -08:00
teknium1	d7cef744ec	Add autocomplete and multiline support in HermesCLI input - Introduced SlashCommandCompleter for command autocompletion, enhancing user experience by suggesting commands as users type. - Enabled multiline input with Shift+Enter, allowing users to enter longer messages more conveniently. - Implemented paste detection to handle large text inputs, saving them to temporary files and replacing them with compact references in the input area. - Updated input area styling and hint display to improve usability and feedback during agent operation.	2026-02-17 21:47:54 -08:00
teknium1	a7f52911e1	Refactor CLI output formatting in AIAgent - Removed ANSI escape codes for color in tool activity messages to simplify output. - Updated the _get_cute_tool_message method to provide a cleaner, more consistent format for various tool activities. - Enhanced readability by aligning messages and removing unnecessary complexity, ensuring a more straightforward user experience.	2026-02-17 21:29:23 -08:00
teknium1	1e31614572	Refactor tool activity messages in AIAgent for improved CLI output - Introduced ANSI escape codes for color-coded CLI messages to enhance readability. - Updated the _get_cute_tool_message method to generate clean, aligned activity lines for various tools, replacing kawaii ASCII art with a more structured format. - Simplified message construction for web tools, terminal commands, and process management, ensuring consistent and scannable output.	2026-02-17 21:26:41 -08:00
teknium1	3b615b0f7a	Enhance tool previews in AIAgent and GatewayRunner - Updated the _build_tool_preview function to include detailed previews for new tools: 'todo', 'send_message', and various 'rl_' tools, improving user feedback during task execution. - Added emoji representations for tools in GatewayRunner, including 'process', 'todo', and 'send_message', to enhance visual clarity in progress messages. - Improved handling of task management and messaging outputs, ensuring more informative and user-friendly interactions.	2026-02-17 17:11:31 -08:00
teknium1	e184f5ab3a	Add todo tool for agent task planning and management Single `todo` tool that reads (no params) or writes (provide todos array with merge flag). In-memory TodoStore on AIAgent, no system prompt mutation, behavioral guidance in tool description only. State re-injected after context compression events. Gateway sessions hydrate from conversation history. Added to all platform toolsets. Also wired into RL agent_loop.py with per-run TodoStore and fixed browser_snapshot user_task passthrough from first user message.	2026-02-17 17:02:33 -08:00
teknium1	6731230d73	Add special handling for 'process' tool in _build_tool_preview function - Enhanced the _build_tool_preview function to include specific formatting for the 'process' tool, displaying action, session_id, data, and timeout when applicable. - This update improves the clarity of tool previews, particularly for actions that require session tracking and timeout management.	2026-02-17 03:18:27 -08:00
teknium1	48b5cfd085	Add skip_context_files option to AIAgent for batch processing - Introduced a new parameter `skip_context_files` in the AIAgent class to control the inclusion of context files (SOUL.md, AGENTS.md, .cursorrules) in the system prompt. - Updated the _process_single_prompt function to set `skip_context_files` to True, preventing pollution of trajectories during batch processing and data generation.	2026-02-16 22:40:31 -08:00
teknium1	84718d183a	Add platform-specific formatting hints and identity for AIAgent - Introduced a default agent identity prompt to ensure consistent behavior across platforms. - Added platform-specific formatting hints for CLI, WhatsApp, Telegram, and Discord to guide the agent's output style. - Updated the AIAgent initialization to accept a platform parameter, enhancing adaptability to different interfaces.	2026-02-12 16:11:16 -08:00
teknium1	3099a2f53c	Add timestamp to active system prompt in AIAgent - Appended the current local date and time to the active system prompt to provide context for the model, addressing potential misinterpretations due to training cutoffs.	2026-02-12 15:59:31 -08:00
teknium1	f5be6177b2	Add Text-to-Speech (TTS) functionality with multiple providers Add tool previews Add AGENTS and SOUL.md support Add Exec Approval	2026-02-12 10:05:08 -08:00
teknium1	153cd5bb44	Refactor skills tool integration and enhance system prompt - Removed the skills_categories tool from the skills toolset, streamlining the skills functionality to focus on skills_list and skill_view. - Updated the system prompt to dynamically build a compact skills index, allowing the model to quickly reference available skills without additional tool calls. - Cleaned up related code and documentation to reflect the removal of skills_categories, ensuring clarity and consistency across the codebase.	2026-02-10 19:48:38 -08:00
teknium1	cfe2f3fe15	Implement interrupt handling for long-running tool executions in AIAgent - Added functionality to signal and terminate long-running terminal commands when a new user message is received, allowing for immediate agent response. - Introduced a global interrupt event in the terminal tool to facilitate early termination of subprocesses. - Updated the AIAgent class to handle interrupts gracefully, ensuring that remaining tool calls are skipped and appropriate messages are returned to maintain valid message sequences.	2026-02-10 16:34:27 -08:00
teknium	1b1307d0d1	Implement Anthropic prompt caching for Claude models via OpenRouter - Introduced a caching strategy that reduces input token costs by ~75% on multi-turn conversations by caching the conversation prefix. - Added functions to apply cache control markers to messages, enhancing efficiency in token usage. - Updated AIAgent to auto-enable prompt caching for Claude models, with configurable cache TTL. - Enhanced logging to track cache hit statistics when caching is active, improving monitoring of token usage.	2026-02-10 06:49:41 +00:00
teknium	dd70d57b9b	Refactor BatchRunner and AIAgent for enhanced reasoning and tool management, improved tool definitions for fileops - Updated `ALL_POSSIBLE_TOOLS` to auto-derive from `TOOL_TO_TOOLSET_MAP` for consistent schema. - Introduced `_extract_reasoning_stats` function to track reasoning coverage in assistant turns. - Enhanced `_process_batch_worker` to discard prompts with no reasoning and aggregate reasoning statistics. - Updated documentation and comments for clarity on new features and changes.	2026-02-08 20:19:14 +00:00
teknium	f12ea1bc02	Enhance BatchRunner and AIAgent with new configuration options, default model now opus 4.6, default summarizer gemini flash 3 - Added `max_tokens`, `reasoning_config`, and `prefill_messages` parameters to `BatchRunner` and `AIAgent` for improved model response control. - Updated CLI to support new options for reasoning effort and prefill messages from a JSON file. - Modified example configuration files to reflect changes in default model and summary model. - Improved error handling for loading prefill messages and reasoning configurations in the CLI. - Updated documentation to include new parameters and usage examples.	2026-02-08 10:49:24 +00:00
teknium1	3c0d0dba49	Update RL tools and enhance configuration management - Modified `model_tools.py` to update default model IDs and add new RL function `rl_test_inference`. - Enhanced `README.md` with installation instructions for submodules and updated API key usage. - Improved `rl_cli.py` to load configuration from `~/.hermes/config.yaml` and set terminal working directory for RL tools. - Updated `run_agent.py` to handle empty string arguments as empty objects for better JSON validation. - Refined installation scripts to ensure submodules are cloned and installed correctly, enhancing setup experience.	2026-02-04 13:57:59 -08:00
teknium1	9bfe185a2e	Implement interrupt handling for agent and CLI input and persistent prompt line at bottom of CLI :) - Enhanced the AIAgent class to support interrupt requests, allowing for graceful interruption of ongoing tasks and processing of new messages. - Updated the HermesCLI to manage user input in a persistent manner, enabling real-time interruption of the agent's conversation. - Introduced a mechanism in the GatewayRunner to handle incoming messages while an agent is running, allowing for immediate response to user commands. - Improved overall user experience by providing feedback during interruptions and ensuring that pending messages are processed correctly.	2026-02-03 16:15:49 -08:00
teknium1	beeb7896e0	Refactor message handling and error logging in agent and gateway - Updated the AIAgent class to extract the first user message for trajectory formatting, improving the accuracy of user queries in the trajectory format. - Enhanced the GatewayRunner to convert transcript history into the agent format, ensuring proper handling of message roles and content. - Adjusted the typing indicator refresh rate to every 2 seconds for better responsiveness. - Improved error handling in the message sending process for the Telegram adapter, implementing a fallback mechanism for Markdown parsing failures, and logging send failures for better debugging.	2026-02-03 15:42:54 -08:00
teknium1	212460289b	Enhance skills tool to have an arg so it is more reliably called, and error handling in agent - Updated the `skills_categories` function to include a `verbose` parameter, allowing users to request skill counts per category. - Modified the `handle_skills_function_call` method to pass the `verbose` argument to `skills_categories`. - Improved error handling in the `AIAgent` class by injecting a recovery message when invalid JSON arguments are detected, guiding users on how to correct their tool calls. - Enhanced the `GatewayRunner` to return a user-friendly error message if the agent fails to generate a final response, improving overall user experience.	2026-02-03 15:26:59 -08:00
teknium1	e7f0ffbf5d	Add tool progress notifications for messaging channels - Introduced a new callback mechanism in the AIAgent class to send tool progress messages during execution, enhancing user feedback in messaging platforms. - Updated the GatewayRunner to support tool progress notifications, allowing users to enable or disable this feature via environment variables. - Enhanced the CLI setup wizard to prompt users for enabling tool progress messages and selecting the notification mode (all or new), improving configuration options. - Updated relevant documentation to reflect the new features and configuration settings for tool progress notifications.	2026-02-03 14:54:43 -08:00
teknium1	7eac4ee9fe	Update agent configuration for maximum tool-calling iterations - Increased the default maximum tool-calling iterations from 20 to 60 in the CLI configuration and related files, allowing for more complex tasks. - Updated documentation and comments to reflect the new recommended range for iterations, enhancing user guidance. - Implemented backward compatibility for loading max iterations from the root-level configuration, ensuring a smooth transition for existing users. - Adjusted the setup wizard to prompt for the maximum iterations setting, improving user experience during configuration.	2026-02-03 14:48:19 -08:00
teknium1	e114f09f70	Implement reasoning extraction and enhance assistant message handling - Added a new method `_extract_reasoning` to extract reasoning content from assistant messages, accommodating multiple formats from various providers. - Updated message handling to ensure all assistant messages include reasoning content for API compatibility, preserving multi-turn reasoning context. - Enhanced logging to capture reasoning details for debugging and analysis. - Modified the TODO.md to reflect changes in planning and task management, emphasizing the need for structured task decomposition and progress tracking.	2026-02-01 22:48:18 -08:00
teknium1	9b4d9452ba	Add context compression feature for long conversations - Implemented automatic context compression to manage long conversations that approach the model's context limit. - Configured the feature to summarize middle turns while protecting the first three and last four turns, ensuring important context is retained. - Added configuration options in `cli-config.yaml` and environment variables for enabling/disabling compression and setting thresholds. - Updated documentation in `README.md`, `cli.md`, and `.env.example` to explain the context compression functionality and its configuration. - Enhanced the `cli.py` to load compression settings into environment variables, ensuring seamless integration with the CLI. - Completed the implementation of context compression as outlined in the TODO list, marking it as a significant enhancement to conversation management.	2026-02-01 18:01:31 -08:00
teknium1	bbeed5b5d1	Enhance session logging and interactive sudo support - Implemented automatic session logging, saving conversation trajectories to the `logs/` directory in JSON format, with each session having a unique identifier. - Updated the CLI to display the session ID in the welcome banner for easy reference. - Introduced an interactive sudo password prompt in CLI mode, allowing users to enter their password with a 45-second timeout, enhancing user experience during command execution. - Documented session logging and interactive sudo features in `README.md`, `cli.md`, and `cli-config.yaml.example` for better user guidance.	2026-02-01 15:36:26 -08:00
teknium1	32254d3010	Add skills guidance to system prompts in run_agent.py - Introduced a default skills guidance prompt to assist the model in checking relevant skills before technical tasks. - Updated the logic in AIAgent to auto-include skills guidance when skills tools are available, enhancing the model's contextual understanding during API calls.	2026-02-01 01:31:59 -08:00
teknium	bc76a032ba	Add a claude code-like CLI - Introduced `cli-config.yaml.example` to provide a template for configuring the CLI behavior, including model settings, terminal tool configurations, agent behavior, and toolsets. - Created `cli.py` for an interactive terminal interface, allowing users to start the Hermes Agent with various options and toolsets. - Added `hermes` launcher script for convenient CLI access. - Updated `model_tools.py` to support quiet mode for suppressing output during tool initialization and execution. - Enhanced logging in various tools to respect quiet mode, improving user experience by reducing unnecessary output. - Added `prompt_toolkit` to `requirements.txt` for improved CLI interaction capabilities. - Created `TODO.md` for future improvements and enhancements to the Hermes Agent framework.	2026-01-31 06:30:48 +00:00
teknium	8e8b6be690	Add timeout configuration for trajectory processing - Updated `trajectory_compression.yaml` to include a new `per_trajectory_timeout` setting, allowing for a timeout of 300 seconds per trajectory. This enhancement helps prevent hanging on problematic entries during processing, improving overall reliability and efficiency in trajectory handling.	2026-01-30 07:34:58 +00:00
teknium	4c05ef0ba8	Enhance logging and tool initialization for improved performance - Updated logging configuration in `run_agent.py` to suppress debug messages from additional third-party libraries, reducing noise in logs. - Enhanced shell scripts for terminal tasks to utilize Singularity for containerized execution, including pre-build SIF image logic and improved logging. - Refactored tool initialization in `mixture_of_agents_tool.py`, `vision_tools.py`, and `web_tools.py` to implement lazy loading of API clients, optimizing resource usage and error handling. - Updated ephemeral system prompts in shell scripts to provide clearer guidance on task execution and resource usage.	2026-01-29 19:59:59 +00:00
teknium	248acf715e	Add browser automation tools and enhance environment configuration - Introduced new browser automation tools in `browser_tool.py` for navigating, interacting with, and extracting content from web pages using the agent-browser CLI and Browserbase cloud execution. - Updated `.env.example` to include new configuration options for Browserbase API keys and session settings. - Enhanced `model_tools.py` and `toolsets.py` to integrate browser tools into the existing tool framework, ensuring consistent access across toolsets. - Updated `README.md` with setup instructions for browser tools and their usage examples. - Added new test script `test_modal_terminal.py` to validate Modal terminal backend functionality. - Improved `run_agent.py` to support browser tool integration and logging enhancements for better tracking of API responses.	2026-01-29 06:10:24 +00:00
teknium	ba19d530ad	Update environment configuration and enhance terminal tool integration - Updated `.env.example` to include new API keys and configuration options for the mini-swe-agent backend, including support for local, Docker, and Modal environments. - Added `.gitmodules` to include mini-swe-agent as a submodule for easier integration. - Refactored `mini_swe_runner.py` to use the updated model format and default to OpenRouter for API calls. - Enhanced `model_tools.py` to support the new terminal tool definitions and ensure compatibility with the mini-swe-agent backend. - Updated `README.md` to reflect changes in setup instructions and environment variable configurations. - Improved `terminal_tool.py` to manage execution environments and lifecycle, ensuring proper cleanup and error handling. - Introduced `terminal_hecate.py` for executing commands on MorphCloud VMs, providing an alternative backend for terminal operations.	2026-01-23 12:26:53 +00:00
teknium	b32cc4b09d	Refactor batch processing with rich progress tracking and update logging in AIAgent - Replaced tqdm with rich for enhanced visual progress tracking in batch processing. - Adjusted logging levels in AIAgent to suppress asyncio debug messages. - Modified datagen script to reduce number of workers for improved performance.	2026-01-14 14:02:59 +00:00
teknium	6e3dbb8d8b	Enhance batch processing with progress tracking and update AIAgent for OpenRouter detection - Integrated tqdm for progress tracking in batch processing, replacing map with imap_unordered for improved performance. - Added base_url attribute in AIAgent to facilitate OpenRouter detection.	2026-01-14 13:46:16 +00:00
teknium	13d360030f	Enhance tool normalization and API integration across modules - Introduced normalization functions for tool statistics and error counts to ensure consistent schema across all trajectory entries, facilitating compatibility with HuggingFace datasets. - Updated batch processing to utilize normalized tool stats and error counts, improving data integrity. - Refactored vision tools and mixture of agents tool to integrate with OpenRouter API, replacing Nous Research API references and updating model configurations. - Enabled reasoning capabilities in API calls for enhanced response quality across various tools. - Improved error handling and API key validation for OpenRouter integration.	2026-01-14 13:40:10 +00:00
teknium	66daebe88f	Implement enhanced response handling and tool call validation in run_agent - Added methods to check for meaningful content after <think> blocks and to retrieve messages up to the last complete assistant turn. - Introduced retry logic for handling truncated responses and invalid JSON arguments in tool calls, with a maximum retry limit. - Improved logging for invalid JSON and empty responses, ensuring better error tracking and handling. - Updated the batch data generation script to adjust dataset file, batch size, and ephemeral system prompt for improved context management.	2026-01-10 13:04:43 +00:00
teknium	4071ba29da	Enhance batch processing and tool validation - Added support for tracking partial results and tool error counts in batch processing. - Implemented filtering of corrupted entries during batch file combination based on valid tool names. - Updated terminal tool to improve command execution and error handling, including retry logic for transient failures. - Refactored model tools to use a simple terminal tool with no session persistence. - Improved logging and error messages for invalid API responses and tool calls. - Introduced chunked processing for large content in web tools to manage size limitations effectively.	2026-01-10 05:56:26 +00:00
Teknium	80d326310e	Merge branch 'main' into speed-upgrades	2026-01-08 01:03:34 -08:00
teknium	6af6ff2a0a	updates for stability and speed	2026-01-08 08:57:51 +00:00
hjc-puro	1614c15bb1	rate limits	2025-11-17 18:35:36 -05:00
hjc-puro	0c618482c4	add logging of prefix of tool call and tool response	2025-11-07 14:43:44 -05:00
hjc-puro	2d8f6c46f1	log first 20 chars	2025-11-07 14:08:06 -05:00
Teknium	4135cf4682	Merge branch 'main' into test	2025-11-04 19:54:40 -08:00
teknium	c82741c3d8	some cleanups	2025-11-05 03:47:17 +00:00
hjc-puro	fbd3a2fdb8	prevent leakage of morph instances between tasks	2025-11-04 03:32:43 -05:00
hjc-puro	a4db3fdee5	fix leakage	2025-11-03 17:42:23 -05:00
teknium	de9c0edc51	some bugfixes	2025-10-15 18:07:06 +00:00
teknium	d36790de91	Add ephemeral system prompt support in batch and agent runners. Update README with usage examples and documentation for the new feature. Ensure prompt is not saved to trajectories.	2025-10-08 02:33:58 +00:00
teknium	0411ca1880	Add environment configuration file, restructure tool imports, and enhance README setup instructions	2025-10-01 09:54:17 +00:00
Teknium	c5386ed7e6	add better logging when requests fail	2025-09-10 00:51:41 -07:00
Teknium	17608c1142	Update to use toolsets and make them easy to create and configure	2025-09-10 00:43:55 -07:00
Teknium	587d1cf720	Fix Web Tools, Upgrade MoA to GPT5, Add Trajectory Saving	2025-08-31 03:04:10 -07:00
Teknium	cde7e64418	add vision model tool, cli updates for exclusive and inclusive toolsets	2025-08-04 00:14:16 -07:00
hjc-puro	a49596cbb2	terminal tool	2025-07-26 04:31:17 +00:00
hjc-puro	122d8788ae	terminal tool	2025-07-25 15:15:36 +00:00
Teknium	21d80ca683	initital commit	2025-07-22 18:32:44 -07:00

... 7 8 9 10 11 ...

775 Commits