_run_async() bridges sync tool handlers to async code. When the handler
is invoked from inside a running event loop (gateway / nested async),
it spawns a worker thread and blocks on future.result(timeout=300).
Before this change, a coroutine that ran past 300s leaked its worker
thread:
- future.cancel() is a no-op on a running ThreadPoolExecutor future
(cancel only works on not-yet-started work).
- pool.shutdown(wait=False, cancel_futures=True) let the caller
proceed but the worker kept running the coroutine until it
returned on its own.
Every tool timeout leaked one thread. In long-lived gateway / RL
sessions this is cumulative.
The fix replaces bare asyncio.run() with a worker wrapper that
creates its own event loop. On timeout, _run_async schedules
task.cancel() on that loop via call_soon_threadsafe, then shuts the
pool down with wait=False so the caller returns immediately. The
coroutine observes CancelledError at its next await and the worker
thread exits cleanly.
Also switches logger.error() to logger.exception() in the top-level
handle_function_call() except block so tool failures produce full
stack traces in errors.log instead of just the message.
Related: #17420 (contributor flagged the leak; the original fix used
pool.shutdown(wait=True) which would have converted the leak into a
hang — caller blocks forever on the same stuck coroutine). Credit
for identifying the leak goes to the contributor.
Co-authored-by: 0z! <162235745+0z1-ghb@users.noreply.github.com>
- _markdown_to_signal docstring claimed SPOILER support but the regex list
never handled ``||...||``. Correct the docstring to match the four
actually-supported styles (BOLD / ITALIC / STRIKETHROUGH / MONOSPACE).
Signal's SPOILER bodyRange would need dedicated ``||spoiler||`` parsing
and is left for a follow-up.
- scripts/release.py: add exiao's noreply email to AUTHOR_MAP so the
contributor-attribution gate accepts their cherry-picked commit.
- Remove dead _lmstudio_loaded_context attribute from run_agent.py (set
but never read — the loaded context is pushed to context_compressor.update_model
which is the actual consumer)
- Cache empty reasoning options with 60s TTL to avoid per-turn HTTP probe
for non-reasoning LM Studio models. Non-empty results cached permanently.
- Extract _lmstudio_server_root(), _lmstudio_request_headers(), and
_lmstudio_fetch_raw_models() shared helpers in models.py — eliminates
URL-strip + auth-header + HTTP-call duplication across probe_lmstudio_models,
ensure_lmstudio_model_loaded, and lmstudio_model_reasoning_options
- Revert runtime_provider.py base_url precedence change: preserve the
established contract (saved config.base_url > env var > default) for all
api_key providers
- Remove unnecessary config version bump 22→23
- Fix TUI test: relax target_model assertion to avoid module-cache flake
- AUTHOR_MAP: added rugved@lmstudio.ai → rugvedS07
Follow-up to PR #16802 (BeliefanX). The original fix read
`agent_history[-1].get("timestamp")` for the tool-tail freshness gate,
but `gateway/run.py` strips the `timestamp` field off all tool/tool_call
rows when building `agent_history` from the raw transcript (see
`clean_msg = {k: v for k, v in msg.items() if k != "timestamp"}`). At
runtime the tool-tail branch always saw `None` and silently took the
legacy-fresh path — the stale-guard never fired for the tool-tail case
it was supposed to cover.
Changes:
- Read the freshness signal from the RAW `history` list (via new
`_last_transcript_timestamp()` helper) BEFORE the strip. Both the
resume_pending branch and the tool-tail branch use this single signal,
replacing the two divergent ones.
- Default window bumped 15 min → 1 hour via new
`_AUTO_CONTINUE_FRESHNESS_SECS_DEFAULT`. The 15-minute default was
shorter than the default `gateway_timeout` of 30 min, so a legitimate
long-running turn interrupted near its timeout boundary and resumed
shortly after would have been misclassified as stale.
- Configurable via `config.yaml` `agent.gateway_auto_continue_freshness`
(bridged to `HERMES_AUTO_CONTINUE_FRESHNESS` at gateway startup — same
pattern as `gateway_timeout`). Set to 0 to disable the gate.
- `_coerce_gateway_timestamp` now explicitly rejects bool (which is a
subclass of int and would otherwise coerce to 0.0/1.0).
- Tests rewritten to exercise the real production data shape: raw
`history` → `_build_agent_history` strip → freshness decision. A
regression guard (`test_stale_tool_tail_with_production_data_shape`)
asserts `agent_history` tool rows carry NO timestamp, protecting
against someone "fixing" the original bug by re-adding the stripped
field (which would break the OpenAI tool-result message contract).
Add BeliefanX to scripts/release.py AUTHOR_MAP.
E2E verified: config.yaml → env var bridge → helper returns configured
value; default 1h window; malformed/empty env var falls back to default;
ISO-Z timestamps parse; ms-epoch coerced; bool rejected.
Follow-up to #15328's vision-unsupported retry branch in run_agent.py.
_strip_images_from_messages() previously deleted any message whose content
was entirely images. That's fine for synthetic user messages injected for
attachment delivery, but it breaks providers for tool-role messages — the
paired tool_call_id on the preceding assistant message ends up unmatched,
which OpenAI-compatible APIs reject with HTTP 400.
Fix: tool-role messages whose content becomes empty are replaced with a
plaintext placeholder that preserves the tool_call_id linkage. Only
non-tool messages are dropped. Added 10 tests covering the role-alternation
invariants + image-type coverage.
Image-rejection detector: expanded phrase list (image content not
supported / multimodal input / vision input / model does not support
image) and gated on 4xx status so transient 5xx errors never get
misinterpreted as 'server said no to images'. Detection is documented as
best-effort English phrase matching.
AUTHOR_MAP: mapped 3820588+ddupont808@users.noreply.github.com to
ddupont808 so release notes attribute the salvage correctly.
Follow-up to the salvaged PR #16867 that added the read path for
agent.disabled_toolsets in _get_platform_tools():
- Document the new config key under a "Global Toolset Disable" section
in website/docs/user-guide/configuration.md, including the precedence
note (global disable overrides per-platform platform_toolsets).
- Map nazirulhafiy@gmail.com -> nazirulhafiy in scripts/release.py
AUTHOR_MAP so release-notes CI attributes the cherry-picked commit.
* fix: clean gateway auxiliary client caches on teardown
* fix(gateway): recover from stale pid files and close cron agents
Two issues were keeping the gateway from surviving long runs:
1. `_cleanup_invalid_pid_path` delegated to `remove_pid_file`, which
refuses to unlink when the file's pid differs from our own. That
safety check exists for the --replace atexit handoff, but it also
applied to stale-record cleanup, so after a crashy exit the pid
file was orphaned: `write_pid_file()`'s O_EXCL create then failed
with `FileExistsError`, and systemd looped on "PID file race lost
to another gateway instance". Unlink unconditionally from this
helper since the caller has already verified the record is dead.
2. The cron scheduler never closed the ephemeral `AIAgent` it creates
per tick, and never swept the process-global auxiliary-client
cache. Over days of 10-minute ticks this leaked subprocesses and
async httpx transports until the gateway hit EMFILE. Release the
agent and call `cleanup_stale_async_clients()` in `run_job`'s
outer `finally`, matching the gateway's own per-turn cleanup.
* chore(release): map bloodcarter@gmail.com -> bloodcarter
---------
Co-authored-by: bloodcarter <bloodcarter@gmail.com>
The CLI renders through prompt_toolkit in non-full-screen mode, so every
repaint uses the renderer's tracked _cursor_pos.y to cursor_up() + erase
before drawing the new frame. Any time that tracked position drifts from
terminal reality, redraws stack on top of stale content instead of
overwriting it. Four user-visible bugs share this root cause.
Fixes:
- #5474 (SIGWINCH ghosts): the resize wrapper previously only handled
column-shrink reflow. Generalize it to force a full screen-clear
(erase_screen + cursor_goto(0,0)) and renderer.reset() on every resize
— covers widen, row-shrink, and multiplexer SIGWINCH-less redraws.
- #8688 (cmux/tmux tab switch): no SIGWINCH fires on focus regain, so
prompt_toolkit has no signal to recover. Add a _force_full_redraw()
helper, bound to Ctrl+L (standard bash/zsh/vim convention) and exposed
as /redraw. Users can manually clear drift without restarting Hermes.
- #14692 (DSR response leaks — ^[[53;1R): resize storms make
prompt_toolkit's CSI 6n queries race past the input parser; the
terminal's reply ends up as literal input text. Add a sibling of the
bracketed-paste sanitizer that strips \x1b[<row>;<col>R and the
caret-escape visible form from paste text, buffer text-filter, and
the input-processing loop.
The idle-redraw removal (#12641) is in the preceding commit from
@foxion37 — keeping them as separate commits preserves attribution.
- scripts/release.py: map sonoyuncudmr@gmail.com -> Sonoyunchu so the
check-attribution CI job and release notes credit Soynchu correctly.
- website/docs/reference/skills-catalog.md: add the airtable row to
the productivity bundled-skills table.
* fix(tui): call maybe_auto_title for TUI sessions (#15961)
The maybe_auto_title() helper is called from cli.py and gateway/run.py
but was never wired into tui_gateway/server.py, so every session started
via 'hermes --tui' landed in state.db with an empty title. Evidence from
the issue reporter: 0/154 TUI sessions titled vs 91/383 CLI.
Mirror the CLI/Gateway pattern: after emitting message.complete, when the
turn finished cleanly, fire-and-forget title generation using the session
key, user prompt, agent response, and current history.
Fixes#15949.
Co-authored-by: math0r-be <math0r-be@github.com>
* chore(release): map math0r-be placeholder email in AUTHOR_MAP
---------
Co-authored-by: math0r-be <math0r-be@github.com>
* fix(/branch): redirect session_log_file and expose branch sessions in list
Two bugs when using /branch:
1. cli.py _handle_branch_command updated agent.session_id but not
agent.session_log_file, so all messages written after branching
landed in the original session's JSON file and the branch never
got its own session_{id}.json on disk.
Fix: mirror the compression-split path (run_agent.py:7579) and
update session_log_file immediately after changing session_id.
2. hermes_state.py list_sessions_rich filtered out every session
with parent_session_id IS NOT NULL to hide sub-agent runs and
compression continuations. Branch sessions share this column, so
they became invisible to `hermes sessions list` and `sessions browse`.
Fix: also include branch children — those whose parent ended with
end_reason='branched' AND whose started_at >= parent.ended_at
(the same timing condition that get_compression_tip uses to
distinguish continuations from live-spawned subagents).
Fixes#14854
Co-Authored-By: Octopus <liyuan851277048@icloud.com>
* chore(release): map octo-patch placeholder email in AUTHOR_MAP
---------
Co-authored-by: octo-patch <octo-patch@github.com>
Co-authored-by: Octopus <liyuan851277048@icloud.com>
Follow-up to #6616 covering the remaining user-injected prompt markers that
the original PR did not touch (reporter's second comment on #6576 explicitly
flagged these). Azure OpenAI Default/DefaultV2 content filters treat any
bracketed [SYSTEM: ...] as prompt-injection and reject with HTTP 400.
Remaining call sites renamed:
- cli.py: background-process notifications (watch_disabled, watch_match,
completion), MCP reload notice (4 live + 1 docstring)
- gateway/run.py: same notification paths + auto-loaded skill banner +
MCP reload notice (5 live + 1 docstring)
- tools/process_registry.py: comment reference
Not renamed:
- environments/hermes_base_env.py '[SYSTEM]\n{content}' — RL training
trajectory rendering only, never sent to Azure, part of a symmetric
[USER]/[ASSISTANT]/[TOOL] scheme.
AUTHOR_MAP: buraysandro9@gmail.com -> ygd58.
Salvage PR #15883 cherry-picked FocusFlow Dev's commit; release-notes
CI needs the AUTHOR_MAP entry to attribute to the PR author's GitHub
login rather than a placeholder.
- New website/docs/guides/azure-foundry.md covering both OpenAI-style
and Anthropic-style endpoints, auto-detection behaviour, gpt-5.x
routing, /v1 stripping, api-version query forwarding, and the
provider: anthropic + Azure URL alternative setup.
- environment-variables.md picks up AZURE_FOUNDRY_API_KEY,
AZURE_FOUNDRY_BASE_URL, AZURE_ANTHROPIC_KEY.
- cli-commands.md includes azure-foundry in the provider choices list.
- configuration.md lists azure-foundry among auxiliary-task providers.
- sidebars.ts wires the new guide into the Guides section.
- scripts/release.py AUTHOR_MAP entries for TechPrototyper,
HangGlidersRule (noreply), and pein892 so the contributor-attribution
CI check does not reject the salvage.
Regression test for #14981. Verifies that _session_expiry_watcher fires
on_session_finalize for each session swept out of the store, matching
the contract documented for /new, /reset, CLI shutdown, and gateway stop.
Verified the test fails cleanly on pre-fix code (hook call list missing
sess-expired) and passes with the fix applied.
Follow-up on top of #15096 cherry-pick:
- Remove spotify_* from _HERMES_CORE_TOOLS (keep only in the 'spotify'
toolset, so the 9 Spotify tool schemas are not shipped to every user).
- Add 'spotify' to CONFIGURABLE_TOOLSETS + _DEFAULT_OFF_TOOLSETS so new
installs get it opt-in via 'hermes tools', matching homeassistant/rl.
- Wire TOOL_CATEGORIES entry pointing at 'hermes auth spotify' for the
actual PKCE login (optional HERMES_SPOTIFY_CLIENT_ID /
HERMES_SPOTIFY_REDIRECT_URI env vars).
- scripts/release.py: map contributor email to GitHub login.
- AUTHOR_MAP entry for 130918800+devorun for #6636 attribution
- test_moa_defaults: was a change-detector tied to the exact frontier
model list — flips red every OpenRouter churn. Rewritten as an
invariant (non-empty, valid vendor/model slugs).
Plugin slash commands now surface as first-class commands in every gateway
enumerator — Discord native slash picker, Telegram BotCommand menu, Slack
/hermes subcommand map — without a separate per-platform plugin API.
The existing 'command:<name>' gateway hook gains a decision protocol via
HookRegistry.emit_collect(): handlers that return a dict with
{'decision': 'deny'|'handled'|'rewrite'|'allow'} can intercept slash
command dispatch before core handling runs, unifying what would otherwise
have been a parallel 'pre_gateway_command' hook surface.
Changes:
- gateway/hooks.py: add HookRegistry.emit_collect() that fires the same
handler set as emit() but collects non-None return values. Backward
compatible — fire-and-forget telemetry hooks still work via emit().
- hermes_cli/plugins.py: add optional 'args_hint' param to
register_command() so plugins can opt into argument-aware native UI
registration (Discord arg picker, future platforms).
- hermes_cli/commands.py: add _iter_plugin_command_entries() helper and
merge plugin commands into telegram_bot_commands() and
slack_subcommand_map(). New is_gateway_known_command() recognizes both
built-in and plugin commands so the gateway hook fires for either.
- gateway/platforms/discord.py: extract _build_auto_slash_command helper
from the COMMAND_REGISTRY auto-register loop and reuse it for
plugin-registered commands. Built-in name conflicts are skipped.
- gateway/run.py: before normal slash dispatch, call emit_collect on
command:<canonical> and honor deny/handled/rewrite/allow decisions.
Hook now fires for plugin commands too.
- scripts/release.py: AUTHOR_MAP entry for @Magaav.
- Tests: emit_collect semantics, plugin command surfacing per platform,
decision protocol (deny/handled/rewrite/allow + non-dict tolerance),
Discord plugin auto-registration + conflict skipping, is_gateway_known_command.
Salvaged from #14131 (@Magaav). Original PR added a parallel
'pre_gateway_command' hook and a platform-keyed plugin command
registry; this re-implementation reuses the existing 'command:<name>'
hook and treats plugin commands as platform-agnostic so the same
capability reaches Telegram and Slack without new API surface.
Co-authored-by: Magaav <73175452+Magaav@users.noreply.github.com>
Add missing AUTHOR_MAP entry for taosiyuan163 whose truncation boundary
fix was adapted into _capture_log_snapshot().
Add regression tests proving: line-boundary truncation keeps the full
first line, mid-line truncation correctly drops the partial fragment.
Follow-up to the cherry-picked PR #13897 fix. Three issues found:
1. CRITICAL: The thinking block synthesised from reasoning_content was
immediately stripped by the third-party signature management code
(Kimi is classified as _is_third_party_anthropic_endpoint). Added a
Kimi-specific carve-out that preserves unsigned thinking blocks while
still stripping Anthropic-signed blocks Kimi can't validate.
2. Empty-string reasoning_content was silently dropped because the
truthiness check ('if reasoning_content and ...') evaluates to False
for ''. Changed to 'isinstance(reasoning_content, str)' so the
tier-3 fallback from _copy_reasoning_content_for_api (which injects
'' for Kimi tool-call messages with no reasoning) actually produces
a thinking block.
3. The thinking block was appended AFTER tool_use blocks. Anthropic
protocol requires thinking -> text -> tool_use ordering. Changed to
blocks.insert(0, ...) to prepend.
Follow-ups on top of salvaged #13923 (@keifergu):
- Print QR poll dot every 3s instead of every 18s so "Fetching
configuration results..." doesn't look hung.
- On "status=success but no bot_info" from the WeCom query endpoint,
log the full payload at WARNING and tell the user we're falling
back to manual entry (was previously a single opaque line).
- Document in the qr_scan_for_bot_info() docstring that the
work.weixin.qq.com/ai/qc/* endpoints are the admin-console web-UI
flow, not the public developer API, and may change without notice.
Also add keifergu@tencent.com to scripts/release.py AUTHOR_MAP so
release notes attribute the feature correctly.
- Wrap child.run_conversation() in a ThreadPoolExecutor with configurable
timeout (delegation.child_timeout_seconds, default 300s) to prevent
indefinite blocking when a subagent's API call or tool HTTP request hangs.
- Add heartbeat stale detection: if a child's api_call_count doesn't
advance for 5 consecutive heartbeat cycles (~2.5 min), stop touching
the parent's activity timestamp so the gateway inactivity timeout
can fire as a last resort.
- Add 'timeout' as a new exit_reason/status alongside the existing
completed/max_iterations/interrupted states.
- Use shutdown(wait=False) on the timeout executor to avoid the
ThreadPoolExecutor.__exit__ deadlock when a child is stuck on
blocking I/O.
Closes#13768
- Description truncated to 60 chars in system prompt (extract_skill_description),
so the 500-char HF workflow description never reached the agent; shortened to
'llama.cpp local GGUF inference + HF Hub model discovery.' (56 chars).
- Restore llama-cpp-python section (basic, chat+stream, embeddings,
Llama.from_pretrained) and frontmatter dependencies entry.
- Fix broken 'Authorization: Bearer ***' curl line (missing closing quote;
llama-server doesn't require auth by default).
When starting the gateway with --replace, concurrent invocations could
leave multiple instances running simultaneously. This happened because
write_pid_file() used a plain overwrite, so the second racer would
silently replace the first process's PID record.
Changes:
- gateway/status.py: write_pid_file() now uses atomic O_CREAT|O_EXCL
creation. If the file already exists, it raises FileExistsError,
allowing exactly one process to win the race.
- gateway/run.py: before writing the PID file, re-check get_running_pid()
and catch FileExistsError from write_pid_file(). In both cases, stop
the runner and return False so the process exits cleanly.
Fixes#11718
Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the
two openai/xai sites in run_agent.py. This finishes the sweep: the same
substring-match false-positive class (e.g. https://api.openai.com.evil/v1,
https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1)
existed in eight more call sites, and the hostname helper was duplicated
in two modules.
- utils: add shared base_url_hostname() (single source of truth).
- hermes_cli/runtime_provider, run_agent: drop local duplicates, import
from utils. Reuse the cached AIAgent._base_url_hostname attribute
everywhere it's already populated.
- agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens
gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg
selection to hostname equality.
- run_agent: native-anthropic check in the Claude-style model branch
and in the AIAgent init provider-auto-detect branch.
- agent/model_metadata: Anthropic /v1/models context-length lookup.
- hermes_cli/providers.determine_api_mode: anthropic / openai URL
heuristics for custom/unknown providers (the /anthropic path-suffix
convention for third-party gateways is preserved).
- tools/delegate_tool: anthropic detection for delegated subagent
runtimes.
- hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint
native-OpenAI detection (paired with deduping the repeated check into
a single is_native_openai boolean per branch).
Tests:
- tests/test_base_url_hostname.py covers the helper directly
(path-containing-host, host-suffix, trailing dot, port, case).
- tests/hermes_cli/test_determine_api_mode_hostname.py adds the same
regression class for determine_api_mode, plus a test that the
/anthropic third-party gateway convention still wins.
Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.
Follow-up for salvaged PR #3185:
- run_agent.py: pass self.api_key to query_ollama_num_ctx() so Ollama
behind an auth proxy (same issue class as the LM Studio fix) can be
probed successfully.
- scripts/release.py AUTHOR_MAP: map @tannerfokkens-maker's local-hostname
commit email.
- Fix duplicate 'timezone' import in e2e conftest
- Fix test_text_before_command_not_detected asserting send() is awaited
when no agent is present in mock setup (text messages don't produce
command output)
Cherry-picked from PR #13159 by @cdanis.
Adds native media attachment delivery to Signal via signal-cli JSON-RPC
attachments param. Signal messages with media now follow the same
early-return pattern as Telegram/Discord/Matrix — attachments are sent
only with the last chunk to avoid duplicates.
Follow-up fixes on top of the original PR:
- Moved Signal into its own early-return block above the restriction
check (matches Telegram/Discord/Matrix pattern)
- Fixed media_files being sent on every chunk in the generic loop
- Restored restriction/warning guards to simple form (Signal exits early)
- Fixed non-hermetic test writing to /tmp instead of tmp_path