WhatsApp bridge (bridge.js) only sets ptt:true when file extension is .ogg
or .opus, causing mp3/wav files (from Edge TTS, NeuTTS, etc.) to arrive
as file attachments instead of voice bubbles — silently, with no error.
Fix: when audio type is sent with a non-ogg/opus format, run ffmpeg
conversion to ogg/opus in a temp file before sending. This makes
send_voice() self-sufficient regardless of what format the caller provides.
Fallback: if ffmpeg is unavailable, original buffer is sent (previous
behaviour) with a console.warn — no crash.
Addresses veloguardian's review comment on PR #4992.
After PR #13725 replaced the module-level _LOCK_DIR/_LOCK_FILE constants
with a dynamic _get_lock_paths() helper, the xdist-isolation fixture
needs to patch the function instead of the removed constants.
* feat(api-server): X-Hermes-Session-Key header for long-term memory scoping
API Server integrations (Open WebUI, custom web UIs) can now pass a stable
per-channel identifier via X-Hermes-Session-Key that scopes long-term memory
(Honcho, etc.) independently of the transcript-scoped X-Hermes-Session-Id.
This matches the native gateway's session_key / session_id split: one stable
key per assistant channel, many independent transcripts that rotate on /new.
- _create_agent and _run_agent accept gateway_session_key and pass it to
AIAgent(gateway_session_key=...), which is already honored by the Honcho
memory provider (plugins/memory/honcho/client.py resolve_session_name).
- New shared helper _parse_session_key_header applies the same API-key
gate, control-character sanitization, and a 256-char length cap as the
existing session-id header.
- All three agent endpoints honor the header: /v1/chat/completions,
/v1/responses, /v1/runs. JSON and SSE responses echo it back.
- /v1/capabilities advertises session_key_header so clients can
feature-detect.
Closes#20060.
Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>
* chore: AUTHOR_MAP entry for manateelazycat
---------
Co-authored-by: Andy Stewart <lazycat.manatee@gmail.com>
* fix(curator): protect hub skills by frontmatter name
* test(skill_usage): add mark_agent_created to regression test
The cherry-picked test predates #19618/#19621 which rewrote
list_agent_created_skill_names() to require an explicit
created_by: 'agent' provenance marker. Without mark_agent_created(),
my-skill is excluded from the list and the positive assertion fails.
* feat(curator): add archive and prune subcommands
Adds 'hermes curator archive <skill>' and 'hermes curator prune
[--days N] [--yes] [--dry-run]' alongside the existing status, run,
pause, resume, pin, unpin, restore, backup, rollback verbs.
These are the two genuinely new user-facing verbs requested in #19384.
The other verbs proposed there ('stats' and 'restore') already exist
as 'curator status' and 'curator restore', so no duplicate surface is
added — all skill lifecycle commands live under the single 'hermes
curator' namespace.
- archive: manual archive of an agent-created skill. Refuses pinned
skills with a hint pointing at 'hermes curator unpin'.
- prune: bulk-archive unpinned skills idle for >= N days (default 90).
Falls back to created_at when last_activity_at is null so never-used
skills can still be pruned. --dry-run previews, --yes skips prompt.
Adapted from @elmatadorgh's PR #19454 which placed the same verbs
under 'hermes skills' with a separate hermes_cli/skills_config.py
handler and rich table for stats. The 'stats' and 'restore' parts of
that PR duplicated existing surface, so only archive and prune are
kept, rewritten to match hermes_cli/curator.py's existing plain-text
handler style. Tests rewritten from scratch against the new handlers.
Closes#19384
Co-authored-by: elmatadorgh <coktinbaran5@gmail.com>
---------
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: elmatadorgh <coktinbaran5@gmail.com>
Follow-up on @EmelyanenkoK's feat: add Telegram DM topic-mode sessions.
Three issues:
1. Split-brain session state. After get_or_create_session() returned a
SessionEntry for a topic lane, the handler was mutating
.session_id in place to the binding's target, but never persisting
the switch through SessionStore. The sessions.json session_key →
session_id map kept pointing at the lane's natural id; any reader
that reloaded from disk saw the wrong id. Fixed by routing through
SessionStore.switch_session(), which _save()s the mapping and ends
the old session in SQLite like /resume does.
2. /new inside a topic was a one-message no-op. Reset created a new
session but left the telegram_dm_topic_bindings row pointing at the
old session_id, so the next message's binding lookup switched right
back. Now _handle_reset_command rebinds the topic to the new
session_id after reset.
3. is_telegram_session_linked_to_topic and
list_unlinked_telegram_sessions_for_user both called
apply_telegram_topic_migration() on read, contradicting the PR's
own invariant that migration only runs on explicit /topic opt-in.
They now tolerate missing topic tables and return empty/False.
Also: _telegram_topic_mode_enabled() now only treats True as enabled
(not any truthy return), so test fixtures with MagicMock session_db
don't accidentally flip every DM into lobby mode — this was breaking
4 pre-existing test_status_command tests.
Tests:
- New regression: /new inside a topic must update the binding row
(test_new_inside_telegram_topic_rewrites_binding_to_new_session).
- _make_runner now stubs switch_session so existing restore tests
still exercise the new code path.
Validated end-to-end with real SessionDB + SessionStore:
readers on fresh DB don't create topic tables; enable creates them;
binding override persists across SessionStore restart; /new rebinds
and the new id survives a restart.
Co-authored-by: EmelyanenkoK <emelyanenko.kirill@gmail.com>
Adapted from PR #19188 by @LeonSGP43 — mocks cli_output helpers and
verifies interactive_setup persists credentials to .env without
crashing. Also adds megastary to AUTHOR_MAP.
Open-weight models (DeepSeek, Qwen, GLM) sometimes emit tool calls like
`{"urls": "https://a.com"}` when the tool schema declares
`type: array`. The call was JSON-valid but semantically wrong, and
`coerce_tool_args` would pass the bare string through — the tool then
failed with a confusing type error.
`coerce_tool_args` now wraps non-list, non-null values in a
single-element list when the schema declares `array`. Strings still go
through `_coerce_value` first so JSON-encoded arrays
(`'["a","b"]'`) parse correctly and nullable `"null"` still
becomes `None`. `None` itself is preserved — tools with sensible
defaults already handle it, and we don't want to silently mask a
deliberate null.
Salvaged from #19652 (NikolayGusev-astra) — the broader validate-then-
repair layer had several issues (duplicated existing coercion,
mis-classified `old_string` as a path field, prepended non-JSON
prefixes to tool results that break downstream JSON parsing, hardcoded
offset/limit defaults unsuitable for non-read_file tools). The one
genuinely new capability is wrapping bare scalars, which is implemented
here directly inside the existing coercion path.
Co-authored-by: Nikolay Gusev <ngusev@astralinux.ru>
Follow-up to @changchun989's cherry-pick: reverts the validate-via-
normalize change so validate_profile_name remains a strict regex check
on the input AS-GIVEN. Callers that accept mixed-case user input
(dashboard UI, CLI args, import flows) call normalize_profile_name()
first, then validate the result. This keeps validate honest about
what the on-disk directory name must look like — e.g. ' jules '
(trailing whitespace) is now rejected instead of silently trimmed
and accepted.
- validate_profile_name: strict lowercase/regex check again, 'UPPER'
back in the invalid-names parametrize
- 8 call sites in profiles.py (create_profile, delete_profile,
set_active_profile, export_profile, import_profile, rename_profile,
resolve_profile_env, plus the clone_from branch): swap the
normalize-then-validate order
- scripts/release.py: add changchun989@proton.me -> changchun989 to
AUTHOR_MAP so CI doesn't block on the unmapped contributor email
All kanban + profile tests pass (268 across test_profiles.py +
test_kanban_db.py + test_kanban_core_functionality.py, plus 73 in
test_kanban_tools.py + test_kanban_dashboard_plugin.py).
Closes#18498.
The contributor's PR silently swallowed ValueError from
SessionDB.set_session_title() with bare except Exception: pass.
Users typing /new <title> with an already-in-use title got an
untitled session and no feedback.
Changes:
- cli.py: catch ValueError from both sanitize_title() and
set_session_title(); print the error and mark the session
untitled in the banner (never echo the rejected title back).
- gateway/run.py: append a warning note to the reset reply on
title rejection; reflect the accepted title in the header.
- Add regression tests for the duplicate-title path in CLI and
gateway.
Also map exx@example.com -> @exxmen in scripts/release.py.
Fixes#18722
get_due_jobs() now recomputes next_run_at via compute_next_run() for
cron/interval jobs that arrived with null next_run_at (e.g. via direct
jobs.json edits) instead of silently skipping them. _resolve_origin()
guards with isinstance(origin, dict), and _deliver_result() now routes
through _resolve_origin() so string/non-dict origins no longer crash
the ticker.
References: references #18735 (open competing fix from automated bulk PR touching 79 files); this PR is a focused single-issue contribution and adds the missing interval-recovery test variant
Co-Authored-By: Claude <noreply@anthropic.com>
Follow-up on #9925 cherry-pick adding two additional tests:
- bytes content hashes identically to its str-decoded form
- mixed bytes+str bundle hash equals the on-disk content_hash from
skills_guard (the production invariant used to detect drift)
Also map dodofun@126.com and 1615063567@qq.com in AUTHOR_MAP so the
CI contributor check passes for the cherry-picked commit.
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: zhao0112 <1615063567@qq.com>
The _send_qqbot function was hardcoded to use the guild channel
endpoint (/channels/{id}/messages), which fails for C2C private
chats and QQ groups with 'channel does not exist' (code 11263).
This change tries the appropriate endpoints in order:
1. /channels/{id}/messages (guild channels)
2. /v2/users/{id}/messages (C2C private chats)
3. /v2/groups/{id}/messages (QQ groups)
Fixes active sending to QQBot C2C and group recipients.
The MiniMax OAuth API endpoints have moved from api.minimax.io to
account.minimax.io and the old paths now respond with HTTP 307.
httpx defaults to follow_redirects=False (unlike requests), so the
device-code and token-refresh flows fail with "Temporary Redirect".
Adds follow_redirects=True to the two httpx.Client instances in
hermes_cli/auth.py used by the MiniMax OAuth flow. This is forward-
compatible -- if endpoints move again, the redirect chain is
followed automatically.
Repro before patch:
curl -i -X POST https://api.minimax.io/oauth/code # -> 307
curl -i -X POST https://api.minimax.io/oauth/token # -> 307
Verified end-to-end against a real MiniMax Plus account on macOS;
the existing tests/test_minimax_oauth.py suite (15 tests) still
passes.
Apply agent.redact.redact_sensitive_text with force=True to log content
captured by _capture_log_snapshot before it reaches upload_to_pastebin.
On-disk logs are untouched. Compatible with the off-by-default local
redaction policy from #16794: this is upload-time-only and applies
regardless of security.redact_secrets because the public paste service
is the leak surface. A visible banner is prepended to each uploaded log
paste so reviewers know redaction was applied. --no-redact preserves
deliberate unredacted sharing for maintainer-coordinated cases.
The bug-report, setup-help, and feature-request issue templates direct
users to run hermes debug share and paste the resulting public URLs.
With redaction off by default per #16794, those uploads have been
carrying credentials onto paste.rs and dpaste.com.
force=True is non-negotiable: without it, redact_sensitive_text
short-circuits at agent/redact.py:322 when the env var is unset, so the
fix would silently be a no-op for its target audience. A regression
test pins this down.
Fixes#19316
The whatsapp-bridge pulls @whiskeysockets/baileys at a pinned git
commit whose transitive dep tree ships protobufjs <7.5.5, triggering
GHSA-xq3m-2v4x-88gg (critical, arbitrary code execution). npm audit
reported 3 cascading criticals: protobufjs, @whiskeysockets/libsignal-node
(pulls protobufjs), and baileys itself (effect rollup).
Fix: add npm overrides block pinning protobufjs to ^7.5.5. Deduplicates
to a single 7.5.6 copy at node_modules/protobufjs that both libsignal-node
and any other consumers resolve through normal module resolution.
Why not bump baileys: npm-published baileys@6.17.16 is deprecated by the
maintainers (wrong version), 7.0.0-rc.* still pulls the same vulnerable
libsignal-node, and upstream Baileys HEAD adds a 4th vuln (music-metadata).
The override is the minimal, behavior-preserving fix.
Validation:
- npm audit: 3 critical -> 0 vulnerabilities
- node -e "import('@whiskeysockets/baileys')" -> all 5 named exports
(makeWASocket, useMultiFileAuthState, DisconnectReason,
fetchLatestBaileysVersion, downloadMediaMessage) resolve
- node bridge.js loads all modules and reaches Express bind
(exits only on EADDRINUSE because the live gateway owns :3000)
- Single deduped protobufjs@7.5.6 in the tree
- TestClampCommandNamesTriples: unit tests for 3-tuple support in
_clamp_command_names (short names, long names, collisions, multiple
entries, backward compat with 2-tuples)
- TestDiscordSkillCmdKeyDispatch: integration test through the full
discord_skill_commands pipeline verifying long skill names retain
their original cmd_key after clamping
- Add contributor CharlieKerfoot to AUTHOR_MAP
Four callsites hardcoded Path.home() / '.hermes' with no HERMES_HOME
check, breaking Docker deployments and profile isolation (hermes -p):
- plugins/hermes-achievements/dashboard/plugin_api.py:
state_path(), snapshot_path(), checkpoint_path() bare-literal paths
- scripts/profile-tui.py:
DEFAULT_STATE_DB and DEFAULT_LOG defaults ignored HERMES_HOME
- hermes_cli/slack_cli.py:
except-Exception fallback for slack-manifest.json dump
- optional-skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py:
--target argparse default
Use get_hermes_home() (with an ImportError shim for the standalone
scripts) or 'os.environ.get("HERMES_HOME") or str(Path.home()/".hermes")'
where importing hermes_constants is impractical.
E2E-verified: with HERMES_HOME=/tmp/x all three achievements paths and
both profile-tui defaults route under /tmp/x.
Salvaged from #18068 (original scope was broader mechanical cleanup
claiming 23 callsites were buggy; most were already respecting
HERMES_HOME via os.environ.get(key, default) — only these 4 had no env
check at all). Credit: @web-dev0521.
DeepSeek V4 Pro tightened thinking-mode validation and rejects empty-string
reasoning_content with HTTP 400:
The reasoning content in the thinking mode must be passed back to the API.
run_agent.py injected "" at three fallback sites — the tool-call pad in
_build_assistant_message and both injection branches of
_copy_reasoning_content_for_api (cross-provider poison guard + unconditional
thinking pad). All three now emit " " (single space), which satisfies the
non-empty check on V4 Pro without leaking fabricated reasoning.
Also upgrades stale empty-string placeholders on replay: sessions persisted
before this change have reasoning_content="" pinned at creation time; when
the active provider enforces thinking-mode echo, the replay path now rewrites
"" -> " " so existing users don't 400 on their first V4 Pro turn after
updating. Non-thinking providers still round-trip "" verbatim.
Updates 9 existing assertions + adds 2 regression tests (stale-placeholder
upgrade, non-thinking verbatim preservation).
Refs #15250, #17400.
Closes#17341.
_process_message_background snapshotted callback_generation from the
interrupt event at the TOP of the task — before the handler ran.
_hermes_run_generation is only set on the event by
GatewayRunner._bind_adapter_run_generation during
_handle_message_with_agent, which runs DURING the handler await. The
early snapshot always captured None, which then flowed into
pop_post_delivery_callback(..., generation=None) in the finally block.
In pop_post_delivery_callback, generation=None with a tuple-registered
entry (generation, callback) bypasses the ownership check — it pops and
fires the callback regardless of which run owns it. Result: a stale run
could fire a fresher run's post-delivery callback (e.g. a
background-review notification attributed to the wrong turn).
Fix: move the snapshot into the finally block, after the handler has
run and _hermes_run_generation has been bound to the current run.
Regression test added: simulates a stale handler at generation=1 and a
fresher callback registered at generation=2. Pre-fix: snapshot=None →
pop fires the generation=2 callback under generation=1's ownership
("newer" fires). Post-fix: snapshot=1 → pop skips the mismatched
entry, callback stays in the dict for the correct run to claim.
Verified: test FAILS on current main (captures "newer" in fired list),
PASSES with this fix.
Salvaged from PR #12565 (the callback-ownership portion only; the
/status totals portion was already fixed on main in 7abc9ce4d via #17158).
Co-authored-by: Oxidane-bot <1317078257maroon@gmail.com>
reset_session() creates a fresh SessionEntry with created_at == updated_at,
but get_or_create_session() bumps updated_at on the next inbound message,
causing _is_new_session in _handle_message_with_agent to evaluate False.
The topic/channel skill auto-load gate (group_topics, channel_skill_bindings)
silently skips the first message after a manual reset.
Add an is_fresh_reset flag on SessionEntry, set by reset_session() and
consumed once by the message handler. Kept distinct from was_auto_reset
because that flag also drives a 'session expired due to inactivity'
user-facing notice and a context-note prepend — both wrong for an
explicit /new or /reset.
Persisted through to_dict/from_dict so the flag survives gateway
restart between /reset and the next message.
Fixes#6508
Co-authored-by: warabe1122 <45554392+warabe1122@users.noreply.github.com>
Co-authored-by: willy-scr <187001140+willy-scr@users.noreply.github.com>
- Move the disabled-ack guard above the debounce so we don't stamp
_busy_ack_ts[session_key] when no ack was actually sent. Harmless
(never read when disabled) but cosmetically off.
- Document display.busy_ack_enabled in user-guide/messaging/index.md
and HERMES_GATEWAY_BUSY_ACK_ENABLED in reference/environment-variables.md.
- Add JezzaHehn to scripts/release.py AUTHOR_MAP for contributor credit.
Follow-up to #17491 (Jezza Hehn).
Moves the here-now skill under optional-skills/productivity/here-now/ so
it's discoverable via the Skills Hub but not installed by default, and
tightens the SKILL.md description to a single line to match sibling
optional-skill descriptions.
Install with:
hermes skills install official/productivity/here-now
Closes#378
Builds on #16855 (@lsdsjy) which fixed DeepSeek v4 reasoning_content
replay via model_extra fallback + capturing tool_calls at method entry.
Kimi / Moonshot thinking mode enforces the same echo-back contract and
hits the same 400 when a tool-call turn is persisted without
reasoning_content.
- _build_assistant_message: pad branch now uses _needs_thinking_reasoning_pad()
(DeepSeek OR Kimi) instead of _needs_deepseek_tool_reasoning() alone.
- Extract _needs_thinking_reasoning_pad() and reuse it in
_copy_reasoning_content_for_api so both sites share one predicate.
- tests/run_agent/test_deepseek_reasoning_content_echo.py: add
TestBuildAssistantMessagePadsStrictProviders parametrized over DeepSeek
(attr=None, attr-absent), Kimi (attr=None), Moonshot (via base_url),
and an OpenRouter negative control that must NOT pad. Proven to fail
2/5 cases on Kimi/Moonshot without this change.
- scripts/release.py: add AUTHOR_MAP entries for lsdsjy and season179.
Refs #17400.
Co-authored-by: season179 <season.saw@gmail.com>
Widen #17639 to the fourth sibling site (tools/skills_tool.py _EXCLUDED_SKILL_DIRS)
and register leoneparise in scripts/release.py AUTHOR_MAP so CI release script
resolves the contributor.
Ports PR #17888's send_multiple_images ABC to every gateway platform that
has a native multi-attachment API, so images arrive as a single bundled
message instead of N separate ones.
Native overrides:
- Telegram: send_media_group (10 photos per album, chunks over); animated
GIFs peeled off and routed through send_animation (albums don't support
animations)
- Discord: channel.send(files=[...]) (10 attachments per message, chunks
over); URL images downloaded into BytesIO so they render inline; forum
channels use create_thread with files=[...]
- Slack: files_upload_v2(file_uploads=[...]) (10 per call, chunks over);
respects thread_ts; records thread participation
- Mattermost: single post with file_ids list (5 per post — Mattermost cap,
chunks over)
- Email: single SMTP message with multiple MIME attachments (no chunk cap,
SMTP size governs); remote URLs remain linked in body (parity with
existing send_image)
All platforms fall back to the base per-image loop on any failure, so a
single bad image in a batch never loses the rest.
Matrix, WhatsApp, and single-attachment platforms (BlueBubbles, Feishu,
WeCom, WeChat, DingTalk) continue to use the base default loop — their
server APIs only accept one attachment per message anyway.
Tests: adds tests/gateway/test_send_multiple_images.py with 19 targeted
tests covering base default loop, chunking, animation peel-off, fallback
paths, and empty-batch no-ops across all five new overrides.
Co-authored-by: Maxence Groine <maxence@groine.fr>
Adds a deterministic pre-check on top of htsh's exception-based fallback:
before calling /content/abstract or /content/overview on a non-pseudo URI,
probe /api/v1/fs/stat. If the server says the URI is a file, route straight
to /content/read instead of eating a failing 500 round-trip.
This is the same idea pty819 and chennest independently landed in PRs
#12757 and #12937 — merged here on top of htsh's broader fix so we keep
pseudo-URI normalization and v0.3.3 browse-shape handling while avoiding
the slow exception path on servers that return a raised 500 every time.
The exception fallback from #5886 stays in place for environments where
fs/stat is unavailable or returns an unfamiliar shape.
Also credits pty819, chennest, and htsh in AUTHOR_MAP so future release
notes attribute them correctly.
Fixes the xdist collision that broke CI on PR #17764, and structurally
prevents future plugin-adapter tests from reintroducing it.
Problem
-------
tests/gateway/test_teams.py (new in this PR) and tests/gateway/test_irc_adapter.py
(already on main) both followed the same anti-pattern:
sys.path.insert(0, str(_REPO_ROOT / 'plugins' / 'platforms' / '<name>'))
from adapter import <Adapter>
Every platform plugin ships its own adapter.py, so the bare
'from adapter import ...' races for sys.modules['adapter']. Whichever test
collected first in a given xdist worker won; the other crashed at
collection with ImportError, and the polluted sys.path cascaded into 19
unrelated test failures across tools/, hermes_cli/, and run_agent/ in the
same worker.
Fix
---
1. tests/gateway/_plugin_adapter_loader.py (new): shared helper
load_plugin_adapter('<name>') that imports plugins/platforms/<name>/adapter.py
via importlib.util under the unique module name plugin_adapter_<name>.
Zero sys.path mutation, no possibility of collision.
2. tests/gateway/test_irc_adapter.py and tests/gateway/test_teams.py:
migrated to the helper. All 'from adapter import ...' statements
(including the ones inside test methods) are replaced with module-level
attribute access on the loaded module.
3. tests/gateway/conftest.py: new pytest_configure guard that AST-scans
every test_*.py under tests/gateway/ at session start and fails the
run with a pointer to the helper if any test uses sys.path.insert into
plugins/platforms/ OR a bare 'import adapter' / 'from adapter import'.
Runs on the xdist controller only (skipped in workers). The next plugin
adapter test that tries to reintroduce this pattern gets rejected at
collection time with a clear remediation message.
4. scripts/release.py: add aamirjawaid@microsoft.com -> heyitsaamir to
AUTHOR_MAP so the check-attribution workflow passes.
Validation
----------
scripts/run_tests.sh tests/gateway/ 4194 passed
scripts/run_tests.sh tests/gateway/test_{teams,irc}* 72 passed (both orderings)
scripts/run_tests.sh <11 prev-failing test files> 398 passed
Guard triggers correctly on both Path-operator and string-literal forms
of the anti-pattern.
- SQL: add `model != ''` to both queries in /api/analytics/models so
sessions with empty-string model (pre-existing data integrity,
confirmed in production DB: ~107 sessions) no longer render as
blank-header cards.
- ModelsPage: drop the arbitrary slashIdx < 20 length gate in
shortModelName / modelProvider. The gate was fragile for longer
vendor prefixes (e.g. `deepseek-ai/...`). Strip on the first /
unconditionally. Rename modelProvider -> modelVendor to avoid
confusion with the billing provider column.
- scripts/release.py: add AUTHOR_MAP entry for yatesjalex.
Self-review caught several errors in the previous commit:
Frontmatter
- Replace non-standard `requires_runtime` / `requires_tooling` fields with
the documented `compatibility:` field (parsed by tools/skills_tool.py).
- Drop the `audit-v5` author tag I added unnecessarily.
MODEL_LOADERS catalog
- Remove `IPAdapterUnifiedLoader` (input `preset` is an enum, not a file).
- Remove `IPAdapterInsightFaceLoader` and `InsightFaceLoader` (input
`provider` is a GPU backend selector, not a model file). These would have
flagged enum values like "STANDARD" or "CUDA" as missing model files.
- Add "NB:" comment explaining `BasicGuider` has no `cfg` input
(the original PARAM_PATTERNS entry would never have matched).
- Remove `SamplerCustomAdvanced.noise_seed` from PARAM_PATTERNS — that
node takes a NOISE input from RandomNoise, not a seed field directly.
NODE_TO_PACKAGE registry slugs
- Verified all 18 packages against api.comfy.org and fixed:
- `comfyui-essentials` → `comfyui_essentials` (underscore, not hyphen)
- `comfyui-gguf` → `ComfyUI-GGUF` (case-sensitive)
- `comfyui-photomaker-plus` → `ComfyUI-PhotoMaker-Plus`
- `comfyui-wanvideowrapper` → `ComfyUI-WanVideoWrapper`
- ComfyUI-HunyuanVideoWrapper isn't on the registry; surface a git-URL
install hint via new NODE_TO_GIT_URL fallback so the user can install
via ComfyUI-Manager's /manager/queue/install endpoint.
Wrong class names
- `Canny` → `CannyEdgePreprocessor` (controlnet-aux registers the latter,
the former never appears in /object_info).
- Add `Zoe_DepthAnythingPreprocessor` and `AnimalPosePreprocessor` while
fixing controlnet-aux.
- Remove `Reroute (rgthree)` (rgthree's Reroute is JS-only — no Python
class, never appears in /object_info).
- Add `Display Int (rgthree)` (sibling of Display Any).
- Move `UltralyticsDetectorProvider` from `comfyui-impact-pack` to
`comfyui-impact-subpack` (separate package, registered there).
Tests
- Update test_packages_are_safe_for_shell to accept case-mixed slugs (the
registry uses both ComfyUI- and comfyui_ prefixes inconsistently). Replaced
the lowercase-only assertion with a shell-safe regex check.
- 117 tests still pass (105 unit + 8 cloud + 4 cross-host).
Attribution
- Add `SHL0MS@users.noreply.github.com` mapping to scripts/release.py
AUTHOR_MAP so check-attribution CI passes.
Layers a programmatic hardware-feasibility check on top of the v4 skill
so the agent doesn't silently push users toward a local install they
can't actually run. The official comfy-cli supports --nvidia / --amd /
--m-series / --cpu, but has no guard against "4 GB laptop GPU on SDXL"
or "Intel Mac falling back to CPU" — both route to comfy-cli paths in
the original table and then fail on first workflow.
- scripts/hardware_check.py: detect OS/arch/GPU (NVIDIA nvidia-smi,
AMD rocm-smi, Apple M1+ via arm64+sysctl, Intel Arc via clinfo),
VRAM, system/unified RAM. Emits JSON
{verdict: ok|marginal|cloud, recommended_install_path, comfy_cli_flag}
with practical thresholds: discrete GPU >=6 GB VRAM minimum,
Apple Silicon >=16 GB unified memory minimum, Intel Mac -> cloud,
no accelerator -> cloud. comfy_cli_flag maps directly to
`comfy install` so the agent can stitch the whole flow together.
- scripts/comfyui_setup.sh: runs hardware_check.py first when no
explicit flag is passed. If verdict=cloud, refuses to install
locally, prints Comfy Cloud URL + an override command, exits 2.
Otherwise auto-selects the right --nvidia/--amd/--m-series flag
for `comfy install`. Surfaces marginal-verdict notes to the user.
- SKILL.md Setup & Onboarding: adds mandatory Step 0 "Check If This
Machine Can Run ComfyUI Locally" ahead of the Path A-E selection.
Documents the verdict thresholds inline, ties verdict + comfy_cli_flag
to the install paths, and updates the path-choice table so
"verdict: cloud" is the first row. Quick-Start "Detect Environment"
block extended to include the hardware check. Verification
checklist gains a hardware-check gate.
- Frontmatter setup.help rewritten to point at hardware_check.py
first. Version bumped 4.0.0 -> 4.1.0.
Close integration gaps discovered by auditing qwen-oauth's file coverage.
These are surfaces the original salvage missed — they all existed on
main and were added in the 747 commits since PR #15203 was opened.
Coverage added:
- agent/credential_pool.py: seed pool from auth.json providers.minimax-oauth
so `hermes auth list` reflects logged-in state and
`hermes auth remove minimax-oauth <N>` works through the standard flow.
- agent/credential_sources.py: register RemovalStep for minimax-oauth
with suppression-aware `_clear_auth_store_provider`.
- agent/models_dev.py: PROVIDER_TO_MODELS_DEV mapping (-> 'minimax' family).
- hermes_cli/providers.py: HermesOverlay entry (anthropic_messages transport,
oauth_external auth_type, api.minimax.io/anthropic base).
- hermes_cli/model_normalize.py: add to _MATCHING_PREFIX_STRIP_PROVIDERS so
`minimax-oauth/MiniMax-M2.7` in config.yaml gets correctly repaired.
- hermes_cli/status.py: render MiniMax OAuth block in `hermes doctor`
(logged-in / region / expires_at / error).
- hermes_cli/web_server.py: register in OAUTH_PROVIDER_REGISTRY + dispatch
branch in _resolve_provider_status so the dashboard auth page shows it.
- website/docs/integrations/providers.md: full 'MiniMax (OAuth)' section.
- website/docs/reference/cli-commands.md: --provider enum.
- website/docs/user-guide/features/fallback-providers.md: fallback table row.
- scripts/release.py AUTHOR_MAP: amanning3390 mapping (CI gate).
_run_async() bridges sync tool handlers to async code. When the handler
is invoked from inside a running event loop (gateway / nested async),
it spawns a worker thread and blocks on future.result(timeout=300).
Before this change, a coroutine that ran past 300s leaked its worker
thread:
- future.cancel() is a no-op on a running ThreadPoolExecutor future
(cancel only works on not-yet-started work).
- pool.shutdown(wait=False, cancel_futures=True) let the caller
proceed but the worker kept running the coroutine until it
returned on its own.
Every tool timeout leaked one thread. In long-lived gateway / RL
sessions this is cumulative.
The fix replaces bare asyncio.run() with a worker wrapper that
creates its own event loop. On timeout, _run_async schedules
task.cancel() on that loop via call_soon_threadsafe, then shuts the
pool down with wait=False so the caller returns immediately. The
coroutine observes CancelledError at its next await and the worker
thread exits cleanly.
Also switches logger.error() to logger.exception() in the top-level
handle_function_call() except block so tool failures produce full
stack traces in errors.log instead of just the message.
Related: #17420 (contributor flagged the leak; the original fix used
pool.shutdown(wait=True) which would have converted the leak into a
hang — caller blocks forever on the same stuck coroutine). Credit
for identifying the leak goes to the contributor.
Co-authored-by: 0z! <162235745+0z1-ghb@users.noreply.github.com>
- _markdown_to_signal docstring claimed SPOILER support but the regex list
never handled ``||...||``. Correct the docstring to match the four
actually-supported styles (BOLD / ITALIC / STRIKETHROUGH / MONOSPACE).
Signal's SPOILER bodyRange would need dedicated ``||spoiler||`` parsing
and is left for a follow-up.
- scripts/release.py: add exiao's noreply email to AUTHOR_MAP so the
contributor-attribution gate accepts their cherry-picked commit.
- Remove dead _lmstudio_loaded_context attribute from run_agent.py (set
but never read — the loaded context is pushed to context_compressor.update_model
which is the actual consumer)
- Cache empty reasoning options with 60s TTL to avoid per-turn HTTP probe
for non-reasoning LM Studio models. Non-empty results cached permanently.
- Extract _lmstudio_server_root(), _lmstudio_request_headers(), and
_lmstudio_fetch_raw_models() shared helpers in models.py — eliminates
URL-strip + auth-header + HTTP-call duplication across probe_lmstudio_models,
ensure_lmstudio_model_loaded, and lmstudio_model_reasoning_options
- Revert runtime_provider.py base_url precedence change: preserve the
established contract (saved config.base_url > env var > default) for all
api_key providers
- Remove unnecessary config version bump 22→23
- Fix TUI test: relax target_model assertion to avoid module-cache flake
- AUTHOR_MAP: added rugved@lmstudio.ai → rugvedS07
The contributor's PR (#16750) scoped the fix to run_setup_wizard() and
explicitly punted the two sibling sites. Both have the identical
[ -e /dev/tty ] pattern followed by a < /dev/tty redirect and crash in
Docker the same way:
- scripts/install.sh:732 install_system_packages() -- apt sudo prompt
fallback. sudo ... < /dev/tty dies with the same ENXIO.
- scripts/install.sh:1395 maybe_start_gateway() -- gateway-install gate,
same function path as the wizard reproducer.
Fix both with the same (: </dev/tty) 2>/dev/null probe, and parametrize
the regression test over all three gated functions so any future
regression is caught regardless of which site breaks.
In Docker builds the `/dev/tty` device node is present in the mount
namespace, so `[ -e /dev/tty ]` returns true — but opening it fails
with `ENXIO: No such device or address`. Under the old gate the
"no terminal available" skip never triggered, the setup wizard ran,
and the build aborted a few lines later when bash tried `< /dev/tty`:
/tmp/install.sh: line 1347: /dev/tty: No such device or address
Replace the existence check with `(: </dev/tty) 2>/dev/null`, which
actually attempts to open /dev/tty in a subshell. The probe succeeds
when piped from `curl | bash` on a real terminal (the wizard's intended
use case) and fails cleanly in Docker build / CI contexts so the skip
kicks in before the redirect can crash.
Add a regression test that statically asserts run_setup_wizard does not
gate on the bare existence check and that the open-based probe is in
place.
Fixes#16746.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to PR #16802 (BeliefanX). The original fix read
`agent_history[-1].get("timestamp")` for the tool-tail freshness gate,
but `gateway/run.py` strips the `timestamp` field off all tool/tool_call
rows when building `agent_history` from the raw transcript (see
`clean_msg = {k: v for k, v in msg.items() if k != "timestamp"}`). At
runtime the tool-tail branch always saw `None` and silently took the
legacy-fresh path — the stale-guard never fired for the tool-tail case
it was supposed to cover.
Changes:
- Read the freshness signal from the RAW `history` list (via new
`_last_transcript_timestamp()` helper) BEFORE the strip. Both the
resume_pending branch and the tool-tail branch use this single signal,
replacing the two divergent ones.
- Default window bumped 15 min → 1 hour via new
`_AUTO_CONTINUE_FRESHNESS_SECS_DEFAULT`. The 15-minute default was
shorter than the default `gateway_timeout` of 30 min, so a legitimate
long-running turn interrupted near its timeout boundary and resumed
shortly after would have been misclassified as stale.
- Configurable via `config.yaml` `agent.gateway_auto_continue_freshness`
(bridged to `HERMES_AUTO_CONTINUE_FRESHNESS` at gateway startup — same
pattern as `gateway_timeout`). Set to 0 to disable the gate.
- `_coerce_gateway_timestamp` now explicitly rejects bool (which is a
subclass of int and would otherwise coerce to 0.0/1.0).
- Tests rewritten to exercise the real production data shape: raw
`history` → `_build_agent_history` strip → freshness decision. A
regression guard (`test_stale_tool_tail_with_production_data_shape`)
asserts `agent_history` tool rows carry NO timestamp, protecting
against someone "fixing" the original bug by re-adding the stripped
field (which would break the OpenAI tool-result message contract).
Add BeliefanX to scripts/release.py AUTHOR_MAP.
E2E verified: config.yaml → env var bridge → helper returns configured
value; default 1h window; malformed/empty env var falls back to default;
ISO-Z timestamps parse; ms-epoch coerced; bool rejected.
Follow-up to #15328's vision-unsupported retry branch in run_agent.py.
_strip_images_from_messages() previously deleted any message whose content
was entirely images. That's fine for synthetic user messages injected for
attachment delivery, but it breaks providers for tool-role messages — the
paired tool_call_id on the preceding assistant message ends up unmatched,
which OpenAI-compatible APIs reject with HTTP 400.
Fix: tool-role messages whose content becomes empty are replaced with a
plaintext placeholder that preserves the tool_call_id linkage. Only
non-tool messages are dropped. Added 10 tests covering the role-alternation
invariants + image-type coverage.
Image-rejection detector: expanded phrase list (image content not
supported / multimodal input / vision input / model does not support
image) and gated on 4xx status so transient 5xx errors never get
misinterpreted as 'server said no to images'. Detection is documented as
best-effort English phrase matching.
AUTHOR_MAP: mapped 3820588+ddupont808@users.noreply.github.com to
ddupont808 so release notes attribute the salvage correctly.
Follow-up to the salvaged PR #16867 that added the read path for
agent.disabled_toolsets in _get_platform_tools():
- Document the new config key under a "Global Toolset Disable" section
in website/docs/user-guide/configuration.md, including the precedence
note (global disable overrides per-platform platform_toolsets).
- Map nazirulhafiy@gmail.com -> nazirulhafiy in scripts/release.py
AUTHOR_MAP so release-notes CI attributes the cherry-picked commit.
* fix: clean gateway auxiliary client caches on teardown
* fix(gateway): recover from stale pid files and close cron agents
Two issues were keeping the gateway from surviving long runs:
1. `_cleanup_invalid_pid_path` delegated to `remove_pid_file`, which
refuses to unlink when the file's pid differs from our own. That
safety check exists for the --replace atexit handoff, but it also
applied to stale-record cleanup, so after a crashy exit the pid
file was orphaned: `write_pid_file()`'s O_EXCL create then failed
with `FileExistsError`, and systemd looped on "PID file race lost
to another gateway instance". Unlink unconditionally from this
helper since the caller has already verified the record is dead.
2. The cron scheduler never closed the ephemeral `AIAgent` it creates
per tick, and never swept the process-global auxiliary-client
cache. Over days of 10-minute ticks this leaked subprocesses and
async httpx transports until the gateway hit EMFILE. Release the
agent and call `cleanup_stale_async_clients()` in `run_job`'s
outer `finally`, matching the gateway's own per-turn cleanup.
* chore(release): map bloodcarter@gmail.com -> bloodcarter
---------
Co-authored-by: bloodcarter <bloodcarter@gmail.com>
The CLI renders through prompt_toolkit in non-full-screen mode, so every
repaint uses the renderer's tracked _cursor_pos.y to cursor_up() + erase
before drawing the new frame. Any time that tracked position drifts from
terminal reality, redraws stack on top of stale content instead of
overwriting it. Four user-visible bugs share this root cause.
Fixes:
- #5474 (SIGWINCH ghosts): the resize wrapper previously only handled
column-shrink reflow. Generalize it to force a full screen-clear
(erase_screen + cursor_goto(0,0)) and renderer.reset() on every resize
— covers widen, row-shrink, and multiplexer SIGWINCH-less redraws.
- #8688 (cmux/tmux tab switch): no SIGWINCH fires on focus regain, so
prompt_toolkit has no signal to recover. Add a _force_full_redraw()
helper, bound to Ctrl+L (standard bash/zsh/vim convention) and exposed
as /redraw. Users can manually clear drift without restarting Hermes.
- #14692 (DSR response leaks — ^[[53;1R): resize storms make
prompt_toolkit's CSI 6n queries race past the input parser; the
terminal's reply ends up as literal input text. Add a sibling of the
bracketed-paste sanitizer that strips \x1b[<row>;<col>R and the
caret-escape visible form from paste text, buffer text-filter, and
the input-processing loop.
The idle-redraw removal (#12641) is in the preceding commit from
@foxion37 — keeping them as separate commits preserves attribution.