Background macOS desktop control via cua-driver MCP — does NOT steal the
user's cursor or keyboard focus, works with any tool-capable model.
Replaces the Anthropic-native `computer_20251124` approach from the
abandoned #4562 with a generic OpenAI function-calling schema plus SOM
(set-of-mark) captures so Claude, GPT, Gemini, and open models can all
drive the desktop via numbered element indices.
- `tools/computer_use/` package — swappable ComputerUseBackend ABC +
CuaDriverBackend (stdio MCP client to trycua/cua's cua-driver binary).
- Universal `computer_use` tool with one schema for all providers.
Actions: capture (som/vision/ax), click, double_click, right_click,
middle_click, drag, scroll, type, key, wait, list_apps, focus_app.
- Multimodal tool-result envelope (`_multimodal=True`, OpenAI-style
`content: [text, image_url]` parts) that flows through
handle_function_call into the tool message. Anthropic adapter converts
into native `tool_result` image blocks; OpenAI-compatible providers
get the parts list directly.
- Image eviction in convert_messages_to_anthropic: only the 3 most
recent screenshots carry real image data; older ones become text
placeholders to cap per-turn token cost.
- Context compressor image pruning: old multimodal tool results have
their image parts stripped instead of being skipped.
- Image-aware token estimation: each image counts as a flat 1500 tokens
instead of its base64 char length (~1MB would have registered as
~250K tokens before).
- COMPUTER_USE_GUIDANCE system-prompt block — injected when the toolset
is active.
- Session DB persistence strips base64 from multimodal tool messages.
- Trajectory saver normalises multimodal messages to text-only.
- `hermes tools` post-setup installs cua-driver via the upstream script
and prints permission-grant instructions.
- CLI approval callback wired so destructive computer_use actions go
through the same prompt_toolkit approval dialog as terminal commands.
- Hard safety guards at the tool level: blocked type patterns
(curl|bash, sudo rm -rf, fork bomb), blocked key combos (empty trash,
force delete, lock screen, log out).
- Skill `apple/macos-computer-use/SKILL.md` — universal (model-agnostic)
workflow guide.
- Docs: `user-guide/features/computer-use.md` plus reference catalog
entries.
44 new tests in tests/tools/test_computer_use.py covering schema
shape (universal, not Anthropic-native), dispatch routing, safety
guards, multimodal envelope, Anthropic adapter conversion, screenshot
eviction, context compressor pruning, image-aware token estimation,
run_agent helpers, and universality guarantees.
469/469 pass across tests/tools/test_computer_use.py + the affected
agent/ test suites.
- `model_tools.py` provider-gating: the tool is available to every
provider. Providers without multi-part tool message support will see
text-only tool results (graceful degradation via `text_summary`).
- Anthropic server-side `clear_tool_uses_20250919` — deferred;
client-side eviction + compressor pruning cover the same cost ceiling
without a beta header.
- macOS only. cua-driver uses private SkyLight SPIs
(SLEventPostToPid, SLPSPostEventRecordTo,
_AXObserverAddNotificationAndCheckRemote) that can break on any macOS
update. Pin with HERMES_CUA_DRIVER_VERSION.
- Requires Accessibility + Screen Recording permissions — the post-setup
prints the Settings path.
Supersedes PR #4562 (pyautogui/Quartz foreground backend, Anthropic-
native schema). Credit @0xbyt4 for the original #3816 groundwork whose
context/eviction/token design is preserved here in generic form.
Flips security.redact_secrets from true to false in DEFAULT_CONFIG, and
the HERMES_REDACT_SECRETS env-var fallback in agent/redact.py now
requires explicit opt-in ("1"/"true"/"yes"/"on") to enable.
New installs and users without a security.redact_secrets key get pass-
through tool output. Existing users whose config.yaml explicitly sets
redact_secrets: true keep redaction on — the config-yaml -> env-var
bridges in hermes_cli/main.py and gateway/run.py still honor their
setting.
Also updates the inline config comments, website docs, and the
hermes-agent skill so /hermes config set security.redact_secrets true
is now the documented way to turn it on.
Port https://github.com/blader/humanizer (MIT, v2.5.1, 16k stars) into
the built-in skills under skills/creative/humanizer/. Based on Wikipedia's
'Signs of AI writing' guide (WikiProject AI Cleanup) — detects 29 AI-writing
patterns and rewrites them to sound human.
Hermes-native adaptations:
- Description (<60 chars) explains what it's for: 'Humanize text: strip
AI-isms and add real voice.'
- 'When to use this skill' section — trigger phrases (humanize, de-AI,
de-slop, un-ChatGPT, rewrite to not sound like an LLM) plus guidance to
apply it to the agent's own output (release notes, PR descriptions, docs).
- 'How to use it in Hermes' — maps the three real input paths (inline,
file via read_file/patch/write_file, voice-calibration sample) onto the
tools the agent actually has. Drops Claude Code's allowed-tools block.
- Converted frontmatter to Hermes format (metadata.hermes.tags, category,
homepage, related_skills).
Attribution preserved:
- Original author Siqi Chen (@blader) credited in frontmatter and body.
- Full MIT LICENSE copied verbatim alongside SKILL.md.
- Wikipedia / WikiProject AI Cleanup credited.
- 29 patterns, personality/soul section, and full worked example kept
verbatim from the source (29,914 chars).
Validated end-to-end against a clean HERMES_HOME:
- sync_skills() copies skills/creative/humanizer/ including LICENSE.
- skills_list(category='creative') returns the 48-char description.
- skill_view(name='humanizer') returns the full body with all 29 patterns,
personality/soul, attribution, and Hermes tool refs (read_file, patch,
write_file) intact.
Adds four new reference docs covering common TD use cases not previously
documented in the skill:
- animation.md: LFOs, timers, keyframes, easing, time references
- midi-osc.md: MIDI controllers, OSC routing, TouchOSC, multi-machine sync
- particles.md: POPs and particleSOP — emission, forces, collisions, render
- projection-mapping.md: windowCOMP, corner pin, mesh warp, edge blending
Also clarifies the SKILL.md tool quick reference: adds td_screen_point_to_global
and notes that 4 admin/dev-mode tools (td_project_quit, td_test_session,
td_dev_log, td_clear_dev_log) live only in mcp-tools.md to keep the main
reference focused on creative workflows.
No SKILL.md workflow or critical-rules changes. References load on demand
so no token-budget impact at session start.
For 14 of 74 compressed skills, the original description contained
trigger keywords, technique counts, attribution, or use-case phrases
not covered by the existing body content. Prepends a 'When to use' /
'What's inside' block near the top so the agent still has the full
context when the skill is loaded.
Skills salvaged:
- codex, ascii-video, creative-ideation, excalidraw, manim-video, p5js
- gif-search, heartmula, youtube-content
- lm-evaluation-harness, obliteratus, vllm, axolotl
- powerpoint
Remaining 60 skills were verified to already cover the dropped content
in their existing body sections (When to Use, overview, intro prose)
or had short descriptions fully captured by the new compressed form.
Target: every skill's description fits in a one-line gateway menu and
leads with trigger keywords an agent would match on. Drops filler like
'Use this skill to', 'A skill for', 'This skill provides'.
Before: max description length was 791 chars (architecture-diagram),
74 of 81 built-in skills were >60 chars.
After: max 60, mean 54, all 81 built-in skills <=60.
Rewritten with double-quoted YAML scalars to preserve Chinese/arrow
glyphs (baoyu-comic, yuanbao, youtube-content).
- claude-design: 'Design one-off HTML artifacts (landing, deck, prototype).' (57)
- popular-web-designs: '54 real design systems (Stripe, Linear, Vercel) as HTML/CSS.' (60)
- design-md: "Author/validate/export Google's DESIGN.md token spec files." (59)
Also adds an inline callout near the top of claude-design pointing to
popular-web-designs and design-md so the cross-reference lands even
without reading the full decision table.
- claude-design: design process + taste for one-off HTML artifacts
- popular-web-designs: 54 ready-to-paste design systems (Stripe/Linear/etc.)
- design-md: formal DESIGN.md token spec file authoring
Adds a comparison table to claude-design's 'When To Use' section and
reciprocal pointers in design-md and popular-web-designs. Also corrects
claude-design author attribution to BadTechBandit.
Follow-up to #16323 — the UrlSource adapter is shipped but four
user-facing docs surfaces still only listed the hub-identifier forms.
- user-guide/features/skills.md: add ``url`` to the Supported-hub-sources
table; add a new "#### 8. Direct URL (`url`)" section explaining scope
(single-file SKILL.md only), name-resolution order (frontmatter → URL
slug → interactive prompt → --name flag), and both TTY and
non-interactive usage. Add two URL examples to the install-examples
block near the top of the page.
- reference/cli-commands.md: two URL install examples + one note
explaining the name-resolution fallback chain.
- guides/work-with-skills.md: one URL-install example alongside the
existing hub-identifier examples.
- skills/autonomous-ai-agents/hermes-agent/SKILL.md: Quick Reference
block's ``hermes skills install`` line now spells out that ID can be
a hub identifier OR a direct SKILL.md URL, and mentions --name for
frontmatter-less skills.
No code changes. No new dependencies. Website builds via the usual
Docusaurus pipeline.
Co-authored-by: teknium1 <teknium@noreply.github.com>
Parse scope from the raw callback URL before stripping the auth code so Flow.fetch_token matches user-granted scopes. Add regression test for dual-scope callbacks.
Made-with: Cursor
Four small tool-description / skill-content tweaks addressing recurring
model mistakes seen in @versun's docx feedback (Kimi 2.6, but the patterns
apply to every model):
1. browser_navigate description: call out .md/.txt/.json/.yaml/.csv/.xml,
raw.githubusercontent.com, and API endpoints as specifically preferring
curl or web_extract. The generic "prefer web_search or web_extract" was
too weak; models kept firing up the browser for plain-text URLs.
2. delegate_task description: two additions.
(a) Pass user language / output-style preferences in 'context' when they
differ from English — otherwise subagents default to English and their
summaries contaminate the final reply (caused the bilingual digest bug).
(b) Subagent summaries are self-reports, not verified facts. For
operations with external side-effects (HTTP uploads, remote writes,
file creation at shared paths), require a verifiable handle (URL, ID,
path) and verify it yourself before claiming success.
3. agent/prompt_builder.py Skills-mandatory block: new explicit line
"Whenever the user asks to configure / set up / modify / install /
enable / disable / troubleshoot Hermes Agent itself, load the
`hermes-agent` skill first." The generic "load what's relevant" didn't
route Hermes-meta questions (like "how do I turn off redaction?") to
the one skill that has the answer.
4. skills/autonomous-ai-agents/hermes-agent/SKILL.md: new "Security &
Privacy Toggles" section covering security.redact_secrets (with the
import-time-snapshot restart-required caveat), privacy.redact_pii,
approvals.mode (manual/smart/off) + --yolo + HERMES_YOLO_MODE, shell
hooks allowlist, and how to disable network/media tools entirely.
Every command verified against the actual config keys — no invented
knobs.
Co-authored-by: teknium1 <teknium@noreply.github.com>
Expand the airtable skill from bare CRUD to a full Hermes-shaped
cookbook matching the linear/notion neighbors, and trim the
description to fit the 60-char system-prompt cutoff.
Hermes-specific additions:
- Explicit 'use the terminal tool with curl — not web_extract or
browser_navigate' guidance, matching the same note in linear.
- Note that AIRTABLE_API_KEY flows from ~/.hermes/.env into the
subprocess automatically via env_passthrough, so curl calls don't
need to re-export it.
- Prefer 'python3 -m json.tool' (always present) over jq (optional)
for pretty-printing, with -s on every curl to keep output clean.
- Read-before-write workflow that resolves record IDs via
filterByFormula instead of guessing.
Cookbook expansion (new vs original):
- Field-type reference table (text, select, multi-select, attachment,
linked record, user) with the exact write-shape Airtable expects.
- typecast flag for auto-coercing values / auto-creating select options.
- performUpsert PATCH for idempotent sync by merge field.
- Batch create/delete endpoints (10-record cap per call).
- Sort + fields query params with URL-encoding (%5B / %5D).
- Named-view query that applies saved filter/sort server-side.
- Full pagination loop template (while loop with offset).
- Common filterByFormula patterns (exact match, contains, AND/OR,
date comparison, NOT empty).
- Rate-limit backoff guidance (Retry-After header, per-base budget).
- Airtable error-code reference (AUTHENTICATION_REQUIRED,
INVALID_PERMISSIONS, MODEL_ID_NOT_FOUND,
INVALID_MULTIPLE_CHOICE_OPTIONS) so the agent can map failures to
user-actionable fixes instead of just retrying.
Also: description trimmed from 183 chars (truncated to 60 in system
prompt, losing 'filter/upsert/delete' trigger terms) down to 59 chars
that render whole: 'Airtable REST API via curl. Records CRUD, filters,
upserts.' Catalog row updated to match.
SKILL.md grew from 115 to 228 lines — still under the 500-line soft
cap and below the linear skill (297 lines) which serves the same
role for GraphQL.
Convert the airtable skill from 'skills.config.airtable.api_key'
(config.yaml, wrong bucket for a secret) to 'prerequisites.env_vars:
[AIRTABLE_API_KEY]' (~/.hermes/.env), matching every other bundled
skill that authenticates with an API token.
Why the original shape was wrong:
- metadata.hermes.config is for non-secret skill settings (paths,
preferences) per references/skill-config-interface.md. Storing a
bearer token under skills.config.* also triggered the documented
'hermes config migrate' nag-on-every-run problem.
- The Quick Reference's 'AIRTABLE_API_KEY=...' bash line couldn't
read skills.config.airtable.api_key anyway — it's a yaml path, not
an env var.
Follow-up polish on the same pass:
- Added version/author/license frontmatter to match notion/linear.
- Added prerequisites.commands: [curl].
- Setup section now specifies the PAT format (pat...) that replaced
legacy 'key...' API keys in Feb 2024, plus the three required scopes
(data.records:read/write, schema.bases:read) and the per-base Access
list requirement.
- Clarified PATCH vs PUT and pagination (100 records/page cap).
- Swapped verification from 'hermes -q ...' (non-deterministic) to a
curl /v0/meta/bases call that returns a verifiable HTTP status code.
Was user-local in ~/.hermes/skills/. Ported into skills/software-development/
so other Hermes users get it and so the related_skills links from
node-inspect-debugger and python-debugpy resolve in-repo.
Frontmatter upgraded to match repo convention (version/author/license/
metadata.hermes.{tags,related_skills}, description rewritten as "Use when ...").
Body expanded with debugging-tactics section pointing at the two new
debugger skills, and additional common-issues / pitfalls entries.
Two new skills under skills/software-development/ for real breakpoint-driven
debugging from the terminal:
- node-inspect-debugger: node --inspect / --inspect-brk, node inspect REPL,
CDP scripting via chrome-remote-interface, attaching to running Node
processes (SIGUSR1), ui-tui-specific recipes, Vitest under debugger,
CPU profiles + heap snapshots.
- python-debugpy: pdb quick reference, breakpoint() workflow, pytest --pdb
(with xdist caveat for scripts/run_tests.sh), post-mortem, debugpy for
remote/attach, remote-pdb as the agent-friendly alternative to DAP,
recipes for tui_gateway/_SlashWorker/subprocess debugging.
skills/feeds/ only contained a category-marker DESCRIPTION.md with no
actual skills in it. Removing the directory and the 'feeds' -> 'Feeds'
display-label mapping in website/scripts/extract-skills.py (the only
other reference in the repo).
New `hermes kanban` CLI subcommand + `/kanban` slash command + skills for
worker and orchestrator profiles. SQLite-backed task board
(~/.hermes/kanban.db) shared across all profiles on the host. Zero
changes to run_agent.py, no new core tools, no tool-schema bloat.
Motivation: delegate_task is a function call — sync fork/join, anonymous
subagent, no resumability, no human-in-the-loop. Kanban is the durable
shape needed for research triage, scheduled ops, digital twins,
engineering pipelines, and fleet work. They coexist (workers may call
delegate_task internally).
What this adds
- hermes_cli/kanban_db.py — schema, CAS claim, dependency resolution,
dispatcher, workspace resolution, worker-context builder.
- hermes_cli/kanban.py — 15-verb CLI surface and shared run_slash()
entry point used by both CLI and gateway.
- skills/devops/kanban-worker — how a profile should work a claimed task.
- skills/devops/kanban-orchestrator — "you are a dispatcher, not a
worker" template with anti-temptation rules.
- /kanban slash command wired into cli.py and gateway/run.py. Bypasses
the running-agent guard (board writes don't touch agent state), so
/kanban unblock can free a stuck worker mid-conversation.
- Design spec at docs/hermes-kanban-v1-spec.pdf — comparative analysis
vs Cline Kanban, Paperclip, NanoClaw, Gemini Enterprise; 8 patterns;
4 user stories; implementation plan; concurrency correctness.
- Docs: website/docs/user-guide/features/kanban.md, CLI reference
updated, sidebar entry added.
Architecture highlights
- Three planes: control (user + gateway), state (board + dispatcher),
execution (pool of profile processes).
- Every worker is a full OS process, spawned as `hermes -p <profile>`.
No in-process subagent swarms — solves NanoClaw's SDK-lifecycle
failure class.
- Atomic claim via SQLite CAS in a BEGIN IMMEDIATE transaction; stale
claims reclaimed 15 min after their TTL expires.
- Tenant namespacing via one nullable column — one specialist fleet
can serve many businesses with data isolation by workspace path.
Tests: 60 targeted tests (schema, CAS atomicity, dependency resolution,
dispatcher, workspace kinds, tenancy, CLI + slash surface). All pass
hermetic via scripts/run_tests.sh.
The ephemeral no-tools side-question variant of /btw confused users who
expected 'by-the-way' to mean 'run this off to the side with tools' —
they'd type /btw and get a toolless agent that couldn't do the work.
/bg worked because it was /background with full tools.
Collapse the two: /btw and /bg both alias to /background. One command,
one behavior, no more gotchas about which variant has tools.
Removed:
- _handle_btw_command in cli.py and gateway/run.py
- _run_btw_task + _active_btw_tasks state in gateway/run.py
- prompt.btw JSON-RPC method + btw.complete event in tui_gateway
- BtwStartResponse type + btw.complete case in ui-tui
- Standalone /btw slash tree registration in Discord
- Standalone btw CommandDef in hermes_cli/commands.py
Updated:
- background CommandDef aliases: (bg,) -> (bg, btw)
- TUI session.ts: local btw handler merged into background
- Docs and tips updated to describe /btw as a /background alias
Adds a 'Video Guide' section pointing at the walkthrough of a Hermes agent
abliterating Gemma with OBLITERATUS, so the agent can surface it when the
user wants a visual overview before running the workflow.
Closes#13626.
Two follow-ups on top of the _hermes_home helper from @jerome-benoit's #12729:
1. Declare a [google] optional extra in pyproject.toml
(google-api-python-client, google-auth-oauthlib, google-auth-httplib2) and
include it in [all]. Packagers (Nix flake, Homebrew) now ship the deps by
default, so `setup.py --check` does not need to shell out to pip at
runtime — the imports succeed and install_deps() is never reached.
This fixes the Nix breakage where pip/ensurepip are stripped.
2. Add `from __future__ import annotations` to setup.py so the PEP 604
`str | None` annotation parses on Python 3.9 (macOS system python).
Previously system python3 SyntaxError'd before any code ran.
install_deps() error message now also points users at the extra instead of
just the raw pip command.
The three google-workspace scripts (setup.py, google_api.py, gws_bridge.py)
each had their own way of resolving HERMES_HOME:
- setup.py imported hermes_constants (crashes outside Hermes process)
- google_api.py used os.getenv inline (no strip, no empty handling)
- gws_bridge.py defined its own local get_hermes_home() (duplicate)
Extract the common logic into _hermes_home.py which:
- Delegates to hermes_constants when available (profile support, etc.)
- Falls back to os.getenv with .strip() + empty-as-unset handling
- Provides display_hermes_home() with ~/ shortening for profiles
All three scripts now import from _hermes_home instead of duplicating.
7 regression tests cover the fallback path: env var override, default
~/.hermes, empty env var, display shortening, profile paths, and
custom non-home paths.
Closes#12722
Three quality improvements on top of #15121 / #15130 / #15135:
1. Tool consolidation (9 → 7)
- spotify_saved_tracks + spotify_saved_albums → spotify_library with
kind='tracks'|'albums'. Handler code was ~90 percent identical
across the two old tools; the merge is a behavioral no-op.
- spotify_activity dropped. Its 'now_playing' action was a duplicate
of spotify_playback.get_currently_playing (both return identical
204/empty payloads). Its 'recently_played' action moves onto
spotify_playback as a new action — history belongs adjacent to
live state.
- Net: each API call ships 2 fewer tool schemas when the Spotify
toolset is enabled, and the action surface is more discoverable
(everything playback-related is on one tool).
2. Spotify skill (skills/media/spotify/SKILL.md)
Teaches the agent canonical usage patterns so common requests don't
balloon into 4+ tool calls:
- 'play X' = one search, then play by URI (not search + scan +
describe + play)
- 'what's playing' = single get_currently_playing (no preflight
get_state chain)
- Don't retry on '403 Premium required' or '403 No active device' —
both require user action
- URI/URL/bare-ID format normalization
- Full failure-mode reference for 204/401/403/429
3. Surfaced in 'hermes setup' tool status
Adds 'Spotify (PKCE OAuth)' to the tool status list when
auth.json has a Spotify access/refresh token. Matches the
homeassistant pattern but reads from auth.json (OAuth-based) rather
than env vars.
Docs updated to reflect the new 7-tool surface, and mention the
companion skill in the 'Using it' section.
Tests: 54 passing (client 22, auth 15, tools_config 35 — 18 = 54 after
renaming/replacing the spotify_activity tests with library +
recently_played coverage). Docusaurus build clean.
* feat(config): make tool output truncation limits configurable
Port from anomalyco/opencode#23770: expose a new `tool_output` config
section so users can tune the hardcoded truncation caps that apply to
terminal output and read_file pagination.
Three knobs under `tool_output`:
- max_bytes (default 50_000) — terminal stdout/stderr cap
- max_lines (default 2000) — read_file pagination cap
- max_line_length (default 2000) — per-line cap in line-numbered view
All three keep their existing hardcoded values as defaults, so behaviour
is unchanged when the section is absent. Power users on big-context
models can raise them; small-context local models can lower them.
Implementation:
- New `tools/tool_output_limits.py` reads the section with defensive
fallback (missing/invalid values → defaults, never raises).
- `tools/terminal_tool.py` MAX_OUTPUT_CHARS now comes from
get_max_bytes().
- `tools/file_operations.py` normalize_read_pagination() and
_add_line_numbers() now pull the limits at call time.
- `hermes_cli/config.py` DEFAULT_CONFIG gains the `tool_output` section
so `hermes setup` writes defaults into fresh configs.
- Docs page `user-guide/configuration.md` gains a "Tool Output
Truncation Limits" section with large-context and small-context
example configs.
Tests (18 new in tests/tools/test_tool_output_limits.py):
- Default resolution with missing / malformed / non-dict config.
- Full and partial user overrides.
- Coercion of bad values (None, negative, wrong type, str int).
- Shortcut accessors delegate correctly.
- DEFAULT_CONFIG exposes the section with the right defaults.
- Integration: normalize_read_pagination clamps to the configured
max_lines.
* feat(skills): add design-md skill for Google's DESIGN.md spec
Built-in skill under skills/creative/ that teaches the agent to author,
lint, diff, and export DESIGN.md files — Google's open-source
(Apache-2.0) format for describing a visual identity to coding agents.
Covers:
- YAML front matter + markdown body anatomy
- Full token schema (colors, typography, rounded, spacing, components)
- Canonical section order + duplicate-heading rejection
- Component property whitelist + variants-as-siblings pattern
- CLI workflow via 'npx @google/design.md' (lint/diff/export/spec)
- Lint rule reference including WCAG contrast checks
- Common YAML pitfalls (quoted hex, negative dimensions, dotted refs)
- Starter template at templates/starter.md
Package verified live on npm (@google/design.md@0.1.1).
* fix(skills/baoyu-comic): require absolute paths for curl -o downloads
When downloading generated images across several batches of image_generate
calls, relying on persistent-shell CWD is unsafe. The terminal tool's shell
can rotate (TERMINAL_LIFETIME_SECONDS expiry, a failed cd that leaves the
shell somewhere else), and 'curl -fsSL <url> -o relative.png' then silently
writes to the wrong directory with no error.
Update the skill's Step 7 Download step to require absolute -o paths (or
workdir= on the terminal tool) and add a matching pitfall entry referencing
the Apr 2026 incident where pages 06-09 of a 10-page comic landed at the
repo root instead of comic/<slug>/. The agent then spent several turns
claiming the files existed where they didn't.
* fix(skills/baoyu-comic): handle clarify timeouts correctly in Step 2
A clarify timeout returning 'Use your best judgement to make the choice
and proceed' is NOT user consent to default the entire Step 2 questionnaire.
It is a per-question default only. Add guidance at both instruction sites
(SKILL.md User Questions section, references/workflow.md Step 2 header)
telling the agent to:
1. Continue asking the remaining questions in the sequence after a
timeout — each question is an independent consent point.
2. Surface every defaulted choice in the next user-visible message
so the user can correct it when they return. An unreported default
is indistinguishable from never having asked.
Reported live Apr 2026: agent asked style question via clarify, got a
timeout response, and silently defaulted style + narrative focus +
audience + review flags in one pass. User only learned style had
defaulted to 'ohmsha' after the comic was fully generated.
Page prompts are written in Step 5 from the text descriptions in
characters/characters.md — the PNG sheet generated in Step 7.1
cannot be used to write them. Reposition the PNG as a human-facing
review artifact (and reference for later regenerations / manual
edits), and drop the confusing "Character sheet | Strategy" tables
since the embedding rule is uniform.
- Remove PDF merge feature and scripts/ directory (no pdf-lib dep)
- Correct image_generate docs: prompt-only, returns URL; add
curl download step after every call
- Downgrade reference images to text-based trait extraction
(style/palette/scene); character sheet is agent-facing reference
- Unify source file naming on source-{slug}.md across SKILL.md
and workflow.md
Port the upstream baoyu-comic skill to Hermes' tool ecosystem, matching
the earlier baoyu-infographic adaptation:
- metadata namespace openclaw -> hermes (+ tags, homepage)
- drop EXTEND.md preferences system (references/config/ removed,
workflow Step 1.1 removed)
- user prompts via clarify (one question at a time) instead of
AskUserQuestion batches
- image generation via image_generate instead of baoyu-imagine, with
aspect-ratio mapping to landscape/portrait/square
- Windows/PowerShell/WSL shell snippets dropped
- file I/O referenced via Hermes write_file/read_file tools
- CLI-style --flags converted to natural-language options and
user-intent cues (skill matching has no slash command trigger)
Add PORT_NOTES.md documenting the adaptations and a sync procedure.
Art-style/tone/layout reference files are preserved verbatim from
upstream v1.56.1.
Three additive conventions inspired by github.com/atomicmemory/llm-wiki-compiler:
- Paragraph-level provenance: `^[raw/articles/source.md]` markers on pages synthesizing 3+ sources, so readers can trace individual claims without re-reading full source files.
- Raw source content hashing: `sha256:` in raw/ frontmatter enables re-ingest drift detection — skip unchanged sources, flag changed ones.
- Optional `confidence` and `contested` frontmatter fields let lint surface weak or disputed claims without re-reading every page's prose.
Lint gains two new checks (quality signals, source drift) and one expanded check (contradictions now surfaces frontmatter-flagged pages).
Also adds a Related Tools section pointing users who want batch/scheduled compilation at llm-wiki-compiler (Obsidian-compatible, works on the same vault).
All additions are opt-in — existing wikis need no migration. Skill version 2.0.0 -> 2.1.0.
- Description truncated to 60 chars in system prompt (extract_skill_description),
so the 500-char HF workflow description never reached the agent; shortened to
'llama.cpp local GGUF inference + HF Hub model discovery.' (56 chars).
- Restore llama-cpp-python section (basic, chat+stream, embeddings,
Llama.from_pretrained) and frontmatter dependencies entry.
- Fix broken 'Authorization: Bearer ***' curl line (missing closing quote;
llama-server doesn't require auth by default).
xurl v1.1.0 added an optional USERNAME positional to `xurl auth oauth2`
that skips the `/2/users/me` lookup, which has been returning 403/UsernameNotFound
for many devs. Documents the workaround in both setup (step 5) and
troubleshooting.
Reported by @itechnologynet.
Small follow-up inspired by stale PR #2421 (@poojandpatel).
- bakery now searches both shop=bakery AND amenity=bakery in one Overpass
query so indie bakeries tagged either way are returned. Reproduces #2421's
Lawrenceville, NJ test case (The Gingered Peach, WildFlour Bakery).
- Adds tourism=guest_house and tourism=camp_site as first-class categories.
- CATEGORY_TAGS entries can now be a list of (key, value) tuples; new
_tags_for() normaliser + tag_pairs= kwarg on build_overpass_nearby/bbox
union the results in one query. Old single-tuple call sites unchanged
(back-compat preserved).
- SKILL.md: 44 → 46 categories, list updated.
- Setup step 5: add --app my-app to xurl auth oauth2 so token binds to the correct app
- Setup step 6: add xurl auth default my-app to set the named app as default
- Add pitfall callout explaining the empty 'default' profile trap
- Agent Workflow step 2: detect when default app has no oauth2 tokens
- Add Troubleshooting table with common xurl issues (auth errors, unauthorized_client, enrollment, credits, media upload, dashboard UI bug)
- Bump to v1.1.0
Community report by @0xHarryWeb3
Smart model routing (auto-routing short/simple turns to a cheap model
across providers) was opt-in and disabled by default. This removes the
feature wholesale: the routing module, its config keys, docs, tests, and
the orchestration scaffolding it required in cli.py / gateway/run.py /
cron/scheduler.py.
The /fast (Priority Processing / Anthropic fast mode) feature kept its
hooks into _resolve_turn_agent_config — those still build a route dict
and attach request_overrides when the model supports it; the route now
just always uses the session's primary model/provider rather than
running prompts through choose_cheap_model_route() first.
Also removed:
- DEFAULT_CONFIG['smart_model_routing'] block and matching commented-out
example sections in hermes_cli/config.py and cli-config.yaml.example
- _load_smart_model_routing() / self._smart_model_routing on GatewayRunner
- self._smart_model_routing / self._active_agent_route_signature on
HermesCLI (signature kept; just no longer initialised through the
smart-routing pipeline)
- route_label parameter on HermesCLI._init_agent (only set by smart
routing; never read elsewhere)
- 'Smart Model Routing' section in website/docs/integrations/providers.md
- tip in hermes_cli/tips.py
- entries in hermes_cli/dump.py + hermes_cli/web_server.py
- row in skills/autonomous-ai-agents/hermes-agent/SKILL.md
Tests:
- Deleted tests/agent/test_smart_model_routing.py
- Rewrote tests/agent/test_credential_pool_routing.py to target the
simplified _resolve_turn_agent_config directly (preserves credential
pool propagation + 429 rotation coverage)
- Dropped 'cheap model' test from test_cli_provider_resolution.py
- Dropped resolve_turn_route patches from cli + gateway test_fast_command
— they now exercise the real method end-to-end
- Removed _smart_model_routing stub assignments from gateway/cron test
helpers
Targeted suites: 74/74 in the directly affected test files;
tests/agent + tests/cron + tests/cli pass except 5 failures that
already exist on main (cron silent-delivery + alias quick-command).
find-nearby and the (new) maps optional skill both used OpenStreetMap's
Overpass + Nominatim to answer the same question — 'what's near this
location?' — so shipping both would be duplicate code for overlapping
capability. Consolidate into one active-by-default skill at
skills/productivity/maps/ that is a strict superset of find-nearby.
Moves + deletions:
- optional-skills/productivity/maps/ → skills/productivity/maps/ (active,
no install step needed)
- skills/leisure/find-nearby/ → DELETED (fully superseded)
Upgrades to maps_client.py so it covers everything find-nearby did:
- Overpass server failover — tries overpass-api.de then
overpass.kumi.systems so a single-mirror outage doesn't break the skill
(new overpass_query helper, used by both nearby and bbox)
- nearby now accepts --near "<address>" as a shortcut that auto-geocodes,
so one command replaces the old 'search → copy coords → nearby' chain
- nearby now accepts --category (repeatable) for multi-type queries in
one call (e.g. --category restaurant --category bar), results merged
and deduped by (osm_type, osm_id), sorted by distance, capped at --limit
- Each nearby result now includes maps_url (clickable Google Maps search
link) and directions_url (Google Maps directions from the search point
— only when a ref point is known)
- Promoted commonly-useful OSM tags to top-level fields on each result:
cuisine, hours (opening_hours), phone, website — instead of forcing
callers to dig into the raw tags dict
SKILL.md:
- Version bumped 1.1.0 → 1.2.0, description rewritten to lead with
capability surface
- New 'Working With Telegram Location Pins' section replacing
find-nearby's equivalent workflow
- metadata.hermes.supersedes: [find-nearby] so tooling can flag any
lingering references to the old skill
External references updated:
- optional-skills/productivity/telephony/SKILL.md — related_skills
find-nearby → maps
- website/docs/reference/skills-catalog.md — removed the (now-empty)
'leisure' section, added 'maps' row under productivity
- website/docs/user-guide/features/cron.md — find-nearby example
usages swapped to maps
- tests/tools/test_cronjob_tools.py, tests/hermes_cli/test_cron.py,
tests/cron/test_scheduler.py — fixture string values swapped
- cli.py:5290 — /cron help-hint example swapped
Not touched:
- RELEASE_v0.2.0.md — historical record, left intact
E2E-verified live (Nominatim + Overpass, one query each):
- nearby --near "Times Square" --category restaurant --category bar → 3 results,
sorted by distance, all with maps_url, directions_url, cuisine, phone, website
where OSM had the tags
All 111 targeted tests pass across tests/cron/, tests/tools/, tests/hermes_cli/.
External services can now push plain-text notifications to a user's chat
via the webhook adapter without invoking the agent. Set deliver_only=true
on a route and the rendered prompt template becomes the literal message
body — dispatched directly to the configured target (Telegram, Discord,
Slack, GitHub PR comment, etc.).
Reuses all existing webhook infrastructure: HMAC-SHA256 signature
validation, per-route rate limiting, idempotency cache, body-size limits,
template rendering with dot-notation, home-channel fallback. No new HTTP
server, no new auth scheme, no new port.
Use cases: Supabase/Firebase webhooks → user notifications, monitoring
alert forwarding, inter-agent pings, background job completion alerts.
Changes:
- gateway/platforms/webhook.py: new _direct_deliver() helper + early
dispatch branch in _handle_webhook when deliver_only=true. Startup
validation rejects deliver_only with deliver=log.
- hermes_cli/main.py + hermes_cli/webhook.go: --deliver-only flag on
subscribe; list/show output marks direct-delivery routes.
- website/docs/user-guide/messaging/webhooks.md: new Direct Delivery
Mode section with config example, CLI example, response codes.
- skills/devops/webhook-subscriptions/SKILL.md: document --deliver-only
with use cases (bumped to v1.1.0).
- tests/gateway/test_webhook_deliver_only.py: 14 new tests covering
agent bypass, template rendering, status codes, HMAC still enforced,
idempotency still applies, rate limit still applies, startup
validation, and direct-deliver dispatch.
Validation: 78 webhook tests pass (64 existing + 14 new). E2E verified
with real aiohttp server + real urllib POST — agent not invoked, target
adapter.send() called with rendered template, duplicate delivery_id
suppressed.
Closes the gap identified in PR #12117 (thanks to @H1an1 / Antenna team)
without adding a second HTTP ingress server.
- Remove orphan skills/creative/touchdesigner/references/pitfalls.md
left over from the rename commit (git add-then-edit instead of git mv
meant the old file never got deleted).
- Honour $HERMES_HOME in setup.sh and SKILL.md setup invocation so
profile-aware installs work correctly.
- Fix troubleshooting.md config path to use $HERMES_HOME instead of
hardcoding ~/.hermes/.
- Add touchdesigner-mcp entries to skills-catalog.md and
optional-skills-catalog.md for parity with blender-mcp/meme-generation.
New skill: creative/touchdesigner — control a running TouchDesigner
instance via REST API. Build real-time visual networks programmatically.
Architecture:
Hermes Agent -> HTTP REST (curl) -> TD WebServer DAT -> TD Python env
Key features:
- Custom API handler (scripts/custom_api_handler.py) that creates a
self-contained WebServer DAT + callback in TD. More reliable than the
official mcp_webserver_base.tox which frequently fails module imports.
- Discovery-first workflow: never hardcode TD parameter names. Always
probe the running instance first since names change across versions.
- Persistent setup: save the TD project once with the API handler baked
in. TD auto-opens the last project on launch, so port 9981 is live
with zero manual steps after first-time setup.
- Works via curl in execute_code (no MCP dependency required).
- Optional MCP server config for touchdesigner-mcp-server npm package.
Skill structure (2823 lines total):
SKILL.md (209 lines) — setup, workflow, key rules, operator reference
references/pitfalls.md (276 lines) — 24 hard-won lessons
references/operators.md (239 lines) — all 6 operator families
references/network-patterns.md (589 lines) — audio-reactive, generative,
video processing, GLSL, instancing, live performance recipes
references/mcp-tools.md (501 lines) — 13 MCP tool schemas
references/python-api.md (443 lines) — TD Python scripting patterns
references/troubleshooting.md (274 lines) — connection diagnostics
scripts/custom_api_handler.py (140 lines) — REST API handler for TD
scripts/setup.sh (152 lines) — prerequisite checker
Tested on TouchDesigner 099 Non-Commercial (macOS/darwin).
Swap the social-media/xitter skill (third-party wrapper around
Infatoshi/x-cli) for a new social-media/xurl skill wrapping
xdevplatform/xurl — the official X API CLI from the X developer
platform team.
Why:
- xurl is officially maintained by the X dev platform team
- OAuth 2.0 PKCE with auto-refresh + multi-app / multi-user support
(vs. xitter's 5-env-var OAuth 1.0a + single account)
- Credentials stored in ~/.xurl managed by xurl itself — no manual
env var juggling for users
- Substantially larger API surface: DMs, follows, blocks, mutes,
media upload, streaming, and raw v2 endpoint access
- Ships stronger agent-safety guardrails (forbidden-flag list,
no --verbose in agent mode, never-read-~/.xurl rule)
Adaptation:
- Ported the openclaw SKILL.md (which the xdevplatform team seeded)
to Hermes frontmatter conventions (prerequisites.commands, platforms,
metadata.hermes.tags/homepage) — dropped openclaw-specific metadata
- Added a Hermes-oriented one-time user setup section so the agent
knows to direct the user to run auth commands themselves, never
execute them with inline secrets
- Preserved the mandatory secret-safety rules verbatim
- Attribution block credits xdevplatform, openclaw, and the Hermes
port
Docs: updated website/docs/reference/skills-catalog.md to replace
the xitter row with xurl.
Three tightly-scoped built-in skill consolidations to reduce redundancy in
the available_skills listing injected into every system prompt:
1. gguf-quantization → llama-cpp (merged)
GGUF is llama.cpp's format; two skills covered the same toolchain. The
merged llama-cpp skill keeps the full K-quant table + imatrix workflow
from gguf and the ROCm/benchmarks/supported-models sections from the
original llama-cpp. All 5 reference files preserved.
2. grpo-rl-training → fine-tuning-with-trl (folded in)
GRPO isn't a framework, it's a trainer inside TRL. Moved the 17KB
deep-dive SKILL.md to references/grpo-training.md and the working
template to templates/basic_grpo_training.py. TRL's GRPO workflow
section now points to both. Atropos skill's related_skills updated.
3. guidance → optional-skills/mlops/
Dropped from built-in. Outlines (still built-in) covers the same
structured-generation ground with wider adoption. Listed in the
optional catalog for users who specifically want Guidance.
Net: 3 fewer built-in skill lines in every system prompt, zero content
loss. Contributor authorship preserved via git rename detection.
Previous pass assumed both skills would always be loaded together, so
each description pointed at the other ('use concept-diagrams instead').
That breaks when only one skill is active — the agent reads 'use the
other skill' and there is no other skill.
Now each skill's description and scope section is fully self-contained:
- States what it's best suited for
- Lists subjects where a more specialized skill (if available) would be
a better fit, naming them only as 'consider X if available'
- Explicitly offers itself as a general SVG diagram fallback when no
more specialized skill exists
An agent loading either skill alone gets unambiguous guidance; an
agent with both loaded still gets useful routing via the 'consider X
if available' hints and the related_skills metadata.
Both skills generate SVG system diagrams, but for very different subjects
and aesthetics. The old descriptions didn't make the split clear, so an
agent loading either one couldn't confidently pick.
Changes:
- Rewrote both frontmatter descriptions to state the scope up front plus
an explicit 'for X, use the other skill instead' pointer.
- Added a symmetric 'When to use this skill vs <other>' decision table
to the top of each SKILL.md body, so the guidance is visible whether
the agent is reading frontmatter or full content.
- Added architecture-diagram <-> concept-diagrams to each other's
related_skills metadata.
Rule of thumb baked into both skills:
software/cloud infra -> architecture-diagram
physical / scientific / educational -> concept-diagrams
llm-wiki was the only shipped skill using metadata.hermes.config, which
caused 'hermes update' and 'hermes config migrate' to prompt for a wiki
directory on every run — even for users who have never touched the skill
— because 'enabled' is opt-out (all shipped skills count as enabled unless
explicitly disabled). Declining the prompt didn't persist anything, so
the nag fired again on every update.
Switch llm-wiki to the env var + runtime default pattern that obsidian and
google-workspace already use: WIKI_PATH env var, default $HOME/wiki. No
prompting infrastructure, no config.yaml touch, no nag loop.
Changes:
- skills/research/llm-wiki/SKILL.md: remove metadata.hermes.config,
document WIKI_PATH env var in the Wiki Location section, update the
orientation snippet and initialization guidance.
- Docs: replace llm-wiki's wiki.path examples with a generic 'myplugin.path'
placeholder across configuration.md, features/skills.md, and
creating-skills.md so users don't try to set skills.config.wiki.path
expecting llm-wiki to use it.
- skills-catalog.md: mention WIKI_PATH instead of skills.config.wiki.path.
E2E verified: discover_all_skill_config_vars() and get_missing_skill_config_vars()
both return 0 entries after this change, so the prompt branch in migrate_config()
no longer fires.
The metadata.hermes.config feature stays in place for third-party skills
that genuinely need structured config, but built-ins now prefer env vars.
* fix: show correct env var name in provider API key error (#9506)
The error message for missing provider API keys dynamically built
the env var name as PROVIDER_API_KEY (e.g. ALIBABA_API_KEY), but
some providers use different names (alibaba uses DASHSCOPE_API_KEY).
Users following the error message set the wrong variable.
Fix: look up the actual env var from PROVIDER_REGISTRY before
building the error. Falls back to the dynamic name if the registry
lookup fails.
Closes#9506
* fix: five HERMES_HOME profile-isolation leaks (#5947)
Bug A: Thread session_title from session_db to memory provider init kwargs
so honcho can derive chat-scoped session keys instead of falling back to
cwd-based naming that merges all gateway users into one session.
Bug B: Replace 14 hardcoded ~/.hermes/skills/ paths across 10 skill files
with HERMES_HOME-aware alternatives (${HERMES_HOME:-$HOME/.hermes} in
shell, os.environ.get('HERMES_HOME', ...) in Python).
Bug C: install.sh now respects HERMES_HOME env var and adds --hermes-home
flag. Previously --dir only set INSTALL_DIR while HERMES_HOME was always
hardcoded to $HOME/.hermes.
Bug D: Remove hardcoded ~/.hermes/honcho.json fallback in resolve_config_path().
Non-default profiles no longer silently inherit the default profile's honcho
config. Falls through to ~/.honcho/config.json (global) instead.
Bug E: Guard _edit_skill, _patch_skill, _delete_skill, _write_file, and
_remove_file against writing to skills found in external_dirs. Skills
outside the local SKILLS_DIR are now read-only from the agent's perspective.
Closes#5947
Adds --from flag to gmail send and gmail reply commands, allowing agents
to customize the From header display name when sharing the same email
account. Usage: --from '"Agent Name" <user@example.com>'
Also syncs repo google_api.py with the deployed standalone implementation
(replaces outdated gws_bridge thin wrapper), adds dedicated docs page
under Features > Skills, and updates sidebar navigation.
Requested by community user @Maxime44.
Port of Cocoon AI's architecture-diagram-generator (MIT) as a Hermes skill.
Generates professional dark-themed system architecture diagrams as standalone
HTML/SVG files. Self-contained output, no dependencies.
- SKILL.md with design system specs, color palette, layout rules
- HTML template with all component types, arrow styles, legend examples
- Fits alongside excalidraw in creative/ category
Source: https://github.com/Cocoon-AI/architecture-diagram-generator
PR #4654 replaced ml-paper-writing with research-paper-writing, preserving
the writing philosophy and reference files but dropping the dedicated
'Sources Behind This Guidance' attribution table from the SKILL.md body.
Re-adds:
- The researcher attribution table (Nanda, Farquhar, Gopen & Swan, Lipton,
Steinhardt, Perez, Karpathy) with affiliations and links to SKILL.md
- Orchestra Research credit as original compiler of the writing philosophy
- 'Origin & Attribution' section in sources.md documenting the full chain:
Nanda blog → Orchestra skill → teknium integration → SHL0MS expansion
Generate project ideas through creative constraints. Constraint + direction
= creativity.
Core skill (SKILL.md, 147 lines):
- 15 curated constraints across 3 categories: developers, makers, anyone
- Developer-focused prompts: 'solve your own itch', 'the CLI tool that
should exist', 'automate the annoying thing', 'nothing new except glue'
- Matching table: maps user mood/intent to appropriate constraints
- Complete worked example with 3 concrete project ideas
- Output format for consistent, actionable idea presentation
Extended library (references/full-prompt-library.md, 110 lines):
- 30+ additional constraints: communication, screens, philosophy,
transformation, identity, scale, starting points
Constraint approach inspired by wttdotm.com/prompts.html. Adapted for
software development and general-purpose ideation.
Adds opt-in creative thinking frameworks to ascii-video, p5js, and
manim-video skills, based on Lluminate (joelsimon.net/lluminate).
Only engaged when the user explicitly asks for creative, experimental,
or unconventional output. Straightforward requests are unaffected.
Each skill gets 2-3 strategies matched to its domain:
- ascii-video: Forced Connections, Conceptual Blending, Oblique Strategies
- p5js: Conceptual Blending, SCAMPER, Distance Association
- manim-video: SCAMPER, Assumption Reversal
Strategies sourced from creativity research (Boden, Eno, de Bono,
Koestler, Fauconnier & Turner, Osborn), formalized for LLM prompting
by Lluminate.
Migrate the google-workspace skill from custom Python API wrappers
(google-api-python-client) to Google's official Rust CLI gws
(googleworkspace/cli). Add gws_bridge.py for headless-compatible
token refresh. Fix partial OAuth scope handling.
Co-authored-by: spideystreet <dhicham.pro@gmail.com>
Cherry-picked from PR #6713
/pr <anything> silently resolved to /prompt via the shortest-match
tiebreaker in prefix expansion, permanently overwriting the system
prompt and persisting to config. The command's functionality (setting
agent.system_prompt) is available via config.yaml and /personality
covers the common use case.
Removes: CommandDef, dispatch branch, _handle_prompt_command handler,
docs references, and updates subcommand extraction test.
Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture)
and manual analysis of the entire codebase.
Changes by category:
Unused imports removed (~95 across 55 files):
- Removed genuinely unused imports from all major subsystems
- agent/, hermes_cli/, tools/, gateway/, plugins/, cron/
- Includes imports in try/except blocks that were truly unused
(vs availability checks which were left alone)
Unused variables removed (~25):
- Removed dead variables: connected, inner, channels, last_exc,
source, new_server_names, verify, pconfig, default_terminal,
result, pending_handled, temperature, loop
- Dropped unused argparse subparser assignments in hermes_cli/main.py
(12 instances of add_parser() where result was never used)
Dead code removed:
- run_agent.py: Removed dead ternary (None if False else None) and
surrounding unreachable branch in identity fallback
- run_agent.py: Removed write-only attribute _last_reported_tool
- hermes_cli/providers.py: Removed dead @property decorator on
module-level function (decorator has no effect outside a class)
- gateway/run.py: Removed unused MCP config load before reconnect
- gateway/platforms/slack.py: Removed dead SessionSource construction
Undefined name bugs fixed (would cause NameError at runtime):
- batch_runner.py: Added missing logger = logging.getLogger(__name__)
- tools/environments/daytona.py: Added missing Dict and Path imports
Unnecessary global statements removed (14):
- tools/terminal_tool.py: 5 functions declared global for dicts
they only mutated via .pop()/[key]=value (no rebinding)
- tools/browser_tool.py: cleanup thread loop only reads flag
- tools/rl_training_tool.py: 4 functions only do dict mutations
- tools/mcp_oauth.py: only reads the global
- hermes_time.py: only reads cached values
Inefficient patterns fixed:
- startswith/endswith tuple form: 15 instances of
x.startswith('a') or x.startswith('b') consolidated to
x.startswith(('a', 'b'))
- len(x)==0 / len(x)>0: 13 instances replaced with pythonic
truthiness checks (not x / bool(x))
- in dict.keys(): 5 instances simplified to in dict
- Redefined unused name: removed duplicate _strip_mdv2 import in
send_message_tool.py
Other fixes:
- hermes_cli/doctor.py: Replaced undefined logger.debug() with pass
- hermes_cli/config.py: Consolidated chained .endswith() calls
Test results: 3934 passed, 17 failed (all pre-existing on main),
19 skipped. Zero regressions.
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
Add geometry mobjects, movement/creation animations, and LaTeX
environments to the skill's reference docs. All verified against
Manim CE v0.20.1.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace Hyaxia/blogwatcher with JulienTant/blogwatcher-cli fork which adds:
- Docker support with BLOGWATCHER_DB env var for persistent storage
- SQL injection prevention
- SSRF protection (blocks private IPs/metadata endpoints)
- HTML scraping fallback when RSS unavailable
- OPML import from Feedly/Inoreader/NewsBlur
- Category filtering for articles
- Direct binary downloads (no Go required)
- Migration guide from original blogwatcher
Binary name changed: blogwatcher -> blogwatcher-cli
Community contribution by Ao (JulienTant).
Closes discussion about Docker compatibility.
- avoid hard-coded ~/.hermes paths in the setup and API shorthands
- prefer HERMES_HOME with a sane default to /Users/peteradams/.hermes
- keep the examples aligned with profile-aware Hermes installs
- fall back to adding the repo root to sys.path when hermes_constants is not importable
- fixes direct execution of setup.py and google_api.py from the repo checkout
- keeps the upstream PR scoped to the google-workspace compatibility fix
Adds obsidian-headless (npm) setup guide to the Obsidian Integration
section — Node 22+, ob login, sync-create-remote, sync-setup, systemd
service for continuous background sync. Covers the full headless
workflow for agents running on servers syncing to Obsidian desktop on
other devices.
Skills can now declare config.yaml settings via metadata.hermes.config
in their SKILL.md frontmatter. Values are stored under skills.config.*
namespace, prompted during hermes config migrate, shown in hermes config
show, and injected into the skill context at load time.
Also adds the llm-wiki skill (Karpathy's LLM Wiki pattern) as the first
skill to use the new config interface, declaring wiki.path.
Skill config interface (new):
- agent/skill_utils.py: extract_skill_config_vars(), discover_all_skill_config_vars(),
resolve_skill_config_values(), SKILL_CONFIG_PREFIX
- agent/skill_commands.py: _inject_skill_config() injects resolved values
into skill messages as [Skill config: ...] block
- hermes_cli/config.py: get_missing_skill_config_vars(), skill config
prompting in migrate_config(), Skill Settings in show_config()
LLM Wiki skill (skills/research/llm-wiki/SKILL.md):
- Three-layer architecture (raw sources, wiki pages, schema)
- Three operations (ingest, query, lint)
- Session orientation, page thresholds, tag taxonomy, update policy,
scaling guidance, log rotation, archiving workflow
Docs: creating-skills.md, configuration.md, skills.md, skills-catalog.md
Closes#5100
Manim's Pango text renderer produces broken kerning with proportional
fonts (Helvetica, Inter, SF Pro, Arial) at all sizes and resolutions.
Characters overlap and spacing is inconsistent. This is a fundamental
Pango limitation.
Changes:
- Recommend Menlo (monospace) as the default font for ALL text
- Proportional fonts only acceptable for large titles (>=48, short strings)
- Set minimum font_size=18 for readability
- Update all code examples to use MONO='Menlo' pattern
- Remove Inter/Helvetica/SF Pro from recommendations
Production pipeline for creating 3Blue1Brown-style animated videos
using Manim Community Edition. The agent handles the full workflow:
creative planning, Python code generation, rendering, scene stitching,
audio muxing, and iterative refinement.
Modes: concept explainers, equation derivations, algorithm
visualizations, data stories, architecture diagrams, paper explainers,
3D visualizations.
9 reference files, setup verification script, README.
All API references verified against ManimCommunity/manim source.