For Hermes / future agents: This is the canonical planning/spec document for building a standalone GitHub-distributable Hermes memory plugin that provides human-like continuity without bloating the active prompt. Use the
subagent-driven-developmentskill to implement this spec task-by-task. Use strict TDD for every production behavior.
Working name: hermes-continuity-memory
Goal: Build a pluggable, local-first Hermes memory provider that stores rich long-term context externally, retrieves compact relevant summaries per turn, and lets Hermes reconstruct deeper details on demand without maintaining a fork of hermes-agent.
Primary distribution target: Standalone GitHub repository installable into $HERMES_HOME/plugins/continuity/ and enabled via memory.provider: continuity.
Architecture: Implement a MemoryProvider plugin named continuity. Use SQLite + FTS5 as the required baseline. Store structured ContextRecords with provenance/source references. Enforce strict retrieval budgets in prefetch(). Add optional vector/hybrid retrieval only after schema, evals, and FTS retrieval are stable.
Non-goal: Replace Hermes built-in MEMORY.md / USER.md, skills, or session_search. This plugin complements them.
Hermes has several useful memory/context systems today:
MEMORY.mdandUSER.md: tiny curated Tier 0 memory, always injected at session start.- Skills: explicit procedural memory, loaded on demand.
- Session search: on-demand archive search across previous conversations.
- Context compression: active-session summarization near context limits.
- External memory provider API: optional provider hooks for prefetch/sync/session-end.
What is missing is a local-first continuity layer that behaves more like human memory:
- It stores many experiences, summaries, lessons, decisions, and project facts.
- It does not keep all of them active in the prompt.
- It retrieves only a small relevant working set for the current turn.
- It can link compact summaries back to exact source details.
- It is inspectable, testable, and safe against stale or untrusted context.
The plugin may store thousands/millions of records, but prefetch() must return a small bounded context block.
Default target:
max_records: 5
max_chars: 1800Normal retrieval should return compact records:
Lesson: delegate_task can timeout with api_calls:0 after exactly 300s; repeated delegation may be wasteful.
Sources: session:abc turns 44-59
It should not inject raw transcript/log chunks unless explicitly requested by a tool or later expansion phase.
Every extracted memory should contain source refs where possible:
- session id
- approximate turn range
- file path
- git commit
- command/test artifact
- user-provided source label
Repeated procedures should become Hermes skills. The continuity plugin may store procedure_candidate records and suggest skill creation, but it should not replace skills.
MEMORY.md / USER.md remain the small, high-confidence, always-injected layer. The continuity plugin may mirror/index those writes, but built-in memory is still the source of always-active facts.
All automatic prefetch output must be framed as recalled informational context. It must not be treated as system/developer/user instruction. Hermes already wraps provider prefetch in <memory-context>; the plugin should not emit instruction-like text.
Before tuning retrieval, create fixture records and queries. Retrieval quality should be measured with deterministic tests.
The plugin must use the existing MemoryProvider API:
class MemoryProvider:
name: str
is_available() -> bool
initialize(session_id: str, **kwargs) -> None
system_prompt_block() -> str
prefetch(query: str, *, session_id: str = "") -> str
queue_prefetch(query: str, *, session_id: str = "") -> None
sync_turn(user_content: str, assistant_content: str, *, session_id: str = "") -> None
on_session_end(messages: list[dict]) -> None
on_pre_compress(messages: list[dict]) -> str
on_memory_write(action: str, target: str, content: str) -> None
on_delegation(task: str, result: str, *, child_session_id: str = "", **kwargs) -> None
get_tool_schemas() -> list[dict]
handle_tool_call(tool_name: str, args: dict, **kwargs) -> str
shutdown() -> NonePlugin install locations:
$HERMES_HOME/plugins/continuity/
or bundled upstream later:
plugins/memory/continuity/
Activation:
memory:
provider: continuityStandalone repository target:
hermes-continuity-memory/
├── README.md
├── LICENSE
├── pyproject.toml
├── plugin.yaml
├── continuity/
│ ├── __init__.py # MemoryProvider + register(ctx)
│ ├── models.py # dataclasses and validation
│ ├── store.py # SQLite schema, migrations, CRUD, FTS
│ ├── retrieval.py # FTS/scoring/hybrid retrieval
│ ├── extraction.py # session/turn consolidation
│ ├── project.py # project/repo identity detection
│ ├── tools.py # tool schemas and handlers
│ ├── security.py # prompt-injection/secret filtering
│ ├── config.py # config loading/defaults
│ ├── embeddings.py # optional phase 5 vector support
│ └── install.py # optional local install helper
├── tests/
│ ├── fixtures/
│ │ └── continuity_memory/
│ │ ├── records.jsonl
│ │ ├── queries.jsonl
│ │ └── sessions.jsonl
│ ├── test_provider_discovery.py
│ ├── test_store.py
│ ├── test_retrieval.py
│ ├── test_tools.py
│ ├── test_extraction.py
│ ├── test_project.py
│ ├── test_security.py
│ └── test_eval_fixtures.py
└── scripts/
├── install.sh
└── run-tests.sh
Installed Hermes plugin layout:
$HERMES_HOME/plugins/continuity/
├── __init__.py
├── plugin.yaml
├── models.py
├── store.py
├── retrieval.py
├── extraction.py
├── project.py
├── tools.py
├── security.py
├── config.py
└── embeddings.py
Required logical fields:
@dataclass
class ContextRecord:
id: str
kind: str
scope_user: str | None
scope_project: str | None
scope_repo: str | None
scope_profile: str | None
title: str
summary: str
details: str | None
tags: list[str]
entities: list[str]
sources: list[SourceRef]
confidence: float
importance: float
trust_level: str
created_at: str
updated_at: str
last_accessed_at: str | None
last_confirmed_at: str | None
expires_at: str | None
source_hash: str | None
embedding_text: str | None@dataclass
class SourceRef:
type: str # session, file, git, tool, user, memory, delegation
id: str | None
uri: str | None
range: str | None
label: str | None
created_at: str | NoneExamples:
{"type":"session","id":"2026-04-guard-proxy","range":"turns 44-59","label":"delegate timeout debugging"}{"type":"file","uri":"~/src/guard-proxy/src/proxy.rs","label":"timeout handling implementation"}MVP kinds:
user_preferenceenvironment_factproject_conventionlesson_learneddecisionsession_summaryartifact_summaryprocedure_candidateopen_questionactive_project_state
Initial trust levels:
explicit_userbuilt_in_memory_mirroragent_observedsession_extractedtool_observedremote_untrusted
Default for extracted records: session_extracted.
Records from built-in memory writes: built_in_memory_mirror.
User-created via tool: explicit_user.
Baseline must be SQLite + FTS5.
CREATE TABLE IF NOT EXISTS schema_migrations (
version INTEGER PRIMARY KEY,
applied_at TEXT NOT NULL
);
CREATE TABLE IF NOT EXISTS context_records (
id TEXT PRIMARY KEY,
kind TEXT NOT NULL,
scope_user TEXT,
scope_project TEXT,
scope_repo TEXT,
scope_profile TEXT,
title TEXT NOT NULL,
summary TEXT NOT NULL,
details TEXT,
tags_json TEXT NOT NULL DEFAULT '[]',
entities_json TEXT NOT NULL DEFAULT '[]',
sources_json TEXT NOT NULL DEFAULT '[]',
confidence REAL NOT NULL DEFAULT 0.5,
importance REAL NOT NULL DEFAULT 0.5,
trust_level TEXT NOT NULL DEFAULT 'session_extracted',
created_at TEXT NOT NULL,
updated_at TEXT NOT NULL,
last_accessed_at TEXT,
last_confirmed_at TEXT,
expires_at TEXT,
source_hash TEXT,
embedding_text TEXT
);
CREATE VIRTUAL TABLE IF NOT EXISTS context_records_fts USING fts5(
id UNINDEXED,
title,
summary,
details,
tags,
entities
);
CREATE TABLE IF NOT EXISTS session_observations (
id TEXT PRIMARY KEY,
session_id TEXT NOT NULL,
turn_index INTEGER,
user_content TEXT,
assistant_content TEXT,
created_at TEXT NOT NULL,
consolidated INTEGER NOT NULL DEFAULT 0
);
CREATE TABLE IF NOT EXISTS retrieval_events (
id TEXT PRIMARY KEY,
query TEXT NOT NULL,
selected_ids_json TEXT NOT NULL,
skipped_ids_json TEXT NOT NULL DEFAULT '[]',
scores_json TEXT NOT NULL DEFAULT '{}',
max_chars INTEGER,
created_at TEXT NOT NULL
);Default:
$HERMES_HOME/continuity/continuity.db
Config override:
memory:
continuity:
db_path: /custom/path/continuity.db- SQLite DB initializes idempotently.
- All migrations are tracked and re-runnable.
- CRUD round-trips all
ContextRecordfields. - FTS entries update on insert/update/delete.
- Tests use temporary
HERMES_HOMEor explicit temp DB path.
Default config namespace:
memory:
provider: continuity
continuity:
db_path: null
max_records: 5
max_chars: 1800
min_score: 0.1
include_sources: true
include_scores: false
save_turn_observations: true
consolidate_on_session_end: true
consolidate_min_turns: 3
session_strategy: per-repo # per-repo | per-directory | global | per-session
project_scope_boost: 1.0
stale_after_days: 180
vector:
enabled: false
backend: none # none | sqlite-vec | lancedb
embedding_model: nullDefinition of done:
- Config loads with sane defaults if absent.
- Env/config failures degrade safely.
- Bad config values are clamped or ignored with debug warnings, not fatal crashes.
MVP must not require embeddings.
Candidate generation:
- FTS query over title/summary/details/tags/entities.
- Tag/entity exact-match scan.
- Optional recent/high-importance fallback if no FTS hit and project scope matches.
Scoring formula v1:
score =
fts_score
+ exact_entity_bonus
+ tag_bonus
+ scope_bonus
+ importance_bonus
+ confidence_bonus
+ recency_bonus
- stale_penalty
- trust_penalty
Specific constants should be in retrieval.py and covered by tests.
prefetch(query) input is the current user message. The plugin should enrich internally with:
- current project id
- git repo remote if detectable
- active profile if passed to
initialize - session id
- platform/user id if passed
Automatic prefetch output should be compact and deterministic:
Relevant continuity memory:
1. [lesson_learned | project=guard-proxy | confidence=0.92]
delegate_task can return status:timeout with api_calls:0 after exactly 300s; repeated delegation may keep failing, so work inline.
Sources: session:2026-04-... turns 44-59
No raw transcripts by default.
prefetch() must guarantee:
- at most
max_records - at most
max_chars - no partial malformed records
- gracefully return
""when no records qualify
- Retrieval eval fixtures pass.
- Exact-term queries retrieve expected exact records.
- Fuzzy-ish lexical queries retrieve expected tagged/entity records when possible.
- Project scope boosts relevant records above global distractors.
- Low confidence/stale records are demoted.
prefetch()never exceeds configuredmax_chars.
Returns provider status:
{
"success": true,
"initialized": true,
"db_path": "...",
"record_count": 42,
"session_id": "...",
"current_project": "...",
"max_records": 5,
"max_chars": 1800,
"vector_enabled": false
}Inputs:
{
"query": "delegate timeout",
"limit": 5,
"scope_project": "guard-proxy"
}Returns ranked records with scores and source refs.
Fetch one record by id, including full details and sources.
Add explicit record. Must validate kind, summary, trust level, and scan for unsafe content.
Delete one record by id.
Phase 2+. Given a record id, retrieve source details if available from local observations/session refs. In MVP this may return sources only with not_implemented for raw expansion.
- Every tool returns valid JSON string.
- Tool schemas validate under OpenAI function schema shape.
- Unknown tool names return JSON error.
- Tools are safe if provider is uninitialized.
- Tests cover success and error cases.
sync_turn(user_content, assistant_content) should optionally persist a lightweight observation to session_observations.
Definition of done:
- Turns are stored crash-safely when
save_turn_observationsis true. - Large messages are truncated or summarized according to config.
- Secrets are filtered before storage where feasible.
on_session_end(messages) should:
- Skip if disabled or too few turns.
- Build extraction input from session messages or stored observations.
- Ask an LLM or deterministic extractor to produce candidate records.
- Validate candidate JSON.
- Scan for secrets/injection.
- Dedupe/merge with existing records.
- Store records with source refs.
- Mark observations consolidated.
MVP may use a conservative heuristic extractor and explicit tools before introducing LLM extraction if standalone plugin access to an auxiliary LLM is inconvenient. If using LLM extraction, it must degrade gracefully when no provider is available.
If using an LLM, extraction prompt must instruct:
- Extract durable facts only.
- Prefer concise summaries.
- Do not save temporary TODOs or raw logs.
- Preserve source refs.
- Output strict JSON.
- Do not store secrets.
- Use allowed
kindvalues only.
- Meaningful session fixture produces expected records.
- Trivial session fixture produces no records.
- Duplicate lesson across sessions updates/merges instead of duplicating.
- Unsafe content is rejected.
- Extraction failure does not break Hermes shutdown/session end.
Detect current project from:
TERMINAL_CWDos.getcwd()git rev-parse --show-toplevelgit remote get-url origin- fallback path hash
Project identity fields:
project_id: str
repo_root: str | None
repo_remote: str | None
display_name: strA project capsule is a special ContextRecord kind:
kind = active_project_state
or a separate table if needed later.
Capsule sections:
- identity
- architecture summary
- verification commands
- known pitfalls
- user decisions
- recent work
- open questions
When current project matches, capsule records should receive strong scope bonus. The capsule should often be retrieved instead of many scattered project records.
- Same git repo from subdir maps to same project id.
- Non-git directories get stable ids.
- Project capsule updates merge instead of append endlessly.
- Capsule prefetch stays under budget.
Before storing or injecting any record, scan for:
- prompt injection phrases
- hidden unicode/invisible characters
- exfiltration patterns
- obvious secrets/tokens/API keys
- private key blocks
Adapt patterns from Hermes tools/memory_tool.py.
Automatic retrieval must never emit:
System instruction:
Developer instruction:
Ignore previous instructions
as active instruction. If such content appears in a source, sanitize or block.
Retrieved records should include trust labels internally. Low-trust or remote-derived content should require higher relevance to inject.
- Injection-pattern records are rejected or sanitized.
- Secret-like records are rejected.
- Prefetch output strips dangerous fence tags or nested
<memory-context>tags. - Security tests cover obvious attack patterns.
Do not hardcode one vector DB into core logic.
Interface:
class VectorBackend:
def is_available(self) -> bool: ...
def upsert(record_id: str, text: str) -> None: ...
def delete(record_id: str) -> None: ...
def search(query: str, limit: int) -> list[VectorHit]: ...Preferred optional backends:
sqlite-vecif packaging/install is stable.- LanceDB if easiest local embedded vector backend.
FTS-only mode must remain fully supported.
Embed compact embedding_text, not raw transcripts:
{kind} {scope_project} {title} {summary} tags:{tags} entities:{entities}
Vector results provide candidates and a score component, but exact/entity/project matching must still matter.
- Plugin works without vector deps.
- Vector tests mock backend scores for determinism.
- Fuzzy recall improves on eval fixtures.
- Exact recall does not regress.
tests/fixtures/continuity_memory/records.jsonl:
{"id":"r_delegate_timeout","kind":"lesson_learned","scope_project":"guard-proxy","title":"delegate_task timeout before API calls","summary":"delegate_task can return status:timeout with api_calls:0 after exactly 300s; repeated delegation is usually wasteful, so work inline.","tags":["delegate_task","timeout","hermes"],"entities":["api_calls:0","300s"],"confidence":0.95,"importance":0.9}tests/fixtures/continuity_memory/queries.jsonl:
{"query":"the subagent timed out without doing anything again","scope_project":"guard-proxy","must_include":["r_delegate_timeout"],"must_not_include":[],"top_k":3}MVP tests:
- must-include appears in top-K
- must-not-include absent from top-K
- injected output under char budget
- deterministic order for ties
Later metrics:
- MRR
- recall@K
- precision@K
- budget-normalized utility
- exact error strings
- fuzzy paraphrases
- project-scoped recall
- global user preference recall
- stale records
- low-confidence records
- distractor records
- unsafe records
- Evaluation test suite runs locally without Hermes credentials.
- FTS-only baseline has documented expected failures for truly semantic cases.
- Hybrid/vector phase improves those cases without breaking exact cases.
Goal: Make the project repeatable before coding behavior.
- Finalize this spec.
- Write
README.mdwith architecture and install target. - Create fixture records and queries.
- Create initial failing tests for provider discovery, store, retrieval, tools.
- Spec exists and future agents can follow it without chat context.
- Eval fixtures exist.
- Test skeleton exists and fails because implementation is absent.
- No production plugin behavior implemented before failing tests.
python -m pytest tests/ -qExpected initially: fail due missing implementation.
Goal: Hermes can discover and load a no-op continuity provider.
- Create
continuity/__init__.pywithContinuityProvider. - Add
register(ctx). - Implement
name,is_available,initialize,get_tool_schemas,handle_tool_callforcontinuity_statusonly. - Add
plugin.yaml. - Add install script to copy plugin into
$HERMES_HOME/plugins/continuity/.
load_memory_provider("continuity")returns provider.provider.is_available()returns true.initialize()creates data directory.continuity_statusreturns valid JSON.
- Unit tests for provider discovery with temp plugin path.
- Unit tests for
continuity_status. - Optional manual Hermes smoke test:
hermes config set memory.provider continuity
hermes chat -q "Use continuity_status and tell me if continuity memory is initialized" --toolsets memoryGoal: Store structured records and retrieve them deterministically using FTS + metadata.
- Implement
ContextRecordandSourceRefdataclasses. - Implement
ContinuityStoremigrations. - Implement CRUD.
- Implement FTS indexing.
- Implement retrieval scoring.
- Implement eval fixture runner.
- CRUD tests pass.
- FTS tests pass.
- Eval fixture exact/project tests pass.
- No vector dependencies required.
python -m pytest tests/test_store.py tests/test_retrieval.py tests/test_eval_fixtures.py -qGoal: prefetch(query) returns compact relevant memory under budget.
- Load config defaults.
- Implement project detection.
- Implement
prefetch()candidate retrieval. - Implement output formatting.
- Enforce
max_recordsandmax_chars. - Log retrieval event to DB.
prefetch()returns empty for no match.prefetch()returns relevant compact context for fixture queries.- Output never exceeds budget.
- Output includes source refs if configured.
- Output contains no raw transcript by default.
python -m pytest tests/test_provider_prefetch.py tests/test_project.py tests/test_eval_fixtures.py -qManual smoke test with seeded DB:
hermes chat -q "the subagent timed out without doing anything again" --toolsets memoryExpected: Hermes receives/references relevant continuity memory.
Goal: User/agent can inspect, search, add, delete, and debug records.
- Implement
continuity_search. - Implement
continuity_get. - Implement
continuity_add. - Implement
continuity_delete. - Expand
continuity_status. - Add debug info for last retrieval event.
- All tools return JSON.
- Error cases are safe and clear.
- Added records are searchable.
- Deleted records no longer appear.
- Status shows DB path, counts, budget, vector state.
python -m pytest tests/test_tools.py -qGoal: Plugin learns from sessions without stuffing everything into prompt.
- Implement
sync_turn()observation storage. - Implement safe truncation/redaction for large turn content.
- Implement conservative session-end extraction.
- Implement candidate validation.
- Implement dedupe/merge.
- Store records with source refs.
- Meaningful session fixture extracts expected records.
- Trivial session fixture extracts no records.
- Duplicate records merge.
- Unsafe extracted content is rejected.
- Failed extraction does not crash session shutdown.
python -m pytest tests/test_extraction.py tests/test_security.py -qGoal: Make repo continuation smart without injecting lots of old session records.
- Implement git/path project identity.
- Add capsule creation/update logic.
- Merge project facts/decisions/pitfalls into capsule.
- Prefer capsule retrieval when current project matches.
- Add capsule status/search support.
- Same repo from nested dirs maps to same project.
- Capsule summarizes project facts without endless append growth.
- Capsule is retrieved for project-scoped queries.
- Capsule output remains bounded.
python -m pytest tests/test_project.py tests/test_project_capsule.py -qGoal: Improve fuzzy recall without making vector dependencies mandatory.
- Define vector backend abstraction.
- Add optional backend implementation.
- Generate embedding text for records.
- Add vector candidate generation.
- Combine with FTS/entity/project scoring.
- Add vector eval cases.
- FTS-only mode still works without vector deps.
- Vector mode improves fuzzy eval cases.
- Exact eval cases do not regress.
- Tests use mocked vector backend for deterministic CI.
python -m pytest tests/test_vectors.py tests/test_hybrid_retrieval.py tests/test_eval_fixtures.py -qGoal: Make high-signal Hermes built-in memory writes searchable in continuity without changing user behavior.
Hermes built-in MEMORY.md / USER.md should remain tiny Tier 0 always-injected memory. This phase mirrors built-in memory writes into structured continuity records so the same curated facts can also be scoped, searched, source-backed, indexed, and included in project capsules.
Implement ContinuityProvider.on_memory_write(action, target, content).
Supported actions:
add— create or upsert a deterministic mirror record forcontent.remove— delete the deterministic mirror record forcontentwhen possible.replace— if only the new content is available, upsert the new mirror record and do not guess the old record id.
Supported targets:
user— mirror asuser_preferenceunless a conservative classifier can choose a better allowed kind.memory— mirror asenvironment_factunless a conservative classifier can chooselesson_learned,decision, orproject_convention.
Record requirements:
- deterministic id from
target+ normalized sanitized content hash, e.g.mirror_user_<hash>ormirror_memory_<hash>. trust_level = built_in_memory_mirror.- source ref:
SourceRef(type="built_in_memory", id=target, label=f"Hermes memory {action}"). scope_projectshould be current project display name when project context is available and content appears project-specific; global/user preferences may remain unscoped.- content must pass the same secret redaction / prompt-injection rejection path used for observations.
- excessive content must be truncated before storage.
- vector indexing must happen through the provider's normal record upsert path when vector mode is enabled.
Do not use an LLM for this phase. Use deterministic rules only.
Suggested classifier:
- target
user+ preference-like wording (prefers,likes,wants,call them,timezone) →user_preference. - target
memory+lesson,gotcha,pitfall,beware,workaround,quirk→lesson_learned. - target
memory+decided,declined,chose,decision→decision. - target
memory+uses,project,repo,convention,test,verify,build→project_conventionwhen project-scoped, otherwiseenvironment_fact. - fallback: target
user→user_preference; targetmemory→environment_fact.
- Add deterministic mirror id helper and content normalization helper.
- Add conservative memory-write classifier.
- Implement
on_memory_write()add/remove/replace behavior. - Ensure mirrored records use source refs and
built_in_memory_mirrortrust. - Route mirrored upserts through vector-aware provider helper.
- Update project capsule generation to naturally include mirrored project records.
- Add tests for add, remove, replace, classification, redaction, scope, retrieval, and vector indexing.
- Add
docs/PHASE7_5.mddocumenting behavior and non-goals.
- Built-in memory writes create searchable continuity records.
- Removed built-in memory content deletes the matching mirrored record when deterministic id can be computed.
- Mirrored records are source-backed as
built_in_memory. - Mirroring does not duplicate skill contents or transcript blobs.
- Secrets are redacted and obvious prompt-injection-like content is rejected.
- FTS-only and vector-enabled modes both pass.
- Existing retrieval and project capsule tests do not regress.
uv run ruff check .
uv run pytest tests/test_memory_mirror_phase7_5.py tests/test_provider_phase7.py tests/test_project_capsule_phase6.py -q
uv run pytest tests/ -qGoal: Connect semantic continuity memories to procedural Hermes skills without copying skill contents into the memory store.
A continuity memory captures what is true or what happened. A Hermes skill captures how to act. This phase lets a memory say "this skill was relevant to this memory" so retrieval can suggest procedural context when a remembered situation recurs.
- Do not store skill bodies in SQLite.
- Do not automatically load skills from the memory provider.
- Do not build a general memory graph or broad memory-to-memory spreading activation system in this phase.
- Do not persist weak embedding-similarity links between every related record.
Add a sparse skill-link table:
CREATE TABLE IF NOT EXISTS context_record_skill_links (
record_id TEXT NOT NULL,
skill_name TEXT NOT NULL,
reason TEXT,
confidence REAL NOT NULL DEFAULT 0.5,
source TEXT NOT NULL DEFAULT 'agent_observed',
created_at TEXT NOT NULL,
last_used_at TEXT,
PRIMARY KEY(record_id, skill_name),
FOREIGN KEY(record_id) REFERENCES context_records(id) ON DELETE CASCADE
);skill_name should use canonical Hermes skill names such as:
software-development/systematic-debugging
devops/openbsd-rust-dev-loop
github/github-pr-workflow
continuity_addmay accept optionalskill_links.continuity_getreturns skill links for the record.continuity_searchreturns skill links for each result.prefetch()may include at most 1–2 compact skill hints per selected memory, subject to the same prompt budget.- Skill hints should be informational, not instructions, e.g.
Related skills: devops/openbsd-rust-dev-loop. - Skill links should be explicit or high-confidence only.
Suggested SkillLink fields:
@dataclass(frozen=True)
class SkillLink:
record_id: str
skill_name: str
reason: str | None = None
confidence: float = 0.5
source: str = "agent_observed"
created_at: str = field(default_factory=utc_now)
last_used_at: str | None = NoneAllowed source values should be constrained at the model layer, for example:
explicit_user
agent_observed
session_extracted
built_in_memory_mirror
- Add
SkillLinkmodel and validation. - Add schema migration for
context_record_skill_links. - Add store methods:
upsert_skill_link(),delete_skill_link(),list_skill_links(record_id), and optionalreplace_skill_links(record_id, links). - Extend
continuity_addto accept optionalskill_linksand persist them. - Extend
continuity_getandcontinuity_searchresponses withskill_links. - Extend prefetch formatting with compact, budgeted
Related skills:lines. - Add tests for add/get/search/prefetch/delete cascade behavior.
- Add
docs/PHASE7_6.mddocumenting skill-link semantics and non-goals.
- Records can have zero or more explicit skill links.
- Skill links are returned by inspection/search tools.
- Prefetch can show compact related-skill hints without blowing budget.
- Deleting a record removes its skill links.
- No skill content is duplicated in continuity memory.
- Existing memory retrieval behavior does not regress.
Memory-to-memory links are deferred. If added later, prefer a separate sparse edge table with explicit relation types such as supports, supersedes, contradicts, duplicates, derived_from, follow_up, and resolved_by.
Do not persist weak similarity links by default. Similarity should usually remain a query-time retrieval/scoring concern.
If a future phase adds relationship strength, use strength or confidence with strict caps such as:
max_link_hops = 1
min_link_strength = 0.7
max_related_records = 2
uv run ruff check .
uv run pytest tests/test_skill_links_phase7_6.py tests/test_tools_phase4.py tests/test_prefetch_phase3.py -q
uv run pytest tests/ -qGoal: Make this usable as a GitHub plugin, not a local hack.
- Write installation docs.
- Write configuration docs.
- Write troubleshooting docs.
- Add example eval dataset.
- Add
scripts/install.sh. - Add CI workflow.
- Tag initial release.
- Fresh Hermes user can install plugin from README.
- Plugin can be enabled with
memory.provider: continuity. - Tests pass in CI.
- README clearly states limitations and safety model.
Manual clean install test in temp Hermes home:
export HERMES_HOME=$(mktemp -d)
./scripts/install.sh
hermes config set memory.provider continuity
hermes memory statusFuture agents implementing this spec must follow:
- Use TDD for each behavior.
- Keep tasks small.
- Run targeted tests after each task.
- Run broader suite before finishing phase.
- Do not add vector search before FTS/store/evals are stable.
- Do not alter Hermes upstream unless the plugin API truly blocks the work.
- Prefer plugin-only solutions.
- If upstream change is needed, document exact reason and propose minimal patch.
Workaround: plugin is selected provider. Do not require Honcho/Mem0 concurrently.
Future upstream idea: provider composition or multiple external providers.
Workaround: use side effects only; extract/store before compression.
Future upstream idea: wire returned text into compression prompt.
Workaround: plugin detects cwd/git/project itself.
Future upstream idea: pass runtime context object to provider hooks.
Workaround: expose provider tools and optional plugin CLI.
Future upstream idea: generic context debug UI.
- Plugin loads.
- SQLite store works.
- Manual add/search/status tools work.
- FTS retrieval and prefetch under budget work.
- Basic docs exist.
- Session consolidation works.
- Project capsules work.
- Eval fixtures pass.
- Security scanner blocks obvious unsafe records.
- CI passes.
- Robust install docs.
- Crash-safe turn observations.
- Good retrieval debug story.
- Optional vector backend if stable, otherwise explicitly deferred.
- Proven in real Hermes sessions without prompt bloat.
- Should extraction use Hermes' auxiliary LLM client, or remain heuristic/plugin-local for portability?
- Should raw session observations be encrypted or expire by default?
- What is the default retention policy?
- Should project capsules be user-editable markdown as well as DB records?
- Should
continuity_expandintegrate with Hermes session DB directly or only plugin-stored observations? - Which vector backend is most compatible with Hermes' packaging constraints?
- How should the plugin avoid duplicating facts already in
MEMORY.md/USER.mdwhile still indexing them?
For the next implementation session, start with Phase 0:
- Create standalone repo or plugin directory.
- Add fixture records/queries.
- Add failing provider discovery and store tests.
- Implement only enough skeleton to make discovery/status tests pass.
Do not implement retrieval before tests and fixtures exist.