fix(embeddings): silent NULL embeddings after marketplace upgrades#168
fix(embeddings): silent NULL embeddings after marketplace upgrades#168efenocchi wants to merge 11 commits into
Conversation
The embed daemon was failing silently after every marketplace plugin
upgrade because Node's bundle-relative module resolution could not find
`@huggingface/transformers`. The package is installed once at
`~/.hivemind/embed-deps/` by `hivemind embeddings install`, but new
versioned plugin cache dirs land without a `node_modules` symlink, so
the daemon's bare `import("@huggingface/transformers")` walks up to a
location that does not have the package. Each embed request then
returned `Cannot find package`, the client coerced it to a null
embedding, and `sessions.message_embedding` columns were written as
NULL with no surface error.
Rework `NomicEmbedder.load()` to resolve the package explicitly via
`createRequire(pathToFileURL("~/.hivemind/embed-deps/")).resolve(...)`
followed by `import(pathToFileURL(absMain))`. This bypasses Node's
upward walk entirely, so the daemon resolves transformers correctly
regardless of which bundle path it was spawned from. The bare-specifier
import remains as a fallback for dev-tree usage where the package is
colocated. If both fail, the thrown error message mentions
`hivemind embeddings install` so the failure is actionable in logs.
Tests use DI to inject the importer so they run identically on
machines that already have `~/.hivemind/embed-deps/` populated (which
would otherwise shadow `vi.mock("@huggingface/transformers")`).
Replace the implicit "embeddings on by default" + `HIVEMIND_EMBEDDINGS`
env-var override with an explicit, persistent opt-in stored on disk.
The new contract:
- `~/.deeplake/config.json` → `embeddings.enabled: boolean` is the
sole source of truth, shared across all agents (claude-code, codex,
cursor, hermes, pi) because they all read the same `~/.deeplake/`.
- Embeddings run only when `enabled === true`.
- The legacy `HIVEMIND_EMBEDDINGS` env var is read EXACTLY ONCE — on
the first run that has no `embeddings.enabled` key — to seed the
persistent value. Migration rule: env=`false` or unset writes
`enabled: false`; any truthy value writes `enabled: true`. After
the seed is written, the env var is never consulted again.
New module `src/user-config.ts` provides `readUserConfig`,
`writeUserConfig` (atomic write + deep merge), `getEmbeddingsEnabled`
(with one-shot migration), and `setEmbeddingsEnabled`. Path is
overridable via `HIVEMIND_CONFIG_PATH` for tests.
`src/embeddings/disable.ts` no longer reads the env var directly.
`EmbeddingsStatus` renames the env-disabled variant to `user-disabled`,
which now reflects both legacy env-disabled and the new config-disabled
cases (both fold into the same user-driven opt-out). The transformers
probe is reordered to match the daemon's import resolution order
(canonical shared-deps first, bundle walk fallback), eliminating the
prior probe/use asymmetry where the probe could succeed and the daemon
still throw MODULE_NOT_FOUND.
A vitest setupFile (`tests/test-setup.ts`) pins `HIVEMIND_CONFIG_PATH`
to a per-process tmp dir so tests never mutate the developer's real
`~/.deeplake/config.json`, and defaults the test environment to
`HIVEMIND_EMBEDDINGS=true` so suites that don't explicitly exercise
the disabled path keep running with embeddings on.
Tests that previously set `HIVEMIND_EMBEDDINGS=false` to exercise the
disabled path now write a throwaway config file with
`embeddings.enabled: false` and point `HIVEMIND_CONFIG_PATH` at it.
Previously `hivemind embeddings install` ≡ `enable` (both did the heavy
deps + symlink work) and `disable` ≡ `uninstall` (both removed symlinks).
With the new persistent-config contract that mapping is wrong:
opting in / out of embeddings should be a lightweight config flip, while
managing the on-disk install is a separate, heavier operation.
New surface (all reflected in `--help` and every agent's SessionStart
injection):
hivemind embeddings install (heavy) npm-installs @huggingface/
transformers into ~/.hivemind/embed-deps,
symlinks every detected agent plugin to
it, and sets embeddings.enabled=true.
hivemind embeddings enable (light) sets embeddings.enabled=true.
Warns if shared deps missing.
hivemind embeddings disable (light) sets embeddings.enabled=false
and SIGTERMs the running daemon +
clears its sock/pid files so the
change takes effect immediately,
instead of waiting 10 min for idle-out.
Shared deps stay on disk.
hivemind embeddings uninstall (heavy) removes every agent's symlink
[--prune] into the shared deps, optionally
prunes the shared dir itself, sets
enabled=false, and SIGTERMs the daemon.
hivemind embeddings status Extended to show the config flag
state alongside the deps + per-agent
state, and a one-line action hint
when the two disagree.
The CLI dispatcher matches these subcommands exactly (no more
install/enable aliasing), and `--with-embeddings` runs the heavy install
path. Per-agent SessionStart blocks for claude-code, codex, cursor, and
hermes now advertise all five subcommands so the model can suggest the
right one without guessing (per agents-deployment-session-start-injection
skill).
A best-effort `killEmbedDaemon()` helper reads the standard
`/tmp/hivemind-embed-${uid}.pid`, SIGTERMs the process, and unlinks
both the pidfile and socket. Tolerant of every missing-file combination.
…gnal
The embed daemon socket is per-UID, not per-plugin-version. After a
marketplace plugin upgrade replaces the bundle, the older daemon
process keeps its socket alive (up to 10 minutes of idle-out), so every
new session on every newer plugin version connects to the same stuck
daemon. When that stuck daemon can't resolve transformers from its own
(now-orphaned) bundle path, it returns MODULE_NOT_FOUND on every embed
call, and the rest of the session writes NULL into the embedding column
with no surface error. We've now seen this in production: the local
log shows ~30 minutes of `embed err: Cannot find package` lines on a
freshly-upgraded plugin.
Three additions:
1. **Hello handshake** (`protocol.ts` + `daemon.ts` + `client.ts`).
First connect per `EmbedClient` instance sends `{ op: "hello" }`;
the daemon answers with its own `daemonPath` (= `process.argv[1]`)
and `pid`. If the running daemon's path doesn't match the client's
configured `daemonEntry`, the client SIGTERMs the daemon and clears
its socket + pidfile so the next call spawns a fresh daemon from
the current bundle. Verified at most once per EmbedClient.
2. **Stuck-daemon recycle on transformers error** (`client.ts`).
Embed responses matching `isTransformersMissingError` (the wrapper
we throw from `defaultImportTransformers`, plus Node's standard
MODULE_NOT_FOUND form) trigger the same recycle. Process-local
guard so only the first failing call kills + cleans up.
3. **Visible one-time notification** (`client.ts`).
On the same transformers-missing trigger, enqueue a `warn`-severity
notification (`id: "embed-deps-missing"`, dedupKey carries the
error detail) so the next SessionStart drain tells the user to run
`hivemind embeddings install`. Suppressed when
`embeddingsStatus() === "user-disabled"` — users who explicitly
opted out via config don't get nagged. Process-local dedup so a
single capture session enqueues at most one notification even if
the embed call fires many times.
Net effect: a poisoned daemon survives at most one failed embed; the
first session after a plugin upgrade recycles it; the user gets a
clear, actionable notification instead of silent NULLs in the
sessions table.
`hivemind embeddings install` symlinks `<pluginDir>/node_modules` to `~/.hivemind/embed-deps/node_modules` so Node's standard module resolution finds @huggingface/transformers from anywhere inside `<pluginDir>/bundle/…`. That works for the plugin version present at install time — but Claude Code's marketplace auto-upgrades drop new versioned cache dirs (`cache/hivemind/hivemind/0.7.27/`, `0.7.28/`, …) WITHOUT the symlink. The user would have to manually re-run `hivemind embeddings install` after every upgrade — and most won't, so embeddings silently degrade. New helper `src/embeddings/self-heal.ts` runs from each agent's capture hook on every invocation. The first capture under a new plugin version creates the symlink atomically (symlink to a `.tmp` suffix, then rename); subsequent calls are O(1) no-ops once `already-linked` is observed. Conservative behavior — the helper NEVER: - Clobbers an existing real `node_modules` directory. - Overrides a symlink that points elsewhere to a valid target (user installed their own dependency tree). - Acts when the shared-deps `node_modules` doesn't exist (returns `shared-deps-missing`; notification path covers user-facing surface). - Acts when `bundleDir` basename isn't `bundle` (guards against the source-tree path being passed during tests — without this gate, a test importing `src/hooks/capture.ts` would symlink `src/node_modules` to the user's real shared deps). What it DOES heal: - Missing link → create. - Dangling symlink (target deleted out from under it) → remove + next call re-creates. Wired into all four capture hooks (claude-code, codex, cursor, hermes) at top-level after the bundleDir is computed. Gated on `!embeddingsDisabled()` so user-disabled installs don't accumulate symlinks they don't want.
Per the project testing philosophy: source tests prove the helpers
are correct, bundle tests prove the build didn't drop them, re-inline
an old pattern, or otherwise regress on the shipped artifact. A
30-second reviewer guardrail.
For each of claude-code, codex, cursor, hermes:
- `bundle/embeddings/embed-daemon.js` contains the canonical
shared-deps path fragments (".hivemind" + "embed-deps"),
`createRequire` (proving the explicit-path resolver survived
bundling), and the actionable error string "hivemind embeddings
install" (proving the error message users will see in logs is in
the shipped artifact).
- `bundle/capture.js` invokes `ensurePluginNodeModulesLink` (the
self-heal helper), carries the `embed-deps-missing` notification
dedupKey, and still names the `user-disabled` status (proving the
opt-out guard survives bundling).
For the CLI:
- `bundle/cli.js` recognises all five embeddings subcommands
(install/enable/disable/uninstall/status) and references
`~/.deeplake/config.json` so the SessionStart injection text and
the dispatcher agree.
Final step before staging per bundle-rebuild-before-staging skill.
All five source commits in this branch (C1–C5) modify shared
`src/embeddings/` and per-agent capture hooks, so every agent's
`bundle/` artifacts need refreshing:
claude-code/bundle/{capture,embed-daemon,session-start,…}.js
codex/bundle/{capture,embed-daemon,stop,…}.js
cursor/bundle/{capture,embed-daemon,…}.js
hermes/bundle/{capture,embed-daemon,…}.js
pi/bundle/wiki-worker.js
bundle/cli.js
embeddings/embed-daemon.js (standalone daemon)
No source changes — `npm run build` output only. The companion
bundle-scan guards in tests/claude-code/embeddings-bundle-scan.test.ts
pass against these artifacts.
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds a hello handshake to embeddings protocol, client verification and recycle, user-config–backed enablement with env migration, transformers shared-deps resolution, CLI install/enable/disable/uninstall/status, plugin node_modules self-heal, and broad bundle/help/test updates. ChangesEmbeddings lifecycle and CLI split
Sequence Diagram(s)sequenceDiagram
participant CLI
participant UserConfig
participant Daemon
participant Client
CLI->>UserConfig: set embeddings.enabled = true/false
Client->>Daemon: hello {id}
Daemon-->>Client: {daemonPath, pid, protocolVersion}
Client->>Daemon: embed {text}
Daemon-->>Client: {embedding|error}
Client-->>UserConfig: enqueue notification (deps missing) [when enabled]
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Possibly related PRs
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
Coverage ReportScope: files changed in this PR. Enforced threshold: 90% per metric (per file via
File Coverage — 18 files changed
Generated for commit 93ec452. |
There was a problem hiding this comment.
Actionable comments posted: 17
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/cursor/cursor-capture-hook.test.ts (1)
272-289:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winMissing cleanup: temp directory leaks on every test run.
The test creates a temp directory and config file but never removes them. This will accumulate garbage in the system temp dir over repeated test runs.
🧹 Proposed fix to clean up temp resources
it("user-disabled embeddings short-circuit to NULL without invoking EmbedClient", async () => { stdinMock.mockResolvedValue({ conversation_id: "sid-emb-3", hook_event_name: "beforeSubmitPrompt", prompt: "disabled", }); // Point user-config at a throwaway path that says enabled:false. const { writeFileSync, mkdtempSync } = await import("node:fs"); + const { rmSync } = await import("node:fs"); const { tmpdir } = await import("node:os"); const { join } = await import("node:path"); const dir = mkdtempSync(join(tmpdir(), "cursor-cap-disabled-")); const cfgPath = join(dir, "config.json"); writeFileSync(cfgPath, JSON.stringify({ embeddings: { enabled: false } }), "utf-8"); - await runHook({ HIVEMIND_CONFIG_PATH: cfgPath }); + try { + await runHook({ HIVEMIND_CONFIG_PATH: cfgPath }); + } finally { + rmSync(dir, { recursive: true, force: true }); + } const sql = queryMock.mock.calls[0][0] as string; expect(sql).toContain("'::jsonb, NULL,"); expect(sql).toMatch(/, message_embedding,/); });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/cursor/cursor-capture-hook.test.ts` around lines 272 - 289, The test creates a temp directory (mkdtempSync -> dir) and config file (cfgPath) but never removes them; update the test around runHook({ HIVEMIND_CONFIG_PATH: cfgPath }) in cursor-capture-hook.test.ts to ensure the temp directory is cleaned (use fs.rmSync or fs.promises.rm with recursive/force) in a finally block or after/teardown so the temp dir and config file created by writeFileSync are removed regardless of test outcome; reference mkdtempSync, writeFileSync, cfgPath, dir and runHook when implementing the cleanup.
🧹 Nitpick comments (1)
hermes/bundle/shell/deeplake-shell.js (1)
67811-67823: 💤 Low value
deepMergeperforms only one level of nested merging, not recursive.The function merges only one level deep: nested objects are spread (
{...baseVal, ...patchVal}) rather than recursively merged. For the currentembeddings.enableduse case this is fine, but deeper structures like{a: {b: {c: 1}}}patched with{a: {b: {d: 2}}}would losec.Consider renaming to
shallowMergeNestedor adding a comment clarifying the single-level constraint if deeper configs are anticipated.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@hermes/bundle/shell/deeplake-shell.js` around lines 67811 - 67823, The function deepMerge only merges nested objects one level deep (it spreads baseVal and patchVal rather than recursively merging), so rename deepMerge to shallowMergeNested (or equivalent) and add a concise JSDoc/comment on the function explaining it intentionally performs only a single-level nested merge; update any callers referencing deepMerge to use the new name (search for deepMerge in the bundle) so callers and future maintainers understand the single-level constraint.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@bundle/cli.js`:
- Around line 3825-3837: The killEmbedDaemon function currently reads a PID from
pidPathFor and sends SIGTERM blindly; instead verify the PID refers to the embed
daemon before signaling: after reading pid (readFileSync8) and before calling
process.kill, check the live process identity (e.g., inspect /proc/<pid>/cmdline
or use a small handshake via socketPathFor to confirm it responds as the embed
daemon) and only signal when that validation succeeds; if validation fails treat
the pidfile/socket as stale and remove pidPath/socketPath rather than killing an
unrelated process.
- Around line 3774-3777: The linkAgent function currently always calls
symlinkForce(join11(install.pluginDir, "node_modules")) which unlinks any
preexisting path and causes EISDIR if an agent already has a real node_modules
directory; before calling symlinkForce, check the fs.lstat/lstatSync on the
target link (the join11(..., "node_modules") path): if it exists and is a real
directory (stats.isDirectory() && !stats.isSymbolicLink()) rename or move it
(e.g., append ".own" or similar) and log that we preserved the agent's own
node_modules, otherwise if it’s a symlink or file remove it and then call
symlinkForce(SHARED_NODE_MODULES, link); update linkAgent to perform this guard
using the same link variable so symlinkForce and SHARED_NODE_MODULES usage
remain unchanged.
In `@claude-code/bundle/capture.js`:
- Around line 1766-1785: The current branch that handles a dangling symlink
(when linkStat.isSymbolicLink() is true) removes the stale symlink with
rmSync(link) and returns without recreating it, leaving self-heal incomplete;
change the catch block inside the dangling-link path so that after successfully
rmSync(link) you immediately recreate the symlink pointing at target using
symlinkSync(target, link[, type]) (choose 'junction' for Windows if needed), and
then return a success kind (e.g., "recreated-link" with link and target) instead
of "stale-link-removed"; update any callers/consumers expecting
"stale-link-removed" to handle the new return kind if necessary.
In `@claude-code/bundle/pre-tool-use.js`:
- Around line 1316-1343: In verifyDaemonOnce, don't set this.helloVerified at
the top; instead only set it after confirming a compatible hello response — move
the assignment to after you've confirmed hello.daemonPath exists and equals
this.daemonEntry (i.e., after the equality check where currently you return on
match), so that failed probes, missing daemonPath, or path mismatches (and
subsequent recycleDaemon calls) do not permanently mark the client as verified;
keep existing behavior for early returns when !this.daemonEntry or on exceptions
by not setting helloVerified in those code paths.
In `@claude-code/bundle/shell/deeplake-shell.js`:
- Around line 68288-68292: resolveEmbedDaemonPath currently builds the daemon
path as join(dirname(fileURLToPath(import.meta.url)), "embeddings",
"embed-daemon.js") which points to bundle/shell/embeddings/embed-daemon.js but
the actual daemon lives in the sibling bundle/embeddings directory; update
resolveEmbedDaemonPath (same function called when spawning EmbedClient) to
include the parent-traversal segment (add ".." into the join parts, mirroring
the pattern used by resolveGrepEmbedDaemonPath) so the join becomes dirname ->
".." -> "embeddings" -> "embed-daemon.js".
In `@codex/bundle/shell/deeplake-shell.js`:
- Around line 68290-68292: resolveEmbedDaemonPath currently constructs the path
"embeddings/embed-daemon.js" relative to the shell bundle directory, which
resolves to codex/bundle/shell/embeddings/embed-daemon.js but the actual bundled
file is at codex/bundle/embeddings/embed-daemon.js; update the path construction
in resolveEmbedDaemonPath so it points to the parent
"embeddings/embed-daemon.js" (i.e., join the dirname of import.meta.url with
".." and "embeddings" then "embed-daemon.js") so daemonEntry references the
correct bundled file.
In `@codex/bundle/stop.js`:
- Around line 1184-1187: enqueueNotification currently always appends the
notification which allows duplicates across processes; change it to honor
n.dedupKey by reading the existing queue via readQueue(), checking if any queued
item has the same dedupKey (and skip push if found), and only push/writeQueue
when dedupKey is absent or unique; reference the enqueueNotification function
and use readQueue/writeQueue and the notification object's dedupKey field to
implement the guard so duplicate "embed-deps-missing" entries are not persisted.
In `@cursor/bundle/shell/deeplake-shell.js`:
- Around line 67910-67918: The embed() flow can send the request on a stale
socket because verifyDaemonOnce() may recycle the daemon; fix by making
verifyDaemonOnce() indicate when it recycled (e.g., return a boolean or the
fresh socket) and then, inside embed(), re-acquire the current socket before
calling sendAndWait() so the request is sent to the new daemon; update embed()
(and the similar block around the 67942-67968 range) to check the return from
verifyDaemonOnce(), refresh/get the socket (the same socket variable passed to
sendAndWait) when a recycle occurred, and only then construct/send the req via
this.sendAndWait().
- Around line 68182-68184: The current isTransformersMissingError function is
too broad because it matches any "MODULE_NOT_FOUND"; change it to only return
true when the missing-module error specifically references transformers-related
packages. Update isTransformersMissingError to check err.code ===
'MODULE_NOT_FOUND' (or the error string) AND test err.message/err for package
names like `@huggingface/transformers`, transformers, hivemind, or accelerate
(e.g. /(`@huggingface`\/transformers|transformers|hivemind|accelerate)/i) so
unrelated missing-module errors are not misclassified.
- Around line 68290-68292: The resolveEmbedDaemonPath function currently builds
a path to "embeddings/embed-daemon.js" relative to the shell bundle directory,
but the emitted daemon actually lives in the parent bundle's embeddings folder;
update resolveEmbedDaemonPath (and the join11/dirname5/fileURLToPath usage) to
ascend one directory before "embeddings" (e.g.
join11(dirname5(fileURLToPath(import.meta.url)), "..", "embeddings",
"embed-daemon.js")) so the resolved path points to
bundle/embeddings/embed-daemon.js instead of shell/embeddings/embed-daemon.js.
In `@hermes/bundle/capture.js`:
- Around line 614-618: enqueueNotification currently always appends
notifications to the persisted queue so cross-process runs re-enqueue “one-time”
warnings; change enqueueNotification to read the queued items (via readQueue),
check for an existing notification with the same unique id/type (e.g.
"embed-deps-missing") or a oneTime flag before pushing, and only push if not
already present, then writeQueue; use the same identifying key the runtime uses
for suppression (the embed-deps-missing id or a new oneTime property) so
_signalledMissingDeps in-memory suppression remains complementary to the
persisted dedupe.
- Around line 1099-1118: The stale-symlink branch currently removes the dangling
link and returns { kind: "stale-link-removed" }, which prevents the caller from
getting a usable node_modules link; instead, after successfully rmSync(link) in
the catch block you should not return immediately but fall through to the same
symlink-creation path used when the link does not exist. Concretely: in the
block under if (linkStat.isSymbolicLink()) where you catch a failed statSync,
remove the early return ({ kind: "stale-link-removed" }); after rmSync(link)
continue execution so the subsequent code that creates the symlink (the same
logic that runs when no link exists) runs and returns the successful "linked"
(or existing success) result; keep use of readlinkSync, statSync, rmSync and the
existing result kinds consistent.
In `@hermes/bundle/wiki-worker.js`:
- Around line 650-651: The current isTransformersMissingError detector is too
broad because it matches any "MODULE_NOT_FOUND"; update
isTransformersMissingError to only treat errors as transformer-missing when the
message explicitly references transformer-related identifiers (for example
'@huggingface/transformers', the word 'transformers' near the MODULE_NOT_FOUND
text, or the specific install hint 'hivemind embeddings install'), rather than
matching bare MODULE_NOT_FOUND; locate the isTransformersMissingError function
and tighten the regex/logic to require transformer-specific context (or prefer
checking for a transformer-specific sentinel error code returned by the daemon
if available) so unrelated packaging/runtime failures are not misclassified.
In `@src/embeddings/client.ts`:
- Around line 128-214: Add focused unit tests for EmbedClient to hit the
conditional branches in verifyDaemonOnce, handleTransformersMissing, and
recycleDaemon: mock sendAndWait to return a HelloResponse with no daemonPath to
assert no recycle; mock sendAndWait to return a mismatched daemonPath and assert
recycleDaemon called once and _recycledStuckDaemon becomes true (and calling
verifyDaemonOnce again does not re-trigger recycle); for
handleTransformersMissing stub embeddingsStatus to return "user-disabled" and
verify enqueueNotification is not called, and stub it to return enabled and
verify enqueueNotification is called once and _signalledMissingDeps is set; for
recycleDaemon mock readFileSync to supply a pid, spy process.kill and unlinkSync
to verify pid kill + socket/pid unlink path and also test the null reportedPid
branch where pidfile is missing. Reset module-level flags (_recycledStuckDaemon,
_signalledMissingDeps) between tests and use spies/mocks for sendAndWait,
recycleDaemon, enqueueNotification, embeddingsStatus, readFileSync, unlinkSync,
and process.kill to assert expected side effects.
In `@tests/claude-code/session-start-setup-hook.test.ts`:
- Around line 230-235: This test sets process.env.EMBEDDINGS_DISABLED_FOR_TEST
but never clears it, causing later tests to inherit the disabled state; after
calling runHook(...) in the "skips warmup..." test, restore or delete
process.env.EMBEDDINGS_DISABLED_FOR_TEST (e.g., delete
process.env.EMBEDDINGS_DISABLED_FOR_TEST or set it back to its original value)
so subsequent calls to runHook() behave deterministically; update the test
around runHook, embedWarmupMock, and debugLogMock assertions to ensure the env
key is cleaned up before the test ends.
In `@tests/hermes/hermes-capture-hook.test.ts`:
- Around line 275-281: Test mutates HIVEMIND_CONFIG_PATH and leaves a temp dir;
wrap the runHook call in a try/finally that saves the original
process.env.HIVEMIND_CONFIG_PATH, sets it to cfgPath, and in finally restores
the original env value (or deletes the key if it was undefined) and removes the
temp directory (use fs.rmSync(dir, { recursive: true, force: true }) or
fs.rmdirSync for compatibility). Locate the code around runHook, cfgPath,
mkdtempSync, writeFileSync and update the test to import/removal functions and
perform cleanup to avoid leaking state between tests.
---
Outside diff comments:
In `@tests/cursor/cursor-capture-hook.test.ts`:
- Around line 272-289: The test creates a temp directory (mkdtempSync -> dir)
and config file (cfgPath) but never removes them; update the test around
runHook({ HIVEMIND_CONFIG_PATH: cfgPath }) in cursor-capture-hook.test.ts to
ensure the temp directory is cleaned (use fs.rmSync or fs.promises.rm with
recursive/force) in a finally block or after/teardown so the temp dir and config
file created by writeFileSync are removed regardless of test outcome; reference
mkdtempSync, writeFileSync, cfgPath, dir and runHook when implementing the
cleanup.
---
Nitpick comments:
In `@hermes/bundle/shell/deeplake-shell.js`:
- Around line 67811-67823: The function deepMerge only merges nested objects one
level deep (it spreads baseVal and patchVal rather than recursively merging), so
rename deepMerge to shallowMergeNested (or equivalent) and add a concise
JSDoc/comment on the function explaining it intentionally performs only a
single-level nested merge; update any callers referencing deepMerge to use the
new name (search for deepMerge in the bundle) so callers and future maintainers
understand the single-level constraint.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: c577da79-afa6-4d82-9398-f8195e33ee25
📒 Files selected for processing (61)
bundle/cli.jsclaude-code/bundle/capture.jsclaude-code/bundle/embeddings/embed-daemon.jsclaude-code/bundle/pre-tool-use.jsclaude-code/bundle/session-start-setup.jsclaude-code/bundle/session-start.jsclaude-code/bundle/shell/deeplake-shell.jsclaude-code/bundle/wiki-worker.jscodex/bundle/capture.jscodex/bundle/embeddings/embed-daemon.jscodex/bundle/pre-tool-use.jscodex/bundle/session-start.jscodex/bundle/shell/deeplake-shell.jscodex/bundle/stop.jscodex/bundle/wiki-worker.jscursor/bundle/capture.jscursor/bundle/embeddings/embed-daemon.jscursor/bundle/pre-tool-use.jscursor/bundle/session-start.jscursor/bundle/shell/deeplake-shell.jscursor/bundle/wiki-worker.jsembeddings/embed-daemon.jshermes/bundle/capture.jshermes/bundle/embeddings/embed-daemon.jshermes/bundle/pre-tool-use.jshermes/bundle/session-start.jshermes/bundle/shell/deeplake-shell.jshermes/bundle/wiki-worker.jspi/bundle/wiki-worker.jssrc/cli/embeddings.tssrc/cli/index.tssrc/embeddings/client.tssrc/embeddings/daemon.tssrc/embeddings/disable.tssrc/embeddings/nomic.tssrc/embeddings/protocol.tssrc/embeddings/self-heal.tssrc/hooks/capture.tssrc/hooks/codex/capture.tssrc/hooks/codex/session-start.tssrc/hooks/cursor/capture.tssrc/hooks/cursor/session-start.tssrc/hooks/hermes/capture.tssrc/hooks/hermes/session-start.tssrc/hooks/session-start-setup.tssrc/hooks/session-start.tssrc/user-config.tstests/claude-code/embeddings-bundle-scan.test.tstests/claude-code/embeddings-client.test.tstests/claude-code/embeddings-disable.test.tstests/claude-code/embeddings-nomic.test.tstests/claude-code/embeddings-self-heal.test.tstests/claude-code/session-start-setup-hook.test.tstests/claude-code/user-config.test.tstests/claude-code/wiki-worker-plugin-version.test.tstests/cli/cli-embeddings.test.tstests/cli/cli-index.test.tstests/cursor/cursor-capture-hook.test.tstests/hermes/hermes-capture-hook.test.tstests/test-setup.tsvitest.config.ts
| function linkAgent(install) { | ||
| const link = join10(install.pluginDir, "node_modules"); | ||
| const link = join11(install.pluginDir, "node_modules"); | ||
| symlinkForce(SHARED_NODE_MODULES, link); | ||
| log(` Embeddings linked ${install.id.padEnd(20)} -> shared deps`); |
There was a problem hiding this comment.
Handle real node_modules/ directories before relinking.
Line 3776 blindly routes every install through symlinkForce(), which uses unlinkSync() on any preexisting path. If an agent already has a real node_modules directory—the state you already surface as owns-own-node-modules in status—hivemind embeddings install will throw EISDIR and stop partway through the install.
Suggested guard
function linkAgent(install) {
const link = join11(install.pluginDir, "node_modules");
+ try {
+ const st = lstatSync2(link);
+ if (st.isDirectory() && !st.isSymbolicLink()) {
+ warn(` Embeddings ${install.id.padEnd(20)} has its own node_modules; skipping relink`);
+ return;
+ }
+ } catch {
+ }
symlinkForce(SHARED_NODE_MODULES, link);
log(` Embeddings linked ${install.id.padEnd(20)} -> shared deps`);
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@bundle/cli.js` around lines 3774 - 3777, The linkAgent function currently
always calls symlinkForce(join11(install.pluginDir, "node_modules")) which
unlinks any preexisting path and causes EISDIR if an agent already has a real
node_modules directory; before calling symlinkForce, check the
fs.lstat/lstatSync on the target link (the join11(..., "node_modules") path): if
it exists and is a real directory (stats.isDirectory() &&
!stats.isSymbolicLink()) rename or move it (e.g., append ".own" or similar) and
log that we preserved the agent's own node_modules, otherwise if it’s a symlink
or file remove it and then call symlinkForce(SHARED_NODE_MODULES, link); update
linkAgent to perform this guard using the same link variable so symlinkForce and
SHARED_NODE_MODULES usage remain unchanged.
| function killEmbedDaemon() { | ||
| const uid = typeof process.getuid === "function" ? process.getuid() : userInfo().uid; | ||
| const pidPath = pidPathFor(String(uid)); | ||
| const sockPath = socketPathFor(String(uid)); | ||
| let pid = null; | ||
| try { | ||
| pid = Number.parseInt(readFileSync8(pidPath, "utf-8").trim(), 10); | ||
| if (Number.isFinite(pid)) { | ||
| try { | ||
| process.kill(pid, "SIGTERM"); | ||
| } catch { | ||
| } | ||
| } |
There was a problem hiding this comment.
Verify the pidfile target before sending SIGTERM.
Line 3834 kills whatever process currently owns that PID. If the embed daemon crashes and leaves a stale pidfile behind, later PID reuse can make hivemind embeddings disable or uninstall terminate an unrelated user process. Reuse the daemon hello/path verification here, or at minimum confirm the live process still matches the embed daemon before signaling; otherwise just clean up the stale socket/pid files.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@bundle/cli.js` around lines 3825 - 3837, The killEmbedDaemon function
currently reads a PID from pidPathFor and sends SIGTERM blindly; instead verify
the PID refers to the embed daemon before signaling: after reading pid
(readFileSync8) and before calling process.kill, check the live process identity
(e.g., inspect /proc/<pid>/cmdline or use a small handshake via socketPathFor to
confirm it responds as the embed daemon) and only signal when that validation
succeeds; if validation fails treat the pidfile/socket as stale and remove
pidPath/socketPath rather than killing an unrelated process.
| if (linkStat.isSymbolicLink()) { | ||
| let existingTarget; | ||
| try { | ||
| existingTarget = readlinkSync(link); | ||
| } catch (e) { | ||
| return { kind: "error", detail: `readlink failed: ${e instanceof Error ? e.message : String(e)}` }; | ||
| } | ||
| if (existingTarget === target) { | ||
| return { kind: "already-linked", target, link }; | ||
| } | ||
| try { | ||
| statSync(link); | ||
| return { kind: "linked-elsewhere", link, existingTarget }; | ||
| } catch { | ||
| try { | ||
| rmSync(link); | ||
| } catch { | ||
| } | ||
| return { kind: "stale-link-removed", link, danglingTarget: existingTarget }; | ||
| } |
There was a problem hiding this comment.
Recreate the symlink in the same dangling-link path.
When node_modules is a dangling symlink, this branch removes it and returns immediately. That leaves the current capture run without a repaired link, so self-heal only completes on a second hook execution.
💡 Proposed fix
} catch {
try {
rmSync(link);
} catch {
}
- return { kind: "stale-link-removed", link, danglingTarget: existingTarget };
+ const recreated = createSymlinkAtomic(target, link);
+ if (recreated.kind === "linked") {
+ return { kind: "stale-link-replaced", link, target, danglingTarget: existingTarget };
+ }
+ return recreated;
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@claude-code/bundle/capture.js` around lines 1766 - 1785, The current branch
that handles a dangling symlink (when linkStat.isSymbolicLink() is true) removes
the stale symlink with rmSync(link) and returns without recreating it, leaving
self-heal incomplete; change the catch block inside the dangling-link path so
that after successfully rmSync(link) you immediately recreate the symlink
pointing at target using symlinkSync(target, link[, type]) (choose 'junction'
for Windows if needed), and then return a success kind (e.g., "recreated-link"
with link and target) instead of "stale-link-removed"; update any
callers/consumers expecting "stale-link-removed" to handle the new return kind
if necessary.
| function writeQueue(q) { | ||
| const path = queuePath(); | ||
| const home = resolve2(homedir3()); | ||
| if (!resolve2(path).startsWith(home + "/") && resolve2(path) !== home) { | ||
| throw new Error(`notifications-queue write blocked: ${path} is outside ${home}`); | ||
| } | ||
| mkdirSync2(join4(home, ".deeplake"), { recursive: true, mode: 448 }); | ||
| const tmp = `${path}.${process.pid}.tmp`; | ||
| writeFileSync2(tmp, JSON.stringify(q, null, 2), { mode: 384 }); | ||
| renameSync(tmp, path); | ||
| } | ||
| function enqueueNotification(n) { | ||
| const q = readQueue(); | ||
| q.queue.push(n); | ||
| writeQueue(q); |
There was a problem hiding this comment.
Serialize notification-queue writes across processes.
enqueueNotification() does a read/modify/write on a shared JSON file with no lock, so two hook processes can race and the later rename will drop the earlier enqueue. That makes the new persistent queue lossy under concurrent hook/CLI activity.
| async verifyDaemonOnce(sock) { | ||
| if (this.helloVerified) | ||
| return; | ||
| this.helloVerified = true; | ||
| if (!this.daemonEntry) | ||
| return; | ||
| const id = String(++this.nextId); | ||
| const req = { op: "hello", id }; | ||
| let resp; | ||
| try { | ||
| resp = await this.sendAndWait(sock, req); | ||
| } catch (e) { | ||
| log4(`hello probe failed (treating as compatible): ${e instanceof Error ? e.message : String(e)}`); | ||
| return; | ||
| } | ||
| const hello = resp; | ||
| if (!hello.daemonPath) { | ||
| log4(`hello returned no daemonPath; skipping mismatch check`); | ||
| return; | ||
| } | ||
| if (hello.daemonPath === this.daemonEntry) | ||
| return; | ||
| if (_recycledStuckDaemon) | ||
| return; | ||
| _recycledStuckDaemon = true; | ||
| log4(`daemon path mismatch \u2014 running=${hello.daemonPath} expected=${this.daemonEntry}; recycling`); | ||
| this.recycleDaemon(hello.pid); | ||
| } |
There was a problem hiding this comment.
Only mark helloVerified after a compatible hello response.
Right now a failed probe or a path mismatch sets the flag permanently, so later connects on the same EmbedClient skip the mismatch check entirely.
Suggested fix
async verifyDaemonOnce(sock) {
if (this.helloVerified)
return;
- this.helloVerified = true;
if (!this.daemonEntry)
return;
const id = String(++this.nextId);
const req = { op: "hello", id };
let resp;
@@
const hello = resp;
if (!hello.daemonPath) {
log4(`hello returned no daemonPath; skipping mismatch check`);
return;
}
- if (hello.daemonPath === this.daemonEntry)
+ if (hello.daemonPath === this.daemonEntry) {
+ this.helloVerified = true;
return;
+ }
if (_recycledStuckDaemon)
return;
_recycledStuckDaemon = true;
log4(`daemon path mismatch — running=${hello.daemonPath} expected=${this.daemonEntry}; recycling`);
this.recycleDaemon(hello.pid);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async verifyDaemonOnce(sock) { | |
| if (this.helloVerified) | |
| return; | |
| this.helloVerified = true; | |
| if (!this.daemonEntry) | |
| return; | |
| const id = String(++this.nextId); | |
| const req = { op: "hello", id }; | |
| let resp; | |
| try { | |
| resp = await this.sendAndWait(sock, req); | |
| } catch (e) { | |
| log4(`hello probe failed (treating as compatible): ${e instanceof Error ? e.message : String(e)}`); | |
| return; | |
| } | |
| const hello = resp; | |
| if (!hello.daemonPath) { | |
| log4(`hello returned no daemonPath; skipping mismatch check`); | |
| return; | |
| } | |
| if (hello.daemonPath === this.daemonEntry) | |
| return; | |
| if (_recycledStuckDaemon) | |
| return; | |
| _recycledStuckDaemon = true; | |
| log4(`daemon path mismatch \u2014 running=${hello.daemonPath} expected=${this.daemonEntry}; recycling`); | |
| this.recycleDaemon(hello.pid); | |
| } | |
| async verifyDaemonOnce(sock) { | |
| if (this.helloVerified) | |
| return; | |
| if (!this.daemonEntry) | |
| return; | |
| const id = String(++this.nextId); | |
| const req = { op: "hello", id }; | |
| let resp; | |
| try { | |
| resp = await this.sendAndWait(sock, req); | |
| } catch (e) { | |
| log4(`hello probe failed (treating as compatible): ${e instanceof Error ? e.message : String(e)}`); | |
| return; | |
| } | |
| const hello = resp; | |
| if (!hello.daemonPath) { | |
| log4(`hello returned no daemonPath; skipping mismatch check`); | |
| return; | |
| } | |
| if (hello.daemonPath === this.daemonEntry) { | |
| this.helloVerified = true; | |
| return; | |
| } | |
| if (_recycledStuckDaemon) | |
| return; | |
| _recycledStuckDaemon = true; | |
| log4(`daemon path mismatch \u2014 running=${hello.daemonPath} expected=${this.daemonEntry}; recycling`); | |
| this.recycleDaemon(hello.pid); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@claude-code/bundle/pre-tool-use.js` around lines 1316 - 1343, In
verifyDaemonOnce, don't set this.helloVerified at the top; instead only set it
after confirming a compatible hello response — move the assignment to after
you've confirmed hello.daemonPath exists and equals this.daemonEntry (i.e.,
after the equality check where currently you return on match), so that failed
probes, missing daemonPath, or path mismatches (and subsequent recycleDaemon
calls) do not permanently mark the client as verified; keep existing behavior
for early returns when !this.daemonEntry or on exceptions by not setting
helloVerified in those code paths.
| if (linkStat.isSymbolicLink()) { | ||
| let existingTarget; | ||
| try { | ||
| existingTarget = readlinkSync(link); | ||
| } catch (e) { | ||
| return { kind: "error", detail: `readlink failed: ${e instanceof Error ? e.message : String(e)}` }; | ||
| } | ||
| if (existingTarget === target) { | ||
| return { kind: "already-linked", target, link }; | ||
| } | ||
| try { | ||
| statSync(link); | ||
| return { kind: "linked-elsewhere", link, existingTarget }; | ||
| } catch { | ||
| try { | ||
| rmSync(link); | ||
| } catch { | ||
| } | ||
| return { kind: "stale-link-removed", link, danglingTarget: existingTarget }; | ||
| } |
There was a problem hiding this comment.
Repair dangling node_modules symlinks in the same call.
When the existing link is stale, this branch deletes it and returns stale-link-removed, but the caller ignores that result. That means the first post-upgrade hook run still has no usable node_modules link and only a later invocation can actually heal the bundle.
Suggested fix
if (linkStat.isSymbolicLink()) {
let existingTarget;
try {
existingTarget = readlinkSync(link);
} catch (e) {
return { kind: "error", detail: `readlink failed: ${e instanceof Error ? e.message : String(e)}` };
}
if (existingTarget === target) {
return { kind: "already-linked", target, link };
}
try {
statSync(link);
return { kind: "linked-elsewhere", link, existingTarget };
} catch {
try {
rmSync(link);
} catch {
}
- return { kind: "stale-link-removed", link, danglingTarget: existingTarget };
+ return createSymlinkAtomic(target, link);
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@hermes/bundle/capture.js` around lines 1099 - 1118, The stale-symlink branch
currently removes the dangling link and returns { kind: "stale-link-removed" },
which prevents the caller from getting a usable node_modules link; instead,
after successfully rmSync(link) in the catch block you should not return
immediately but fall through to the same symlink-creation path used when the
link does not exist. Concretely: in the block under if
(linkStat.isSymbolicLink()) where you catch a failed statSync, remove the early
return ({ kind: "stale-link-removed" }); after rmSync(link) continue execution
so the subsequent code that creates the symlink (the same logic that runs when
no link exists) runs and returns the successful "linked" (or existing success)
result; keep use of readlinkSync, statSync, rmSync and the existing result kinds
consistent.
| function isTransformersMissingError(err) { | ||
| return /(@huggingface\/transformers|hivemind embeddings install|MODULE_NOT_FOUND)/i.test(err); |
There was a problem hiding this comment.
Narrow the transformers-missing detector.
Matching bare MODULE_NOT_FOUND will also classify unrelated daemon packaging/runtime failures as missing shared deps, which recycles the daemon and tells users to run hivemind embeddings install even when that cannot fix the problem. Restrict this check to transformers-specific messages, or better yet, have the daemon return a dedicated sentinel error code.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@hermes/bundle/wiki-worker.js` around lines 650 - 651, The current
isTransformersMissingError detector is too broad because it matches any
"MODULE_NOT_FOUND"; update isTransformersMissingError to only treat errors as
transformer-missing when the message explicitly references transformer-related
identifiers (for example '@huggingface/transformers', the word 'transformers'
near the MODULE_NOT_FOUND text, or the specific install hint 'hivemind
embeddings install'), rather than matching bare MODULE_NOT_FOUND; locate the
isTransformersMissingError function and tighten the regex/logic to require
transformer-specific context (or prefer checking for a transformer-specific
sentinel error code returned by the daemon if available) so unrelated
packaging/runtime failures are not misclassified.
| /** | ||
| * Send a `hello` on first successful connect per EmbedClient instance. | ||
| * If the daemon answers with a path that doesn't match our configured | ||
| * daemonEntry — typical after a marketplace upgrade replaced the bundle | ||
| * — SIGTERM the daemon + clear sock/pid so the next call spawns from the | ||
| * current bundle. We mark `helloVerified` even on mismatch so we don't | ||
| * re-issue the hello against the next, fresh connection. | ||
| */ | ||
| private async verifyDaemonOnce(sock: Socket): Promise<void> { | ||
| if (this.helloVerified) return; | ||
| this.helloVerified = true; | ||
| if (!this.daemonEntry) return; // no expectation to verify against | ||
| const id = String(++this.nextId); | ||
| const req: HelloRequest = { op: "hello", id }; | ||
| let resp: DaemonResponse; | ||
| try { | ||
| resp = await this.sendAndWait(sock, req); | ||
| } catch (e: unknown) { | ||
| // Daemon doesn't understand `hello` (older protocol) or connection | ||
| // hiccup. Don't kill on a transient — let embed proceed and surface | ||
| // any real problem there. | ||
| log(`hello probe failed (treating as compatible): ${e instanceof Error ? e.message : String(e)}`); | ||
| return; | ||
| } | ||
| const hello = resp as HelloResponse; | ||
| if (!hello.daemonPath) { | ||
| log(`hello returned no daemonPath; skipping mismatch check`); | ||
| return; | ||
| } | ||
| if (hello.daemonPath === this.daemonEntry) return; | ||
| if (_recycledStuckDaemon) return; // already recycled this process | ||
| _recycledStuckDaemon = true; | ||
| log(`daemon path mismatch — running=${hello.daemonPath} expected=${this.daemonEntry}; recycling`); | ||
| this.recycleDaemon(hello.pid); | ||
| } | ||
|
|
||
| /** | ||
| * On a transformers-missing error from the daemon, SIGTERM the stuck | ||
| * daemon (the bundle daemon that can't find its deps) and clear | ||
| * sock/pid so the next call spawns fresh. Also enqueue a one-time | ||
| * notification telling the user to run `hivemind embeddings install` | ||
| * — but only when the user has opted in. Suppressed when | ||
| * embeddingsStatus() === "user-disabled" so we don't nag users who | ||
| * explicitly chose to turn embeddings off. | ||
| */ | ||
| private handleTransformersMissing(detail: string): void { | ||
| if (!_recycledStuckDaemon) { | ||
| _recycledStuckDaemon = true; | ||
| this.recycleDaemon(null); | ||
| } | ||
| if (_signalledMissingDeps) return; | ||
| _signalledMissingDeps = true; | ||
| let status: string; | ||
| try { status = embeddingsStatus(); } catch { status = "enabled"; } | ||
| if (status === "user-disabled") return; // user said no, don't nag | ||
| try { | ||
| enqueueNotification({ | ||
| id: "embed-deps-missing", | ||
| severity: "warn", | ||
| title: "Hivemind embeddings disabled — deps missing", | ||
| body: `Semantic memory search is off because @huggingface/transformers is not installed where the daemon can find it. Run \`hivemind embeddings install\` to enable.`, | ||
| dedupKey: { reason: "transformers-missing", detail: detail.slice(0, 200) }, | ||
| }); | ||
| } catch (e: unknown) { | ||
| // Best-effort: never let a notification write failure escape into | ||
| // the capture hot path. | ||
| log(`enqueue embed-deps-missing failed: ${e instanceof Error ? e.message : String(e)}`); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Best-effort SIGTERM + sock/pid cleanup. Tolerant of every missing-file | ||
| * combination and dead-PID cases. | ||
| */ | ||
| private recycleDaemon(reportedPid: number | null): void { | ||
| let pid: number | null = reportedPid; | ||
| if (pid === null) { | ||
| try { | ||
| pid = Number.parseInt(readFileSync(this.pidPath, "utf-8").trim(), 10); | ||
| } catch { /* no pidfile */ } | ||
| } | ||
| if (Number.isFinite(pid) && pid !== null && pid > 0) { | ||
| try { process.kill(pid, "SIGTERM"); } catch { /* already dead */ } | ||
| } | ||
| try { unlinkSync(this.socketPath); } catch { /* not present */ } | ||
| try { unlinkSync(this.pidPath); } catch { /* not present */ } | ||
| } |
There was a problem hiding this comment.
Add targeted branch tests for new client recovery paths to unblock CI.
Branch coverage for src/embeddings/client.ts is below threshold (79.74% vs required 80%), and the new conditional paths here are likely the gap. Please add focused tests for branches like: hello response missing daemonPath, mismatch-triggered recycle (single-shot), user-disabled notification suppression, and pidfile fallback inside recycleDaemon.
Also applies to: 338-389
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/embeddings/client.ts` around lines 128 - 214, Add focused unit tests for
EmbedClient to hit the conditional branches in verifyDaemonOnce,
handleTransformersMissing, and recycleDaemon: mock sendAndWait to return a
HelloResponse with no daemonPath to assert no recycle; mock sendAndWait to
return a mismatched daemonPath and assert recycleDaemon called once and
_recycledStuckDaemon becomes true (and calling verifyDaemonOnce again does not
re-trigger recycle); for handleTransformersMissing stub embeddingsStatus to
return "user-disabled" and verify enqueueNotification is not called, and stub it
to return enabled and verify enqueueNotification is called once and
_signalledMissingDeps is set; for recycleDaemon mock readFileSync to supply a
pid, spy process.kill and unlinkSync to verify pid kill + socket/pid unlink path
and also test the null reportedPid branch where pidfile is missing. Reset
module-level flags (_recycledStuckDaemon, _signalledMissingDeps) between tests
and use spies/mocks for sendAndWait, recycleDaemon, enqueueNotification,
embeddingsStatus, readFileSync, unlinkSync, and process.kill to assert expected
side effects.
| it("skips warmup when the user has disabled embeddings in config", async () => { | ||
| await runHook({ EMBEDDINGS_DISABLED_FOR_TEST: "1" }); | ||
| expect(embedWarmupMock).not.toHaveBeenCalled(); | ||
| expect(debugLogMock).toHaveBeenCalledWith( | ||
| "embed daemon warmup skipped: HIVEMIND_EMBEDDINGS=false", | ||
| "embed daemon warmup skipped: embeddings disabled in ~/.deeplake/config.json (run `hivemind embeddings enable` to opt in)", | ||
| ); |
There was a problem hiding this comment.
Clear EMBEDDINGS_DISABLED_FOR_TEST after this test to avoid env leakage.
Line 231 sets a process-wide env var, but runHook() only updates keys provided per call. Later tests that call runHook() without this key can inherit the disabled state and become order-dependent.
💡 Proposed fix
it("skips warmup when the user has disabled embeddings in config", async () => {
- await runHook({ EMBEDDINGS_DISABLED_FOR_TEST: "1" });
- expect(embedWarmupMock).not.toHaveBeenCalled();
- expect(debugLogMock).toHaveBeenCalledWith(
- "embed daemon warmup skipped: embeddings disabled in ~/.deeplake/config.json (run `hivemind embeddings enable` to opt in)",
- );
+ try {
+ await runHook({ EMBEDDINGS_DISABLED_FOR_TEST: "1" });
+ expect(embedWarmupMock).not.toHaveBeenCalled();
+ expect(debugLogMock).toHaveBeenCalledWith(
+ "embed daemon warmup skipped: embeddings disabled in ~/.deeplake/config.json (run `hivemind embeddings enable` to opt in)",
+ );
+ } finally {
+ delete process.env.EMBEDDINGS_DISABLED_FOR_TEST;
+ }
});🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/claude-code/session-start-setup-hook.test.ts` around lines 230 - 235,
This test sets process.env.EMBEDDINGS_DISABLED_FOR_TEST but never clears it,
causing later tests to inherit the disabled state; after calling runHook(...) in
the "skips warmup..." test, restore or delete
process.env.EMBEDDINGS_DISABLED_FOR_TEST (e.g., delete
process.env.EMBEDDINGS_DISABLED_FOR_TEST or set it back to its original value)
so subsequent calls to runHook() behave deterministically; update the test
around runHook, embedWarmupMock, and debugLogMock assertions to ensure the env
key is cleaned up before the test ends.
| const { writeFileSync, mkdtempSync } = await import("node:fs"); | ||
| const { tmpdir } = await import("node:os"); | ||
| const { join } = await import("node:path"); | ||
| const dir = mkdtempSync(join(tmpdir(), "hermes-cap-disabled-")); | ||
| const cfgPath = join(dir, "config.json"); | ||
| writeFileSync(cfgPath, JSON.stringify({ embeddings: { enabled: false } }), "utf-8"); | ||
| await runHook({ HIVEMIND_CONFIG_PATH: cfgPath }); |
There was a problem hiding this comment.
Restore HIVEMIND_CONFIG_PATH and clean temp dir after the test.
This test mutates process env and creates a temp directory but does not restore/cleanup, which can leak state into later tests in the same worker.
Suggested fix
- const { writeFileSync, mkdtempSync } = await import("node:fs");
+ const { writeFileSync, mkdtempSync, rmSync } = await import("node:fs");
const { tmpdir } = await import("node:os");
const { join } = await import("node:path");
const dir = mkdtempSync(join(tmpdir(), "hermes-cap-disabled-"));
const cfgPath = join(dir, "config.json");
writeFileSync(cfgPath, JSON.stringify({ embeddings: { enabled: false } }), "utf-8");
- await runHook({ HIVEMIND_CONFIG_PATH: cfgPath });
+ const prevConfigPath = process.env.HIVEMIND_CONFIG_PATH;
+ try {
+ await runHook({ HIVEMIND_CONFIG_PATH: cfgPath });
+ } finally {
+ if (prevConfigPath === undefined) delete process.env.HIVEMIND_CONFIG_PATH;
+ else process.env.HIVEMIND_CONFIG_PATH = prevConfigPath;
+ rmSync(dir, { recursive: true, force: true });
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const { writeFileSync, mkdtempSync } = await import("node:fs"); | |
| const { tmpdir } = await import("node:os"); | |
| const { join } = await import("node:path"); | |
| const dir = mkdtempSync(join(tmpdir(), "hermes-cap-disabled-")); | |
| const cfgPath = join(dir, "config.json"); | |
| writeFileSync(cfgPath, JSON.stringify({ embeddings: { enabled: false } }), "utf-8"); | |
| await runHook({ HIVEMIND_CONFIG_PATH: cfgPath }); | |
| const { writeFileSync, mkdtempSync, rmSync } = await import("node:fs"); | |
| const { tmpdir } = await import("node:os"); | |
| const { join } = await import("node:path"); | |
| const dir = mkdtempSync(join(tmpdir(), "hermes-cap-disabled-")); | |
| const cfgPath = join(dir, "config.json"); | |
| writeFileSync(cfgPath, JSON.stringify({ embeddings: { enabled: false } }), "utf-8"); | |
| const prevConfigPath = process.env.HIVEMIND_CONFIG_PATH; | |
| try { | |
| await runHook({ HIVEMIND_CONFIG_PATH: cfgPath }); | |
| } finally { | |
| if (prevConfigPath === undefined) delete process.env.HIVEMIND_CONFIG_PATH; | |
| else process.env.HIVEMIND_CONFIG_PATH = prevConfigPath; | |
| rmSync(dir, { recursive: true, force: true }); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/hermes/hermes-capture-hook.test.ts` around lines 275 - 281, Test
mutates HIVEMIND_CONFIG_PATH and leaves a temp dir; wrap the runHook call in a
try/finally that saves the original process.env.HIVEMIND_CONFIG_PATH, sets it to
cfgPath, and in finally restores the original env value (or deletes the key if
it was undefined) and removes the temp directory (use fs.rmSync(dir, {
recursive: true, force: true }) or fs.rmdirSync for compatibility). Locate the
code around runHook, cfgPath, mkdtempSync, writeFileSync and update the test to
import/removal functions and perform cleanup to avoid leaking state between
tests.
Two issues surfaced during real e2e verification against the
test_plugin sandbox table.
1) **CJS default-export unwrap in nomic.ts.**
`createRequire(base).resolve("@huggingface/transformers")` honors the
package's `"require"` conditional and returns the path to the CJS
bundle (`./dist/transformers.node.cjs`). A subsequent dynamic
`import(pathToFileURL(absMain).href)` wraps the CJS module as
`{ default: <exports>, __esModule: true }`, so the daemon's
`mod.env.allowLocalModels = false` line threw
`Cannot set properties of undefined`. Added a
`normalizeTransformersModule` helper that returns `mod.default`
when it carries `pipeline`, else returns the bare module — works
for both the CJS-resolved-by-require path and the ESM-resolved-by-
import path (dev tree).
2) **Recycle on `unknown op` from hello.**
A pre-handshake daemon (i.e. anything before this PR lands) answers
`{ op: "hello" }` with `{ id, error: "unknown op" }` and no
`daemonPath`. The previous check skipped the mismatch path in that
case ("no daemonPath; skipping mismatch check") — meaning a stuck
older daemon would keep poisoning sessions forever. Now treat a
missing `daemonPath` the same as a path mismatch: the running
daemon doesn't speak the current protocol, so it can't be trusted
and gets recycled.
End-to-end verification on the test_plugin org's `sessions_test`
table: with both fixes in place, a real capture hook run produced a
row with `len=768` for `message_embedding` — the first non-NULL
embedding written to that table since the regression was filed.
…gent thrash)
The hello-handshake recycle was too aggressive. The original logic
fired SIGTERM whenever the running daemon's `daemonPath` differed
from the client's expected `daemonEntry`. On a single-agent machine
that catches the legitimate "marketplace upgrade replaced the
bundle, old daemon still running with old code" case. But on
multi-agent machines (a Hivemind user running claude-code + codex,
or anyone using pi) it causes endless thrash:
- claude-code spawns daemon at <cc-bundle>/embed-daemon.js
- codex fires capture, connects, hello returns cc's path,
codex expects codex's path → MISMATCH → recycle the working
daemon
- codex spawns its own daemon at <cod-bundle>/embed-daemon.js
- claude-code's next capture → mismatch → recycle codex's
- ...forever
Differentiate "stale (GC'd) bundle" from "different but
functionally-equivalent bundle" via filesystem check:
- `!hello.daemonPath` → older daemon, no handshake support → recycle
- `daemonPath !== entry`
AND `!existsSync(daemonPath)` → orphaned bundle (GC'd) → recycle
AND `existsSync(daemonPath)` → multi-agent share → KEEP DAEMON
The marketplace-upgrade case still works because Claude Code's
plugin-cache-gc eventually prunes old versioned dirs. Until then,
the embed-error trigger (still in place) catches a stuck daemon
whose code is buggy regardless of path.
Verified live: spawned claude-code's daemon, then connected as
codex from a separate client. Codex got its embedding via
claude-code's daemon (vec length 768) without triggering recycle.
Socket and daemon survived intact.
Pi is also covered without changes — it passes no `daemonEntry`
(uses the canonical shared daemon at ~/.hivemind/embed-deps/
embed-daemon.js). When pi runs alongside any agent, its expected
entry differs from the running agent's bundle path, but both files
exist → no recycle, pi happily reuses whatever's warm.
Openclaw is out of scope — it doesn't embed locally (uses MCP
contracts that delegate to the cloud).
Summary
Embeddings have been silently failing in production. The capture hook
spawns the embed daemon from the marketplace plugin bundle path
(
~/.claude/plugins/cache/hivemind/hivemind/<version>/bundle/embeddings/embed-daemon.js),but
@huggingface/transformersis installed only at~/.hivemind/embed-deps/node_modules/— reachable from the bundle viaa one-time symlink that
hivemind embeddings installcreates. EveryClaude Code marketplace auto-upgrade drops a new versioned cache dir
without the symlink, so the bundle daemon's
import("@huggingface/transformers")returns MODULE_NOT_FOUND,embed()returns null, andsessions.message_embeddingwrites throughas
NULLforever after. The user-facing log shows ~30 minutes ofembed err: Cannot find packagelines.This PR is a fix-it-forever overhaul of the embed daemon lifecycle:
Explicit transformers resolver in the daemon.
nomic.tsnowloads transformers via
createRequire(pathToFileURL("~/.hivemind/embed-deps/")).resolve()followed by an absolute-URL dynamic import, with a bare-specifier
fallback for dev trees. The daemon resolves transformers regardless
of which bundle path it was spawned from.
Persistent opt-in via
~/.deeplake/config.json. The legacyHIVEMIND_EMBEDDINGSenv var is read ONCE on first run to seedembeddings.enabled, then never consulted again. Source of truthis now a real file shared across all 4 agents.
New CLI surface.
hivemind embeddings installis the heavypath (deps + symlinks +
enabled:true).enable/disablearelight config flips.
uninstall [--prune]is the heavy counterpart(removes symlinks + optionally wipes shared deps). Every agent's
SessionStart injection advertises the full surface.
Hello handshake + stuck-daemon recycle. The socket is per-UID,
not per-version, so one stuck daemon poisons every subsequent
session for up to 10 minutes of idle-out. The client now sends a
helloon first connect; on path mismatch (typical post-upgrade)it SIGTERMs the daemon and clears its sock/pid. Same recycle fires
if the daemon returns a transformers-missing error during
embed(). Next session spawns fresh from the current bundle.Visible one-time notification. On transformers-missing, the
client enqueues a warn-severity notification routed through the
existing SessionStart drain so the user sees a clear "run
hivemind embeddings install" message. Suppressed when the userhas explicitly disabled embeddings — no nag for opt-outs.
Self-heal symlink across cached versions. Each capture hook
runs
ensurePluginNodeModulesLink({ bundleDir })at top-level.The first capture under a new plugin version (post-marketplace-
upgrade) creates
<pluginDir>/node_modules → ~/.hivemind/embed-deps/node_modulesatomically; subsequent calls are O(1) no-ops. Conservative — never
clobbers a real
node_modules, never overrides a symlink to adifferent valid target, but cleans up dangling links so it can
recreate them on the next call. Refuses to act outside the
production
bundle/layout (guard against tests).Bundle-scan guards. New
tests/claude-code/embeddings-bundle-scan.test.tsreads each agent's shipped
embed-daemon.js,capture.js, andthe CLI bundle, and asserts every literal string the runtime
depends on survives bundling. 30-second reviewer guardrail.
Behavioral migration
HIVEMIND_EMBEDDINGS=falseenv var~/.deeplake/config.json→embeddings.enabled: falseEmbeddingsStatus = "env-disabled"EmbeddingsStatus = "user-disabled"enableEmbeddings()(heavy)installEmbeddings()disableEmbeddings({ prune })(heavy)uninstallEmbeddings({ prune })enableEmbeddings()(light) — flip config flag, warn if deps missingdisableEmbeddings()(light) — flip flag + SIGTERM daemonfalse/unset →enabled: false; elseenabled: trueMost users have the env var unset, so on first run after this lands
they'll get
enabled: falseand need to runhivemind embeddings enableor
installto opt back in. That's intentional — explicit consentinstead of implicit on-by-default.
Test plan
npm test→ 2457 passing (up from ~2371 onmain).tsc --noEmitclean.fix.
existing real dir, existing symlink-elsewhere, dangling symlink,
non-bundle layout).
tables"):
node codex/bundle/commands/auth-login.js org switch test_pluginsessionsTableNameatsessions_testinsrc/config.ts,rebuild, reinstall plugin.
kill $(cat /tmp/hivemind-embed-$(id -u).pid 2>/dev/null) ; rm -f /tmp/hivemind-embed-$(id -u).{sock,pid}SELECT COUNT(*) AS total, COUNT(message_embedding) AS with_embed FROM sessions_test WHERE creation_date > NOW() - INTERVAL '5 minutes';with_embed == total.src/config.tsbefore merging.hivemind embeddings disable→ next capture writes NULL, nodaemon spawn, no notification.
hivemind embeddings enable→ daemon spawns, embedding columnpopulated.
rm <marketplace-cache>/<version>/node_modulesOut of scope (filed separately)
~/.deeplake/hook-debug.log. Different auth-layer issue,orthogonal to embeddings.
Summary by CodeRabbit
New Features
hivemind embeddings installto manage transformer dependencies separately from enablementhivemind embeddings uninstall [--prune]for full cleanup with optional dependency pruning~/.deeplake/config.jsonfor cross-session preference preservationBug Fixes