Skip to content

hivemind_search / Grep recall: tier memory-first, fall back to sessions — current UNION ALL blocks on the slow sessions branch even when memory already has the answer #176

@kaghni

Description

@kaghni

Symptom

Every hivemind memory recall is slow whenever the sessions table has more than ~10 seconds of latency, even if the answer is already in the memory (summaries) table after ~1.5 seconds. Agents see Search failed: Query timeout after 10000ms and either fall back to other tools (hivemind_index + hivemind_read, ~2× more round-trips) or — worse — fail the whole call. End-user-visible: telegram responses come ~15-20s late, Claude Code Grep for memory hits the SDK abort, etc.

Where

Shared across every hivemind agent path. src/shell/grep-core.ts:searchDeeplakeTables is the single primitive. Every consumer routes through it:

Agent path Call site
Claude Code / codex / hermes — Grep PreToolUse intercept src/shell/grep-interceptor.ts:131,150
Cursor — grep/rg PreToolUse intercept src/hooks/cursor/pre-tool-use.ts:1
Direct hook (Claude Code grep-direct path) src/hooks/grep-direct.ts:330
Openclaw hivemind_search agent tool openclaw/src/index.ts:792, 908
MCP server memory_search tool (Cline / Roo / Kilo / etc.) src/mcp/server.ts:81

The function's own header confirms the design intent:

Runs both halves in a single UNION ALL query so each grep = one round-trip. … semantic catches conceptual matches that lexical can't express. De-duplicate by path in the outer layer; when a path appears in both halves, the semantic score wins.

i.e. one round-trip is by design, optimized for fewer Deeplake calls. That assumption breaks once sessions query latency exceeds the SDK timeout.

Why both tables are slow vs fast — measured

Captured live against org=activeloop, ws=hivemind (2026-05-17 02:33 UTC) using raw curl, bypassing the SDK timeout:

SELECT MAX(creation_date) FROM "sessions"                            → 2.27 s   ✓
SELECT * FROM "sessions" LIMIT 0                                     → 0.20 s   ✓ (schema only)
SELECT path, message::text FROM "sessions" WHERE message::text ILIKE → 16.02 s  ✗ exceeds 10s SDK timeout
(memory UNION ALL sessions) ILIKE                                    → 10.48 s  ✗

So the UNION ALL itself takes ~10.5s — JUST slightly over the 10s AbortSignal.timeout(10000) in src/deeplake-api.ts:44. The hivemind plugin sees a timeout and reports Search failed: Query timeout after 10000ms, even though Deeplake returns HTTP 200 with valid data on the server side. The memory branch alone would finish in ~1.5s.

Tested with hits from kw = "openclaw" — both tables had matches, both contributed rows in the (otherwise discarded) response.

Why this matters now

Two PRs in flight have made the symptom more visible:

Proposed: tier the search

Move from one-call UNION ALL to a two-phase pattern. Phase A returns immediately on memory hit; Phase B fires only on miss.

// Sketch — src/shell/grep-core.ts
export async function searchDeeplakeTables(api, memoryTable, sessionsTable, opts) {
  // Phase A: small/fast table only
  const memoryOpts = { ...opts, scope: "memory-only" };
  const memoryRows = await api.query(buildMemoryOnlySql(memoryOpts));
  if (memoryRows.length >= (opts.minHitThreshold ?? 3) || opts.scope === "memory-only") {
    return memoryRows;
  }
  // Phase B: only on miss / few hits, with a higher per-call timeout
  // (env or config-overridable, separate from the per-table memory timeout).
  const sessionRows = await api.query(buildSessionsOnlySql(opts), { timeoutMs: 30_000 });
  return dedupeByPath([...memoryRows, ...sessionRows]);
}

Net effect for the common case:

  • Memory has a relevant hit → return in ~1.5s instead of ~10.5s
  • Memory is sparse → fall through to sessions, no functionality lost

Trade-offs:

  • Adds one extra round-trip in the sparse case (acceptable — that case is rare and already slow)
  • Score-blending becomes per-phase (current code does semantic > lexical deduping in the outer layer; needs a re-think when results come from two separate queries)
  • searchOpts.contentScanOnly regex-path filtering (line 793 in openclaw bundle, line ~620 in source) needs to stay correct across both phases

Alternative ideas

  1. Add scope: "memory" | "sessions" | "both" to SearchOptions. Have the agent's tool-call (or the prompt nudge) prefer scope: memory first; the LLM can fall through to scope: sessions if memory is empty. Same effect, smaller refactor, but pushes the decision into the agent instead of the library.
  2. Server-side fix (Deeplake team). Index message::text for ILIKE OR use message_embedding for semantic-only search. Removes the underlying slowness, no client change. But out of our control; long lead time.
  3. Per-table timeouts in the SDK (HIVEMIND_QUERY_TIMEOUT_MEMORY_MS vs ..._SESSIONS_MS). Doesn't fix UNION ALL waiting for both branches — even if sessions gets a 30s budget, memory results still wait 10s to come back. Only helps if we already split the query.

Acceptance criteria

  • hivemind_search "openclaw" returns hits within ~3s on the same dataset where today's UNION ALL takes ~10.5s
  • Recall coverage unchanged: hits that today come from sessions still surface (after memory miss)
  • Score / ranking semantics documented for the two-phase case (no silent regression on --use-semantic callers)
  • Same change benefits all hivemind agents (CC PreToolUse, codex, cursor, hermes, pi, openclaw, MCP) — single code path

Related

Notes for the implementer

searchDeeplakeTables's comment already calls out the hybrid lexical+semantic branch is "one round-trip by design." That's a meaningful axis to preserve — if we move to two-phase, both the lexical hybrid AND the pure-lexical fast path should still be one-round-trip per phase. The constraint relaxes from "one round-trip" to "one round-trip per table."

Telegram round-trip measurement on the live gateway (2026-05-17): a memory-recall question hit hivemind_search first, timed out at 10s, fell back to hivemind_index (succeeded in ~2s) and hivemind_read on a specific summary (~1s), and the agent answered correctly. Total latency ~15s. With tiered search it would be ~3s.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions