hivemind_search / Grep recall: tier memory-first, fall back to sessions — current UNION ALL blocks on the slow sessions branch even when memory already has the answer

## Symptom

Every hivemind memory recall is slow whenever the `sessions` table has more than ~10 seconds of latency, even if the answer is already in the `memory` (summaries) table after ~1.5 seconds. Agents see `Search failed: Query timeout after 10000ms` and either fall back to other tools (`hivemind_index` + `hivemind_read`, ~2× more round-trips) or — worse — fail the whole call. End-user-visible: telegram responses come ~15-20s late, Claude Code `Grep` for memory hits the SDK abort, etc.

## Where

Shared across **every** hivemind agent path. `src/shell/grep-core.ts:searchDeeplakeTables` is the single primitive. Every consumer routes through it:

| Agent path | Call site |
|---|---|
| Claude Code / codex / hermes — `Grep` PreToolUse intercept | `src/shell/grep-interceptor.ts:131,150` |
| Cursor — `grep`/`rg` PreToolUse intercept | `src/hooks/cursor/pre-tool-use.ts:1` |
| Direct hook (Claude Code grep-direct path) | `src/hooks/grep-direct.ts:330` |
| Openclaw `hivemind_search` agent tool | `openclaw/src/index.ts:792, 908` |
| MCP server `memory_search` tool (Cline / Roo / Kilo / etc.) | `src/mcp/server.ts:81` |

The function's own header confirms the design intent:

> Runs both halves in a single UNION ALL query so each grep = one round-trip. … semantic catches conceptual matches that lexical can't express. De-duplicate by path in the outer layer; when a path appears in both halves, the semantic score wins.

i.e. **one round-trip is by design**, optimized for fewer Deeplake calls. That assumption breaks once `sessions` query latency exceeds the SDK timeout.

## Why both tables are slow vs fast — measured

Captured live against `org=activeloop, ws=hivemind` (2026-05-17 02:33 UTC) using raw curl, bypassing the SDK timeout:

```
SELECT MAX(creation_date) FROM "sessions"                            → 2.27 s   ✓
SELECT * FROM "sessions" LIMIT 0                                     → 0.20 s   ✓ (schema only)
SELECT path, message::text FROM "sessions" WHERE message::text ILIKE → 16.02 s  ✗ exceeds 10s SDK timeout
(memory UNION ALL sessions) ILIKE                                    → 10.48 s  ✗
```

So the UNION ALL itself takes ~10.5s — JUST slightly over the 10s `AbortSignal.timeout(10000)` in `src/deeplake-api.ts:44`. The hivemind plugin sees a timeout and reports `Search failed: Query timeout after 10000ms`, even though Deeplake returns HTTP 200 with valid data on the server side. The memory branch alone would finish in ~1.5s.

Tested with hits from `kw = "openclaw"` — both tables had matches, both contributed rows in the (otherwise discarded) response.

## Why this matters now

Two PRs in flight have made the symptom more visible:

- **#124** (just merged path): openclaw's blocking auto-recall was REPLACED with the agent-initiated `hivemind_search` tool. Lazy/tool-only flow exposes the per-tool-call latency directly to the user instead of hiding it inside a system-prompt build step. **Net win** (the agent can fall back), but the slow case is now user-visible.
- **#170** (ClawHub static-scan compliance): inlined `process.env.HIVEMIND_QUERY_TIMEOUT_MS` to `undefined` in the openclaw bundle. So the existing env-var override I added to mitigate this no longer works on openclaw. Bump-the-timeout escape hatch is gone — we need a real fix.

## Proposed: tier the search

Move from one-call UNION ALL to a two-phase pattern. **Phase A returns immediately on memory hit; Phase B fires only on miss.**

```ts
// Sketch — src/shell/grep-core.ts
export async function searchDeeplakeTables(api, memoryTable, sessionsTable, opts) {
  // Phase A: small/fast table only
  const memoryOpts = { ...opts, scope: "memory-only" };
  const memoryRows = await api.query(buildMemoryOnlySql(memoryOpts));
  if (memoryRows.length >= (opts.minHitThreshold ?? 3) || opts.scope === "memory-only") {
    return memoryRows;
  }
  // Phase B: only on miss / few hits, with a higher per-call timeout
  // (env or config-overridable, separate from the per-table memory timeout).
  const sessionRows = await api.query(buildSessionsOnlySql(opts), { timeoutMs: 30_000 });
  return dedupeByPath([...memoryRows, ...sessionRows]);
}
```

Net effect for the common case:
- Memory has a relevant hit → return in ~1.5s instead of ~10.5s
- Memory is sparse → fall through to sessions, no functionality lost

Trade-offs:
- Adds one extra round-trip in the sparse case (acceptable — that case is rare and already slow)
- Score-blending becomes per-phase (current code does `semantic > lexical` deduping in the outer layer; needs a re-think when results come from two separate queries)
- `searchOpts.contentScanOnly` regex-path filtering (line 793 in openclaw bundle, line ~620 in source) needs to stay correct across both phases

## Alternative ideas

1. **Add `scope: "memory" | "sessions" | "both"` to `SearchOptions`.** Have the agent's tool-call (or the prompt nudge) prefer `scope: memory` first; the LLM can fall through to `scope: sessions` if memory is empty. Same effect, smaller refactor, but pushes the decision into the agent instead of the library.
2. **Server-side fix (Deeplake team).** Index `message::text` for ILIKE OR use `message_embedding` for semantic-only search. Removes the underlying slowness, no client change. But out of our control; long lead time.
3. **Per-table timeouts in the SDK** (`HIVEMIND_QUERY_TIMEOUT_MEMORY_MS` vs `..._SESSIONS_MS`). Doesn't fix UNION ALL waiting for both branches — even if `sessions` gets a 30s budget, memory results still wait 10s to come back. Only helps if we already split the query.

## Acceptance criteria

- [ ] `hivemind_search "openclaw"` returns hits within ~3s on the same dataset where today's UNION ALL takes ~10.5s
- [ ] Recall coverage unchanged: hits that today come from sessions still surface (after memory miss)
- [ ] Score / ranking semantics documented for the two-phase case (no silent regression on `--use-semantic` callers)
- [ ] Same change benefits all hivemind agents (CC PreToolUse, codex, cursor, hermes, pi, openclaw, MCP) — single code path

## Related

- #80 — auto-capture in openclaw doesn't populate `message_embedding`. If we ever do semantic-only on the sessions table, that gap matters more.
- #170 / PR #170 — inlined `HIVEMIND_QUERY_TIMEOUT_MS` for openclaw. Once this tiered design lands, that env-var override is moot for the recall path (timeouts move into the SDK call options, not a global env).
- #124 — replaced openclaw blocking auto-recall with `hivemind_search` tool calls; this issue tightens the latency of those tool calls.

## Notes for the implementer

`searchDeeplakeTables`'s comment already calls out the hybrid lexical+semantic branch is "one round-trip by design." That's a meaningful axis to preserve — if we move to two-phase, both the lexical hybrid AND the pure-lexical fast path should still be one-round-trip per phase. The constraint relaxes from "one round-trip" to "one round-trip per table."

Telegram round-trip measurement on the live gateway (2026-05-17): a memory-recall question hit `hivemind_search` first, timed out at 10s, fell back to `hivemind_index` (succeeded in ~2s) and `hivemind_read` on a specific summary (~1s), and the agent answered correctly. **Total latency ~15s. With tiered search it would be ~3s.**

Agent path	Call site
Claude Code / codex / hermes — `Grep` PreToolUse intercept	`src/shell/grep-interceptor.ts:131,150`
Cursor — `grep`/`rg` PreToolUse intercept	`src/hooks/cursor/pre-tool-use.ts:1`
Direct hook (Claude Code grep-direct path)	`src/hooks/grep-direct.ts:330`
Openclaw `hivemind_search` agent tool	`openclaw/src/index.ts:792, 908`
MCP server `memory_search` tool (Cline / Roo / Kilo / etc.)	`src/mcp/server.ts:81`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hivemind_search / Grep recall: tier memory-first, fall back to sessions — current UNION ALL blocks on the slow sessions branch even when memory already has the answer #176

Symptom

Where

Why both tables are slow vs fast — measured

Why this matters now

Proposed: tier the search

Alternative ideas

Acceptance criteria

Related

Notes for the implementer

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

hivemind_search / Grep recall: tier memory-first, fall back to sessions — current UNION ALL blocks on the slow sessions branch even when memory already has the answer #176

Description

Symptom

Where

Why both tables are slow vs fast — measured

Why this matters now

Proposed: tier the search

Alternative ideas

Acceptance criteria

Related

Notes for the implementer

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions