Update context-engine SKILL docs; emphasize search

m1rl0k · m1rl0k · commit 3103c7090d91 · 2026-02-16T13:57:55.000-05:00
Revise context-engine skill documentation across .codex/ and skills/ to make `search` the recommended/primary entrypoint, add guidance to always use `search` first, and show example auto-routing. Expand and clarify tool docs: symbol_graph query types (callers, callees, definition, importers), optional graph_query capabilities and fallbacks, new admin/diagnostics commands, error fallbacks, and session/workspace recommendations. Replace many repo_search examples with search, streamline index/workspace tooling sections (remove some qdrant index instructions), and add best practices (TOON format, multi-query patterns, commit-history predictions). Minor formatting and example updates throughout for clarity and consistency.
diff --git a/.codex/skills/context-engine/SKILL.md b/.codex/skills/context-engine/SKILL.md
@@ -7,49 +7,64 @@ description: Hybrid semantic/lexical code search with neural reranking via MCP t
 
 Hybrid vector search (semantic + lexical) with neural reranking for codebase retrieval.
 
+> **IMPORTANT: Always use `search` as your FIRST tool for ANY code exploration, lookup, or question.** It auto-detects intent and routes to the best specialized tool. Only use `repo_search`, `symbol_graph`, or other tools directly when you need specific parameters or features that `search` does not expose (cross-repo, memory, admin). When in doubt, use `search`.
+
 ## Core Decision Tree
 
 ```
 Need to find code?
 ├── UNSURE / GENERAL QUERY → search (RECOMMENDED DEFAULT)
-│   └── Auto-routes to the best tool based on query intent
+│   └── Auto-routes to best tool based on query intent
+│   └── Handles: code search, Q&A, tests, config, symbols, imports
 ├── Simple lookup → search OR info_request
 ├── Need filters/control → search OR repo_search
 ├── Search across multiple repos → cross_repo_search
 ├── Want LLM explanation → search OR context_answer
 ├── Find similar patterns → pattern_search (if enabled)
-├── Find relationships → search OR symbol_graph (DEFAULT, always available)
+├── Find relationships
+│   ├── Who calls / who imports / where defined → symbol_graph (DEFAULT, always available)
+│   ├── What does this call → symbol_graph (query_type="callees")
+│   ├── Multi-hop (callers of callers) → symbol_graph (depth=2+)
+│   └── Impact analysis / cycles → graph_query (ONLY if NEO4J/MEMGRAPH enabled)
+├── Git history
+│   ├── Find commits → search_commits_for
+│   └── Predict co-changing files → search_commits_for (predict_related=true)
+├── Blend code + notes → context_search (include_memories=true)
 └── Store/recall knowledge → memory_store, memory_find
 ```
 
 ## Primary Tools
 
-**search** - Unified entry point (RECOMMENDED DEFAULT):
+**search** - ALWAYS USE FIRST (unified entry point, auto-routes):
 ```json
 {"query": "authentication middleware"}
+{"query": "how does caching work?"}          // → routes to context_answer
+{"query": "who calls authenticate()"}        // → routes to symbol_graph
+{"query": "tests for payment processing"}    // → routes to search_tests_for
 ```
-Auto-detects intent and routes to the best tool. Returns:
-```json
-{
-  "ok": true, "intent": "search", "confidence": 0.92,
-  "tool": "repo_search", "result": {...}, "execution_time_ms": 245
-}
-```
-Handles: code search, Q&A, tests, config, symbols, imports. Use specialized tools only for cross-repo, memory, or admin operations.
+Auto-detects intent and routes to the best tool. Returns `{ok, intent, confidence, tool, result, execution_time_ms}`.
+
+Optional params: `query`, `collection`, `limit`, `language`, `under`, `include_snippet`, `compact`, `context_lines`, `ext`, `not_glob`, `path_glob`, `output_format`, `rerank_enabled`.
+
+Use specialized tools directly only for: cross-repo search, memory, admin, or when you need params `search` doesn't expose.
 
 **repo_search** - Direct code search (full control):
 ```json
 {"query": "authentication middleware", "limit": 10, "include_snippet": true}
 ```
 Multi-query: `{"query": ["auth handler", "login validation"]}`
 
-**symbol_graph** - Find callers, definitions, importers (ALWAYS available):
+**symbol_graph** - Find callers, callees, definitions, importers (ALWAYS available):
 ```json
 {"symbol": "authenticate", "query_type": "callers", "limit": 10}
+{"symbol": "authenticate", "query_type": "callees", "limit": 10}
 {"symbol": "UserService", "query_type": "definition"}
 {"symbol": "utils", "query_type": "importers"}
 ```
-Use `depth=2` for multi-hop (callers of callers).
+Query types: `callers`, `callees`, `definition`, `importers`. Use `depth=2` for multi-hop. Falls back to semantic search if no graph hits. Results include ~500-char source snippets.
+
+**graph_query** (OPTIONAL -- only if NEO4J_GRAPH=1 or MEMGRAPH_GRAPH=1):
+Extra query types: `transitive_callers`, `transitive_callees`, `impact`, `dependencies`, `cycles`. If not in your tool list, use `symbol_graph` instead.
 
 **context_answer** - LLM-generated explanation with citations:
 ```json
@@ -82,7 +97,10 @@ Use `depth=2` for multi-hop (callers of callers).
 | `search_config_for` | Find config | `{"query": "database connection"}` |
 | `search_callers_for` | Quick caller search | `{"query": "processPayment"}` |
 | `search_commits_for` | Git history | `{"query": "fixed auth bug"}` |
-| `pattern_search` | Similar code patterns | `{"query": "retry with backoff"}` |
+| `search_commits_for` | Predict co-changing files | `{"path": "src/auth.py", "predict_related": true}` |
+| `change_history_for_path` | File change summary | `{"path": "src/auth.py", "include_commits": true}` |
+| `pattern_search` | Similar code patterns (if enabled) | `{"query": "retry with backoff"}` |
+| `search_importers_for` | Find importers | `{"query": "utils/helpers"}` |
 
 ## Index Management
 
@@ -92,13 +110,23 @@ Use `depth=2` for multi-hop (callers of callers).
 
 ## Best Practices
 
-1. **Use `search` as your default tool** - Auto-routes to the best specialized tool
-2. **NEVER use grep/cat/find for code exploration** - Use MCP tools instead
-3. **Start with `symbol_graph`** for all relationship queries
-4. **Use multi-query** for complex searches: pass 2-3 variations
+1. **ALWAYS start with `search`** - It is your PRIMARY tool. Auto-routes to the best specialized tool. Only fall back to specific tools when you need params `search` doesn't expose.
+2. **NEVER use grep/cat/find for code exploration** - Use MCP tools instead. Only acceptable use: confirming exact literal strings.
+3. **Start with `symbol_graph`** for all relationship queries - always available, no Neo4j needed
+4. **Use multi-query** for complex searches: pass 2-3 variations as a list
 5. **Two-phase search**: Discovery (`limit=3, compact=true`) → Deep dive (`limit=8, include_snippet=true`)
 6. **Fire parallel calls** - Multiple independent `search`, `repo_search`, `symbol_graph` in one message
 7. **Set session defaults early**: `set_session_defaults(output_format="toon", compact=true)`
+8. **Use TOON format** - `output_format: "toon"` for 60-80% token reduction on exploratory queries
+9. **Use `cross_repo_search`** for multi-repo scenarios instead of manual collection switching
+10. **Predict co-changing files** - `search_commits_for(path=..., predict_related=true)` finds historically coupled files
+
+## Error Fallbacks
+
+- `context_answer` timeout → `search` + `info_request(include_explanation=true)`
+- `pattern_search` unavailable → `search` with structural query terms
+- `graph_query` unavailable → `symbol_graph` (always available)
+- grep/Read File → use `search`, `symbol_graph`, `info_request` instead
 
 ## Filters (for repo_search)
 
diff --git a/skills/context-engine/SKILL.md b/skills/context-engine/SKILL.md
@@ -300,7 +300,17 @@ The `query_signature` encodes control flow: `L` (loops), `B` (branches), `T` (tr
 {"query": "utils/helpers", "limit": 10}
 ```
 
-**symbol_graph** - Symbol graph navigation (callers / definition / importers):
+**symbol_graph** - Symbol graph navigation (callers / callees / definition / importers):
+
+**Query types:**
+| Type | Description |
+|------|-------------|
+| `callers` | Who calls this symbol? |
+| `callees` | What does this symbol call? |
+| `definition` | Where is this symbol defined? |
+| `importers` | Who imports this module/symbol? |
+
+**Examples:**
 ```json
 {"symbol": "ASTAnalyzer", "query_type": "definition", "limit": 10}
 ```
@@ -310,14 +320,17 @@ The `query_signature` encodes control flow: `L` (loops), `B` (branches), `T` (tr
 ```json
 {"symbol": "qdrant_client", "query_type": "importers", "limit": 10}
 ```
+```json
+{"symbol": "authenticate", "query_type": "callees", "limit": 10}
+```
 - Supports `language`, `under`, `depth`, and `output_format` like other tools.
 - Use `depth=2` or `depth=3` for multi-hop traversals (callers of callers).
 - If there are no graph hits, it falls back to semantic search.
 - **Note**: Results are "hydrated" with ~500-char source snippets for immediate context.
 
 **graph_query** - Advanced graph traversals (OPTIONAL — ONLY available when NEO4J_GRAPH=1 or MEMGRAPH_GRAPH=1):
 
-> **If `graph_query` is not in your MCP tool list, it is NOT enabled. Use `symbol_graph` for all graph queries instead. Do NOT error or warn about missing Neo4j.**
+> **If `graph_query` is not in your MCP tool list, it is NOT enabled. Use `symbol_graph` for all graph queries instead. Do NOT error or warn about missing Neo4j/Memgraph.**
 
 ```json
 {"symbol": "normalize_path", "query_type": "impact", "depth": 2}
@@ -334,12 +347,20 @@ The `query_signature` encodes control flow: `L` (loops), `B` (branches), `T` (tr
 |------|-------------|
 | `callers` | Who calls this symbol? (depth 1) |
 | `callees` | What does this symbol call? (depth 1) |
+| `definition` | Where is this symbol defined? |
 | `transitive_callers` | Multi-hop callers (up to depth) |
 | `transitive_callees` | Multi-hop callees (up to depth) |
 | `impact` | What breaks if I change this? (reverse transitive) |
 | `dependencies` | What does this depend on? (calls + imports) |
 | `cycles` | Detect circular dependencies |
 
+**Parameters:**
+- `symbol` - Symbol name to query
+- `query_type` - One of the types above
+- `depth` - Maximum traversal depth (default 1)
+- `limit` - Max results (default 10)
+- `include_paths` - Include file paths in results (bool, optional)
+
 
 
 **search_commits_for** - Search git history:
@@ -382,26 +403,7 @@ Use `context_search` to blend code results with stored memories:
 }
 ```
 
-## Index Management
-
-**qdrant_index_root** - First-time setup or full reindex:
-```json
-{}
-```
-With recreate (drops existing data):
-```json
-{"recreate": true}
-```
-
-**qdrant_index** - Index only a subdirectory:
-```json
-{"subdir": "src/"}
-```
-
-**qdrant_prune** - Remove deleted files from index:
-```json
-{}
-```
+## Admin and Diagnostics
 
 **qdrant_status** - Check index health:
 ```json
@@ -413,23 +415,11 @@ With recreate (drops existing data):
 {}
 ```
 
-## Workspace Tools
-
-**workspace_info** - Get current workspace and collection:
-```json
-{}
-```
-
-**list_workspaces** - List all indexed workspaces:
+**embedding_pipeline_stats** - Get cache efficiency, bloom filter stats, pipeline performance:
 ```json
 {}
 ```
 
-**collection_map** - View collection-to-repo mappings:
-```json
-{"include_samples": true}
-```
-
 **set_session_defaults** - Set defaults for session:
 ```json
 {"collection": "my-project", "language": "python"}
@@ -446,8 +436,6 @@ Don't discover at every session start. Trigger when: search returns no/irrelevan
 ```json
 // qdrant_list — discover available collections
 {}
-// collection_map — map repos to collections with sample files
-{"include_samples": true}
 ```
 
 ### Context Switching (Session Defaults = `cd`)
@@ -459,7 +447,7 @@ Treat `set_session_defaults` like `cd` — it scopes ALL subsequent searches:
 {"collection": "backend-api-abc123"}
 
 // One-off peek at another repo (does NOT change session default)
-// repo_search
+// search (or repo_search)
 {"query": "login form", "collection": "frontend-app-def456"}
 ```
 
@@ -472,12 +460,12 @@ NEVER search both repos with the same vague query. Find the **interface boundary
 **Pattern 1 — Interface Handshake (API/RPC):**
 ```json
 // 1. Find client call in frontend
-// repo_search
+// search
 {"query": "login API call", "collection": "frontend-col"}
 // → Found: axios.post('/auth/v1/login', ...)
 
 // 2. Search backend for that exact route
-// repo_search
+// search
 {"query": "'/auth/v1/login'", "collection": "backend-col"}
 ```
 
@@ -488,19 +476,19 @@ NEVER search both repos with the same vague query. Find the **interface boundary
 {"symbol": "UserProfile", "query_type": "importers", "collection": "frontend-col"}
 
 // 2. Find definition in source
-// repo_search
+// search
 {"query": "interface UserProfile", "collection": "shared-lib-col"}
 ```
 
 **Pattern 3 — Event Relay (Pub/Sub):**
 ```json
 // 1. Find producer → extract event name
-// repo_search
+// search
 {"query": "publish event", "collection": "service-a-col"}
 // → Found: bus.publish("USER_CREATED", payload)
 
 // 2. Find consumer with exact event name
-// repo_search
+// search
 {"query": "'USER_CREATED'", "collection": "service-b-col"}
 ```
 
@@ -533,11 +521,11 @@ NEVER search both repos with the same vague query. Find the **interface boundary
 // cross_repo_search
 {"boundary_key": "/api/auth/login", "collection": "backend-col"}
 ```
-Use `cross_repo_search` when you need breadth across repos. Use `repo_search` with explicit `collection` when you need depth in one repo.
+Use `cross_repo_search` when you need breadth across repos. Use `search` (or `repo_search`) with explicit `collection` when you need depth in one repo.
 
 ### Multi-Repo Anti-Patterns
 - **DON'T** search both repos with the same vague query (noisy, confusing)
-- **DON'T** assume the default collection is correct — verify with `collection_map`
+- **DON'T** assume the default collection is correct — verify with `qdrant_list`
 - **DON'T** forget to "cd back" after cross-referencing another repo
 - **DO** extract exact strings (route paths, event names, type names) as search anchors
 
@@ -578,7 +566,7 @@ Tools return structured errors, typically via `error` field and sometimes `ok: f
 ```
 
 Common issues:
-- **Collection not found** - Run `qdrant_index_root` to create the index
+- **Collection not found** - Verify collection with `qdrant_list` or check that the codebase has been indexed
 - **Empty results** - Broaden query, check filters, verify index exists
 - **Timeout on rerank** - Set `rerank_enabled: false` or reduce `limit`
 
@@ -592,8 +580,6 @@ Common issues:
 6. **Include snippets** - Set `include_snippet: true` to see code context in results
 7. **Store decisions** - Use `memory_store` to save architectural decisions and context for later
 8. **Check index health** - Run `qdrant_status` if searches return unexpected results
-9. **Prune after refactors** - Run `qdrant_prune` after moving/deleting files
-10. **Index before search** - Always run `qdrant_index_root` on first use or after cloning a repo
 11. **Use pattern_search for structural matching** - When looking for code with similar control flow (retry loops, error handling), use `pattern_search` instead of `repo_search` (if enabled)
 12. **Describe patterns in natural language** - `pattern_search` understands "retry with backoff" just as well as actual code examples (if enabled)
 13. **Fire independent searches in parallel** - Call multiple `search`, `repo_search`, `symbol_graph`, etc. in the same message block for 2-3x speedup