Proposal: Implement "Local Hybrid Search" (BM25) to enable offline agent reasoning tests

### **Context**
I've been following the recent discussions on enabling local development without GCP credentials (referencing the need for a `MockRetriever`). While setting up a mock is essential to prevent crashes, a static mock that returns synthetic/dummy data has a major limitation: it makes it impossible to test the Agent's **reasoning logic** or **hallucination checks** offline.

If the retriever returns random text ("Lorem ipsum"), the LLM cannot perform meaningful multi-hop reasoning, rendering local testing ineffective for logic improvements.

### **Proposed Solution**
I propose implementing a `LocalRetriever` fallback in `backend/retrieval.py` that activates automatically when `self.is_enabled` is False (i.e., when GCP keys are missing).

Instead of returning dummy data, this system would:
1.  Load a local `local_knowledge.json` file (which mirrors the production BigQuery schema).
2.  Perform a lightweight **BM25 / Keyword Search** on the title and content chunks.
3.  Return **contextually relevant** results to the LLM based on the user's query.

### **Technical Approach**
I have already prototyped a working script (`local_search.py`) that implements this logic using standard Python libraries (no heavy vector DB dependencies required).

The flow would be:
* **Check Env:** If GCP is missing -> Initialize `LocalRetriever`.
* **Search:** Tokenize query -> Score documents based on term frequency (Title + Content) -> Sort by relevance.
* **Output:** Return `List[RetrievedItem]` identical to the Vertex AI response.

### **Benefits**
* **True Logic Testing:** Developers can verify if the agent correctly summarizes/synthesizes data without spending cloud credits.
* **Hallucination Debugging:** We can test if the agent correctly identifies "Undisclosed" information using a controlled local dataset.
* **Zero Cost:** Fully offline and free.

I have the prototype ready and can submit a PR to integrate this into `backend/retrieval.py`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Implement "Local Hybrid Search" (BM25) to enable offline agent reasoning tests #9

Context

Proposed Solution

Technical Approach

Benefits

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Implement "Local Hybrid Search" (BM25) to enable offline agent reasoning tests #9

Description

Context

Proposed Solution

Technical Approach

Benefits

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions