Architecture Enhancements: Hybrid Search Ranking (RRF) & Robust JSON Parsing

Hi INCF Team,

I've been exploring the `knowledge-space-agent` codebase to understand the current RAG pipeline for the upcoming 2026 cycle. I noticed the recent updates to the documentation and decided to dive into the core logic.

I identified two opportunities to significantly improve the agent's search relevance and stability. I would love to open a PR for these if aligned with the roadmap:

### 1. Robust JSON Parsing (Error Handling)
**Current State:** The LLM calls in `agents.py` rely on `json.loads(resp.text)`.
**Issue:** Even at low temperatures, models like Gemini Flash often output Markdown formatting (e.g., ` ```json ... ``` `) or conversational filler. This currently causes `JSONDecodeError` exceptions that crash the pipeline.
**Proposed Fix:** Implement a `clean_and_parse_json` helper utility that strips Markdown and handles malformed output gracefully before parsing.

### 2. Search Ranking Upgrade (Reciprocal Rank Fusion)
**Current State:** `fuse_results` uses a linear weighted sum: `vector_score * 0.6 + keyword_score * 0.4`.
**Issue:** Keyword scores (BM25/Elastic) are often unbounded (e.g., 10.0+), while Vector scores are normalized (0.0–1.0). In practice, this allows keyword matches to overpower semantic vector matches, negating the benefit of the hybrid approach.
**Proposed Fix:** Switch to **Reciprocal Rank Fusion (RRF)**. This ranks documents based on their position `(1 / (k + rank))` rather than raw scores, ensuring a mathematically stable balance between semantic and keyword results.

I have a working implementation plan for both changes. Would you be open to a PR refactoring these components?

Best,
Somsubhra Nandi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Architecture Enhancements: Hybrid Search Ranking (RRF) & Robust JSON Parsing #13

1. Robust JSON Parsing (Error Handling)

2. Search Ranking Upgrade (Reciprocal Rank Fusion)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Architecture Enhancements: Hybrid Search Ranking (RRF) & Robust JSON Parsing #13

Description

1. Robust JSON Parsing (Error Handling)

2. Search Ranking Upgrade (Reciprocal Rank Fusion)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions