Skip to content

Conversation

@sukriti112
Copy link

@sukriti112 sukriti112 commented Nov 19, 2025

Summary

This PR adds a semantic answer cache and fine-grained latency instrumentation to the TokenSmith RAG pipeline, plus a paraphrased benchmark suite to evaluate cache behavior. The goal is to reduce end-to-end latency for repeated/paraphrased questions while preserving answer traceability via citations and logs.

Key Changes

  1. Semantic Answer Cache

    • Introduces an in-memory _SEMANTIC_CACHE keyed by a configuration signature (model path, retrieval knobs, index prefix, etc.).
    • Each cache entry stores:
      • Normalized question text
      • Unit-normalized question embedding (using the FAISS embedder)
      • Final answer string
      • Chunk indices and chunk metadata
      • HyDE text used for retrieval (when enabled)
    • On each query:
      • Embed the incoming question and compute cosine similarity against cached embeddings.
      • If max similarity ≥ 0.85, treat as a semantic cache hit, reuse the answer, and log semantic_cache_hit_seconds.
      • Otherwise, run the normal HyDE → retrieval → ranking → generation pipeline and insert a new cache entry.
    • Cache size is capped (default 50 entries per config) to avoid unbounded growth.
  2. Per-Stage Latency Instrumentation

    • Extends stage_timings with:
      • hyde_seconds
      • retrieval_seconds
      • ranking_seconds
      • generation_seconds
      • semantic_cache_hit_seconds
    • Each query log entry now shows these timings, enabling:
      • Cold vs. cached comparison per query.
      • Identification of the main latency bottlenecks.
  3. Paraphrased Benchmark Suite

    • Adds tests/benchmarks_semantic.yaml with 12 paraphrased versions of the existing benchmark questions.
    • These benchmarks are used only to test semantic-cache behavior and do not change base accuracy evaluation.

How to Run

python scripts/run_benchmarks.py --benchmarks tests/benchmarks.yaml tests/benchmarks_semantic.yaml --runs 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant