Add semantic caching, latency logging, and paraphrased benchmarks to TokenSmith #51

sukriti112 · 2025-11-19T09:40:59Z

Summary

This PR adds a semantic answer cache and fine-grained latency instrumentation to the TokenSmith RAG pipeline, plus a paraphrased benchmark suite to evaluate cache behavior. The goal is to reduce end-to-end latency for repeated/paraphrased questions while preserving answer traceability via citations and logs.

Key Changes

Semantic Answer Cache
- Introduces an in-memory _SEMANTIC_CACHE keyed by a configuration signature (model path, retrieval knobs, index prefix, etc.).
- Each cache entry stores:
  - Normalized question text
  - Unit-normalized question embedding (using the FAISS embedder)
  - Final answer string
  - Chunk indices and chunk metadata
  - HyDE text used for retrieval (when enabled)
- On each query:
  - Embed the incoming question and compute cosine similarity against cached embeddings.
  - If max similarity ≥ 0.85, treat as a semantic cache hit, reuse the answer, and log semantic_cache_hit_seconds.
  - Otherwise, run the normal HyDE → retrieval → ranking → generation pipeline and insert a new cache entry.
- Cache size is capped (default 50 entries per config) to avoid unbounded growth.
Per-Stage Latency Instrumentation
- Extends stage_timings with:
  - hyde_seconds
  - retrieval_seconds
  - ranking_seconds
  - generation_seconds
  - semantic_cache_hit_seconds
- Each query log entry now shows these timings, enabling:
  - Cold vs. cached comparison per query.
  - Identification of the main latency bottlenecks.
Paraphrased Benchmark Suite
- Adds tests/benchmarks_semantic.yaml with 12 paraphrased versions of the existing benchmark questions.
- These benchmarks are used only to test semantic-cache behavior and do not change base accuracy evaluation.

How to Run

python scripts/run_benchmarks.py --benchmarks tests/benchmarks.yaml tests/benchmarks_semantic.yaml --runs 1

Add semantic cache, latency logging and paraphrased benchmarks

00c7024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add semantic caching, latency logging, and paraphrased benchmarks to TokenSmith #51

Add semantic caching, latency logging, and paraphrased benchmarks to TokenSmith #51

Uh oh!

sukriti112 commented Nov 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add semantic caching, latency logging, and paraphrased benchmarks to TokenSmith #51

Are you sure you want to change the base?

Add semantic caching, latency logging, and paraphrased benchmarks to TokenSmith #51

Uh oh!

Conversation

sukriti112 commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

How to Run

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sukriti112 commented Nov 19, 2025 •

edited

Loading