You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A hybrid Retrieval-Augmented Generation (RAG) system for ArXiv research papers. Combines dense retrieval (sentence-transformer embeddings + Chroma), lexical retrieval (BM25), and cross-encoder reranking to provide accurate, citation-backed answers to research questions.
Note: Ranking metrics (Recall@k, MRR, nDCG@k) are omitted because ground-truth labels were auto-generated from the retriever itself (self-consistency). Only honest, reproducible system metrics are reported below. Human-labeled evaluation is planned. See results/metrics.md for the full report.
⚡ Latency & Throughput (5,000 papers)
Component
p50 latency
p95 latency
Avg latency
QPS
Retrieval Pipeline
79.6 ms
96.9 ms
90.7 ms
11.02
End-to-End (incl. LLM)
-
-
1,059 ms
-
Caching Rule: If retrieval_p95 ≥ 200 ms, system enables Redis top-k cache with 1-minute TTL. Currently at 96.9 ms, so caching is gracefully bypassed.