A RAG (Retrieval-Augmented Generation) system implementing hybrid retrieval and multi-stage retrieval.
- BM25 Sparse Retrieval: Traditional retrieval method based on word frequency statistics
- Dense Retrieval: Generate embeddings using sentence-transformers with FAISS for similarity search
- Hybrid Retrieval: Combine BM25 and dense retrieval for both lexical matching and semantic similarity
- TF-IDF reranking (stable and reliable)
- Extensible support for neural reranking models (BGE, Cross-Encoder, etc.)
- Configurable reranking models
- Stage 1: Hybrid retrieval to get candidate documents
- Stage 2: Reranking to improve result quality
- Python 3.12+
- UV package manager (recommended) or pip
# Clone repository
git clone <your-repo-url>
cd rag_system
# Install dependencies
uv sync
# Activate virtual environment
source .venv/bin/activate # Linux/Mac
# or .venv\Scripts\activate # Windowspip install faiss-cpu rank-bm25 torch transformers numpy scikit-learnfrom hybrid_retrieval import multi_stage_retrieval
# Execute query
query = "How do transformers handle long sequences?"
results = multi_stage_retrieval(query)
# View results
for i, (doc, score) in enumerate(results):
print(f"Document {i+1} (Score: {score:.4f}):")
print(doc)
print()# Custom retrieval parameters
results = multi_stage_retrieval(
query=query,
initial_k=10, # Initially retrieve 10 documents
final_k=5 # Finally return 5 most relevant documents
)from hybrid_retrieval import hybrid_retrieval, simple_rerank
# Hybrid retrieval
hybrid_results = hybrid_retrieval(query, k=5)
# Get documents and rerank
documents = [doc for doc, _ in hybrid_results]
reranked_results = simple_rerank(query, documents, top_k=3)hybrid_retrieval.py- Main retrieval and reranking functionalitymultistage_retrieval.py- Simplified version of multi-stage retrievalmyrag.py- Basic RAG implementationmain.py- Example main programtest_*.py- Various test files
- Vector Database: FAISS (Facebook AI Similarity Search)
- Text Embeddings: sentence-transformers/all-MiniLM-L6-v2
- Sparse Retrieval: rank-bm25
- Reranking: scikit-learn (TF-IDF)
- Deep Learning: PyTorch, Transformers
Query: How do transformers handle long sequences?
Stage 1: Hybrid retrieval...
Initial results:
1. (Score: 0.6543) Transformers use self-attention mechanisms...
2. (Score: 0.5432) Recurrent Neural Networks (RNNs)...
...
Stage 2: Reranking...
Final results:
Document 1 (Score: 0.3388):
Transformers use self-attention mechanisms to process sequences in parallel...
Document 2 (Score: 0.3234):
Long Short-Term Memory (LSTM) networks are a type of RNN...
Some cross-encoder models (like cross-encoder/ms-marco-MiniLM-L-6-v2) produce NaN values in certain environments. Solutions:
- Use BGE reranker models as alternatives
- Implemented TF-IDF as a stable backup option
- Added error handling and multi-model retry mechanisms
-
Add More Reranking Models:
- BGE-reranker-large
- ColBERT
- SPLADE
-
Performance Optimization:
- Document index caching
- Batch processing
- GPU acceleration
-
Feature Enhancement:
- Query expansion
- Document chunking strategies
- Relevance feedback
Issues and Pull Requests are welcome!
MIT License