A production-ready Retrieval-Augmented Generation (RAG) system that combines vector search with LLM capabilities to answer questions from your documents.
Retrieval-Augmented Generation (RAG) is an AI architecture that enhances LLM responses by retrieving relevant context from a knowledge base before generating answers. This eliminates hallucinations and enables AI to answer questions about your private documents.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β RAG ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Document π Query π¬ Response
β β β²
βΌ βΌ β
βββββββββββββ βββββββββββββ βββββββββββββ
β Chunking β β Embed β β LLM β
β & Embed β β Query β β Generate β
βββββββ¬ββββββ βββββββ¬ββββββ βββββββ¬ββββββ
β β β
βΌ βΌ β
βββββββββββββββββββββββββββββββββββββββββββββββ β
β ποΈ PINECONE VECTOR DATABASE ββββββββββββββββββββββ
β β Retrieved
β [ββββ] doc-1 similarity: 0.92 β Context
β [ββββ] doc-2 similarity: 0.87 β
β [ββββ] doc-3 similarity: 0.81 β
βββββββββββββββββββββββββββββββββββββββββββββββ
| Feature | Description |
|---|---|
| π€ Document Upload | Upload PDFs and text files via web UI |
| π Semantic Search | Find relevant content using vector similarity |
| π― Reranking | Improve search accuracy with BGE reranker |
| π¬ Chat Interface | Modern, responsive chat UI |
| π§ LLM Integration | Groq's Llama 3.3 70B for fast responses |
| π Context Display | View retrieved sources for transparency |
| π Session Memory | Multi-turn conversations with context |
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SYSTEM OVERVIEW β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββ
β Browser β
β (Chat UI) β
ββββββββ¬βββββββ
β
HTTP POST /api/chat, /api/ingest
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β EXPRESS.JS SERVER β
β βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ β
β β /api/chat β β /api/ingest β β /api/health β β
β β β β β β β β
β β β’ Receive msg β β β’ Upload file β β β’ Index stats β β
β β β’ RAG search β β β’ Chunk text β β β’ Health check β β
β β β’ LLM call β β β’ Store embeds β β β β
β ββββββββββ¬βββββββββ ββββββββββ¬βββββββββ βββββββββββββββββββ β
βββββββββββββΌββββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ
β PINECONE β β PDF LOADER β β GROQ β
β Vector Store β β Text Splitter β β LLM (Llama 3) β
β β β β β β
β β’ Store vectors β β β’ Parse PDFs β β β’ Generate answer β
β β’ Semantic search β β β’ Chunk @ 500 β β β’ Tool calling β
β β’ BGE reranking β β β’ 100 overlap β β β’ Fast inference β
βββββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββββ
- Node.js 18+ installed
- Pinecone account (free tier works)
- Groq account (free tier works)
git clone <your-repo-url>
cd RAG
npm installCreate a .env file:
# Pinecone - Get from https://console.pinecone.io
PINECONE_API_KEY=pcsk_xxxxxxxxxxxxx
# Groq - Get from https://console.groq.com
GROQ_API_KEY=gsk_xxxxxxxxxxxxx
# Optional
OPENAI_API_KEY=sk-xxxxxxxxxxxxxThe index uses integrated embeddings (Pinecone generates embeddings automatically):
# First time only - creates index with llama-text-embed-v2 model
npm run devOr manually via Pinecone CLI:
pc index create -n rag-embedded-index -m cosine -c aws -r us-east-1 \
--model llama-text-embed-v2 --field_map text=contentnpm start
# or for development with hot reload:
npm run serverNavigate to http://localhost:3000 and start chatting!
RAG/
βββ π server.js # Express server with API endpoints
βββ π index.js # Document ingestion utilities
βββ π public/
β βββ π index.html # Chat interface (single-page app)
βββ π data/ # Sample documents
βββ π uploads/ # Temporary upload storage
βββ π package.json # Dependencies
βββ π .env # Environment variables
βββ π README.md # You are here!
POST /api/chat
Content-Type: application/json
{
"message": "What are the key ML concepts?",
"sessionId": "optional-session-id"
}Response:
{
"response": "Based on the knowledge base, key ML concepts include...",
"toolsUsed": ["rag_search"],
"context": "[Source 1] (Score: 0.92)\nML basics include..."
}POST /api/ingest
Content-Type: multipart/form-data
file: <PDF or TXT file>Response:
{
"success": true,
"message": "Successfully ingested document.pdf",
"chunksCreated": 15,
"totalRecords": 24
}GET /api/healthResponse:
{
"status": "ok",
"index": "rag-embedded-index",
"records": 24
}βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DOCUMENT INGESTION PIPELINE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π PDF/TXT File
β
βΌ
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β Upload βββββΆβ Parse βββββΆβ Chunk βββββΆβ Store β
β (Multer) β β (PDFLoader) β β (500 chars) β β (Pinecone) β
βββββββββββββββ βββββββββββββββ βββββββββββββββ βββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Chunk 1: "..." β
β Chunk 2: "..." β
β Chunk 3: "..." β
β ... β
βββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Pinecone β
β Auto-Embeds β
β (llama-text) β
βββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β QUERY PROCESSING FLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π€ User: "What is transformer architecture?"
β
βΌ
βββββββββββββββ
β Groq LLM βββββ Decides to call rag_search tool
ββββββββ¬βββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PINECONE SEARCH β
β β
β Query: "transformer architecture" β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SEMANTIC SEARCH (Top 6) β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β BGE RERANKER (Top 3) β β
β β β β
β β #1 [0.92] "Transformer architecture uses attention..." β β
β β #2 [0.87] "Attention mechanism allows the model..." β β
β β #3 [0.81] "Tokenization is the process of..." β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββ
β Groq LLM βββββ Generates answer using retrieved context
ββββββββ¬βββββββ
β
βΌ
π¬ "Transformer architecture is a neural network design that uses
self-attention mechanisms to process sequences..."
| Category | Technology | Purpose |
|---|---|---|
| Runtime | Node.js 18+ | JavaScript runtime |
| Server | Express 5.x | HTTP server & routing |
| Vector DB | Pinecone | Vector storage & search |
| Embeddings | llama-text-embed-v2 | Text to vectors (integrated) |
| Reranker | bge-reranker-v2-m3 | Result reranking |
| LLM | Groq (Llama 3.3 70B) | Response generation |
| PDF Parsing | LangChain PDFLoader | Document extraction |
| Chunking | RecursiveCharacterTextSplitter | Text segmentation |
| File Upload | Multer | Multipart form handling |
| Metric | Value | Notes |
|---|---|---|
| Embedding Dimension | 1024 | llama-text-embed-v2 |
| Chunk Size | 500 chars | With 100 char overlap |
| Search + Rerank | ~200ms | Pinecone serverless |
| LLM Response | ~1-3s | Groq inference |
| Max Upload | ~10MB | PDF/TXT files |
Modify in server.js:
const splitter = new RecursiveCharacterTextSplitter({
chunkSize: 500, // Characters per chunk
chunkOverlap: 100 // Overlap between chunks
});const results = await index.namespace(NAMESPACE).searchRecords({
query: {
topK: 6, // Initial candidates
inputs: { text: query }
},
rerank: {
model: "bge-reranker-v2-m3",
topN: 3, // Final results after reranking
rankFields: ["content"]
}
});- Multi-file batch upload - Upload multiple documents at once
- Document management - Delete/update specific documents
- Namespace support - Separate knowledge bases per user/topic
- Streaming responses - Real-time token streaming
- Authentication - User login and access control
- Analytics dashboard - Query logs and usage metrics
- Hybrid search - Combine semantic + keyword search
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing) - Open a Pull Request
MIT License - feel free to use this project for learning or production!
- Pinecone for vector database infrastructure
- Groq for blazing-fast LLM inference
- LangChain for document processing utilities
Built with β€οΈ for the AI Engineering community
β Star this repo β’ π Report Bug β’ β¨ Request Feature
