A production-oriented Retrieval-Augmented Generation (RAG) system that ingests YouTube videos, stores semantically searchable transcript chunks in a FAISS vector database, and answers user questions using an LLM with optional video-level filtering.
This project focuses on correct RAG architecture, including lazy lifecycle handling, metadata-aware retrieval, and clean separation of ingestion, retrieval, and generation layers.
- 📥 Ingest YouTube transcripts via API
- ✂️ Robust text chunking with overlap
- 🧠 Semantic embeddings using Sentence Transformers
- 🗂️ Persistent FAISS vector storage
- 🏷️ Chunk-level metadata (video_id, timestamps)
- 🔍 Filtered retrieval per video or across all videos
- 🤖 LLM-powered answers (Groq / LLaMA)
- ⚙️ Lazy loading (safe startup with empty index)
- 🔎 Vector database inspection & debugging utilities
YouTube Video ID
↓
YouTube Transcript API
(text + start + duration)
↓
Transcript Segments
↓
Text Chunking
↓
LangChain Documents
(page_content + metadata)
↓
Embedding Model
(Sentence Transformers)
↓
FAISS Vector Store
(vectors + metadata)
User Question (+ optional video_id)
↓
Query Embedding
↓
FAISS Similarity Search
(global or filtered)
↓
Relevant Chunks
↓
RAG Chain
(context + question)
↓
LLM
↓
Final Answer
Each stored chunk includes structured metadata:
{
"video_id": "abc123",
"start": 120.5,
"end": 134.8
}This enables:
- Video-specific querying
- Source attribution (future-ready)
- Timestamp-based answers
- Clean deletion or re-indexing per video
youtube-rag/
├── app/
│ ├── ingestion/
│ │ ├── youtube_loader.py
│ │ ├── splitter.py
│ │ └── embed_store.py
│ ├── retrieval/
│ │ └── langchain_retriever.py
│ ├── chains/
│ │ ├── rag_chain.py
│ │ └── prompts.py
│ ├── schemas/
│ │ ├── ingest.py
│ │ └── query.py
│ └── main.py
├── vectorstore/
│ └── faiss_index/
│ ├── index.faiss
│ └── index.pkl
├── inspect_faiss.py
├── config.py
├── requirements.txt
└── README.md
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtexport GROQ_API_KEY=your_groq_api_keyuvicorn app.main:app --reloadAPI documentation is available at:
http://127.0.0.1:8000/docs
POST /ingest
{
"video_id": "aMARZGTbULc"
}Ingestion steps:
- Fetch transcript
- Chunk text
- Generate embeddings
- Persist vectors with metadata in FAISS
POST /ask
{
"question": "How does HTTPS work?"
}{
"question": "Explain the TLS handshake",
"video_id": "aMARZGTbULc"
}Use the inspection utility:
python inspect_faiss.pyThis allows you to:
- Verify stored chunks
- Inspect metadata
- Debug retrieval quality
- Understand what context the LLM receives
- Lazy initialization of vector store and RAG chain
- Stateless application startup
- Clear separation of concerns
- Metadata-first retrieval design
- Production-safe lifecycle handling
- No source citations in responses
- No conversational memory
- No per-video deletion endpoint
- API-only (no frontend)
- 📌 Source citations with timestamps
- 🧹 Delete or reindex individual videos
- 💬 Conversational RAG
- 📊 Video-level relevance ranking
- 🖥️ Frontend interface
A metadata-aware YouTube RAG system that ingests transcripts, stores semantically searchable chunks in FAISS, and answers questions using filtered retrieval and an LLM.