This project demonstrates a Bangla-language FAQ assistant built using Retrieval-Augmented Generation (RAG) inside a Google Colab notebook.
The system retrieves relevant Bangla FAQ content using semantic embeddings + FAISS, applies metadata-based filtering, and generates grounded responses using an LLM â preventing hallucinations and improving answer relevance for a low-resource language.
đ¯ Primary Goal: Build a proof-of-concept RAG system that supports Bangla queries with topic-aware routing and safe fallback behavior.
-
Most RAG examples focus on English
-
Bangla is a low-resource language in AI
-
Hallucination is risky in FAQ and helpdesk systems
-
This project shows how to ground LLMs using retrieval + metadata
-
Bangla customer support bots
-
Educational assistants
-
Government or NGO information systems
-
Domain-specific knowledge assistants
â Bangla input & output support
đ§ LLM-based topic routing
đī¸ Metadata filtering before vector search
đ FAISS-powered semantic similarity search
đ§ą Retrieval-Augmented Generation (RAG)
đĢ Safe fallback when no relevant context is found
đģ Fully implemented inside a single Colab notebook
User Question (Bangla)
|
v
[ LLM Category Router ]
|
v
[ Metadata-Based Filtering ]
|
v
[ FAISS Similarity Search ]
|
v
[ RAG Answer Generation ]
Design Goal: Ensure the LLM never answers beyond retrieved Bangla knowledge.
| Component | Technology |
|---|---|
| Language | Python |
| Notebook | Google Colab |
| Embeddings | Bengali SBERT (l3cube-pune) |
| Vector Store | FAISS |
| RAG Framework | LangChain |
| LLM | OpenAI GPT-4.1-Nano (GitHub Inference API) |
- Open the notebook:
notebooks/bangla_faq_rag.ipynb
- Set API token in Colab:
os.environ["GITHUB_TOKEN"] = "your_api_key"
- Run cells sequentially:
-
Install dependencies
-
Load and embed Bangla FAQ data
-
Build FAISS vector store
-
Ask questions interactively
đŠđģâđģ āĻāĻĒāύāĻžāϰ āĻĒā§āϰāĻļā§āύāĻāĻŋ āĻŦāϞā§āύ: āĻĒāϰāĻŋāĻŦāĻžāϰā§āϰ āϏāĻžāĻĨā§ āĻā§āϰāĻŽāĻŖā§āϰ āϏā§āĻŦāĻŋāϧāĻž āĻā§?
[LLM Router] Category: āĻā§āϰāĻŽāĻŖ
[User Question] āĻĒāϰāĻŋāĻŦāĻžāϰā§āϰ āϏāĻžāĻĨā§ āĻā§āϰāĻŽāĻŖā§āϰ āϏā§āĻŦāĻŋāϧāĻž āĻā§?
[Metadata Filtering] Category: āĻā§āϰāĻŽāĻŖ
đĩđģ [Similarity Search Results for Query] āĻĒāϰāĻŋāĻŦāĻžāϰā§āϰ āϏāĻžāĻĨā§ āĻā§āϰāĻŽāĻŖā§āϰ āϏā§āĻŦāĻŋāϧāĻž āĻā§?
>> āĻĒāϰāĻŋāĻŦāĻžāϰā§āϰ āϏāĻžāĻĨā§ āĻā§āϰāĻŽāĻŖ āĻāύāύā§āĻĻ āĻŦāĻžā§āĻžā§āĨ¤
>> āĻā§āϰāĻŽāĻŖ āĻŽāĻžāύā§āώāĻā§ āύāϤā§āύ āĻ
āĻāĻŋāĻā§āĻāϤāĻž āĻĻā§ā§āĨ¤
>> āĻā§āϰāĻŽāĻŖ āĻā§āĻŦāύā§āϰ āĻāĻāĻā§ā§ā§āĻŽāĻŋ āĻĻā§āϰ āĻāϰā§āĨ¤
đĄāĻĒā§āϰāϤā§āϝāĻžāĻļāĻŋāϤ āĻāϤā§āϤāϰ: āĻĒāϰāĻŋāĻŦāĻžāϰā§āϰ āϏāĻžāĻĨā§ āĻā§āϰāĻŽāĻŖā§āϰ āϏā§āĻŦāĻŋāϧāĻž āĻšāϞ⧠āĻāĻāĻŋ āĻāύāύā§āĻĻ āĻŦāĻžā§āĻžā§, āύāϤā§āύ āĻ
āĻāĻŋāĻā§āĻāϤāĻž āĻĻā§ā§ āĻāĻŦāĻ āĻā§āĻŦāύā§āϰ āĻāĻāĻā§ā§ā§āĻŽāĻŋ āĻĻā§āϰ āĻāϰā§āĨ¤
If no relevant context exists:
āĻĻā§āĻāĻāĻŋāϤ, āĻāĻ āĻŦāĻŋāώāϝāĻŧā§ āĻāĻŽāĻžāϰ āĻāϤā§āϤāϰāĻāĻŋ āĻāĻžāύāĻž āύā§āĻāĨ¤
đ https://drive.google.com/file/d/1esH30Cl80EC0Y_Rs6qH8HvHFda2DvjhN/view
-
Grounded answers
-
Topic-aware retrieval
-
No hallucinated responses
-
Implemented Retrieval-Augmented Generation from scratch
-
Used metadata-aware vector filtering
-
Applied LLM-based reasoning for intent classification
-
Worked with low-resource Bangla embeddings
-
Designed a safe, explainable AI assistant
-
Convert notebook into FastAPI service
-
Add automatic evaluation metrics (Recall@K)
-
Persist FAISS index to disk
-
Add document ingestion from PDFs / web
-
Deploy as a lightweight web app