Skip to content

Designed a Bangla-language FAQ assistant using Retrieval-Augmented Generation (RAG) with metadata-aware filtering to reduce hallucination and improve answer relevance for low-resource languages.

Notifications You must be signed in to change notification settings

Methila-Meem/AI_Bangla_FAQ_Bot_RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

🇧🇩 Bangla FAQ Bot using RAG (Retrieval-Augmented Generation)


🚀 Project Overview

This project demonstrates a Bangla-language FAQ assistant built using Retrieval-Augmented Generation (RAG) inside a Google Colab notebook.

The system retrieves relevant Bangla FAQ content using semantic embeddings + FAISS, applies metadata-based filtering, and generates grounded responses using an LLM — preventing hallucinations and improving answer relevance for a low-resource language.

đŸŽ¯ Primary Goal: Build a proof-of-concept RAG system that supports Bangla queries with topic-aware routing and safe fallback behavior.


🌍 Why This Project Matters

  • Most RAG examples focus on English

  • Bangla is a low-resource language in AI

  • Hallucination is risky in FAQ and helpdesk systems

  • This project shows how to ground LLMs using retrieval + metadata

Applicable Use Cases

  • Bangla customer support bots

  • Educational assistants

  • Government or NGO information systems

  • Domain-specific knowledge assistants


🧩 Key Features

✅ Bangla input & output support

🧠 LLM-based topic routing

đŸ—‚ī¸ Metadata filtering before vector search

🔎 FAISS-powered semantic similarity search

🧱 Retrieval-Augmented Generation (RAG)

đŸšĢ Safe fallback when no relevant context is found

đŸ’ģ Fully implemented inside a single Colab notebook


đŸ—ī¸ System Architecture

User Question (Bangla)
        |
        v
[ LLM Category Router ]
        |
        v
[ Metadata-Based Filtering ]
        |
        v
[ FAISS Similarity Search ]
        |
        v
[ RAG Answer Generation ]

Design Goal: Ensure the LLM never answers beyond retrieved Bangla knowledge.


đŸ› ī¸ Tech Stack

Component Technology
Language Python
Notebook Google Colab
Embeddings Bengali SBERT (l3cube-pune)
Vector Store FAISS
RAG Framework LangChain
LLM OpenAI GPT-4.1-Nano (GitHub Inference API)

âš™ī¸ How to Run (Google Colab)

  1. Open the notebook:
notebooks/bangla_faq_rag.ipynb
  1. Set API token in Colab:
os.environ["GITHUB_TOKEN"] = "your_api_key"
  1. Run cells sequentially:
  • Install dependencies

  • Load and embed Bangla FAQ data

  • Build FAISS vector store

  • Ask questions interactively


â–ļī¸ Example Interaction

👩đŸģ‍đŸ’ģ āφāĻĒāύāĻžāϰ āĻĒā§āϰāĻļā§āύāϟāĻŋ āĻŦāϞ⧁āύ: āĻĒāϰāĻŋāĻŦāĻžāϰ⧇āϰ āϏāĻžāĻĨ⧇ āĻ­ā§āϰāĻŽāϪ⧇āϰ āϏ⧁āĻŦāĻŋāϧāĻž āϕ⧀?
[LLM Router] Category: āĻ­ā§āϰāĻŽāĻŖ

[User Question] āĻĒāϰāĻŋāĻŦāĻžāϰ⧇āϰ āϏāĻžāĻĨ⧇ āĻ­ā§āϰāĻŽāϪ⧇āϰ āϏ⧁āĻŦāĻŋāϧāĻž āϕ⧀?

[Metadata Filtering] Category: āĻ­ā§āϰāĻŽāĻŖ

đŸ•ĩđŸģ [Similarity Search Results for Query] āĻĒāϰāĻŋāĻŦāĻžāϰ⧇āϰ āϏāĻžāĻĨ⧇ āĻ­ā§āϰāĻŽāϪ⧇āϰ āϏ⧁āĻŦāĻŋāϧāĻž āϕ⧀?
 >> āĻĒāϰāĻŋāĻŦāĻžāϰ⧇āϰ āϏāĻžāĻĨ⧇ āĻ­ā§āϰāĻŽāĻŖ āφāύāĻ¨ā§āĻĻ āĻŦāĻžā§œāĻžā§ŸāĨ¤
 >> āĻ­ā§āϰāĻŽāĻŖ āĻŽāĻžāύ⧁āώāϕ⧇ āύāϤ⧁āύ āĻ…āĻ­āĻŋāĻœā§āĻžāϤāĻž āĻĻā§‡ā§ŸāĨ¤
 >> āĻ­ā§āϰāĻŽāĻŖ āĻœā§€āĻŦāύ⧇āϰ āĻāĻ•āĻ˜ā§‡ā§Ÿā§‡āĻŽāĻŋ āĻĻā§‚āϰ āĻ•āϰ⧇āĨ¤

💡āĻĒā§āϰāĻ¤ā§āϝāĻžāĻļāĻŋāϤ āωāĻ¤ā§āϤāϰ: āĻĒāϰāĻŋāĻŦāĻžāϰ⧇āϰ āϏāĻžāĻĨ⧇ āĻ­ā§āϰāĻŽāϪ⧇āϰ āϏ⧁āĻŦāĻŋāϧāĻž āĻšāϞ⧋ āĻāϟāĻŋ āφāύāĻ¨ā§āĻĻ āĻŦāĻžā§œāĻžā§Ÿ, āύāϤ⧁āύ āĻ…āĻ­āĻŋāĻœā§āĻžāϤāĻž āĻĻā§‡ā§Ÿ āĻāĻŦāĻ‚ āĻœā§€āĻŦāύ⧇āϰ āĻāĻ•āĻ˜ā§‡ā§Ÿā§‡āĻŽāĻŋ āĻĻā§‚āϰ āĻ•āϰ⧇āĨ¤

If no relevant context exists:

āĻĻ⧁āσāĻ–āĻŋāϤ, āĻāχ āĻŦāĻŋāώāϝāĻŧ⧇ āφāĻŽāĻžāϰ āωāĻ¤ā§āϤāϰāϟāĻŋ āϜāĻžāύāĻž āύ⧇āχāĨ¤

📊 Demo & Output

đŸŽĨ Demo Video:

👉 https://drive.google.com/file/d/1esH30Cl80EC0Y_Rs6qH8HvHFda2DvjhN/view

📌 Output Characteristics:

  • Grounded answers

  • Topic-aware retrieval

  • No hallucinated responses


🧠 Learning & Engineering Highlights

  • Implemented Retrieval-Augmented Generation from scratch

  • Used metadata-aware vector filtering

  • Applied LLM-based reasoning for intent classification

  • Worked with low-resource Bangla embeddings

  • Designed a safe, explainable AI assistant


🔮 Future Improvements

  • Convert notebook into FastAPI service

  • Add automatic evaluation metrics (Recall@K)

  • Persist FAISS index to disk

  • Add document ingestion from PDFs / web

  • Deploy as a lightweight web app


About

Designed a Bangla-language FAQ assistant using Retrieval-Augmented Generation (RAG) with metadata-aware filtering to reduce hallucination and improve answer relevance for low-resource languages.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published