Skip to content

Latest commit

 

History

History
76 lines (57 loc) · 3.36 KB

File metadata and controls

76 lines (57 loc) · 3.36 KB

RAG Telegram Assistant

Python aiogram OpenAI License

An intelligent Telegram bot powered by a Retrieval-Augmented Generation (RAG) pipeline, built from scratch to answer questions based on a custom knowledge base. It handles both text and voice messages, maintains conversation history, and cites its sources.

This project serves as a clear, practical demonstration of how to build a modern LLM assistant without high-level frameworks like LangChain, offering a deep dive into the mechanics of a RAG pipeline.

➡️ For a detailed component breakdown and logic, see ARCHITECTURE.md.

🚀 Key Features

  • Data-Grounded Responses (RAG): The bot uses documents from a knowledge base as its primary source of truth, preventing confabulation.
  • Voice Support: Integrated speech recognition via OpenAI allows users to ask questions using voice messages.
  • Conversation Memory: Remembers recent messages to maintain a coherent dialogue.
  • Source Citations: Cites the source document for answers drawn from the knowledge base.
  • Flexible Configuration: Key parameters (GPT model, relevance threshold, chunk size) are managed in a central config file.
  • Usage Limiting: A built-in system to control the number of requests per user.

🛠️ How It Works

  1. Indexing: Local .txt files from the data/vectordb directory are loaded, split into smaller chunks, and vectorized using OpenAI's models. These vectors are stored in a local ChromaDB instance.
  2. Retrieval: When a user asks a question, it's also vectorized. The system then searches the database for the most semantically similar text chunks.
  3. Generation: The retrieved chunks, conversation history, and the user's original question are combined into a comprehensive prompt, which is sent to the GPT model to generate a final, context-aware answer.

⚙️ Getting Started

Prerequisites

Installation & Setup

1. Clone the repository:

git clone https://github.com/your-username/rag-telegram-assistant.git
cd rag-telegram-assistant

2. Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate

3. Install dependencies:

pip install -r requirements.txt

4. Configure environment variables: Create a .env file in the project root by copying .env.example or creating it from scratch. Fill in your API keys:

TELEGRAM_BOT_TOKEN="Your token from @BotFather"
OPENAI_API_KEY="Your key from OpenAI"

5. Prepare your knowledge base: Place your custom .txt files into the data/vectordb directory.

6. Create the vector index: Run this script once to index your documents. Re-run it whenever you update the knowledge base.

python src/run_indexer.py

7. Run the bot:

python src/main.py

Your assistant is now live and ready to chat in Telegram!