An intelligent Telegram bot powered by a Retrieval-Augmented Generation (RAG) pipeline, built from scratch to answer questions based on a custom knowledge base. It handles both text and voice messages, maintains conversation history, and cites its sources.
This project serves as a clear, practical demonstration of how to build a modern LLM assistant without high-level frameworks like LangChain, offering a deep dive into the mechanics of a RAG pipeline.
➡️ For a detailed component breakdown and logic, see ARCHITECTURE.md.
- Data-Grounded Responses (RAG): The bot uses documents from a knowledge base as its primary source of truth, preventing confabulation.
- Voice Support: Integrated speech recognition via OpenAI allows users to ask questions using voice messages.
- Conversation Memory: Remembers recent messages to maintain a coherent dialogue.
- Source Citations: Cites the source document for answers drawn from the knowledge base.
- Flexible Configuration: Key parameters (GPT model, relevance threshold, chunk size) are managed in a central config file.
- Usage Limiting: A built-in system to control the number of requests per user.
- Indexing: Local
.txtfiles from thedata/vectordbdirectory are loaded, split into smaller chunks, and vectorized using OpenAI's models. These vectors are stored in a local ChromaDB instance. - Retrieval: When a user asks a question, it's also vectorized. The system then searches the database for the most semantically similar text chunks.
- Generation: The retrieved chunks, conversation history, and the user's original question are combined into a comprehensive prompt, which is sent to the GPT model to generate a final, context-aware answer.
- Python 3.10+
- An OpenAI API Key
- A Telegram Bot Token
1. Clone the repository:
git clone https://github.com/your-username/rag-telegram-assistant.git
cd rag-telegram-assistant2. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows, use: venv\Scripts\activate3. Install dependencies:
pip install -r requirements.txt4. Configure environment variables:
Create a .env file in the project root by copying .env.example or creating it from scratch. Fill in your API keys:
TELEGRAM_BOT_TOKEN="Your token from @BotFather"
OPENAI_API_KEY="Your key from OpenAI"5. Prepare your knowledge base:
Place your custom .txt files into the data/vectordb directory.
6. Create the vector index: Run this script once to index your documents. Re-run it whenever you update the knowledge base.
python src/run_indexer.py7. Run the bot:
python src/main.pyYour assistant is now live and ready to chat in Telegram!