Course Navigator RAG Bot

This project implements a multi-course RAG (Retrieval-Augmented Generation) assistant with a Telegram interface.
The system ingests course materials, builds a lightweight RAPTOR-style index, retrieves relevant knowledge, and answers questions strictly based on course context.

Overview

The system supports:

multiple courses (os-2023, ir-2024, etc.)
PDF ingestion with token-based chunking
RAPTOR-lite index: Level-0 chunks + Level-1 summaries
embeddings for both levels
structured retrieval based on Level-1 similarity
context construction with token-budget enforcement
English answers generated by an LLM
Telegram bot interface
fully containerized deployment (Docker + uv)

Project Structure

project/
  data/
    <course_id>/
      raw/        # original PDFs and materials
      index/      # chunks, summaries, embeddings
  src/
    ingest.py          # PDF ingestion and RAPTOR-lite index builder
    tokenizer.py       # model-based token counter and chunk splitter
    raptor_index.py    # index structures and disk I/O
    rag_pipeline.py    # retrieval + context building + LLM answering
    bot.py             # Telegram bot entry point
    router.py          # aiogram routing (commands, states)
    bot_state.py       # FSM definitions
    config.py          # .env configuration
  Dockerfile
  docker-compose.yml
  pyproject.toml
  README.md

Requirements

Python 3.11+
uv (dependency manager)
Docker (optional, recommended for deployment)
Telegram Bot API token
OpenAI-compatible API key (for embeddings and LLM)

Installation (local)

Install dependencies

uv sync

Build index for a course

uv run python -m ingest <course_id>

Example:

uv run python -m ingest os-2023

Run the Telegram bot

uv run python -m bot

Environment Variables

Create a .env file in the project root:

OPENAI_API_KEY=your_key
OPENAI_MODEL=gpt-4o-mini
EMBEDDING_MODEL=text-embedding-3-large
TELEGRAM_BOT_TOKEN=your_telegram_token

Docker Deployment

Build and run with docker-compose

docker compose build
docker compose up -d

Filesystem layout on server

data/
  os-2023/
    raw/      # upload PDFs here
    index/    # generated automatically by ingest

Build index on the server

docker compose run --rm course-navigator-rag \
  uv run python -m course_navigator_rag.ingest os-2023

How Retrieval Works

Level-1 summaries represent clusters of bottom-level chunks.
A user question is embedded via text-embedding-3-large.
Level-1 summaries are ranked by cosine similarity.
Corresponding Level-0 chunks are collected with a token-budget constraint.
The final context is sent to the LLM (gpt-4o-mini).
The model answers strictly based on retrieved context.

This ensures deterministic, grounded responses with minimal hallucination.

Telegram Bot Flow

/start — choose a course
After choosing, every message is interpreted as a question
The bot retrieves context and replies in English
Interface remains in Russian for comfortable UX

Extending the System

To add a new course:

Create folders:

data/<new_course>/raw
data/<new_course>/index

Upload PDFs into raw/

Run ingestion:

uv run python -m course_navigator_rag.ingest <new_course>

Add the course to AVAILABLE_COURSES in router.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Course Navigator RAG Bot

Overview

Project Structure

Requirements

Installation (local)

Install dependencies

Build index for a course

Run the Telegram bot

Environment Variables

Docker Deployment

Build and run with docker-compose

Filesystem layout on server

Build index on the server

How Retrieval Works

Telegram Bot Flow

Extending the System

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
src		src
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Course Navigator RAG Bot

Overview

Project Structure

Requirements

Installation (local)

Install dependencies

Build index for a course

Run the Telegram bot

Environment Variables

Docker Deployment

Build and run with docker-compose

Filesystem layout on server

Build index on the server

How Retrieval Works

Telegram Bot Flow

Extending the System

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages