Skip to content

ubc/tlef-biocbot

Repository files navigation

BiocBot - AI-Powered Study Assistant

BiocBot is an AI-powered study assistant platform that enables students to interact with course material in a chat-based format. Instructors can upload documents (PDFs, DOCX, or TXT), which are automatically parsed, chunked, and embedded into a vector database (Qdrant) for semantic search. When a student asks a question, the system retrieves relevant chunks and generates a response grounded in course content.

🚀 Features

  • Document Management: Upload and organize course materials
  • Vector Search: Semantic search across documents using Qdrant
  • AI Chat Interface: Student interaction with course content
  • Assessment Questions: Create and manage course assessments
  • Course Structure: Organize content by units/lectures
  • User Management: Separate interfaces for instructors and students

🏗️ Architecture

BiocBot follows a split architecture with a public frontend and a private backend, adhering to clear separation of concerns for maintainability and security.

Tech Stack

  • Frontend: HTML + Vanilla JS (no frameworks), styled via separate CSS files
  • Backend: Node.js (Express), built with modular architecture
  • Database: MongoDB (for documents, user sessions, analytics)
  • Vector Database: Qdrant for semantic search and similarity retrieval
  • Embeddings: Ollama with nomic-embed-text model
  • Document Processing: UBC GenAI Toolkit modules

🛠️ Setup & Installation

Prerequisites

  • Node.js v18.x or higher
  • MongoDB instance
  • Qdrant vector database (Docker recommended)
  • Ollama with nomic-embed-text model

1. Clone and Install

git clone <repository-url>
cd tlef-biocbot
npm install

2. Environment Configuration

Create a .env file in the root directory with the following variables:

# MongoDB Connection
MONGO_URI=mongodb://localhost:27017/biocbot

# Server Port
TLEF_BIOCBOT_PORT=8080

# Qdrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=super-secret-dev-key

# Embeddings Provider Configuration
EMBEDDING_PROVIDER=ubc-genai-toolkit-llm

# LLM Provider Settings (for Embeddings)
LLM_PROVIDER=ollama
LLM_API_KEY=nokey
LLM_ENDPOINT=http://localhost:11434
LLM_EMBEDDING_MODEL=nomic-embed-text
LLM_DEFAULT_MODEL=llama3.1

3. Start Services

Start Qdrant (Docker)

docker run -p 6333:6333 qdrant/qdrant

Start Ollama with nomic-embed-text

ollama pull nomic-embed-text
ollama serve

Start BiocBot

npm run dev

🔍 Qdrant Integration

BiocBot now includes advanced vector search capabilities through Qdrant integration:

Features

  • Automatic Document Processing: Documents are automatically chunked, embedded, and stored
  • Semantic Search: Find relevant content using natural language queries
  • Course-Aware Search: Filter results by course and lecture
  • Real-time Indexing: New documents are immediately searchable

API Endpoints

  • GET /api/qdrant/status - Check Qdrant service status
  • POST /api/qdrant/process-document - Process and store document
  • POST /api/qdrant/search - Semantic search across documents
  • DELETE /api/qdrant/document/:id - Delete document chunks
  • GET /api/qdrant/collection-stats - Get collection statistics

Testing the Integration

Visit /qdrant-test to test the Qdrant functionality:

  • Process test documents
  • Perform semantic searches
  • View collection statistics

📚 Usage

For Instructors

  1. Access: Navigate to /instructor
  2. Onboarding: Complete course setup
  3. Upload Documents: Add course materials to units
  4. Create Questions: Build assessments for students
  5. Publish Units: Make content available to students

For Students

  1. Access: Navigate to /student
  2. Course Selection: Choose your course
  3. Assessment: Complete calibration questions
  4. Chat Interface: Ask questions about course material
  5. Semantic Search: Find relevant content using natural language

🔧 Development

Project Structure

tlef-biocbot/
├── public/                 # Frontend assets
│   ├── instructor/        # Instructor interface
│   ├── student/          # Student interface
│   └── qdrant-test.html  # Qdrant testing page
├── src/                   # Backend source
│   ├── models/           # Data models
│   ├── routes/           # API routes
│   ├── services/         # Business logic
│   └── server.js         # Main server file
└── documents/            # Course documentation

Key Components

  • QdrantService: Handles vector database operations
  • Document Processing: Automatic chunking and embedding
  • Semantic Search: Vector similarity search
  • Course Management: Structured content organization

🚧 Current Status

  • Phase 1: Backend pipeline with Qdrant integration
  • Document Upload: File and text document support
  • Vector Search: Semantic document retrieval
  • 🔄 Assessment System: Question creation and management
  • 🔄 Student Interface: Chat-based learning experience

🤝 Contributing

This project follows clean architecture principles optimized for clarity, maintainability, and junior developer readability. All code should be:

  • Modular: Single responsibility functions and classes
  • Documented: Comprehensive docblocks and inline comments
  • Accessible: Clear variable names and logical flow
  • Secure: Input validation and error handling

📄 License

ISC License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •