BiocBot is an AI-powered study assistant platform that enables students to interact with course material in a chat-based format. Instructors can upload documents (PDFs, DOCX, or TXT), which are automatically parsed, chunked, and embedded into a vector database (Qdrant) for semantic search. When a student asks a question, the system retrieves relevant chunks and generates a response grounded in course content.
- Document Management: Upload and organize course materials
- Vector Search: Semantic search across documents using Qdrant
- AI Chat Interface: Student interaction with course content
- Assessment Questions: Create and manage course assessments
- Course Structure: Organize content by units/lectures
- User Management: Separate interfaces for instructors and students
BiocBot follows a split architecture with a public frontend and a private backend, adhering to clear separation of concerns for maintainability and security.
- Frontend: HTML + Vanilla JS (no frameworks), styled via separate CSS files
- Backend: Node.js (Express), built with modular architecture
- Database: MongoDB (for documents, user sessions, analytics)
- Vector Database: Qdrant for semantic search and similarity retrieval
- Embeddings: Ollama with nomic-embed-text model
- Document Processing: UBC GenAI Toolkit modules
- Node.js v18.x or higher
- MongoDB instance
- Qdrant vector database (Docker recommended)
- Ollama with nomic-embed-text model
git clone <repository-url>
cd tlef-biocbot
npm install
Create a .env
file in the root directory with the following variables:
# MongoDB Connection
MONGO_URI=mongodb://localhost:27017/biocbot
# Server Port
TLEF_BIOCBOT_PORT=8080
# Qdrant Configuration
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=super-secret-dev-key
# Embeddings Provider Configuration
EMBEDDING_PROVIDER=ubc-genai-toolkit-llm
# LLM Provider Settings (for Embeddings)
LLM_PROVIDER=ollama
LLM_API_KEY=nokey
LLM_ENDPOINT=http://localhost:11434
LLM_EMBEDDING_MODEL=nomic-embed-text
LLM_DEFAULT_MODEL=llama3.1
docker run -p 6333:6333 qdrant/qdrant
ollama pull nomic-embed-text
ollama serve
npm run dev
BiocBot now includes advanced vector search capabilities through Qdrant integration:
- Automatic Document Processing: Documents are automatically chunked, embedded, and stored
- Semantic Search: Find relevant content using natural language queries
- Course-Aware Search: Filter results by course and lecture
- Real-time Indexing: New documents are immediately searchable
GET /api/qdrant/status
- Check Qdrant service statusPOST /api/qdrant/process-document
- Process and store documentPOST /api/qdrant/search
- Semantic search across documentsDELETE /api/qdrant/document/:id
- Delete document chunksGET /api/qdrant/collection-stats
- Get collection statistics
Visit /qdrant-test
to test the Qdrant functionality:
- Process test documents
- Perform semantic searches
- View collection statistics
- Access: Navigate to
/instructor
- Onboarding: Complete course setup
- Upload Documents: Add course materials to units
- Create Questions: Build assessments for students
- Publish Units: Make content available to students
- Access: Navigate to
/student
- Course Selection: Choose your course
- Assessment: Complete calibration questions
- Chat Interface: Ask questions about course material
- Semantic Search: Find relevant content using natural language
tlef-biocbot/
├── public/ # Frontend assets
│ ├── instructor/ # Instructor interface
│ ├── student/ # Student interface
│ └── qdrant-test.html # Qdrant testing page
├── src/ # Backend source
│ ├── models/ # Data models
│ ├── routes/ # API routes
│ ├── services/ # Business logic
│ └── server.js # Main server file
└── documents/ # Course documentation
- QdrantService: Handles vector database operations
- Document Processing: Automatic chunking and embedding
- Semantic Search: Vector similarity search
- Course Management: Structured content organization
- ✅ Phase 1: Backend pipeline with Qdrant integration
- ✅ Document Upload: File and text document support
- ✅ Vector Search: Semantic document retrieval
- 🔄 Assessment System: Question creation and management
- 🔄 Student Interface: Chat-based learning experience
This project follows clean architecture principles optimized for clarity, maintainability, and junior developer readability. All code should be:
- Modular: Single responsibility functions and classes
- Documented: Comprehensive docblocks and inline comments
- Accessible: Clear variable names and logical flow
- Secure: Input validation and error handling
ISC License