SIT Chatbot RAG System

A comprehensive Retrieval-Augmented Generation (RAG) chatbot system for Singapore Institute of Technology (SIT), featuring hybrid search capabilities with FAISS vector similarity and BM25 keyword matching, powered by OpenAI GPT and ElevenLabs voice integration.

Features

Intelligent RAG System: Hybrid search combining FAISS vector similarity and BM25 keyword matching
Voice Integration: Speech-to-Text and Text-to-Speech using ElevenLabs APIs
Interactive Frontend: React-based chat interface with animated otter mascot
Dual Input Modes: Support for both text and voice interactions
Real-time Processing: Fast query processing with optimized vector database
SIT-Specific Knowledge: Trained on SIT course information and institutional data

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Frontend      │    │   Backend        │    │   RAG System   │
│   (React/JS)    │◄──►│   (Node.js)      │◄──►│   (Python)      │
│   Port 3000     │    │   Port 3000      │    │   Port 8000     │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                       │                       │
         │                       │                       │
         ▼                       ▼                       ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Voice APIs    │    │   File Upload    │    │   Vector DB     │
│   (ElevenLabs)  │    │   (Multer)       │    │   (LanceDB)     │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Prerequisites

Before running the system, ensure you have the following installed:

Node.js (v16 or higher)
Python (v3.8 or higher)
FFmpeg (for audio processing)

API Keys Required

OpenAI API Key: For GPT-based response generation
ElevenLabs API Key: For speech-to-text and text-to-speech functionality

Installation & Setup

1. Clone the Repository

git clone https://github.com/Finance-LLMs/SIT-Chatbot-RAG.git
cd SIT-Chatbot-RAG

2. Download Vector Database

⚠️ Important: The LanceDB database is not included in the repository due to Git LFS limits.

Download the database from Google Drive: LanceDB Database
Extract the zip file
Copy the data folder from the extracted files
Place it in the root directory

Your structure should look like:

SIT-Chatbot-RAG/
├── data/
│   └── vector-index-lancedb/
│       ├── bm25_index.lance/
│       └── faiss_index.lance/
├── SIT-chatbot-main/
│   ├── backend/
│   ├── src/
│   └── ...
├── SITCHATBOTLLM/
│   ├── server.py
│   ├── requirements.txt
│   └── ...
└── ...

3. Backend Setup (RAG System)

cd SITCHATBOTLLM

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Set up environment variables
cp .env.example .env  # Create if doesn't exist
# Edit .env and add your OpenAI API key

4. Frontend/API Server Setup

cd SIT-chatbot-main

# Install Node.js dependencies
npm install

# Set up environment variables
cd backend
cp .env.example .env  # Create if doesn't exist
# Edit .env and add your API keys

5. Environment Configuration

Create .env files with the following structure:

Backend (.env in `SIT-chatbot-main/backend/`)

OPENAI_API_KEY=your_openai_api_key_here
ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

RAG System (.env in `SITCHATBOTLLM/`)

OPENAI_API_KEY=your_openai_api_key_here

Running the System

The system requires three components to run simultaneously:

1. Start the RAG Backend (Python)

cd SITCHATBOTLLM
python server.py

Runs on: http://localhost:8000
Provides: Chat completions endpoint with RAG functionality

2. Start the API Server (Node.js)

cd SIT-chatbot-main/backend
node server.js

Runs on: http://localhost:3000
Provides: Frontend serving, API proxy, voice processing

3. Build Frontend (if needed)

cd SIT-chatbot-main
npm run build  # Build the frontend

4. Access the Application

Open your browser and navigate to: http://localhost:3000

Usage

Text Mode

Type your question in the chat input
Click "Send" or press Enter
Receive AI-powered responses based on SIT knowledge base

Voice Mode

Click the microphone icon to start recording
Speak your question clearly
The system will transcribe, process, and respond with both text and voice

Example Queries

"What courses does SIT offer in cloud computing?"
"Tell me about the admission requirements"
"What are the fees for the DevOps program?"
"How long is the Cloud Computing course?"

Technical Details

Vector Database

Storage: LanceDB with 4,076+ indexed documents
Search: Hybrid FAISS + BM25 for optimal retrieval
Content: SIT course catalogs, program information, policies

Voice Processing

STT Model: ElevenLabs Scribe v1 with English language optimization
TTS Model: ElevenLabs Turbo v2.5 for natural speech synthesis
Audio Optimization: 16kHz, mono, PCM format using FFmpeg

API Endpoints

POST /api/chat - Chat completions (proxy to RAG system)
POST /api/speech-to-text - Convert audio to text
POST /api/text-to-speech - Convert text to audio

Project Structure

SIT-Chatbot-RAG/
├── README.md                          # This file
├── SIT-chatbot-main/                  # Frontend & API Server
│   ├── backend/
│   │   ├── server.js                  # Node.js Express server
│   │   ├── .env                       # API keys
│   │   └── uploads/                   # Temporary audio files
│   ├── src/
│   │   ├── app.js                     # React frontend logic
│   │   ├── index.html                 # Main HTML file
│   │   └── styles.css                 # Styling
│   ├── dist/                          # Built frontend files
│   ├── package.json                   # Node.js dependencies
│   └── webpack.config.js              # Build configuration
├── SITCHATBOTLLM/                     # RAG Backend
│   ├── server.py                      # FastAPI RAG server
│   ├── bm25_chunk_search.py           # Search implementation
│   ├── requirements.txt               # Python dependencies
│   └── .env                           # OpenAI API key
└── data/                              # Additional data files
    └── vector-index-lancedb/          # LanceDB storage

📄 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
SIT-chatbot-main		SIT-chatbot-main
SITCHATBOTLLM		SITCHATBOTLLM
nginx-config		nginx-config
systemd-configs		systemd-configs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SIT_RAG_Chatbot_Documentation.docx		SIT_RAG_Chatbot_Documentation.docx
debug-chat-issues.sh		debug-chat-issues.sh
fix-frontend-service.sh		fix-frontend-service.sh
lancedb_bm25_migration_updated.py		lancedb_bm25_migration_updated.py
service-management-guide.md		service-management-guide.md
setup-pm2-services.sh		setup-pm2-services.sh
setup-systemd-services.sh		setup-systemd-services.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SIT Chatbot RAG System

Features

Architecture

Prerequisites

API Keys Required

Installation & Setup

1. Clone the Repository

2. Download Vector Database

3. Backend Setup (RAG System)

4. Frontend/API Server Setup

5. Environment Configuration

Backend (.env in `SIT-chatbot-main/backend/`)

RAG System (.env in `SITCHATBOTLLM/`)

Running the System

1. Start the RAG Backend (Python)

2. Start the API Server (Node.js)

3. Build Frontend (if needed)

4. Access the Application

Usage

Text Mode

Voice Mode

Example Queries

Technical Details

Vector Database

Voice Processing

API Endpoints

Project Structure

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

License

Finance-LLMs/SIT-Chatbot-RAG

Folders and files

Latest commit

History

Repository files navigation

SIT Chatbot RAG System

Features

Architecture

Prerequisites

API Keys Required

Installation & Setup

1. Clone the Repository

2. Download Vector Database

3. Backend Setup (RAG System)

4. Frontend/API Server Setup

5. Environment Configuration

Backend (.env in SIT-chatbot-main/backend/)

RAG System (.env in SITCHATBOTLLM/)

Running the System

1. Start the RAG Backend (Python)

2. Start the API Server (Node.js)

3. Build Frontend (if needed)

4. Access the Application

Usage

Text Mode

Voice Mode

Example Queries

Technical Details

Vector Database

Voice Processing

API Endpoints

Project Structure

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Backend (.env in `SIT-chatbot-main/backend/`)

RAG System (.env in `SITCHATBOTLLM/`)

Packages