Skip to content

lugnitdgp/TDOC-Campus-Companion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ“ Campus Companion

AI-Powered Campus Information Assistant for NIT Durgapur


πŸ› οΈ Tech Stack

Component Technology
Backend API FastAPI + Uvicorn
Database SQLite3
Vector Storage ChromaDB (internally using SQLite3)
Embeddings Sentence Transformers β†’ 384-dim vectors
Frontend Streamlit
PDF Loading PyPDF Loader
Classification Keyword β†’ Logistic Regression (ML) β†’ LLM
LLM Open Source Model from HuggingFace: Mistral-7B-Instruct

πŸ“Š System Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     CAMPUS COMPANION SYSTEM                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                  β”‚
β”‚  USER QUERY β†’ FastAPI Backend β†’ 3-Level Classification           β”‚
β”‚                                                                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  INTENT CLASSIFIER (core/classifier.py)                    β”‚  β”‚
β”‚  β”‚                                                            β”‚  β”‚
β”‚  β”‚  Level 1: Keyword Matching (⚑ 0.001s) - 70% queries        β”‚  β”‚
β”‚  β”‚  Level 2: ML Classifier (⚑⚑ 0.01s) - 25% queries           β”‚  β”‚
β”‚  β”‚  Level 3: LLM (Mistral-7B) (⚑⚑⚑ 1-2s) - 5% queries         β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                         β”‚                                        β”‚
β”‚         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                        β”‚
β”‚         β–Ό               β–Ό               β–Ό                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚  β”‚ DATABASE   β”‚  β”‚ RAG SYSTEM β”‚  β”‚ AI FALLBACKβ”‚                  β”‚
β”‚  β”‚            β”‚  β”‚            β”‚  β”‚            β”‚                  β”‚ 
β”‚  β”‚ β€’ Canteen  β”‚  β”‚ β€’ ChromaDB β”‚  β”‚ Mistral-7B β”‚                  β”‚
β”‚  β”‚ β€’ Faculty  β”‚  β”‚ β€’ 384-dim  β”‚  β”‚ Generates  β”‚                  β”‚
β”‚  β”‚ β€’ Rooms    β”‚  β”‚   Vectors  β”‚  β”‚ Responses  β”‚                  β”‚
β”‚  β”‚ β€’ Wardens  β”‚  β”‚ β€’ Cosine   β”‚  β”‚            β”‚                  β”‚
β”‚  β”‚ (SQLite)   β”‚  β”‚   Search   β”‚  β”‚            β”‚                  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜                  β”‚
β”‚        β”‚               β”‚               β”‚                         β”‚
β”‚        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                         β”‚
β”‚                        β–Ό                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  AI RESPONSE FORMATTER (core/response.py)                  β”‚  β”‚
β”‚  β”‚  Raw Data β†’ Natural Language (Mistral-7B, Temp: 0.5)       β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                       β–Ό                                          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  JSON RESPONSE                                             β”‚  β”‚
β”‚  β”‚  {"answer": "...", "intent": "...", "confidence": 0.85}    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Project Structure

CAMPUS_COMPANION/
β”œβ”€β”€ api/
β”‚   β”œβ”€β”€ main.py                    # FastAPI app initialization
β”‚   └── routers/
β”‚       └── chat.py                # ⭐ Main chat endpoint (600+ lines)
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ classifier.py              # ⭐ 3-level intent classification (800+ lines)
β”‚   β”œβ”€β”€ rag.py                     # ⭐ RAG system with ChromaDB (190+ lines)
β”‚   β”œβ”€β”€ response.py                # πŸ€– AI response formatter
β”‚   β”œβ”€β”€ fallback_message.py        # πŸ›‘οΈ AI fallback handler
β”‚   └── embeddings.py              # Document chunking & embeddings
β”œβ”€β”€ db/
β”‚   β”œβ”€β”€ models.py                  # ⭐ Database schema (10 tables)
β”‚   └── session.py                 # DB connection
β”œβ”€β”€ scripts/
β”‚   β”œβ”€β”€ ingest_pdfs.py             # PDF β†’ ChromaDB pipeline
β”‚   β”œβ”€β”€ pdf_processor.py           # Text extraction (PyPDF2 + Tesseract)
β”‚   └── chunking.py                # Text chunking logic
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ pdfs/                      # Source PDF documents
β”‚   └── rag_docs/                  # ChromaDB storage
β”œβ”€β”€ frontend.py                    # Streamlit chat UI
β”œβ”€β”€ app.py                         # Database initializer
β”œβ”€β”€ testdb.py                      # Sample data loader
β”œβ”€β”€ requirements.txt               # Python dependencies
β”œβ”€β”€ .env                           # Environment variables
└── campus_companion.db            # SQLite database

πŸš€ Quick Start Guide

οΏ½οΏ½ Prerequisites

Verify the following are installed:

python3 --version    
pip --version
git --version

πŸ”§ Installation

1. Clone and Navigate

git clone <your-repo-url>
cd CAMPUS_COMPANION

2. Create Virtual Environment

# Create virtual environment
python3 -m venv .venv

# Activate (Linux/Mac)
source .venv/bin/activate

# Activate (Windows)
.venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

4. Set Up HuggingFace Token

  1. Visit: https://huggingface.co/settings/tokens
  2. Create token (Read access)
  3. Copy token
  4. Create .env file:
echo "HUGGINGFACEHUB_API_TOKEN=hf_paste_your_token_here" > .env

5. Initialize Database

python3 app.py
python3 testdb.py

6. Set Up PDF Documents

mkdir -p data/pdfs
# Add your PDF documents to data/pdfs/
# Then run:
python3 scripts/ingest_pdfs.py

7. Start Backend

uvicorn api.main:app --reload

8. Test API (in new terminal)

curl -X POST http://localhost:8000/api/chat \
  -H "Content-Type: application/json" \
  -d '{"text":"hello"}'

9. Start Frontend (in another terminal with .venv activated)

streamlit run frontend.py

πŸ“š Implementation Guide

πŸ“… DAY 1: Database Setup & AI Fallback

🎯 Learning Objectives:

  • Understand the 3-level intent classification system
  • Learn database structure and queries
  • Implement AI fallback for graceful error handling

πŸ” The Problem

How does the system know what type of question was asked?

Solution: Progressive complexity with fallback

πŸ† Three-Level Classification System

Level 1: Keyword Matching ⚑ (Fast - 0.001s)
  • Handles: 70% of queries
  • Method: Simple word detection
  • Function: classify_keyword() in classifier.py

Examples:

  • βœ… "Roy canteen phone" β†’ Keywords found β†’ db_contact
  • βœ… "Where is AB-301?" β†’ Keywords found β†’ db_location
  • ❌ "I need to contact the mess" β†’ No exact keywords
Level 2: Machine Learning ⚑⚑ (Medium - 0.01s)
  • Handles: 25% of queries
  • Method: TF-IDF + Logistic Regression
  • Training: Pre-trained on 200+ example queries
  • Function: ml_classify() in classifier.py

How it works:

  • Converts text to numerical features (word importance)
  • Trained model predicts intent
  • Example: "mess contact" β†’ ML recognizes as contact query

When it works:

  • βœ… Variations of known patterns
  • βœ… Synonyms and paraphrases

When it fails:

  • ❌ Completely novel phrasing
  • ❌ Ambiguous questions
Level 3: LLM Classification ⚑⚑⚑ (Slow ~ 1-2s) [HW]

Hints:

  • Sends query to Mistral-7B with instructions
  • "Classify this as: contact/location/rag/small_talk/fallback"
  • Returns intent with reasoning

Example:

  • "Can you help me reach the person in charge of food services?" β†’ LLM understands context β†’ db_contact

πŸ—„οΈ Database Structure

How to create the DB:

  1. Create table models in models.py
  2. Use session.py to connect
  3. Populate db by forming testdb.py

What's in the Database?

  • 8-10 tables in SQLite: Faculty, Canteen, Warden, Room, Building, etc.
  • Fixed schema (columns known in advance)
  • Fast exact matches

Key Functions:

  • try_get_contact(text, session) - Search people/places
  • try_get_location(text, session) - Search rooms/buildings
  • extract_entity_names() - Parse query for names

πŸ›‘οΈ AI Fallback System

Concept: Graceful handling of out-of-scope queries

When Fallback Triggers:

  • Intent classified as "ai_fallback"
  • Database search returns nothing
  • RAG search finds no relevant documents
  • Confidence too low (< 0.3)

Response Generation:

  • Send query + system prompt to Mistral-7B
  • Temperature: 0.7 (more creative for deflection)
  • AI generates polite refusal + redirection + organized response

Key Function: fallback_ai_response(query) in fallback_message.py

πŸ§ͺ Hands-on Exercise

Edit testdb.py and add/populate data of your choice

Functions Involved:

  • session.add() - Add to database
  • session.commit() - Save changes
  • try_get_contact() - Search function

πŸ“… DAY 2: RAG (Retrieval Augmented Generation)

🎯 Learning Objectives:

  • Understand RAG (Retrieval Augmented Generation) concept
  • Learn PDF processing pipeline (extraction β†’ chunking β†’ embedding)
  • Understand semantic search and cosine similarity

πŸ€” The Problem

Student asks: "How to calculate CGPA?"

Why Database Won't Work:

  • ❌ CGPA calculation is a multi-step explanation (not a single data point)
  • ❌ Rules are in PDF documents (Academic Handbook, 50+ pages)
  • ❌ Manual data entry = tedious + error-prone
  • ❌ Rules change β†’ Need to update database every time

Why not raw AI?

  • ❌ LLMs hallucinate
  • ❌ No existing knowledge of the campus rules
  • ❌ CGPA varies from campus to campus

πŸ”„ RAG Pipeline

Document Processing:

PDFs β†’ Extract Text β†’ Split into Chunks β†’ Convert to Vectors β†’ Store in Database

Query Processing:

User Question β†’ Convert to Vector β†’ Find Similar Vectors β†’ Get Text Chunks
Question + Context Chunks β†’ LLM β†’ Natural Answer

πŸ“„ PDF Processing Pipeline

1. PDF Text Extraction

Process:

  1. Open PDF file
  2. Iterate through each page
  3. Extract text layer (embedded text data)
  4. Concatenate all pages

Key Function: extract_text_pypdf2() in pdf_processor.py

2. Text Cleaning

Removes Noise:

  • Page numbers
  • URLs
  • Headers/footers
  • Extra whitespaces

Key Function: clean_text() in pdf_processor.py

3. Quality Check

Check the length of answer and whether it is in readable format

Key Function: validate_extracted_text() in pdf_processor.py

βœ‚οΈ Text Chunking

Why Chunking?

  • Embedding models have token limits for each query
  • Semantic search less accurate with more tokens
  • Higher API cost with larger inputs

Solution: Split into smaller, semantically meaningful pieces

Chunking Parameters:

  • chunk_size
  • chunk_overlap
  • min_chunk_size

Key Function: chunk_text() in chunking.py

πŸ”’ Embeddings

Model Used: all-MiniLM-L6-v2 (Sentence Transformers)

Concept: Convert text into numbers that capture meaning

Example:

Text: "How to calculate CGPA?"
Embedding: [0.234, -0.112, 0.567, ..., 0.891]  (384 numbers)

"CGPA calculation" β†’ [0.12, 0.45, -0.23, ...]
"Grade point average" β†’ [0.15, 0.43, -0.20, ...]  (CLOSE! βœ…)
"Pizza recipe" β†’ [0.87, -0.32, 0.61, ...]  (FAR! ❌)

Key Functions:

  • get_embeddings() in embeddings.py
  • generate_embeddings() in embeddings.py

πŸ—ƒοΈ Vector Database - ChromaDB

Stores all the embeddings for semantic search

πŸ§ͺ Hands-on Exercise

# 1. Add PDF to data folder
cp ~/hostel_rules.pdf data/pdfs/

# 2. Run ingestion
python3 scripts/ingest_pdfs.py

# 3. Check ChromaDB
python3 -c "from core.rag import collection; print(f'{collection.count()} documents')"

# 4. Test query
curl -X POST http://localhost:8000/api/chat \
  -d '{"text":"What are hostel visiting hours?"}'

πŸ“… DAY 3: Intent Classifier

🎯 Learning Objectives:

  • Master the unified classification system
  • Understand the priority-based keyword matching
  • Learn ML-based intent prediction
  • Explore result aggregation strategies

πŸ€” The Problem

When a user asks "Roy canteen phone", how does the system know they want contact information and not location or rules?

Solution: Intent Classification - categorizing user queries into predefined intents

🎯 Intent Types in Campus Companion

Intent Description
db_contact Contact information (phone, email)
db_location Location queries (rooms, buildings)
rag Document-based questions (CGPA rules, policies)
ai_fallback General questions / greetings
small_talk [HW] Conversational queries

πŸ”„ Three-Level Classification Pipeline

Keyword Matching (Fast) β†’ Machine Learning (Accurate) β†’ LLM (Slow but most Accurate) [HW]

πŸ”‘ Keyword Classification

Function: classify_keywords(text: str) -> IntentResult

Purpose: Fast rule-based classification using keyword matching

Priority Order (Matters!):

  1. βœ… Check for RAG keywords β†’ "CGPA", "rules", "policy"
  2. βœ… Check for contact keywords β†’ "phone", "email", "canteen"
  3. βœ… Check for location keywords β†’ "where", "room", "building"
  4. βœ… Default β†’ ai_fallback

Why this order?

  • RAG first because academic queries are most specific
  • Contact/Location second because they have clear entities
  • Fallback last as catch-all

πŸ€– Machine Learning Classifier

Key Class: MLClassifier

Purpose: Learn patterns from training examples using Machine Learning

Components:

  1. TF-IDF Vectorizer - Converts text to numerical features
  2. Logistic Regression - Predicts intent based on learned patterns

🎼 The Orchestrator

Function: UnifiedClassifier.classify()

Purpose: Combine all three classifiers and make final decision

Classification Pipeline:

Step 1: Run keyword classifier (always)
  ↓
Step 2: Run ML classifier (if trained)
  ↓
Step 3: Run LLM classifier (if requested AND confidence < 0.7)
  ↓
Step 4: Aggregate results by taking MAX confidence per intent
  ↓
Step 5: Detect multi-intent queries
  ↓
Step 6: Determine if AI fallback needed
  ↓
Return ClassificationResult

πŸ“Š Result Aggregation Strategy

Why MAX (not AVG)?

  • If one classifier is very confident, it likely found a strong signal
  • Average would dilute strong predictions
  • Example: Keyword (0.90) + ML (0.60) β†’ MAX = 0.90 (better than AVG = 0.75)

[HW] Multi-intent Discussion


πŸ“… DAY 4: Response Generation + Frontend

🎯 Learning Objectives:

  • Understand how raw data is converted to natural language responses
  • Learn the role of AI in response formatting
  • Understand the frontend-backend connection

πŸ€” The Problem

Database returns raw data for "Roy canteen phone":

Raw Output:
name: Roy Canteen
phone: +91-8012345678
email: roy@campus.edu
location: Ground Floor
  • User Experience: ❌ Boring, mechanical, not conversational
  • What Users Expect: βœ… Natural, helpful, human-like response

πŸ’‘ The Solution: AI Response Formatter

Output:

🍽️ Roy Canteen

You can reach Roy Canteen at +91-8012345678 or email them at 
roy@campus.edu. They're located on the Ground Floor!

πŸ”„ Response Flow

RAW DATA (from DB/RAG)
    ↓
AI FORMATTER (response.py)
    ↓
NATURAL LANGUAGE RESPONSE
    ↓
FRONTEND (frontend.py)
    ↓
USER SEES POLISHED ANSWER

πŸ€– AI Response Formatter Architecture

Key Class: ResponseGenerator in response.py

Main Methods:

1. __init__() - Initialization

  • Purpose: Set up LLM (Mistral-7B) and RAG system

2. refine_query(query: str) -> str

  • Purpose: Improve search queries before RAG lookup

3. format_response(query: str, data: str) -> str

  • Purpose: Convert raw data to natural language

4. generate_rag_response(query: str) -> Dict

  • Purpose: Complete RAG pipeline - search docs + generate answer
Helper Functions:

_build_context(documents, max_length)

  • Combines document chunks into one string
  • Stops at 2000 chars (LLM context limit)
  • Labels each source: [Source 1], [Source 2], etc.

_generate_llm_answer(query, context)

  • Sends context + query to Mistral-7B
  • Prompt engineering: "Answer using ONLY context"
  • Prevents hallucination (AI making up facts)

_calculate_confidence(documents)

  • Average relevance score of top 3 chunks
  • Example: (0.92 + 0.87 + 0.81) / 3 = 0.87

_format_sources(documents)

  • Extract metadata: filename, relevance score
  • Show users where answer came from (transparency)

5. _generate_contact_response(query: str) -> Dict

  • Purpose: Format database contact results

6. _generate_location_response(query: str) -> Dict

  • Purpose: Format database location results

7. _generate_ai_fallback_response(query: str) -> Dict

  • Purpose: Handle out-of-scope queries gracefully

8. generate_response(query: str, intent: str) -> Dict

  • Purpose: Main entry point - routes to correct handler

πŸ–₯️ Frontend - Streamlit

What is Streamlit?

  • Streamlit = Python web framework for data apps

Why Streamlit?

  • βœ… Write web UI in pure Python (no HTML/CSS/JavaScript)
  • βœ… Auto-refreshes on code changes
  • βœ… Built-in chat components (st.chat_message, st.chat_input)
  • βœ… Fast prototyping (build UI in 50 lines!)
Key Components:
  1. Page Configuration: st.set_page_config
  2. Sidebar: st.sidebar
  3. Session State: Conversation memory and Chat History [HW]
  4. Chat Input & API Call: st.chat_input

πŸš€ Running the Application

# Start backend
uvicorn api.main:app --reload

# Start frontend (on another terminal with .venv activated)
streamlit run frontend.py

❓ Common Questions

Q: Why separate frontend and backend?

  • A: Scalability. Backend can serve multiple frontends (web, mobile, API users).

Q: Can we use React instead of Streamlit?

  • A: Yes! Backend API is framework-agnostic. Just POST to /api/chat.

Q: Why not format in chat.py directly?

  • A: Separation of concerns. response.py is reusable across different endpoints.

Q: How to deploy to production?

  • A: Backend β†’ Railway/Render. Frontend β†’ Streamlit Cloud (free tier).

πŸ“… DAY 5+6: Chat System + FastAPI

🎯 Learning Objectives:

  • Understand FastAPI application structure
  • Learn request/response flow
  • Master the chat endpoint orchestration

πŸ“„ main.py - The Entry Point

Purpose: Entry point of the backend

Key Idea:

  • Nothing intelligent happens here
  • It does not answer questions
  • It sets up everything needed so other files can work
Key Components:

1. app = FastAPI(...)

  • App is the control center of the backend
  • Every endpoint, rule, and config is attached to it

2. CORS Middleware Block

  • Frontend and backend usually run on different ports
  • Browsers block such requests by default

CORS Configuration:

  • allow_origins β†’ Who can access the backend
  • allow_methods β†’ What HTTP actions are allowed
  • allow_headers β†’ What headers are accepted
  • allow_credentials β†’ Whether cookies/auth can pass

3. init_db()

  • Database tables exist before any request
  • Backend never crashes due to missing tables
  • Reads database models
  • Creates tables if missing
  • Skips if already present

4. app.include_router(...)

  • A router is a group of related endpoints
  • Example: chat routes live in chat.py
  • Connects /api/chat β†’ logic in chat.py
  • Adds structure and modularity

5. Root Endpoint /

  • Helpful for debugging, deployment checks, dev sanity checks

6. Health Check /health

  • Every production backend has a health endpoint
  • It answers only one thing: "Am I alive?"

πŸ“„ chat.py - The Orchestrator

Purpose: Where user input becomes an intelligent response

Responsibilities:

  • Receiving user queries
  • Validating input
  • Classifying intent
  • Fetching data (DB / RAG)
  • Using AI when needed
  • Returning a structured response
Chat Endpoint: /api/chat

Role:

  • Single entry point for all user queries

Handles:

  • Simple greetings
  • Database lookups
  • Document-based questions
  • AI fallback responses

Why one endpoint?

  • Simplifies frontend
  • Centralizes logic
  • Easier to debug and extend
Key Function: chat(request: ChatRequest)

Explanation:

  • This function is the orchestrator β€” it doesn't do everything itself, but controls everything
Request & Response Models

ChatRequest

Purpose:

  • Guarantees valid input
  • Prevents malformed data
  • Makes API predictable

ChatResponse

Purpose:

  • Standardizes backend output
  • Makes frontend rendering easy

Fields:

  • answer β†’ Final message
  • intent β†’ What the system understood
  • confidence β†’ How sure the system is
  • used_fallback β†’ Whether AI was used
  • is_multi_intent β†’ Multiple meanings detected
  • all_intents β†’ Ranked intent candidates
Intent Classification Pipeline

Key Function: classify_detailed

Purpose:

  • The system decides what the user wants, not how to answer yet

Types of intents:

  • db_contact
  • db_location
  • faculty_info
  • rag
  • small_talk [HW][greetings]
  • ai_fallback

Why classification first?

  • Avoids unnecessary DB calls
  • Prevents wrong answers

Important classification outputs:

  • primary_intent
  • confidence
  • needs_fallback
  • is_multi_intent
  • all_intents
Handlers & Data Retrieval

Main routing decision: Based on primary_intent

Important Handler Functions:

  • try_get_contact() - Search for contact information
  • try_get_location() - Search for locations
  • try_get_faculty() - Search for faculty information
  • try_get_rag() - Retrieve from RAG system
  • fallback_ai_response() - Handle unknown queries
Response Formatting

Final step before your query is sent, processed through a number of steps and ready to be printed in JSON format β†’ formatting is required to return in user-friendly form


πŸŽ“ Course Summary

Dear Students,

Over the past 6 days, we built Campus Companion, an AI-powered chatbot that helps students find contact information, locations, and academic policies through a beautiful Streamlit interface.

πŸ—οΈ System Architecture

The system uses a 3-layer architecture:

  1. Frontend (Streamlit for UI)
  2. Backend (FastAPI for API server)
  3. Core Intelligence (classification, database handlers, RAG, and AI formatting)

πŸ”„ Request Flow

When a user asks "Roy canteen phone", the request flows through:

  1. Pydantic validation
  2. 3-level intent classification (keywords/ML/LLM)
  3. Routing to appropriate handler (try_get_contact searches the database with fuzzy matching)
  4. AI formatting (Mistral-7B converts raw data to natural language)
  5. Structured JSON response displayed in the frontend

πŸ› οΈ Technologies Used

Modern Stack:

  • FastAPI (REST API)
  • SQLAlchemy (database ORM)
  • Scikit-learn (ML classification)
  • ChromaDB (vector database for RAG)
  • HuggingFace (embeddings and LLM)

Production-Grade Principles:

  • βœ… Separation of concerns
  • βœ… Graceful degradation (fallback mechanisms)
  • βœ… Comprehensive error handling
  • βœ… Type safety with Pydantic
  • βœ… Extensive logging

πŸ’‘ Key Innovation

Our hybrid approach combines:

  • Structured database queries for contacts/locations
  • RAG (Retrieval-Augmented Generation) for document-based questions like "How to calculate CGPA?"
    • Uses semantic search to find relevant PDF chunks
    • Generates contextual answers

🎯 What You've Learned

  1. Full-stack development (frontend + backend + database)
  2. AI/ML integration (classification, embeddings, LLMs)
  3. Software engineering (clean architecture, error handling, API design)
  4. Real-world application that solves actual campus problems

πŸš€ Real-World Applications

This same architecture can be adapted for:

  • πŸ₯ Hospital assistants
  • 🏒 Corporate helpdesks
  • πŸ›’ E-commerce support
  • πŸ“š Any domain requiring intelligent information retrieval

πŸ”§ Next Steps

You're now ready to:

  • Extend this system (add new intents, multilingual support)
  • Improve accuracy (fine-tune classifiers, better RAG strategies)
  • Deploy to production (Railway/Render/AWS)
  • Add advanced features (voice input, analytics dashboards)

πŸŽ‰ Congratulations!

You didn't just learn to code, you learned to think like a software engineer, understanding:

  • Why each component exists
  • How they communicate
  • When to use different approaches

These are skills that companies actively seek in full-stack AI developers.

Now go build something amazing! πŸš€


πŸ† Skills Mastered

FastAPI + Streamlit + SQLAlchemy + ChromaDB + HuggingFace + RAG + Clean Architecture

Keep coding, keep learning, keep building! πŸ’™


πŸ“ž Support

For questions or issues, please refer to the implementation guide above or contact the development team.


Made with ❀️ for NIT Durgapur

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages