A production-ready RAG (Retrieval-Augmented Generation) chatbot system designed specifically for Vietnamese public administrative services. This system provides accurate, context-aware responses about government procedures, FAQs, and public service guidelines using advanced NLP and vector search technologies.
- Features
- Architecture
- Tech Stack
- Prerequisites
- Installation
- Configuration
- Usage
- API Documentation
- Deployment
- Testing
- Performance Optimization
- Monitoring & Logging
- Contributing
- Troubleshooting
- License
This chatbot brings powerful AI capabilities to help citizens access public services more easily.
- Intelligent RAG System: Combines vector search (FAISS) with LLM for accurate responses
- Multilingual Support: Optimized for Vietnamese with multilingual embedding models
- High Performance: Sub-second response time with intelligent caching
- Production-Ready: Security-hardened with rate limiting, CORS, and API key management
- Context Analysis: Advanced context relevance scoring and filtering
- Smart Caching: LRU cache with configurable TTL for repeated queries
- Docker Support: Fully containerized for easy deployment
- Monitoring: Comprehensive logging with trace IDs and performance metrics
- Hot Reload: Dynamic data updates without system restart
- Hybrid Search: Combines FAISS semantic search with BM25 keyword search using RRF fusion
- Re-ranking: CrossEncoder model (
ms-marco-MiniLM-L-6-v2) re-scores retrieved documents for better relevance - Semantic Search: L2-normalized embeddings for accurate similarity matching
- Streaming Responses: Real-time token streaming via Server-Sent Events (SSE)
- Threshold-based Filtering: Intelligent fallback for low-confidence results
- Source Attribution: Every response includes verifiable source references
- Batch Processing: Optimized embedding generation for large datasets
- Chat History Context: Maintains conversation context for follow-up questions
- Sub-path Deployment: Configurable BASE_PATH for flexible deployment scenarios
- Health Checks: Liveness and readiness probes for orchestration
- Error Recovery: Graceful degradation with retry mechanisms
- Rich Markdown Support: Full GFM (GitHub Flavored Markdown) rendering with tables, code blocks, and more
- Query Processing: User query → Embedding generation → Vector normalization
- Hybrid Retrieval:
- FAISS similarity search (semantic)
- BM25 keyword search
- Reciprocal Rank Fusion (RRF) to combine results
- Re-ranking: CrossEncoder re-scores documents for relevance
- Generation: Context assembly → Prompt construction → LLM streaming inference
- Response: Streaming answer with real-time tokens → Source attribution → Cache storage
- Delivery: Server-Sent Events (SSE) stream with contexts and metadata
- FastAPI: Modern, high-performance web framework
- Uvicorn/Gunicorn: ASGI server with worker management
- Pydantic: Data validation and settings management
- Sentence Transformers: Multilingual embedding generation
- Model:
paraphrase-multilingual-MiniLM-L12-v2
- Model:
- FAISS: Efficient vector similarity search (Facebook AI)
- BM25 (Okapi): Keyword-based search using rank-bm25
- CrossEncoder: Document re-ranking for relevance
- Model:
cross-encoder/ms-marco-MiniLM-L-6-v2
- Model:
- Groq: High-performance LLM API
- Default Model:
openai/gpt-oss-120b
- Default Model:
- NumPy: Numerical operations and vector manipulation
- scikit-learn: Normalization and preprocessing utilities
- rank-bm25: BM25 keyword search implementation
- Python JSON Logger: Structured logging for production
- Docker: Containerization and deployment
- Docker Compose: Multi-service orchestration
- Python dotenv: Environment variable management
- Python: 3.12.3 or higher (3.10+ minimum supported)
- RAM: Minimum 1.5GB (2GB+ recommended for production)
- Disk Space: ~3GB for models and dependencies
- OS: Linux, macOS, or Windows
- GPU: CUDA-compatible GPU for faster embedding generation
- Docker: Version 20.10+ with Docker Compose
You'll need a Groq API Key for LLM functionality:
- Get your free API key at: https://console.groq.com
Choose the installation method that best fits your needs:
git clone https://github.com/PhucHuwu/ChatBot_Dich_vu_cong.git
cd ChatBot_Dich_vu_cong# Using venv
python -m venv venv
# Activate on Linux/macOS
source venv/bin/activate
# Activate on Windows
venv\Scripts\activate# Install production dependencies
pip install -r requirements.txt
# Or for development (includes testing tools)
pip install -r requirements-dev.txt# Copy example environment file (if available)
cp .env.example .env
# Edit .env with your configuration
# REQUIRED: Set your GROQ_API_KEY# Build FAISS index from data sources
python -c "from rag import build_index; build_index()"# Development mode
uvicorn app:app --reload --host 0.0.0.0 --port 8000
# Production mode
gunicorn app:app \
--workers 4 \
--worker-class uvicorn.workers.UvicornWorker \
--bind 0.0.0.0:8000 \
--timeout 120 \
--log-level infoRecommended for production environments and quick setup.
# Create .env file with required variables
echo "GROQ_API_KEY=your_api_key_here" > .env
# Build and start services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop services
docker-compose down# Build image
docker build -t chatbot-dichvucong:latest .
# Run container
docker run -d \
--name chatbot \
-p 8000:8000 \
-e GROQ_API_KEY=your_api_key \
-v $(pwd)/embeddings:/app/embeddings \
-v $(pwd)/data:/app/data:ro \
chatbot-dichvucong:latest# Install development dependencies
pip install -r requirements-dev.txt
# Run with auto-reload
uvicorn app:app --reload --host 0.0.0.0 --port 8000Create a .env file in the project root. See .env.example for a complete list of configuration options.
Required configuration:
# Required - Get your API key from https://console.groq.com
GROQ_API_KEY=your_groq_api_key_hereImportant optional configurations:
# Deployment path - Important for sub-path deployments
# Leave empty "" for root deployment: https://domain.com/
# Set to sub-path for nested deployment: https://domain.com/chatbot
# Must be synchronized with frontend/config.js BASE_PATH
BASE_PATH=/chatbot
# CORS - Customize for your domain
ALLOWED_ORIGINS=https://yourdomain.gov.vn,https://api.yourdomain.gov.vn
# Re-ranking - Improve result relevance (default: enabled)
ENABLE_RERANKING=True
RERANKING_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2
RERANKING_TOP_K=5
# Hybrid Search - Combine semantic + keyword search (default: enabled)
ENABLE_HYBRID_SEARCH=True
HYBRID_FUSION_METHOD=rrf
BM25_WEIGHT=0.5
VECTOR_WEIGHT=0.5Other optional configurations (with sensible defaults):
- Application settings (APP_ENV, DEBUG, HOST, PORT, WORKERS)
- LLM configuration (model, temperature, max tokens, timeout, reasoning effort)
- Embedding settings (model, batch size, device)
- Vector search parameters (similarity threshold, top K results)
- Caching, logging, rate limiting, security options
For the complete list of configuration options with detailed explanations, see .env.example.
The system automatically validates critical configurations on startup (see config.py):
GROQ_API_KEYis set- Data directory exists
- Debug mode disabled in production
- CORS origins properly configured
-
Access the Frontend
http://localhost:8000/frontend/index.html -
Interact with the Chatbot
- Type your question in Vietnamese
- Press Enter or click Send
- View responses with source attributions
- Supports markdown formatting in responses
The API uses Server-Sent Events (SSE) for real-time streaming responses:
curl -N -X POST "http://localhost:8000/api/chat/stream" \
-H "Content-Type: application/json" \
-d '{
"query": "Làm thế nào để đăng ký thường trú?",
"chat_history": [],
"conversation_id": "conv-123"
}'The response streams multiple events via SSE:
1. Metadata Event (sent first):
{
"type": "metadata",
"query": "Làm thế nào để đăng ký thường trú?",
"contexts": [
{
"text": "Thủ tục đăng ký thường trú...",
"type": "guide",
"category": "Đăng ký cư trú",
"href": "https://dichvucong.gov.vn/...",
"title": "Đăng ký thường trú"
}
],
"sources": [
{
"source": "Nguồn 1",
"type": "guide",
"title": "Đăng ký thường trú",
"href": "https://dichvucong.gov.vn/..."
}
]
}2. Content Events (streamed incrementally):
{
"type": "content",
"content": "Để đăng ký thường trú, bạn cần..."
}3. Done Event (sent last):
{
"type": "done",
"process_time": 1.234,
"trace_id": "abc123-def456-ghi789",
"success": true
}4. Error Event (if an error occurs):
{
"type": "error",
"error": "Error message",
"trace_id": "abc123-def456-ghi789",
"success": false
}When you update data files in the data/ directory (such as faq.json and guide.json):
Note: Data JSON files are excluded from version control but are required for the application to function. Make sure they exist locally.
./scripts/rebuild_index.sh.\scripts\rebuild_index.ps1from rag import build_index
build_index(batch_size=32)| Endpoint | Method | Description | Auth Required |
|---|---|---|---|
/ |
GET | Frontend homepage | No |
/health |
GET | Basic health check | No |
/api/status |
GET | Detailed system status | No |
/api/chat/stream |
POST | Chat with streaming response (SSE) | No |
/api/build |
POST | Rebuild vector index | No |
/api/cache/stats |
GET | Get cache statistics | No |
/api/cache/clear |
POST | Clear cache | No |
/api/suggestions |
GET | Get suggested questions | No |
/api/docs |
GET | Interactive API docs (Swagger) | No |
/api/redoc |
GET | API documentation (ReDoc) | No |
Most important endpoints:
POST /api/chat/stream- Main chat endpoint with streaming response (SSE)GET /health- Health checkGET /api/status- System status with cache, device info, re-ranker and hybrid search status
Example Chat Request:
curl -N -X POST "http://localhost:8000/api/chat/stream" \
-H "Content-Type: application/json" \
-d '{
"query": "Thủ tục cấp CMND mất cần gì?",
"chat_history": [],
"conversation_id": "conv-123"
}'Streaming response includes:
- Multiple SSE events with
data:prefix metadataevent - Retrieved contexts and sourcescontentevents - Streamed answer tokensdoneevent - Completion with process time and trace_iderrorevent - Error details if something fails
When EXPOSE_DOCS=True, access:
- Swagger UI:
http://localhost:8000/api/docs - ReDoc:
http://localhost:8000/api/redoc
- Set
APP_ENV=productionin environment variables - Set
DEBUG=Falsein environment variables - Configure
ALLOWED_ORIGINSwith your actual domain(s) - Configure
BASE_PATHif deploying to a sub-path (must sync with frontend/config.js) - Set
EXPOSE_DOCS=Falsefor production security - Use valid
GROQ_API_KEYfrom Groq console - Configure re-ranking:
ENABLE_RERANKING=True(recommended) - Configure hybrid search:
ENABLE_HYBRID_SEARCH=True(recommended) - Set up HTTPS/TLS termination (Nginx/Traefik/Caddy)
- Configure reverse proxy with proper timeouts (at least 60s for streaming)
- Set up log aggregation if needed
- Set up automated backups for
embeddings/directory (includes FAISS and BM25 indexes) - Enable rate limiting with
ENABLE_RATE_LIMIT=Trueif needed - Configure firewall rules for your infrastructure
- Set appropriate resource limits (CPU/Memory) based on load
- Build indexes before deploying:
python -c "from rag import build_index; build_index()"
# 1. Build production image
docker build -t chatbot-dichvucong:v1.0.0 .
# 2. Run with production settings
docker run -d \
--name chatbot-production \
--restart unless-stopped \
-p 8000:8000 \
-e APP_ENV=production \
-e DEBUG=False \
-e GROQ_API_KEY=${GROQ_API_KEY} \
-e WORKERS=2 \
-e LOG_LEVEL=INFO \
-e ENABLE_CACHE=True \
-e CACHE_MAX_SIZE=500 \
-e CACHE_TTL=3600 \
-v $(pwd)/embeddings:/app/embeddings \
-v $(pwd)/data:/app/data:ro \
--memory="1536M" \
--cpus="1.0" \
chatbot-dichvucong:v1.0.0We welcome contributions from the community! Whether you're fixing bugs, adding features, improving documentation, or reporting issues, your help is appreciated.
- Fork and clone the repository
- Create a feature branch:
git checkout -b feature/your-feature-name - Make your changes following our code style guidelines
- Write/update tests for your changes
- Commit using Conventional Commits format (e.g.,
feat:,fix:,docs:) - Submit a pull request with a clear description
- Style: Follow PEP 8, use Black for formatting
- Testing: Maintain 70%+ test coverage
- Documentation: Add docstrings for public functions
- Type hints: Required for new code
# Run these checks
pytest # All tests must pass
black . # Format code
flake8 # Lint codeFor comprehensive information on:
- Development setup and environment configuration
- Detailed coding standards and best practices
- Testing guidelines and coverage requirements
- Pull request process and review criteria
- Issue reporting templates
- Community guidelines and communication
Please read our CONTRIBUTING.md guide.
This project adheres to a Code of Conduct. By participating, you are expected to uphold this code. Please report unacceptable behavior through GitHub issues or contact maintainers directly.
Problem:
FileNotFoundError: embeddings/faiss_index.bin not found
Solution:
# Rebuild the index
python -c "from rag import build_index; build_index()"Problem:
ValueError: GROQ_API_KEY is required
Solution:
# Set API key in .env file
echo "GROQ_API_KEY=your_key_here" >> .envProblem:
WARNING: CUDA not available, using CPU
Solution:
# Install CUDA-enabled PyTorch
pip install torch --index-url https://download.pytorch.org/whl/cu118
# Or force CPU mode
echo "EMBEDDING_DEVICE=cpu" >> .envProblem:
OSError: [Errno 48] Address already in use
Solution:
# Find and kill process using port 8000
# Linux/macOS:
lsof -ti:8000 | xargs kill -9
# Windows:
netstat -ano | findstr :8000
taskkill /PID <PID> /F
# Or use different port
uvicorn app:app --port 8001Problem:
RuntimeError: CUDA out of memory
Solution:
# Reduce batch size
echo "EMBEDDING_BATCH_SIZE=16" >> .env
# Or use CPU
echo "EMBEDDING_DEVICE=cpu" >> .envDiagnosis:
# Check logs for timing breakdown
docker logs chatbot | grep "duration_ms"Solutions:
- Enable caching:
ENABLE_CACHE=True - Reduce
TOP_K_DEFAULTto 5-7 - Increase
SIMILARITY_THRESHOLDto 1.0 - Use GPU:
EMBEDDING_DEVICE=cuda - Disable re-ranking if not needed:
ENABLE_RERANKING=False - Disable hybrid search if not needed:
ENABLE_HYBRID_SEARCH=False - Reduce
INITIAL_RETRIEVAL_MULTIPLIERto 2 if re-ranking is enabled
Problem:
FileNotFoundError: embeddings/bm25_index.pkl not found
Solution:
# Rebuild both FAISS and BM25 indexes
python -c "from rag import build_index; build_index()"
# Or disable hybrid search if not needed
echo "ENABLE_HYBRID_SEARCH=False" >> .envProblem:
ERROR: failed to solve: process "/bin/sh -c pip install..." did not complete
Solution:
# Clear Docker cache
docker builder prune -a
# Build with no cache
docker build --no-cache -t chatbot-dichvucong .Enable detailed logging:
# In .env
DEBUG=True
LOG_LEVEL=DEBUG./scripts/health_check.sh.\scripts\health_check.ps1See the scripts/ directory for all available scripts.
If you encounter any issues, we're here to help:
- Check Documentation: Review this README and inline code comments
- Search Issues: Check existing GitHub Issues for similar problems
- Enable Debug Logging: Set
LOG_LEVEL=DEBUGfor detailed diagnostics - Create Issue: If the problem persists, please create a new issue with logs, configuration, and steps to reproduce
This project is licensed under the MIT License - see the LICENSE file for details.
- PhucHuwu - Project Creator and Maintainer
- FastAPI - Web framework
- Sentence Transformers - Embedding models
- FAISS - Vector search by Meta AI
- Groq - LLM inference platform
- Hugging Face - Model hosting
This project was created to improve access to Vietnamese public administrative services through AI-powered assistance.
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- FastAPI Documentation
- FAISS Wiki
- Sentence Transformers Documentation
- Groq API Documentation
- Docker Documentation
Made by Phuc Tran Huu and his friends - ITPTIT