A microservices-based RAG chatbot using LangGraph for workflow orchestration, FAISS for retrieval, and FastAPI for all service endpoints. Features include iterative LLM-driven retrieval, centralized logging with SSE streaming, and text-to-speech capabilities.
Tech Stack: Python 3.13 | FastAPI | LangGraph | FAISS | HuggingFace Transformers | OpenAI | Docker Compose
Services:
- Workflow (port 8000): LangGraph-based orchestration with
/runand/ttsendpoints - Retriever (port 8001): FAISS vector search with multi-store support, document management, and configurable chunking
- LLM (port 8002): ChatOpenAI wrapper for answer generation and retrieval decisions
- Frontend (port 8003): React SPA with TTS, vector store management, and document upload
- Logger (port 8004): Centralized log collection with SSE streaming
- TTS (port 8005): External Kokoro TTS service (optional) - TTS_kokoro
# Set required environment variables
$env:OPENAI_API_KEY = "your-key-here"
$env:PATH_TO_FAISS_INDEX = "./faiss_Hugging_index"
# Build and run all services
docker compose build --no-cache --pull
docker compose up -d
# Verify health
curl http://localhost:8000/health # workflow
curl http://localhost:8003 # frontendAccess the UI at http://localhost:8003
app/
├── langgraph_code/ # Workflow orchestration
│ └── src/
│ ├── workflow.py # LangGraph state machine
│ ├── nodes.py # Node implementations
│ ├── wf_api.py # Main FastAPI app
│ └── tts_api.py # TTS proxy endpoint
├── llm/ # LLM service
├── retriever/src/ # Retriever service
│ ├── retriever.py # FastAPI endpoints
│ ├── crud.py # Database CRUD operations
│ ├── database.py # SQLAlchemy models & templates
│ └── faiss_utils.py # FAISS operations
├── frontend/ # React SPA with TypeScript
└── logger_service/ # Centralized logging
tests/ # Unit & integration tests
docker-compose.yml # Service orchestration
- User submits question → Workflow invokes LLM to decide next action (retrieve/answer/clarify)
- LLM requests retrieval → Workflow calls FAISS retriever with targeted query
- Documents returned → LLM evaluates if sufficient to answer (iterates up to 5x)
- Final answer generated → Response includes answer, context, and retrieval count
- TTS playback → Optional audio synthesis via speaker button in UI
The LLM maintains a context summary across iterations to track gathered information and avoid redundant retrievals.
Workflow Service (/)
POST /run- Execute RAG workflow:{"question": "...", "k": 3}POST /tts- Synthesize speech:{"text": "...", "voice": "am_onyx", "speed": 1.0}GET /health,GET /ready- Health checks
LLM Service (/)
POST /retrieve_or_respond- Decide next actionPOST /generate_answer- Generate final answer
Retriever Service (/)
POST /stores/{store_id}/retrieve- FAISS semantic searchGET/POST/PATCH/DELETE /stores- Vector store CRUDPOST /stores/{store_id}/upload- Upload documents (.txt, .md)GET/PATCH/DELETE /stores/{store_id}/documents/{doc_id}- Document managementGET/POST/PATCH/DELETE /templates- Prompt template management
Logger Service (/)
POST /logs- Submit logsGET /stream- SSE log stream
Environment Variables:
OPENAI_API_KEY- Required for LLM servicePATH_TO_FAISS_INDEX- Default FAISS index directory (store 1)CHUNK_SIZE,CHUNK_OVERLAP- Document chunking config (defaults: 4000, 800)MODEL_NAME_EMBEDDING- HuggingFace embedding model (default: sentence-transformers/all-MiniLM-L6-v2)LANGGRAPH_LLM_API_URL- LLM service URL (default:http://localhost:8002)LANGGRAPH_RETRIEVER_API_URL- Retriever URL (default:http://localhost:8001)TTS_SERVICE_URL- TTS service URL (default:http://tts_service:8005)MODEL_NAME_LLM,TEMPERATURE_LLM,MAX_TOKENS- LLM configuration
Run tests:
pytest # All tests
pytest tests/test_llm.py # Specific module
pytest --cov=. # With coverageLocal development without Docker:
# Install dependencies
pip install -r requirements.txt
# Run services individually
uvicorn app.retriever.src.retriever:app --port 8001
uvicorn app.llm.src.llm_api:app --port 8002
uvicorn app.langgraph_code.src.wf_api:app --port 8000- GitHub Actions: Automated testing with ruff linting and pytest
- SonarQube Cloud: Code quality and security analysis
- Snyk: Dependency vulnerability scanning
All Docker containers run as non-root appuser for security.
- Run external Kokoro TTS service on port 8005
- Configure Docker networks: add
tts_kokoro_networkto langgraph service - Set
TTS_SERVICE_URLenvironment variable - Click speaker icons in UI to play audio
Note: HTTP is used for internal Docker communication as services are network-isolated.
The React frontend was developed with GitHub Copilot as an experiment in AI-accelerated learning. See AI_DEVELOPMENT.md for details on this approach and insights gained.
