A microservices-based Retrieval-Augmented Generation (RAG) system for lecture content, designed to transcribe, embed, and enable semantic search across lecture recordings. Originally optimized for the NVIDIA Jetson Orin Nano edge device.
This system enables users to upload lecture recordings, automatically transcribe them using speech-to-text, generate vector embeddings, and perform semantic search using RAG. The architecture follows a microservices pattern with gRPC communication between services.
- UI - Next.js frontend for uploading lectures and querying content
- BFF - Spring Boot backend-for-frontend service that orchestrates communication between microservices
- Lecture Search Service - Rust service that coordinates transcription, embedding, and storage operations
- Qdrant Client Service - Rust service managing vector database operations
- Service Embedding - Python service generating vector embeddings using sentence transformers
- Service Speech-to-Text - Python/Flask service for transcribing audio using Whisper
- Qdrant - Vector database for storing and querying embeddings
Frontend
- Next.js 16 with React 19
- TypeScript
- Tailwind CSS
- Radix UI components
Backend Services
- BFF: Java 17 with Spring Boot 3.5 and Spring gRPC
- Lecture Search Service: Rust with Tonic (gRPC)
- Qdrant Client Service: Rust with Tonic (gRPC)
ML Services
- Speech-to-Text: primeline/whisper-large-v3-turbo-german
- Embedding: google/embeddinggemma-300m via sentence-transformers
Infrastructure
- Vector Store: Qdrant
- Communication: gRPC with Protocol Buffers
- Containerization: Docker & Docker Compose
- CI/CD: Jenkins
- Docker and Docker Compose
- Hugging Face token (for accessing the embedding model)
-
Add your Hugging Face token to the Docker Compose file for the embedding service
-
Start all services:
docker compose up
The services will be available at:
- UI: Check docker-compose.yml for port mappings
- BFF: Port 40999
- Lecture Search Service: Port 40998
- Qdrant Dashboard: Port 6333
- Qdrant gRPC: Port 6334
The repository follows a monorepo structure. Each service has its own build configuration:
UI
cd ui
npm install
npm run devBFF
cd bff
./mvnw spring-boot:runRust Services
cd lecture_search_service # or qdrant-client-service
cargo runPython Services
cd service_embedding # or service_speechtotext
pip install -r requirements.txt
python main.pyThe ML inference services (Whisper speech-to-text and embedding generation) were specifically optimized to run on the NVIDIA Jetson Orin Nano with 8GB RAM. The other services (UI, BFF, Rust services, Qdrant) can run on standard hardware without special considerations.
Running both the embedding and Whisper models simultaneously on the Jetson Nano can cause memory pressure due to the limited 8GB RAM shared between CPU and GPU.
When running the ML services on Jetson Nano:
- Monitor GPU/CPU memory usage:
tegrastats - Monitor system memory:
watch -n 0.5 free -m - Check for OOM errors:
sudo dmesg | tail -20
If experiencing out-of-memory issues, consider increasing swap space. The Jetson Nano's iGPU shares memory with the CPU, so both GPU and system memory need to be monitored together.
- Upload lecture recording through the UI
- Lecture gets transcribed using Whisper
- Transcription is chunked and embedded using the embedding service
- Vector embeddings are stored in Qdrant
- Users can query lectures using semantic search via RAG
The project uses Jenkins for continuous integration with:
- Automated Docker image builds
- Version management
- Multi-service orchestration
- Container registry publishing (GHCR)
lecture-RAG/
├── ui/ # Next.js frontend
├── bff/ # Spring Boot BFF service
├── lecture_search_service/ # Rust lecture coordination service
├── qdrant-client-service/ # Rust Qdrant client wrapper
├── service_embedding/ # Python embedding service
├── service_speechtotext/ # Python Whisper service
├── openai_speechtotext/ # OpenAI-based transcription (legacy)
├── openai_summarize/ # OpenAI-based summarization (legacy)
├── proto/ # Protocol Buffer definitions
├── jenkins/ # Jenkins pipeline scripts
├── docker-compose.yml # Main orchestration
└── docker-compose.jetson.yml # Jetson-specific config
See individual service directories for license information.
Yanik Luis Recke - recke.yanik@outlook.de