A production-ready Retrieval-Augmented Generation (RAG) system built with Java 17 and Spring Boot 3.
Note: This is the Java implementation. For the Python version, see legacy Python README.
Click the thumbnail above to watch the demo video!
- π Vector Search: In-memory vector store with cosine similarity search
- π Document Processing: PDF, TXT, Markdown support with Apache PDFBox
- π€ Multiple LLM Providers: OpenAI GPT-4, Anthropic Claude
- π¬ Conversation Memory: Context-aware multi-turn conversations
- β‘ Streaming Responses: Server-Sent Events (SSE) for real-time streaming
- π Metrics & Monitoring: Prometheus/Grafana integration with Micrometer
- π Security: Rate limiting, content moderation, audit logging
- π Performance: Response caching, batch processing, performance profiling
- π Analytics: Query tracking, usage statistics, popular queries
- π³ Docker Support: Full containerization with Docker Compose
- π API Documentation: Interactive OpenAPI/Swagger UI
- π§ͺ Testing: JUnit 5, Mockito, comprehensive test coverage
- Java 17 or higher
- Maven 3.6+
- OpenAI or Anthropic API key
# 1. Set your API key
export OPENAI_API_KEY="your-api-key-here"
# 2. Build the project
mvn clean install
# 3. Run the application
mvn spring-boot:run
# 1. Add documents to the documents/ folder
cp your-document.pdf documents/
# 2. Index documents
curl -X POST http://localhost:8000/api/v1/index
# 3. Make your first query
curl -X POST http://localhost:8000/api/v1/query \
-H "Content-Type: application/json" \
-d '{"query": "What is retrieval augmented generation?", "topK": 5}'- API Docs: http://localhost:8000/swagger-ui.html
- Quick Start: guide/java-quick-start.md
- Migration Guide: guide/python-to-java-migration.md
- Full README: README-JAVA.md
Edit src/main/resources/application.properties:
# Server
server.port=8000
# LLM Settings
llm.provider=openai
llm.openai.api-key=${OPENAI_API_KEY}
llm.openai.model=gpt-4
llm.temperature=0.7
llm.max-tokens=2000
# Embedding
embedding.model=sentence-transformers/all-mpnet-base-v2
embedding.dimension=768
# Retrieval
retrieval.top-k=10
retrieval.mode=hybrid
# Chunking
chunking.size=800
chunking.overlap=200
# Security
security.rate-limit.enabled=true
security.rate-limit.requests=100GET /api/v1/health- Health checkGET /api/v1/health/detailed- Detailed health with component statusGET /api/v1/ready- Kubernetes readiness probePOST /query- Query the knowledge basePOST /query/multi-document- Query across multiple specific documentsPOST /stream- Streaming query (Server-Sent Events)GET /docs- Interactive API documentation (Swagger UI)
POST /documents/upload- Upload and index a new documentPOST /index- Index all documents from documents folder
POST /process/audio- Transcribe audio filesPOST /process/image- Extract text from images using OCR
GET /conversation/{session_id}/history- Get conversation historyDELETE /conversation/{session_id}- Clear conversation history
POST /feedback- Submit user feedbackGET /feedback/stats- Get feedback statistics
GET /metrics- Get system metrics (JSON)GET /metrics/prometheus- Prometheus metrics endpointWebSocket /ws/{client_id}- WebSocket for real-time streaming
See guide/quick-start.md for full API reference.
The system includes comprehensive monitoring capabilities:
- Prometheus Metrics: Export metrics at
/metrics/prometheus - Health Checks: Basic (
/health) and detailed (/health/detailed) health endpoints - Performance Profiling: Built-in performance profiler for optimization
- Tracing: Distributed tracing support
- Query Analytics: Track query patterns, performance, and usage statistics
With Docker Compose, Prometheus is automatically configured:
# Access Prometheus UI
# http://localhost:9090For production deployments, see k8s/ directory for Kubernetes manifests with monitoring configured.
Production-ready Docker Compose configuration is available:
# Use production configuration
docker-compose -f docker/docker-compose.prod.yml up -dKubernetes deployment manifests are available in the k8s/ directory:
- Deployment with horizontal pod autoscaling
- Service and Ingress configuration
- ConfigMap and Secrets management
- Persistent volume claims for data storage
- Windows Setup:
guide/windows-setup.mdβ (No Make required - Windows-friendly commands) - Quick Start:
guide/quick-start.md - Docker Setup:
guide/docker-setup-instructions.md - Docker Quick Fix:
guide/docker-quick-fix.md(solves installation errors) - Troubleshooting:
guide/troubleshooting.md
RAGh-Tutor/
βββ pom.xml # Maven dependencies
βββ Dockerfile # Docker configuration
βββ docker-compose-java.yml # Docker Compose setup
βββ build.sh / build.bat # Build scripts
βββ prometheus.yml # Prometheus config
β
βββ src/
β βββ main/
β β βββ java/com/ragtutor/
β β β βββ RagTutorApplication.java # Spring Boot entry point
β β β β
β β β βββ config/ # Configuration classes
β β β β βββ AppConfig.java
β β β β βββ LLMConfig.java
β β β β βββ EmbeddingConfig.java
β β β β βββ RetrievalConfig.java
β β β β βββ ChunkingConfig.java
β β β β βββ MemoryConfig.java
β β β β βββ AgentConfig.java
β β β β βββ SecurityConfig.java
β β β β βββ WebConfig.java
β β β β
β β β βββ controller/ # REST API
β β β β βββ RagController.java
β β β β
β β β βββ service/ # Business logic
β β β β βββ QueryService.java
β β β β βββ DocumentService.java
β β β β βββ ConversationService.java
β β β β βββ FeedbackService.java
β β β β βββ HealthService.java
β β β β βββ MetricsService.java
β β β β βββ InitializationService.java
β β β β
β β β βββ schemas/ # DTOs
β β β β βββ QueryRequest.java
β β β β βββ QueryResponse.java
β β β β βββ ChatRequest.java
β β β β βββ Document.java
β β β β βββ ...
β β β β
β β β βββ retrieval/ # Vector store
β β β β βββ InMemoryVectorStore.java
β β β β
β β β βββ embedding/ # Embedding generation
β β β β βββ EmbeddingModelService.java
β β β β
β β β βββ generation/ # LLM integration
β β β β βββ LLMClient.java
β β β β
β β β βββ chunking/ # Document chunking
β β β β βββ DocumentChunker.java
β β β β
β β β βββ processing/ # Document processing
β β β β βββ DocumentLoader.java
β β β β
β β β βββ memory/ # Conversation management
β β β β βββ ConversationManager.java
β β β β
β β β βββ agents/ # RAG agent
β β β β βββ RAGAgent.java
β β β β
β β β βββ security/ # Security components
β β β β βββ ContentModerator.java
β β β β βββ AuditLogger.java
β β β β βββ ActionBudgetGuard.java
β β β β
β β β βββ middleware/ # Middleware
β β β β βββ RateLimiterFilter.java
β β β β
β β β βββ monitoring/ # Observability
β β β β βββ PerformanceProfiler.java
β β β β βββ TracingService.java
β β β β
β β β βββ performance/ # Performance optimization
β β β β βββ ResponseCache.java
β β β β
β β β βββ features/ # Advanced features
β β β β βββ QueryAnalytics.java
β β β β
β β β βββ utils/ # Utilities
β β β β βββ TextUtils.java
β β β β βββ FileUtils.java
β β β β
β β β βββ exception/ # Exception handling
β β β β βββ GlobalExceptionHandler.java
β β β β
β β β βββ listener/ # Event listeners
β β β βββ ApplicationStartupListener.java
β β β
β β βββ resources/
β β βββ application.properties # Spring configuration
β β
β βββ test/
β βββ java/com/ragtutor/ # JUnit tests
β βββ RagTutorApplicationTests.java
β βββ HealthServiceTest.java
β βββ DocumentChunkerTest.java
β
βββ docker/ # Docker configurations
βββ k8s/ # Kubernetes manifests
βββ guide/ # Documentation
βββ data/ # Data storage
β βββ embeddings/
β βββ cache/
β βββ feedback/
βββ documents/ # Document upload directory
βββ logs/ # Application logs
## π§ͺ Testing
```bash
# Run all tests
mvn test
# Run with coverage
mvn clean test jacoco:report
# Run integration tests
mvn verify
# View coverage report
open target/site/jacoco/index.html
# Build production image
docker build -t rag-tutor:latest .
# Run with environment variables
docker run -d -p 8000:8000 \
-e OPENAI_API_KEY=$OPENAI_API_KEY \
-v ./documents:/app/documents \
-v ./data:/app/data \
rag-tutor:latestKubernetes manifests are available in the k8s/ directory:
- Deployment with horizontal pod autoscaling
- Service and Ingress configuration
- ConfigMap and Secrets management
- Persistent volume claims
kubectl apply -f k8s/# Start with monitoring stack
docker-compose -f docker-compose-java.yml up -d
# Access dashboards
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)
- Metrics: http://localhost:8000/actuator/prometheus- Query latency and throughput
- Retrieval performance
- LLM generation time
- Cache hit rates
- Error rates by type
- JVM metrics (heap, GC, threads)
- β Rate Limiting: Token bucket algorithm (100 req/min default)
- β Content Moderation: Filters inappropriate content
- β Audit Logging: Complete audit trail of operations
- β Action Budget: Prevents abuse with session limits
- β Input Validation: Bean Validation on all inputs
- β CORS Configuration: Configurable cross-origin policies
- β Response Caching: Caffeine cache for frequent queries
- β Batch Processing: Efficient batch embedding generation
- β Connection Pooling: HTTP client connection reuse
- β Async Operations: CompletableFuture for parallel processing
- β Performance Profiling: Detailed timing metrics
Migrating from the Python version? See Python to Java Migration Guide.
Key Differences:
- FastAPI β Spring Boot
- asyncio β CompletableFuture
- Pydantic β Lombok + Bean Validation
- FAISS β In-memory vector store
- Port: Same (8000)
- API: Compatible endpoints
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
MIT License - See LICENSE file for details
- π Documentation
- π Issue Tracker
- π¬ Discussions
- Original Python implementation
- Spring Boot framework
- LangChain4j library
- Apache PDFBox
- OpenAI & Anthropic
Built with β and Java 17
