DocMan is an enterprise-grade Retrieval-Augmented Generation (RAG) system built with .NET 9 and React 18 that enables intelligent document management and semantic search with advanced AI capabilities.
The system allows users to upload documents (PDF, DOCX, TXT, MD), automatically processes them into semantic chunks with embeddings, and provides intelligent search using hybrid retrieval techniques combined with LLM-powered answers.
- DocMan.API - ASP.NET Core Minimal APIs with JWT authentication
- DocMan.Core - Business logic, CQRS with MediatR, RAG services
- DocMan.Data - Entity Framework Core 9.0 with Azure SQL Server
- DocMan.Model - Entity models and DTOs
- DocMan.UI.React - React with Vite, Chakra UI, responsive design
- Dark/Light theme support
- Mobile-friendly interface
- Azure OpenAI - Embeddings (text-embedding-3-small) & Chat (GPT-4o)
- Semantic Kernel - LLM orchestration framework
- Tiktoken - Token counting for efficient context management
- Upload documents (PDF, DOCX, TXT, Markdown)
- Automatic content extraction and chunking
- Vector embeddings generation (1536 dimensions)
- Document categorization and organization
- Dense Only - Pure vector similarity search
- Sparse Only - BM25 keyword-based search
- Hybrid - Dense + Sparse with RRF fusion
- Hybrid + HyDE - Hypothetical document generation
- Full Pipeline - HyDE + Cross-encoder reranking
- BM25 - Probabilistic ranking for keyword search
- RRF - Reciprocal Rank Fusion for result combination
- HyDE - Hypothetical Document Embeddings via LLM
- Cross-Encoder Reranking - Semantic similarity-based reranking
- Token Management - Efficient context window handling
- Real-time search with metrics
- LLM-generated answers with source attribution
- Evaluation mode for comparing retrieval strategies
- Execution time tracking
- JWT authentication with role-based access
- User-scoped document access
- Secure API endpoints
Core Entities:
- Users - Authentication & authorization
- Documents - Document metadata
- DocumentChunks - Semantic chunks with vector embeddings
- Categories - Document organization
Vector Search:
- SQL Server vector columns (float[1536])
- Cosine similarity distance function
- Efficient indexing for fast retrieval
| Layer | Technology |
|---|---|
| Backend | .NET 9, ASP.NET Core, EF Core 9.0 |
| Database | Azure SQL Server with Vector Search |
| Frontend | React 18, Vite, Chakra UI |
| AI/ML | Azure OpenAI, Semantic Kernel |
| Search | Lucene.Net (BM25), Vector DB |
| Architecture | CQRS, Repository Pattern, Unit of Work |
cd DocMan.API
dotnet run
# API runs on http://localhost:5021cd DocMan.UI.React
npm install
npm run dev
# UI runs on http://localhost:5174- Username: john_doe
- Password: Password123!
- User Login → JWT token issued
- Document Upload → Content extracted, chunked, embedded
- BM25 Indexing → Sparse index built automatically
- Search Query → Hybrid retrieval executed
- LLM Generation → Answer synthesized from context
- Response → Answer + sources + metrics returned
appsettings.json:
{
"AzureOpenAI": {
"Embedding": { "Endpoint", "Deployment", "ModelId", "ApiKey" },
"ChatCompletion": { "Endpoint", "Deployment", "ModelId", "ApiKey" }
},
"AppSettings": {
"MaxInputTokens": 16385,
"MaxOutputTokens": 800,
"MaxRelevantChunks": 5
}
}✅ Complete Implementation:
- Full RAG pipeline with hybrid search
- Advanced retrieval techniques (BM25, RRF, HyDE, Cross-Encoder)
- React UI with search mode selection
- Token-efficient context management
- Evaluation metrics system
🚀 Production Ready - All core features implemented and tested