A local, privacy-focused document Q&A system built with Semantic Kernel and Ollama that uses Retrieval-Augmented Generation (RAG) to answer questions based on your documents.
- Local & Private: Everything runs locally using Ollama - no data sent to the cloud
- PDF Support: Load and query PDF documents
- Code File Support: Query code files (.cs, .js, .ts, .py, and more)
- Interactive Chat: Simple console-based Q&A interface
- Source Citations: Answers include references to source documents
- In-Memory Vector Store: Fast semantic search using embeddings
- .NET 10.0 SDK
- Docker and Docker Compose (for Ollama)
docker-compose up -d# Pull embedding model
docker exec -it ollama ollama pull nomic-embed-text
# Pull chat model
docker exec -it ollama ollama pull llama3.2cd "SemanticKernelDemo "
dotnet runThe application supports the following commands:
/load <path>- Load documents from a file or folder/ask <question>- Ask a question (or just type your question directly)/clear- Clear all loaded documents from memory/stats- Show statistics about loaded documents/help- Show help message/exit- Exit the application
> /load ./Documents
📁 Loading documents from ./Documents...
✓ Loaded research-paper.pdf (5 chunks)
✓ Loaded api-service.cs (3 chunks)
✓ Loaded notes.txt (2 chunks)
✅ Ingestion complete! Total chunks: 10
> How does authentication work?
🔍 Searching... Found 5 relevant chunks
💭 Generating answer...
According to api-service.cs, authentication uses JWT tokens...
Sources:
- api-service.cs (chunk 2, relevance: 0.89)
- research-paper.pdf (page 3, relevance: 0.75)
> /stats
=== Statistics ===
Total chunks in memory: 10
Ollama service: Connected
Available models: nomic-embed-text, llama3.2
> /exit
ℹ Goodbye!
SemanticKernelDemo/
├── Models/ # Data models
│ ├── DocumentChunk.cs # Chunk with metadata
│ └── SearchResult.cs # Search result with score
├── Services/ # Core services
│ ├── OllamaService.cs # Ollama API integration
│ ├── ChunkingService.cs # Text chunking
│ └── DocumentLoaders/ # Document loading strategies
│ ├── IDocumentLoader.cs # Interface
│ ├── PdfDocumentLoader.cs # PDF parsing
│ └── CodeDocumentLoader.cs # Code file loading
├── RAG/ # RAG components
│ ├── VectorStoreManager.cs # Vector storage
│ ├── DocumentIngestionPipeline.cs# Ingestion pipeline
│ └── RagOrchestrator.cs # RAG orchestration
├── UI/ # User interface
│ └── ConsoleInterface.cs # Interactive console
├── Program.cs # Entry point
├── appsettings.json # Configuration
└── docker-compose.yml # Ollama service
Edit appsettings.json to customize settings:
{
"Ollama": {
"Endpoint": "http://localhost:11434",
"EmbeddingModel": "nomic-embed-text",
"ChatModel": "llama3.2"
},
"RAG": {
"ChunkSize": 800,
"ChunkOverlap": 100,
"TopKResults": 5,
"MinRelevanceScore": 0.7
}
}-
Document Ingestion:
- Load documents (PDF or code files)
- Split into chunks (800 tokens with 100 token overlap)
- Generate embeddings using
nomic-embed-text - Store in in-memory vector store
-
Query Processing (RAG):
- Retrieve: Generate embedding for user question
- Retrieve: Search top-5 similar chunks using semantic similarity
- Augment: Build context from retrieved chunks
- Generate: Send augmented prompt to
llama3.2LLM - Return answer with source citations
PDFs:
.pdf- Parsed using PdfPig library
Code Files:
- C#:
.cs,.csx,.cshtml,.razor - JavaScript/TypeScript:
.js,.jsx,.ts,.tsx - Python:
.py,.pyw,.pyx - Java:
.java - C/C++:
.c,.cpp,.h,.hpp - Go:
.go - Rust:
.rs - Ruby:
.rb - PHP:
.php - SQL:
.sql - Shell:
.sh,.bash,.zsh - Config/Markup:
.json,.xml,.yaml,.yml,.toml,.ini,.md,.txt,.csv
- In-Memory Storage: Semantic Kernel's
VolatileMemoryStorefor simplicity. Data is lost on restart but easy to reload. - Chunking: 800 tokens with 100-token overlap balances context preservation and retrieval precision.
- Retrieval: Top-5 semantic search provides diverse context without overwhelming the LLM.
- Models:
nomic-embed-text: Optimized for semantic search with 768-dimensional embeddingsllama3.2: Good balance of quality and speed for chat
If you see "Cannot connect to Ollama":
# Check if Ollama is running
docker ps | grep ollama
# Start Ollama
docker-compose up -d
# Check logs
docker logs ollamaIf models are not available:
# Pull models manually
docker exec -it ollama ollama pull nomic-embed-text
docker exec -it ollama ollama pull llama3.2
# List available models
docker exec -it ollama ollama listIf you encounter build errors:
# Clean and rebuild
dotnet clean
dotnet restore
dotnet build- Persistent vector storage (Qdrant, Chroma)
- Web UI (Blazor/ASP.NET Core)
- Conversation history
- Multi-query retrieval
- Hybrid search (keyword + semantic)
- Document metadata filtering
- Export/import vector store
- Streaming responses
This project is for educational and personal use.
Built with:
- Semantic Kernel - AI orchestration framework
- Ollama - Local LLM runtime
- OllamaSharp - .NET client for Ollama
- PdfPig - PDF parsing library