Personal Document Q&A Assistant with RAG

A local, privacy-focused document Q&A system built with Semantic Kernel and Ollama that uses Retrieval-Augmented Generation (RAG) to answer questions based on your documents.

Features

Local & Private: Everything runs locally using Ollama - no data sent to the cloud
PDF Support: Load and query PDF documents
Code File Support: Query code files (.cs, .js, .ts, .py, and more)
Interactive Chat: Simple console-based Q&A interface
Source Citations: Answers include references to source documents
In-Memory Vector Store: Fast semantic search using embeddings

Prerequisites

.NET 10.0 SDK
Docker and Docker Compose (for Ollama)

Setup

1. Start Ollama Service

docker-compose up -d

2. Pull Required Models

# Pull embedding model
docker exec -it ollama ollama pull nomic-embed-text

# Pull chat model
docker exec -it ollama ollama pull llama3.2

3. Build and Run

cd "SemanticKernelDemo "
dotnet run

Usage

Commands

The application supports the following commands:

/load <path> - Load documents from a file or folder
/ask <question> - Ask a question (or just type your question directly)
/clear - Clear all loaded documents from memory
/stats - Show statistics about loaded documents
/help - Show help message
/exit - Exit the application

Example Session

> /load ./Documents
📁 Loading documents from ./Documents...
✓ Loaded research-paper.pdf (5 chunks)
✓ Loaded api-service.cs (3 chunks)
✓ Loaded notes.txt (2 chunks)
✅ Ingestion complete! Total chunks: 10

> How does authentication work?
🔍 Searching... Found 5 relevant chunks
💭 Generating answer...

According to api-service.cs, authentication uses JWT tokens...

Sources:
- api-service.cs (chunk 2, relevance: 0.89)
- research-paper.pdf (page 3, relevance: 0.75)

> /stats
=== Statistics ===
Total chunks in memory: 10
Ollama service: Connected
Available models: nomic-embed-text, llama3.2

> /exit
ℹ Goodbye!

Project Structure

SemanticKernelDemo/
├── Models/                          # Data models
│   ├── DocumentChunk.cs            # Chunk with metadata
│   └── SearchResult.cs             # Search result with score
├── Services/                        # Core services
│   ├── OllamaService.cs            # Ollama API integration
│   ├── ChunkingService.cs          # Text chunking
│   └── DocumentLoaders/            # Document loading strategies
│       ├── IDocumentLoader.cs      # Interface
│       ├── PdfDocumentLoader.cs    # PDF parsing
│       └── CodeDocumentLoader.cs   # Code file loading
├── RAG/                             # RAG components
│   ├── VectorStoreManager.cs       # Vector storage
│   ├── DocumentIngestionPipeline.cs# Ingestion pipeline
│   └── RagOrchestrator.cs          # RAG orchestration
├── UI/                              # User interface
│   └── ConsoleInterface.cs         # Interactive console
├── Program.cs                       # Entry point
├── appsettings.json                # Configuration
└── docker-compose.yml              # Ollama service

Configuration

Edit appsettings.json to customize settings:

{
  "Ollama": {
    "Endpoint": "http://localhost:11434",
    "EmbeddingModel": "nomic-embed-text",
    "ChatModel": "llama3.2"
  },
  "RAG": {
    "ChunkSize": 800,
    "ChunkOverlap": 100,
    "TopKResults": 5,
    "MinRelevanceScore": 0.7
  }
}

How It Works

RAG Pipeline

Document Ingestion:
- Load documents (PDF or code files)
- Split into chunks (800 tokens with 100 token overlap)
- Generate embeddings using nomic-embed-text
- Store in in-memory vector store
Query Processing (RAG):
- Retrieve: Generate embedding for user question
- Retrieve: Search top-5 similar chunks using semantic similarity
- Augment: Build context from retrieved chunks
- Generate: Send augmented prompt to llama3.2 LLM
- Return answer with source citations

Supported File Types

PDFs:

.pdf - Parsed using PdfPig library

Code Files:

C#: .cs, .csx, .cshtml, .razor
JavaScript/TypeScript: .js, .jsx, .ts, .tsx
Python: .py, .pyw, .pyx
Java: .java
C/C++: .c, .cpp, .h, .hpp
Go: .go
Rust: .rs
Ruby: .rb
PHP: .php
SQL: .sql
Shell: .sh, .bash, .zsh
Config/Markup: .json, .xml, .yaml, .yml, .toml, .ini, .md, .txt, .csv

Architecture Decisions

In-Memory Storage: Semantic Kernel's VolatileMemoryStore for simplicity. Data is lost on restart but easy to reload.
Chunking: 800 tokens with 100-token overlap balances context preservation and retrieval precision.
Retrieval: Top-5 semantic search provides diverse context without overwhelming the LLM.
Models:
- nomic-embed-text: Optimized for semantic search with 768-dimensional embeddings
- llama3.2: Good balance of quality and speed for chat

Troubleshooting

Ollama Connection Issues

If you see "Cannot connect to Ollama":

# Check if Ollama is running
docker ps | grep ollama

# Start Ollama
docker-compose up -d

# Check logs
docker logs ollama

Missing Models

If models are not available:

# Pull models manually
docker exec -it ollama ollama pull nomic-embed-text
docker exec -it ollama ollama pull llama3.2

# List available models
docker exec -it ollama ollama list

Build Errors

If you encounter build errors:

# Clean and rebuild
dotnet clean
dotnet restore
dotnet build

Future Enhancements

License

This project is for educational and personal use.

Credits

Built with:

Semantic Kernel - AI orchestration framework
Ollama - Local LLM runtime
OllamaSharp - .NET client for Ollama
PdfPig - PDF parsing library

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
SemanticKernelDemo		SemanticKernelDemo
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
SemanticKernelDemo .sln		SemanticKernelDemo .sln
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Personal Document Q&A Assistant with RAG

Features

Prerequisites

Setup

1. Start Ollama Service

2. Pull Required Models

3. Build and Run

Usage

Commands

Example Session

Project Structure

Configuration

How It Works

RAG Pipeline

Supported File Types

Architecture Decisions

Troubleshooting

Ollama Connection Issues

Missing Models

Build Errors

Future Enhancements

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Personal Document Q&A Assistant with RAG

Features

Prerequisites

Setup

1. Start Ollama Service

2. Pull Required Models

3. Build and Run

Usage

Commands

Example Session

Project Structure

Configuration

How It Works

RAG Pipeline

Supported File Types

Architecture Decisions

Troubleshooting

Ollama Connection Issues

Missing Models

Build Errors

Future Enhancements

License

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages