Skip to content

nohan-anto/Semantic-kernal-Console-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Personal Document Q&A Assistant with RAG

A local, privacy-focused document Q&A system built with Semantic Kernel and Ollama that uses Retrieval-Augmented Generation (RAG) to answer questions based on your documents.

Features

  • Local & Private: Everything runs locally using Ollama - no data sent to the cloud
  • PDF Support: Load and query PDF documents
  • Code File Support: Query code files (.cs, .js, .ts, .py, and more)
  • Interactive Chat: Simple console-based Q&A interface
  • Source Citations: Answers include references to source documents
  • In-Memory Vector Store: Fast semantic search using embeddings

Prerequisites

  • .NET 10.0 SDK
  • Docker and Docker Compose (for Ollama)

Setup

1. Start Ollama Service

docker-compose up -d

2. Pull Required Models

# Pull embedding model
docker exec -it ollama ollama pull nomic-embed-text

# Pull chat model
docker exec -it ollama ollama pull llama3.2

3. Build and Run

cd "SemanticKernelDemo "
dotnet run

Usage

Commands

The application supports the following commands:

  • /load <path> - Load documents from a file or folder
  • /ask <question> - Ask a question (or just type your question directly)
  • /clear - Clear all loaded documents from memory
  • /stats - Show statistics about loaded documents
  • /help - Show help message
  • /exit - Exit the application

Example Session

> /load ./Documents
📁 Loading documents from ./Documents...
✓ Loaded research-paper.pdf (5 chunks)
✓ Loaded api-service.cs (3 chunks)
✓ Loaded notes.txt (2 chunks)
✅ Ingestion complete! Total chunks: 10

> How does authentication work?
🔍 Searching... Found 5 relevant chunks
💭 Generating answer...

According to api-service.cs, authentication uses JWT tokens...

Sources:
- api-service.cs (chunk 2, relevance: 0.89)
- research-paper.pdf (page 3, relevance: 0.75)

> /stats
=== Statistics ===
Total chunks in memory: 10
Ollama service: Connected
Available models: nomic-embed-text, llama3.2

> /exit
ℹ Goodbye!

Project Structure

SemanticKernelDemo/
├── Models/                          # Data models
│   ├── DocumentChunk.cs            # Chunk with metadata
│   └── SearchResult.cs             # Search result with score
├── Services/                        # Core services
│   ├── OllamaService.cs            # Ollama API integration
│   ├── ChunkingService.cs          # Text chunking
│   └── DocumentLoaders/            # Document loading strategies
│       ├── IDocumentLoader.cs      # Interface
│       ├── PdfDocumentLoader.cs    # PDF parsing
│       └── CodeDocumentLoader.cs   # Code file loading
├── RAG/                             # RAG components
│   ├── VectorStoreManager.cs       # Vector storage
│   ├── DocumentIngestionPipeline.cs# Ingestion pipeline
│   └── RagOrchestrator.cs          # RAG orchestration
├── UI/                              # User interface
│   └── ConsoleInterface.cs         # Interactive console
├── Program.cs                       # Entry point
├── appsettings.json                # Configuration
└── docker-compose.yml              # Ollama service

Configuration

Edit appsettings.json to customize settings:

{
  "Ollama": {
    "Endpoint": "http://localhost:11434",
    "EmbeddingModel": "nomic-embed-text",
    "ChatModel": "llama3.2"
  },
  "RAG": {
    "ChunkSize": 800,
    "ChunkOverlap": 100,
    "TopKResults": 5,
    "MinRelevanceScore": 0.7
  }
}

How It Works

RAG Pipeline

  1. Document Ingestion:

    • Load documents (PDF or code files)
    • Split into chunks (800 tokens with 100 token overlap)
    • Generate embeddings using nomic-embed-text
    • Store in in-memory vector store
  2. Query Processing (RAG):

    • Retrieve: Generate embedding for user question
    • Retrieve: Search top-5 similar chunks using semantic similarity
    • Augment: Build context from retrieved chunks
    • Generate: Send augmented prompt to llama3.2 LLM
    • Return answer with source citations

Supported File Types

PDFs:

  • .pdf - Parsed using PdfPig library

Code Files:

  • C#: .cs, .csx, .cshtml, .razor
  • JavaScript/TypeScript: .js, .jsx, .ts, .tsx
  • Python: .py, .pyw, .pyx
  • Java: .java
  • C/C++: .c, .cpp, .h, .hpp
  • Go: .go
  • Rust: .rs
  • Ruby: .rb
  • PHP: .php
  • SQL: .sql
  • Shell: .sh, .bash, .zsh
  • Config/Markup: .json, .xml, .yaml, .yml, .toml, .ini, .md, .txt, .csv

Architecture Decisions

  • In-Memory Storage: Semantic Kernel's VolatileMemoryStore for simplicity. Data is lost on restart but easy to reload.
  • Chunking: 800 tokens with 100-token overlap balances context preservation and retrieval precision.
  • Retrieval: Top-5 semantic search provides diverse context without overwhelming the LLM.
  • Models:
    • nomic-embed-text: Optimized for semantic search with 768-dimensional embeddings
    • llama3.2: Good balance of quality and speed for chat

Troubleshooting

Ollama Connection Issues

If you see "Cannot connect to Ollama":

# Check if Ollama is running
docker ps | grep ollama

# Start Ollama
docker-compose up -d

# Check logs
docker logs ollama

Missing Models

If models are not available:

# Pull models manually
docker exec -it ollama ollama pull nomic-embed-text
docker exec -it ollama ollama pull llama3.2

# List available models
docker exec -it ollama ollama list

Build Errors

If you encounter build errors:

# Clean and rebuild
dotnet clean
dotnet restore
dotnet build

Future Enhancements

  • Persistent vector storage (Qdrant, Chroma)
  • Web UI (Blazor/ASP.NET Core)
  • Conversation history
  • Multi-query retrieval
  • Hybrid search (keyword + semantic)
  • Document metadata filtering
  • Export/import vector store
  • Streaming responses

License

This project is for educational and personal use.

Credits

Built with:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors