Skip to content

Commit 6595f87

Browse files
author
beaglebyte
committed
Readme Update
1 parent 322a293 commit 6595f87

File tree

1 file changed

+178
-21
lines changed

1 file changed

+178
-21
lines changed

README.md

Lines changed: 178 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,187 @@
11
# Spring AI RAG Multi-Topic System
22

3-
A production-ready multi-topic Retrieval-Augmented Generation (RAG) system built with Spring AI, Ollama, and Qdrant.
3+
## Overview
44

5-
## Features
5+
Your **Spring-AI-Topic-RAG** project is a production-ready **Retrieval-Augmented Generation (RAG) system** built with Spring AI, Ollama, and Qdrant. Tthis project demonstrates an advanced implementation of AI-powered document processing and semantic search.
66

7-
**Multi-Topic RAGs**: Separate isolated RAGs for different domains (Pentesting, IoT, Blockchain, Cloud, etc.)
8-
🔍 **Intelligent Retrieval**: Semantic search using vector embeddings
9-
📄 **Document Support**: PDF and Markdown files with automatic metadata extraction
10-
🤖 **Local LLM**: Runs entirely locally using Ollama
11-
🔒 **No External APIs**: All processing happens on your machine
12-
**Fast Indexing**: Efficient vector storage with Qdrant
13-
🚀 **Easy to Extend**: Add new topics with simple configuration
7+
## What You Built
148

15-
## Quick Start
9+
### Core Features:
1610

17-
### Prerequisites
11+
- **Multi-Topic RAG System**: Separate, isolated RAG instances for different domains (Pentesting, IoT, Blockchain, Cloud, etc.)
12+
- **Semantic Search**: Intelligent document retrieval using vector embeddings
13+
- **Document Processing**: Support for PDF and Markdown files with automatic metadata extraction
14+
- **Local LLM Processing**: Runs entirely locally using Ollama (no external API dependencies)
15+
- **Vector Storage**: Efficient semantic search using Qdrant
16+
- **Easy Extensibility**: Simple configuration to add new topics
1817

19-
- Java 17+
20-
- Docker & Docker Compose
21-
- Ollama installed locally
22-
- qdrant installed locally
23-
- Maven
18+
### Technology Stack:
2419

25-
### Installation
20+
- **Framework**: Spring Boot 3.5.8 with Spring AI 1.1.2
21+
- **Language**: Java 21
22+
- **LLM**: Ollama (local language model)
23+
- **Vector Database**: Qdrant
24+
- **Document Processing**: Apache Tika & PDFBox
25+
- **Communication**: gRPC for Qdrant client
26+
- **Build Tool**: Maven
2627

27-
1. **Clone the repository**
28-
```bash
29-
git clone https://github.com/beaglebyte/spring-ai-rag-multi-topic.git
30-
cd spring-ai-rag-multi-topic
28+
---
29+
30+
## Step-by-Step Explanation for Beginners
31+
32+
### **What is RAG (Retrieval-Augmented Generation)?**
33+
34+
RAG is a technique that combines:
35+
36+
1. **Retrieval**: Finding relevant documents from a knowledge base
37+
2. **Augmentation**: Using those documents to enhance the AI's response
38+
3. **Generation**: Creating an answer based on both the user's question and the retrieved documents
39+
40+
Think of it as giving an AI assistant a library of books before asking questions!
41+
42+
### **How This Project Works - 5 Steps**
43+
44+
#### **Step 1: Document Ingestion**
45+
46+
Code
47+
48+
```
49+
Your PDF/Markdown Files
50+
51+
Tika & PDFBox (read files)
52+
53+
Extract Text & Metadata
54+
```
55+
56+
- The system reads your documents and extracts text content
57+
- Metadata (author, creation date, etc.) is automatically captured
58+
- Documents are split into manageable chunks
59+
60+
#### **Step 2: Vector Embedding**
61+
62+
Code
63+
64+
```
65+
Document Text
66+
67+
Ollama (local LLM)
68+
69+
Convert to Vectors (numerical representation)
70+
```
71+
72+
- Each document chunk is converted into a vector (a list of numbers)
73+
- These vectors capture the semantic meaning of the text
74+
- Similar documents have similar vectors
75+
76+
#### **Step 3: Vector Storage (Indexing)**
77+
78+
Code
79+
80+
```
81+
Document Vectors
82+
83+
Qdrant Vector Database
84+
85+
Organized & Searchable Index
86+
```
87+
88+
- Vectors are stored in Qdrant for fast retrieval
89+
- Organized by topic to keep domains separate
90+
- Enables quick semantic search
91+
92+
#### **Step 4: Query Processing**
93+
94+
Code
95+
96+
```
97+
User Question
98+
99+
Convert to Vector (same way as documents)
100+
101+
Search Qdrant for Similar Vectors
102+
103+
Retrieve Top Matching Documents
104+
```
105+
106+
- When a user asks a question, it's converted to a vector
107+
- The system finds documents with similar vectors
108+
- Only relevant documents are retrieved
109+
110+
#### **Step 5: Response Generation**
111+
112+
Code
113+
114+
```
115+
User Question + Retrieved Documents
116+
117+
Ollama (local LLM)
118+
119+
Generate Intelligent Answer
120+
```
121+
122+
- The LLM reads the retrieved documents
123+
- It generates an answer grounded in those specific documents
124+
- Response is more accurate and verifiable
125+
126+
### **Multi-Topic Architecture**
127+
128+
Instead of one large database, you have **separate RAG systems for different domains**:
129+
130+
Code
131+
132+
```
133+
Spring AI Application
134+
├── Pentesting RAG ──→ Pentesting Documents → Pentesting Vector Store
135+
├── IoT RAG ──────────→ IoT Documents → IoT Vector Store
136+
```
137+
138+
**Benefits:**
139+
140+
- Better semantic relevance (avoids mixing unrelated domains)
141+
- Faster searches (smaller databases)
142+
- Easy to manage (add/remove topics independently)
143+
144+
### **Getting Started - Prerequisites**
145+
146+
Before running this project, you need:
147+
148+
|Requirement|Purpose|
149+
|---|---|
150+
|**Java 17+**|Run the Spring application|
151+
|**Maven**|Build & manage dependencies|
152+
|**Docker & Docker Compose**|Run containerized services|
153+
|**Ollama**|Local language model (runs AI locally)|
154+
|**Qdrant**|Vector database (stores & searches vectors)|
155+
156+
### **Key Technologies Explained**
157+
158+
|Technology|Role|
159+
|---|---|
160+
|**Spring Boot**|Web framework for the application|
161+
|**Spring AI**|Abstraction layer for AI/ML operations|
162+
|**Ollama**|Runs large language models locally (privacy-first)|
163+
|**Qdrant**|Specialized database for vector similarity search|
164+
|**Apache Tika**|Extracts text from various document formats|
165+
|**PDFBox**|Reads PDF files and metadata|
166+
|**gRPC**|Fast communication protocol between services|
167+
168+
### **Workflow Example**
169+
170+
**Scenario**: You have PDFs about cloud security
171+
172+
1. **Upload PDFs** → System reads and chunks them
173+
2. **Index** → Each chunk gets converted to vectors and stored in Qdrant
174+
3. **User Asks** → "What are cloud security best practices?"
175+
4. **Search** → Finds relevant chunks about cloud security
176+
5. **Generate** → Ollama writes an answer based on those chunks
177+
6. **Return** → User gets an accurate, sourced answer
178+
179+
---
180+
181+
## Why This Project is Useful
182+
183+
✅ **Private**: Everything runs locally, no data sent to external APIs
184+
✅ **Accurate**: Answers are grounded in your actual documents
185+
✅ **Customizable**: Add any topic/domain you need
186+
✅ **Scalable**: Separate topics mean independent scaling
187+
✅ **Modern**: Uses cutting-edge Spring AI framework

0 commit comments

Comments
 (0)