Intelligent system for technical manual queries using Retrieval-Augmented Generation (RAG)
This project implements a complete RAG solution developed for a Machine Learning Engineering challenge. The system allows technical manuals in PDF format to be uploaded and performs contextualized queries, providing accurate answers based exclusively on the indexed documents.
The WEG Motor RAG Assistant solves the problem of fast and accurate information retrieval in extensive technical documentation. Instead of manually searching through PDFs, users interact with an assistant that:
- 🔍 Performs semantic search across documents using embeddings.
- 🤖 Generates contextualized answers using state-of-the-art LLMs.
- 📚 Cites sources (file and page number) for auditability.
- 🛡️ Prevents hallucinations by rejecting out-of-scope questions.
- ✅ Document Upload: Index multiple PDF files simultaneously.
- ✅ Smart Processing: Automatic text splitting into chunks with overlap.
- ✅ Vector Search: ChromaDB powered by multilingual embeddings (HuggingFace).
- ✅ Contextual Answers: LLMs with anti-hallucination prompt engineering.
- ✅ Source Citation: Automatic references (file name + page).
- ✅ Resilient Architecture: Primary Gemini → Ollama/Mistral fallback system.
- 🌐 Multilingual Support: Responds in the same language as the query.
- 🔄 Hot-Reload: Update the vector index without restarting the system.
- 📊 Structured Logs: Full request and execution tracing.
- 🐳 Simplified Deployment: Docker Compose setup with a single command.
| Layer | Technology | Rationale |
|---|---|---|
| API | FastAPI | High performance and automatic validation (Pydantic). |
| Orchestration | LangChain | Abstraction for multiple LLMs and integrations. |
| Vector Store | ChromaDB | Simplicity combined with local persistence. |
| Embeddings | HuggingFace MiniLM | Efficient and lightweight multilingual model. |
| Primary LLM | Google Gemini 2.5 Flash | Low latency and cost-effectiveness. |
| Fallback LLM | Mistral (Ollama) | Local execution, eliminating external dependencies. |
| Frontend | Streamlit | Rapid chat interface prototyping. |
| Containerization | Docker Compose | Environment isolation and reproducibility. |
- Docker
>= 20.10 - Docker Compose
>= 2.0 - Google Gemini API Key (get it here)
Clone the repository and set up the environment variables:
git clone https://github.com/karineyasmin/weg_rag_project
cd weg_rag_projectEdit the .env file in the project root:
GEMINI_API_KEY=your_api_key_here
PRIMARY_MODEL=gemini-2.5-flash
FALLBACK_MODEL=mistral
OLLAMA_URL=http://ollama:11434Run all services with a single command:
docker-compose up --buildWhat happens:
- Builds custom Python images.
- Initializes the Ollama service.
- Automatically downloads the Mistral model.
- Starts the API (port 8000) and Frontend (port 8501).
- Frontend: http://localhost:8501
- API Docs: http://localhost:8000/docs
- Ollama API: http://localhost:11434
Description: Indexes technical manuals into the system.
Request:
curl -X POST "http://localhost:8000/documents" \
-F "files=@motor_manual.pdf" \
-F "files=@gearbox_manual.pdf"Response:
{
"message": "Documents processed successfully",
"documents_indexed": 2,
"total_chunks": 347
}Description: Ask questions regarding the indexed documents.
Request:
curl -X POST "http://localhost:8000/question" \
-H "Content-Type: application/json" \
-d '{"question": "What is the nominal power of the W22 motor?"}'Response:
{
"answer": "The nominal power of the W22 motor ranges from 0.12 to 355 kW, depending on the model.",
"references": [
"Source: manual_w22.pdf (Page 12)",
"Source: manual_w22.pdf (Page 34)"
]
}- "What is the absorbed power (Pa) of a motor?"
- "What is the formula for calculating torque mentioned in the manual?"
- "What are the requirements for installation in explosive environments?"
- "What is the motor's power consumption?"
- "How to verify insulation resistance?"
- Question: "What is the weather forecast for tomorrow?"
- Answer: "Information not found." (The system rejects questions outside the context of the uploaded documents)
- Ingestion: PDF → PyPDF → RecursiveTextSplitter → Embeddings → ChromaDB.
- Query: Question → Semantic Search (top-k=3) → Prompt Engineering → LLM → Answer.
-
Index a document
curl -X POST "http://localhost:8000/documents" \ -F "files=@data/test_manual.pdf"
-
Ask a question
curl -X POST "http://localhost:8000/question" \ -H "Content-Type: application/json" \ -d '{"question": "What is the nominal voltage?"}'
docker-compose logs -f apiEdit app/services/ingestion.py:
self.splitter = RecursiveCharacterTextSplitter(
chunk_size=1500, # Increase for larger chunks
chunk_overlap=300 # Increase overlap
)Edit app/providers/vector_store.py:
self.embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2" # Alternative model
)In the .env file:
GEMINI_API_KEY="" # Leaving this empty forces the fallback to Mistralrag_project/
├── app/
│ ├── api/ # FastAPI routes
│ ├── config/ # Environment variables
│ ├── models/ # Pydantic schemas
│ ├── providers/ # Integrations (LLM, Vector Store)
│ ├── services/ # Business logic
│ └── utils/ # Logging
├── data/
│ ├── vector_store/ # Persisted vector database
│ └── temp_uploads/ # Temporary PDF uploads
├── app_frontend.py # Streamlit interface
├── docker-compose.yml # Container orchestration
├── Dockerfile # Custom Python image
└── pyproject.toml # Project dependencies
Solution: Wait approximately 30 seconds for the Mistral model to finish downloading:
docker-compose logs ollama-pull-modelSolution: Double-check the key in the .env file and restart the containers:
docker-compose down
docker-compose up --buildSolution: The HuggingFace model is downloaded during first use (~400MB). Please wait for the download to complete.
This project was developed as part of a technical challenge and is available under the MIT License.
Karine
📧 Email: karine.y.ribeiro@gmail.com
🔗 LinkedIn: Karine Yasmin Ribeiro
Built with ❤️ using Python and LangChain