Retrieval-Augmented Generation with Documentation

aka. RAG Doctor 🧑‍⚕️

This repository showcases a Retrieval-Augmented Generation (RAG) system for interacting with documentation that uses natural language queries to retrieve and summarize relevant information.

interactive-demo.webm

Overview

Creates a Qdrant vector database for embeddings from the given CSV file(s)
- The vector database is used for fast similarity search to find relevant documentation
- We use a CSV based on Hugging Face documentation as an example
Uses OpenAI's embeddings for similarity search and GPT models for high-quality responses
Provides an interactive interface for querying the documentation using natural language
Each query retrieves the most relevant documentation snippets for context
Answers include source links for reference

Prerequisites

Valohai account to run the pipelines
OpenAI account to use their APIs
Less than $5 in OpenAI credits

Setup

If you can't find this project in your Valohai Templates, you can set it up manually:

Create a new project on Valohai
Set the project repository to: https://github.com/valohai/rag-doc-example
Save the settings and click "Fetch Repository"

This makes sure the project is up to date
🔑 Create an OpenAI API key for this project
- We will need the API key next so record it down
Assign the API key to this project:

You will see ✅ Changes to OPENAI_API_KEY saved if everything went correctly.

And now you are ready to run the pipelines!

Usage

Navigate to the "Pipelines" tab
Click the "Create Pipeline" button
Select the "assistant-pipeline" pipeline template
Click the "Create pipeline from template" button
Feel free to look around and finally click the "Create pipeline" button

This will start the pipeline:

Feel free to explore around while it runs.

When it finishes, the last step will contain qualitative results to review:

This manual evaluation is a simplification how to validate the quality of the generated responses. "LLM evals" is a large topic outside the scope of this particular example.

Now you have a mini-pipeline that maintains a RAG vector database and allows you to ask questions about the documentation. You can ask your own questions by creating new executions based on the "do-query" step.

Next Steps

Automatic Deployment

The repository also contains a pipeline "assistant-pipeline-with-deployment" which deploys the RAG system to an HTTP endpoint after a manual human validation of the "manual-evaluation" pipeline step.

🤩 Show Me!

Create a Valohai Deployment to tell where the HTTP endpoint should be hosted:

You can use Valohai Public Cloud and valohai.cloud as the target when trialing out. Make sure to name the deployment public
Create a pipeline as we did before, but use the "assistant-pipeline-with-deployment" template.

The pipeline should look something like this.
The pipeline will halt to a "⏳️ Pending Approval" state, where you can click the "Approve" button to proceed.
After approval, the pipeline will build and deploy the endpoint.
You can use the "Test Deployment" button to run a test queries against the endpoint.

Using Other Models

This example uses OpenAI for both the embedding and query models.

Either could be changed to a different provider or a local model.

🤩 Show Me!

Changing models inside the OpenAI ecosystem is a matter of changing constants in src/rag_doctor/consts.py:

EMBEDDING_MODEL = "text-embedding-ada-002"
EMBEDDING_LENGTH = 1_536  # the dimensions of a "text-embedding-ada-002" embedding vector

PROMPT_MODEL = "gpt-4o-mini"
PROMPT_MAX_TOKENS = 128_000  # model "context window" from https://platform.openai.com/docs/models

Further modifying the chat model involves reimplementing the query logic in src/rag_doctor/query.py.

Similarly, modifying the embedding model is a matter of reimplementing the embedding logic in both src/rag_doctor/database.py and src/rag_doctor/query.py.

If you decide to change the embedding model, remember to recreate the vector database.

RAG Evaluation Pipeline

This repository includes a comprehensive evaluation system that measures RAG performance across three key dimensions: retrieval quality, generation accuracy, and operational efficiency.

🤩 Show Me!

What Gets Evaluated

Retrieval Metrics:

Context Coverage: Uses LLM-as-a-judge to assess whether retrieved documents contain the information needed to answer the question correctly
Response Rate: Percentage of questions that receive valid responses

Generation Metrics:

Factuality Score: LLM-based evaluation of answer accuracy (1-5 scale)
Response Quality: Average length and substantive response rate

Operational Metrics:

Latency: Estimated response time per query
Cost: Token-based cost estimation for embeddings and LLM calls

How to Run Evaluation

⚠️ Important: Make sure you have the appropriate API keys configured in your project environment variables:

For OpenAI models: OPENAI_API_KEY (already set up from the initial setup)
For Anthropic models: Add ANTHROPIC_API_KEY if using provider: anthropic
For other providers: Add the corresponding API key as needed

Navigate to the "Pipelines" tab and create a new pipeline
Select the "rag-evaluation-pipeline" template
Select which model provider you would like to evaluate (default: OpenAI) and the questions to test the knowledge base on.
The pipeline will:
- Create embeddings from your documentation
- Generate responses to evaluation questions
- Run comprehensive evaluation metrics (for other evaluation metrics, you can customize evaluate.py accordingly)

Evaluation Results

The evaluation step produces detailed metrics logged to Valohai's metadata system:

{
  "response_rate": 1.0,
  "context_coverage": 0.85,
  "factuality_score": 4.2,
  "avg_response_length": 841.25,
  "substantive_rate": 0.9,
  "estimated_latency_seconds": 2.041,
  "estimated_cost_usd": 0.0021
}

These metrics help you:

Monitor system performance over time
Compare different models or configurations
Validate changes before deploying to production
Understand cost implications of your RAG system

Using Custom Evaluation Questions

The pipeline includes gold standard questions with ground truth answers for evaluation. You can customize these by:

Creating your own evaluation dataset with columns: question, ground_truth_answer
Updating the gold_standards input in the evaluate-rag step in valohai.yaml
Modifying the questions in the pipeline configuration

This evaluation framework follows MLOps best practices, providing the metrics needed to maintain and improve your RAG system in production.

Multi-Provider Comparison

You can compare different LLM providers (OpenAI vs Anthropic) side-by-side to understand their performance characteristics and make informed decisions about which model works best for your use case, by leveraging the Task feature in Valohai.

Navigate to the "Pipelines" tab and create a new pipeline
Select the "rag-evaluation-pipeline" template
Select the generate-responses node and convert it to a Task.

This will automatically create executions for the model providers available in the provider parameter.

Using Your Own Documentation

You can take a look at the input file given to the "embedding" node and create a similar CSV from your own documentation and replace the input with that CSV.

Running it Locally

You can also run the individual pieces locally by following instructions in the DEVELOPMENT file.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github		.github
src/rag_doctor		src/rag_doctor
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
DEVELOPMENT.md		DEVELOPMENT.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock
valohai.yaml		valohai.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Retrieval-Augmented Generation with Documentation

aka. RAG Doctor 🧑‍⚕️

Overview

Prerequisites

Setup

Usage

Next Steps

Automatic Deployment

Using Other Models

RAG Evaluation Pipeline

What Gets Evaluated

How to Run Evaluation

Evaluation Results

Using Custom Evaluation Questions

Multi-Provider Comparison

Using Your Own Documentation

Running it Locally

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

valohai/rag-doc-example

Folders and files

Latest commit

History

Repository files navigation

Retrieval-Augmented Generation with Documentation

aka. RAG Doctor 🧑‍⚕️

Overview

Prerequisites

Setup

Usage

Next Steps

Automatic Deployment

Using Other Models

RAG Evaluation Pipeline

What Gets Evaluated

How to Run Evaluation

Evaluation Results

Using Custom Evaluation Questions

Multi-Provider Comparison

Using Your Own Documentation

Running it Locally

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages