Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,11 @@ dependencies = [
"langfuse==3.10.6",
"instructor>=1.13.0",
"jsonschema>=4.17.3",
"ragas>=0.4.3",
"litellm>=1.81.12",
Comment on lines +24 to +25
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Evaluation-only dependencies added to the main project's runtime dependencies.

ragas and litellm are only used by the rag_eval/ evaluation package, not by the MCP server itself. Adding them as top-level runtime dependencies unnecessarily increases the install footprint for all deployments (including production Lambda/container).

Move these to an optional dependency group:

Suggested fix
     "instructor>=1.13.0",
     "jsonschema>=4.17.3",
-    "ragas>=0.4.3",
-    "litellm>=1.81.12",
 ]

 [project.optional-dependencies]
+eval = [
+    "ragas>=0.4.3",
+    "litellm>=1.81.12",
+]
 dev = [
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 24 - 25, The dependencies "ragas" and "litellm"
are evaluation-only and should be removed from the top-level runtime
dependencies in pyproject.toml; instead add them under an optional dependency
group (e.g., project.optional-dependencies or tool.poetry.extras) as an
"rag_eval" or "evaluation" extra so only users who need the rag_eval/ package
install them; update references to the dependency names "ragas" and "litellm"
accordingly and ensure the main MCP runtime dependency list no longer contains
these two entries.

"nest-asyncio>=1.6.0",
"boto3>=1.28.0",
"mlflow>=3.1.4",
]

[project.optional-dependencies]
Expand Down Expand Up @@ -128,3 +133,12 @@ quote-style = "double"
indent-style = "space"
skip-magic-trailing-comma = false
line-ending = "auto"

[dependency-groups]
dev = [
"pytest>=8.4.2",
"pytest-asyncio>=1.2.0",
"pytest-cov>=7.0.0",
"pytest-mock>=3.14.0",
"responses>=0.25.3",
]
94 changes: 94 additions & 0 deletions rag_eval/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# RAG Evaluation

Evaluate a RAG (Retrieval Augmented Generation) system with custom metrics

## Quick Start

### 1. Set Your API Key

Choose your LLM provider:

```bash
# OpenAI (default)
export OPENAI_API_KEY="your-openai-key"

# Or use Anthropic Claude
export ANTHROPIC_API_KEY="your-anthropic-key"

# Or use Google Gemini
export GOOGLE_API_KEY="your-google-key"
```
Comment on lines +7 to +20
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Quick Start API key instructions are misleading — code uses AWS Bedrock, not OpenAI/Anthropic/Google.

The actual implementation in ragas_utils.py uses bedrock/amazon.nova-pro-v1:0 via litellm, not OpenAI or Anthropic directly. This section should document AWS credential setup (e.g., AWS_DEFAULT_REGION, AWS_ACCESS_KEY_ID, or AWS profile configuration) instead.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rag_eval/README.md` around lines 7 - 20, The quick-start API key instructions
are incorrect because the code (ragas_utils.py) uses AWS Bedrock via litellm
with model "bedrock/amazon.nova-pro-v1:0" rather than OpenAI/Anthropic/Google;
update the README section to instruct users to configure AWS credentials (e.g.,
AWS_DEFAULT_REGION, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY or use an AWS CLI
profile) and mention any litellm-specific env vars or credential providers
required for Bedrock access, and replace the OpenAI/ANTHROPIC/GOOGLE examples
with Bedrock-specific setup and a short note referencing ragas_utils.py and the
Bedrock model string so readers know which provider is actually used.


### 2. Install Dependencies

Using `uv` (recommended):

```bash
uv sync
```

Or using `pip`:

```bash
pip install -e .
```

### 3. Run the Evaluation

Using `uv`:

```bash
uv run python evals.py
```

Or using `pip`:

```bash
python evals.py
```

## Project Structure

```
rag_eval/
├── README.md # This file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These directory listings are nice but sort of hard to maintain. It looks like these are already out of date 😅

├── pyproject.toml # Project configuration
├── rag.py # Your RAG application code
├── evals.py # Evaluation workflow
├── __init__.py # Makes this a Python package
└── evals/ # Evaluation-related data
├── datasets/ # Test datasets
├── experiments/ # Experiment results
└── logs/ # Evaluation logs and traces
```
Comment on lines +52 to +63
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a language specifier to the fenced code block and correct the project structure.

The code block at Line 52 is missing a language specifier (flagged by markdownlint MD040). Also, rag.py is listed in the structure but does not exist in this PR — should this be mcp_client.py or another file?

Suggested fix
-```
+```text
 rag_eval/
 ├── README.md           # This file
 ├── pyproject.toml      # Project configuration
-├── rag.py              # Your RAG application code
+├── mcp_client.py       # MCP client for RAG queries
+├── models.py           # Pydantic models for evaluation
+├── ragas_utils.py      # Ragas metrics utilities
 ├── evals.py            # Evaluation workflow
 ├── __init__.py         # Makes this a Python package
 └── evals/              # Evaluation-related data
     ├── datasets/       # Test datasets
     ├── experiments/    # Experiment results
     └── logs/           # Evaluation logs and traces
-```
+```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```
rag_eval/
├── README.md # This file
├── pyproject.toml # Project configuration
├── rag.py # Your RAG application code
├── evals.py # Evaluation workflow
├── __init__.py # Makes this a Python package
└── evals/ # Evaluation-related data
├── datasets/ # Test datasets
├── experiments/ # Experiment results
└── logs/ # Evaluation logs and traces
```
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 52-52: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rag_eval/README.md` around lines 52 - 63, Update the README's directory tree
fenced code block to include a language specifier (e.g., ```text) and correct
the file list: replace the non-existent rag.py entry with the actual files
present in the PR (mcp_client.py, models.py, ragas_utils.py) so the tree
accurately reflects the package; ensure the closing fence remains and the
comments (e.g., "# MCP client for RAG queries") match the new filenames to help
readers locate mcp_client.py, models.py, and ragas_utils.py.


## Customization

### Modify the LLM Provider

In `evals.py`, update the LLM configuration:

```python
from ragas.llms import llm_factory

# Use Anthropic Claude
llm = llm_factory("claude-3-5-sonnet-20241022", provider="anthropic")

# Use Google Gemini
llm = llm_factory("gemini-1.5-pro", provider="google")

# Use local Ollama
llm = llm_factory("mistral", provider="ollama", base_url="http://localhost:11434")
```

### Customize Test Cases

Edit the `load_dataset()` function in `evals.py` to add or modify test cases.

### Change Evaluation Metrics

Update the `my_metric` definition in `evals.py` to use different grading criteria.

## Documentation

Visit https://docs.ragas.io for more information.
3 changes: 3 additions & 0 deletions rag_eval/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
"""RAG evaluation package for Earthdata MCP server."""

__version__ = "1.0.0"
Comment on lines +1 to +3
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Version mismatch: __version__ is "1.0.0" but pyproject.toml declares "0.1.0".

These should be consistent. Since this is a new package, 0.1.0 is the more appropriate value.

Suggested fix
-__version__ = "1.0.0"
+__version__ = "0.1.0"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"""RAG evaluation package for Earthdata MCP server."""
__version__ = "1.0.0"
"""RAG evaluation package for Earthdata MCP server."""
__version__ = "0.1.0"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@rag_eval/__init__.py` around lines 1 - 3, The package version constant is
inconsistent: update the __version__ variable in rag_eval.__init__ to match
pyproject.toml by changing __version__ = "1.0.0" to __version__ = "0.1.0";
verify the value matches pyproject.toml and keep them synchronized going forward
(update whichever file should be authoritative if different).

Loading