Skip to content

Commit 4870313

Browse files
authored
Eval updates (#796)
* Improve evals Signed-off-by: Mihai Criveti <[email protected]> * Improve evals Signed-off-by: Mihai Criveti <[email protected]> * Improve evals Signed-off-by: Mihai Criveti <[email protected]> * Improve evals Signed-off-by: Mihai Criveti <[email protected]> * Improve evals Signed-off-by: Mihai Criveti <[email protected]> --------- Signed-off-by: Mihai Criveti <[email protected]>
1 parent 0a049a7 commit 4870313

27 files changed

+6290
-82
lines changed

mcp-servers/python/mcp_eval_server/.env.example

Lines changed: 106 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,61 @@
11
# MCP Evaluation Server Environment Configuration
22
# Copy this file to .env and configure your settings
33

4+
# ═══════════════════════════════════════════════════════════════════════════════
5+
# LLM Provider Configuration
6+
# ═══════════════════════════════════════════════════════════════════════════════
7+
48
# OpenAI Configuration
59
OPENAI_API_KEY=sk-your-openai-api-key-here
6-
# OPENAI_ORG_ID=org-your-organization-id # Optional
10+
# OPENAI_ORGANIZATION=org-your-organization-id # Optional
11+
# OPENAI_BASE_URL=https://api.openai.com/v1 # Optional custom endpoint
712

813
# Azure OpenAI Configuration
14+
# AZURE_OPENAI_API_KEY=your-azure-openai-key
915
# AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
10-
# AZURE_OPENAI_KEY=your-azure-openai-key
11-
# AZURE_OPENAI_API_VERSION=2024-02-01
16+
# AZURE_OPENAI_API_VERSION=2024-02-15-preview
17+
# AZURE_DEPLOYMENT_NAME=your-gpt-4-deployment
18+
19+
# Anthropic Configuration
20+
# ANTHROPIC_API_KEY=sk-ant-your-anthropic-api-key
21+
22+
# AWS Bedrock Configuration
23+
# AWS_ACCESS_KEY_ID=AKIA...
24+
# AWS_SECRET_ACCESS_KEY=...
25+
# AWS_REGION=us-east-1
26+
27+
# OLLAMA Configuration
28+
# OLLAMA_BASE_URL=http://localhost:11434
1229

13-
# Anthropic Configuration (for future support)
14-
# ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
30+
# Google Gemini Configuration
31+
# GOOGLE_API_KEY=your-google-api-key
32+
33+
# IBM Watsonx.ai Configuration
34+
# WATSONX_API_KEY=your-watsonx-api-key
35+
# WATSONX_PROJECT_ID=your-project-id
36+
# WATSONX_URL=https://us-south.ml.cloud.ibm.com
1537

1638
# Default Judge Model Selection
17-
DEFAULT_JUDGE_MODEL=gpt-4
18-
# Alternative options: gpt-3.5-turbo, gpt-4-turbo, gpt-4-azure, rule-based
39+
DEFAULT_JUDGE_MODEL=gpt-4o-mini
40+
# Alternative options: claude-4-1-bedrock, gemini-1-5-pro, gemini-1-5-flash, gpt-4, gpt-3.5-turbo,
41+
# gpt-4-turbo, claude-3-sonnet, claude-3-haiku, claude-3-opus, gpt-4-azure,
42+
# claude-3-sonnet-bedrock, llama-3-1-70b-watsonx, granite-3-0-8b-watsonx,
43+
# mixtral-8x7b-watsonx, llama3-8b, mistral-7b, rule-based
44+
45+
# ═══════════════════════════════════════════════════════════════════════════════
46+
# Custom Configuration Paths
47+
# ═══════════════════════════════════════════════════════════════════════════════
48+
49+
# Custom model configuration (defaults to built-in config/models.yaml)
50+
# MCP_EVAL_MODELS_CONFIG=/path/to/custom/models.yaml
51+
52+
# Custom configuration directory (for all config files)
53+
# MCP_EVAL_CONFIG_DIR=/path/to/custom/config/
54+
55+
# Custom rubrics, benchmarks, prompts (future enhancement)
56+
# MCP_EVAL_RUBRICS_CONFIG=/path/to/custom/rubrics.yaml
57+
# MCP_EVAL_BENCHMARKS_CONFIG=/path/to/custom/benchmarks.yaml
58+
# MCP_EVAL_PROMPTS_CONFIG=/path/to/custom/prompts.yaml
1959

2060
# Cache Configuration
2161
MCP_EVAL_CACHE_DIR=/app/data/cache
@@ -51,3 +91,62 @@ DEFAULT_CONFIDENCE_THRESHOLD=0.8
5191
# Security settings
5292
RATE_LIMIT_REQUESTS=100 # per hour
5393
RATE_LIMIT_TOKENS=50000 # per hour
94+
95+
# ═══════════════════════════════════════════════════════════════════════════════
96+
# Provider Setup Examples & Installation Notes
97+
# ═══════════════════════════════════════════════════════════════════════════════
98+
99+
# 🔧 Installation for Additional Providers:
100+
#
101+
# For Anthropic support:
102+
# pip install anthropic
103+
#
104+
# For AWS Bedrock support:
105+
# pip install boto3 botocore
106+
#
107+
# For OLLAMA support:
108+
# pip install aiohttp
109+
# # Also ensure OLLAMA is running: ollama serve
110+
#
111+
# For Google Gemini support:
112+
# pip install google-generativeai
113+
#
114+
# For IBM Watsonx.ai support:
115+
# pip install ibm-watsonx-ai
116+
#
117+
# For all providers:
118+
# pip install -e ".[all]"
119+
120+
# 📝 Provider Configuration Examples:
121+
#
122+
# OpenAI (Default - always available):
123+
# OPENAI_API_KEY=sk-proj-...
124+
#
125+
# Azure OpenAI (Enterprise):
126+
# AZURE_OPENAI_API_KEY=...
127+
# AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
128+
# AZURE_DEPLOYMENT_NAME=gpt-4-deployment
129+
#
130+
# Anthropic (Claude models):
131+
# ANTHROPIC_API_KEY=sk-ant-api...
132+
#
133+
# AWS Bedrock (Claude via AWS):
134+
# AWS_ACCESS_KEY_ID=AKIA...
135+
# AWS_SECRET_ACCESS_KEY=...
136+
# AWS_REGION=us-east-1
137+
#
138+
# Google Gemini (Google AI Studio):
139+
# GOOGLE_API_KEY=your-google-api-key
140+
#
141+
# IBM Watsonx.ai (Enterprise AI):
142+
# WATSONX_API_KEY=your-watsonx-api-key
143+
# WATSONX_PROJECT_ID=your-project-id
144+
# WATSONX_URL=https://us-south.ml.cloud.ibm.com
145+
#
146+
# OLLAMA (Local/Self-hosted):
147+
# OLLAMA_BASE_URL=http://localhost:11434
148+
# # Ensure models are pulled: ollama pull llama3:8b
149+
150+
# 🧪 Validation:
151+
# Run: make validate-models
152+
# This will test connectivity and show which judges are available

0 commit comments

Comments
 (0)