Skip to content

Commit 83bc689

Browse files
authored
752 mcp eval server (#788)
* MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> * MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> * MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> * MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> * MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> * MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> * MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> * MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> * MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> * MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> * MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> * MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> * MCP Eval server Signed-off-by: Mihai Criveti <[email protected]> --------- Signed-off-by: Mihai Criveti <[email protected]>
1 parent 6070767 commit 83bc689

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+9556
-16
lines changed

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -527,7 +527,7 @@ images:
527527
# help: vulture - Dead code detection
528528

529529
# Allow specific file/directory targeting
530-
DEFAULT_TARGETS := mcpgateway
530+
DEFAULT_TARGETS := mcpgateway mcp-servers/python
531531
TARGET ?= $(DEFAULT_TARGETS)
532532

533533
# Add dummy targets for file arguments passed to lint commands only

docs/docs/using/tool-annotations.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -254,4 +254,4 @@ Many MCP clients use annotations to:
254254
- **Cache results** from read-only tools
255255
- **Adjust UI presentation** based on safety hints
256256

257-
Properly annotated tools provide better user experiences and safer AI agent interactions.
257+
Properly annotated tools provide better user experiences and safer AI agent interactions.
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# MCP Evaluation Server Environment Configuration
2+
# Copy this file to .env and configure your settings
3+
4+
# OpenAI Configuration
5+
OPENAI_API_KEY=sk-your-openai-api-key-here
6+
# OPENAI_ORG_ID=org-your-organization-id # Optional
7+
8+
# Azure OpenAI Configuration
9+
# AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com/
10+
# AZURE_OPENAI_KEY=your-azure-openai-key
11+
# AZURE_OPENAI_API_VERSION=2024-02-01
12+
13+
# Anthropic Configuration (for future support)
14+
# ANTHROPIC_API_KEY=sk-ant-your-anthropic-key
15+
16+
# Default Judge Model Selection
17+
DEFAULT_JUDGE_MODEL=gpt-4
18+
# Alternative options: gpt-3.5-turbo, gpt-4-turbo, gpt-4-azure, rule-based
19+
20+
# Cache Configuration
21+
MCP_EVAL_CACHE_DIR=/app/data/cache
22+
MCP_EVAL_CACHE_TTL=3600 # Cache TTL in seconds (1 hour)
23+
MCP_EVAL_CACHE_SIZE=1000 # Maximum cached items
24+
25+
# Database Configuration
26+
MCP_EVAL_RESULTS_DB=/app/data/results/evaluation_results.db
27+
28+
# Logging Configuration
29+
LOG_LEVEL=INFO
30+
PYTHONUNBUFFERED=1
31+
32+
# Performance Configuration
33+
MAX_CONCURRENT_EVALUATIONS=3
34+
EVALUATION_TIMEOUT=300 # seconds
35+
36+
# Development Configuration
37+
# DEVELOPMENT_MODE=true # Enable for development features
38+
39+
# Model-specific settings
40+
GPT4_TEMPERATURE=0.3
41+
GPT4_MAX_TOKENS=2000
42+
GPT35_TEMPERATURE=0.2
43+
GPT35_MAX_TOKENS=2000
44+
45+
# Evaluation defaults
46+
DEFAULT_CONSISTENCY_RUNS=3
47+
DEFAULT_TEMPERATURE_RANGE=0.1,0.5,0.9
48+
DEFAULT_RELEVANCE_THRESHOLD=0.7
49+
DEFAULT_CONFIDENCE_THRESHOLD=0.8
50+
51+
# Security settings
52+
RATE_LIMIT_REQUESTS=100 # per hour
53+
RATE_LIMIT_TOKENS=50000 # per hour
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.db
Lines changed: 298 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,298 @@
1+
# syntax=docker/dockerfile:1.7
2+
3+
###############################################################################
4+
# MCP Evaluation Server - OCI-compliant container build
5+
#
6+
# This multi-stage Dockerfile produces an ultra-slim, scratch-based runtime
7+
# image that automatically tracks the latest Python 3.11.x patch release
8+
# from the RHEL 9 repositories and is fully patched on each rebuild.
9+
#
10+
# Key design points:
11+
# - Builder stage has full DNF + devel headers for wheel compilation
12+
# - Runtime stage is scratch: only the Python runtime and app
13+
# - Both builder and runtime rootfs receive `dnf upgrade -y`
14+
# - Development headers are dropped from the final image
15+
# - Hadolint DL3041 is suppressed to allow "latest patch" RPM usage
16+
# - Includes ML/AI dependencies for sentence transformers and embeddings
17+
###############################################################################
18+
19+
###########################
20+
# Build-time arguments
21+
###########################
22+
# Temporary dir for assembling the scratch rootfs
23+
ARG ROOTFS_PATH=/tmp/rootfs
24+
# Python major.minor series to track
25+
ARG PYTHON_VERSION=3.11
26+
27+
###########################
28+
# Builder stage
29+
###########################
30+
FROM registry.access.redhat.com/ubi9/ubi:9.6-1753978585 AS builder
31+
SHELL ["/bin/bash", "-euo", "pipefail", "-c"]
32+
33+
ARG PYTHON_VERSION
34+
ARG ROOTFS_PATH
35+
36+
# ----------------------------------------------------------------------------
37+
# 1) Patch the OS
38+
# 2) Install Python + headers for building wheels (needed for ML dependencies)
39+
# 3) Install build tools for compiling scientific libraries
40+
# 4) Install binutils for strip command
41+
# 5) Register python3 alternative
42+
# 6) Clean caches to reduce layer size
43+
# ----------------------------------------------------------------------------
44+
# hadolint ignore=DL3041
45+
RUN set -euo pipefail \
46+
&& dnf upgrade -y \
47+
&& dnf install -y \
48+
python${PYTHON_VERSION} \
49+
python${PYTHON_VERSION}-devel \
50+
python${PYTHON_VERSION}-pip \
51+
gcc \
52+
binutils \
53+
&& update-alternatives --install /usr/bin/python3 python3 /usr/bin/python${PYTHON_VERSION} 1 \
54+
&& dnf clean all
55+
56+
WORKDIR /app
57+
58+
# ----------------------------------------------------------------------------
59+
# Copy only the files needed for dependency installation first
60+
# This maximizes Docker layer caching - dependencies change less often
61+
# ----------------------------------------------------------------------------
62+
COPY pyproject.toml /app/
63+
COPY README.md /app/
64+
65+
# ----------------------------------------------------------------------------
66+
# Create and populate virtual environment
67+
# - Upgrade pip, setuptools, wheel, pdm, uv
68+
# - Install project dependencies and package
69+
# - Include all evaluation dependencies (ML/AI packages)
70+
# - Remove build tools but keep runtime dist-info
71+
# - Remove build caches and build artifacts
72+
# ----------------------------------------------------------------------------
73+
RUN set -euo pipefail \
74+
&& python3 -m venv /app/.venv \
75+
&& /app/.venv/bin/pip install --no-cache-dir --upgrade pip setuptools wheel \
76+
&& /app/.venv/bin/pip install --no-cache-dir -e ".[dev]" \
77+
&& /app/.venv/bin/pip uninstall --yes pip setuptools wheel \
78+
&& rm -rf /root/.cache /var/cache/dnf \
79+
&& find /app/.venv -name "*.dist-info" -type d \
80+
\( -name "pip-*" -o -name "setuptools-*" -o -name "wheel-*" \) \
81+
-exec rm -rf {} + 2>/dev/null || true \
82+
&& rm -rf /app/.venv/share/python-wheels \
83+
&& rm -rf /app/*.egg-info /app/build /app/dist /app/.eggs
84+
85+
# ----------------------------------------------------------------------------
86+
# Now copy the application files needed for runtime
87+
# This ensures code changes don't invalidate the dependency layer
88+
# ----------------------------------------------------------------------------
89+
COPY mcp_eval_server/ /app/mcp_eval_server/
90+
COPY config/ /app/config/
91+
COPY __init__.py /app/
92+
93+
# ----------------------------------------------------------------------------
94+
# Create runtime script for MCP server
95+
# ----------------------------------------------------------------------------
96+
RUN cat > /app/run-server.sh << 'EOF'
97+
#!/bin/bash
98+
set -euo pipefail
99+
100+
# Set default values
101+
export OPENAI_API_KEY="${OPENAI_API_KEY:-}"
102+
export AZURE_OPENAI_ENDPOINT="${AZURE_OPENAI_ENDPOINT:-}"
103+
export AZURE_OPENAI_KEY="${AZURE_OPENAI_KEY:-}"
104+
105+
# Log startup information
106+
echo "Starting MCP Evaluation Server..."
107+
echo "Available judges: $(python3 -c 'from mcp_eval_server.tools.judge_tools import JudgeTools; jt=JudgeTools(); print(jt.get_available_judges())')"
108+
109+
# Run the MCP server
110+
exec python3 -m mcp_eval_server.server
111+
EOF
112+
113+
# ----------------------------------------------------------------------------
114+
# Ensure executable permissions for scripts
115+
# ----------------------------------------------------------------------------
116+
RUN chmod +x /app/run-server.sh
117+
118+
# ----------------------------------------------------------------------------
119+
# Pre-compile Python bytecode with -OO optimization
120+
# - Strips docstrings and assertions
121+
# - Improves startup performance
122+
# - Must be done before copying to rootfs
123+
# - Remove __pycache__ directories after compilation
124+
# ----------------------------------------------------------------------------
125+
RUN python3 -OO -m compileall -q /app/.venv /app/mcp_eval_server /app/config \
126+
&& find /app -type d -name "__pycache__" -exec rm -rf {} + 2>/dev/null || true
127+
128+
# ----------------------------------------------------------------------------
129+
# Build a minimal, fully-patched rootfs containing only the runtime Python
130+
# Include ca-certificates for HTTPS connections to OpenAI/Azure APIs
131+
# ----------------------------------------------------------------------------
132+
# hadolint ignore=DL3041
133+
RUN set -euo pipefail \
134+
&& mkdir -p "${ROOTFS_PATH:?}" \
135+
&& dnf --installroot="${ROOTFS_PATH:?}" --releasever=9 upgrade -y \
136+
&& dnf --installroot="${ROOTFS_PATH:?}" --releasever=9 install -y \
137+
--setopt=install_weak_deps=0 \
138+
--setopt=tsflags=nodocs \
139+
python${PYTHON_VERSION} \
140+
ca-certificates \
141+
&& dnf clean all --installroot="${ROOTFS_PATH:?}"
142+
143+
# ----------------------------------------------------------------------------
144+
# Create `python3` symlink in the rootfs for compatibility
145+
# ----------------------------------------------------------------------------
146+
RUN ln -s /usr/bin/python${PYTHON_VERSION} ${ROOTFS_PATH:?}/usr/bin/python3
147+
148+
# ----------------------------------------------------------------------------
149+
# Clean up unnecessary files from rootfs (if they exist)
150+
# - Remove development headers, documentation
151+
# - Use ${var:?} to prevent accidental deletion of host directories
152+
# ----------------------------------------------------------------------------
153+
RUN set -euo pipefail \
154+
&& rm -rf ${ROOTFS_PATH:?}/usr/include/* \
155+
${ROOTFS_PATH:?}/usr/share/man/* \
156+
${ROOTFS_PATH:?}/usr/share/doc/* \
157+
${ROOTFS_PATH:?}/usr/share/info/* \
158+
${ROOTFS_PATH:?}/usr/share/locale/* \
159+
${ROOTFS_PATH:?}/var/log/* \
160+
${ROOTFS_PATH:?}/boot \
161+
${ROOTFS_PATH:?}/media \
162+
${ROOTFS_PATH:?}/srv \
163+
${ROOTFS_PATH:?}/usr/games \
164+
&& find ${ROOTFS_PATH:?}/usr/lib*/python*/ -type d -name "test" -exec rm -rf {} + 2>/dev/null || true \
165+
&& find ${ROOTFS_PATH:?}/usr/lib*/python*/ -type d -name "tests" -exec rm -rf {} + 2>/dev/null || true \
166+
&& find ${ROOTFS_PATH:?}/usr/lib*/python*/ -type d -name "idle_test" -exec rm -rf {} + 2>/dev/null || true \
167+
&& find ${ROOTFS_PATH:?}/usr/lib*/python*/ -name "*.mo" -delete 2>/dev/null || true \
168+
&& rm -rf ${ROOTFS_PATH:?}/usr/lib*/python*/ensurepip \
169+
${ROOTFS_PATH:?}/usr/lib*/python*/idlelib \
170+
${ROOTFS_PATH:?}/usr/lib*/python*/tkinter \
171+
${ROOTFS_PATH:?}/usr/lib*/python*/turtle* \
172+
${ROOTFS_PATH:?}/usr/lib*/python*/distutils/command/*.exe
173+
174+
# ----------------------------------------------------------------------------
175+
# Remove package managers and unnecessary system tools from rootfs
176+
# - Keep essential tools for MCP server functionality
177+
# - Remove security-sensitive tools to minimize attack surface
178+
# ----------------------------------------------------------------------------
179+
RUN rm -rf ${ROOTFS_PATH:?}/usr/bin/dnf* \
180+
${ROOTFS_PATH:?}/usr/bin/yum* \
181+
${ROOTFS_PATH:?}/usr/bin/rpm* \
182+
${ROOTFS_PATH:?}/usr/bin/microdnf \
183+
${ROOTFS_PATH:?}/usr/lib/rpm \
184+
${ROOTFS_PATH:?}/usr/lib/dnf \
185+
${ROOTFS_PATH:?}/usr/lib/yum* \
186+
${ROOTFS_PATH:?}/etc/dnf \
187+
${ROOTFS_PATH:?}/etc/yum*
188+
189+
# ----------------------------------------------------------------------------
190+
# Strip unneeded symbols from shared libraries and remove build tools
191+
# - This reduces the final image size and removes build tools in one step
192+
# ----------------------------------------------------------------------------
193+
RUN find "${ROOTFS_PATH:?}/usr/lib64" -name '*.so*' -exec strip --strip-unneeded {} + 2>/dev/null || true \
194+
&& dnf remove -y gcc binutils \
195+
&& dnf clean all
196+
197+
# ----------------------------------------------------------------------------
198+
# Remove setuid/setgid binaries for security
199+
# ----------------------------------------------------------------------------
200+
RUN find ${ROOTFS_PATH:?} -perm /4000 -o -perm /2000 -type f -delete 2>/dev/null || true
201+
202+
# ----------------------------------------------------------------------------
203+
# Create minimal passwd/group files for user 1001
204+
# - Using GID 1001 to match UID for consistency
205+
# - OpenShift compatible (accepts any UID in group 1001)
206+
# ----------------------------------------------------------------------------
207+
RUN printf 'mcp-eval:x:1001:1001:mcp-eval:/app:/sbin/nologin\n' > "${ROOTFS_PATH:?}/etc/passwd" && \
208+
printf 'mcp-eval:x:1001:\n' > "${ROOTFS_PATH:?}/etc/group"
209+
210+
# ----------------------------------------------------------------------------
211+
# Create necessary directories in the rootfs
212+
# - /tmp and /var/tmp with sticky bit for security
213+
# - Create directories for evaluation data and cache
214+
# ----------------------------------------------------------------------------
215+
RUN chmod 1777 ${ROOTFS_PATH:?}/tmp ${ROOTFS_PATH:?}/var/tmp 2>/dev/null || true \
216+
&& chown 1001:1001 ${ROOTFS_PATH:?}/tmp ${ROOTFS_PATH:?}/var/tmp \
217+
&& mkdir -p ${ROOTFS_PATH:?}/app/data/cache \
218+
&& mkdir -p ${ROOTFS_PATH:?}/app/data/results \
219+
&& chown -R 1001:1001 ${ROOTFS_PATH:?}/app/data
220+
221+
# ----------------------------------------------------------------------------
222+
# Copy application directory into the rootfs and fix permissions for non-root
223+
# - Set ownership to 1001:1001 (matching passwd/group)
224+
# - Allow group write permissions for OpenShift compatibility
225+
# ----------------------------------------------------------------------------
226+
RUN cp -r /app ${ROOTFS_PATH:?}/app \
227+
&& chown -R 1001:1001 ${ROOTFS_PATH:?}/app \
228+
&& chmod -R g=u ${ROOTFS_PATH:?}/app
229+
230+
###########################
231+
# Final runtime (squashed)
232+
###########################
233+
FROM scratch AS runtime
234+
235+
ARG PYTHON_VERSION
236+
ARG ROOTFS_PATH
237+
238+
# ----------------------------------------------------------------------------
239+
# OCI image metadata
240+
# ----------------------------------------------------------------------------
241+
LABEL maintainer="Mihai Criveti" \
242+
org.opencontainers.image.title="mcp/mcp-eval-server" \
243+
org.opencontainers.image.description="MCP Evaluation Server: Comprehensive agent and prompt evaluation using LLM-as-a-judge" \
244+
org.opencontainers.image.licenses="Apache-2" \
245+
org.opencontainers.image.version="0.1.0" \
246+
org.opencontainers.image.source="https://github.com/contextforge/mcp-context-forge" \
247+
org.opencontainers.image.documentation="https://github.com/contextforge/mcp-context-forge/mcp-servers/python/mcp-eval-server" \
248+
org.opencontainers.image.vendor="MCP Context Forge"
249+
250+
# ----------------------------------------------------------------------------
251+
# Copy the entire prepared root filesystem from the builder stage
252+
# ----------------------------------------------------------------------------
253+
COPY --from=builder ${ROOTFS_PATH}/ /
254+
255+
# ----------------------------------------------------------------------------
256+
# Ensure our virtual environment binaries have priority in PATH
257+
# - Don't write bytecode files (we pre-compiled with -OO)
258+
# - Unbuffered output for better logging
259+
# - Random hash seed for security
260+
# - Disable pip cache to save space
261+
# - Set evaluation server specific environment variables
262+
# ----------------------------------------------------------------------------
263+
ENV PATH="/app/.venv/bin:${PATH}" \
264+
PYTHONDONTWRITEBYTECODE=1 \
265+
PYTHONUNBUFFERED=1 \
266+
PYTHONHASHSEED=random \
267+
PIP_NO_CACHE_DIR=1 \
268+
PIP_DISABLE_PIP_VERSION_CHECK=1 \
269+
MCP_EVAL_CACHE_DIR="/app/data/cache" \
270+
MCP_EVAL_RESULTS_DB="/app/data/results/evaluation_results.db" \
271+
DEFAULT_JUDGE_MODEL="gpt-4"
272+
273+
# ----------------------------------------------------------------------------
274+
# Application working directory
275+
# ----------------------------------------------------------------------------
276+
WORKDIR /app
277+
278+
# ----------------------------------------------------------------------------
279+
# Expose MCP server port (stdio by default, but useful for HTTP wrapper)
280+
# ----------------------------------------------------------------------------
281+
EXPOSE 8080
282+
283+
# ----------------------------------------------------------------------------
284+
# Run as non-root user (1001)
285+
# ----------------------------------------------------------------------------
286+
USER 1001
287+
288+
# ----------------------------------------------------------------------------
289+
# Health check for MCP server functionality
290+
# - Test that the server can import all modules and list tools
291+
# ----------------------------------------------------------------------------
292+
HEALTHCHECK --interval=60s --timeout=15s --start-period=120s --retries=3 \
293+
CMD ["python3", "-c", "from mcp_eval_server.server import judge_tools; print('MCP Eval Server healthy:', len(judge_tools.get_available_judges()), 'judges available')"]
294+
295+
# ----------------------------------------------------------------------------
296+
# Entrypoint - Run the MCP Evaluation Server
297+
# ----------------------------------------------------------------------------
298+
CMD ["./run-server.sh"]

0 commit comments

Comments
 (0)