Skip to content

Commit 8b4da37

Browse files
author
Zvi Fried
committed
feat/general-refinement - Fix deterministic JSON parsing and remove exception swallowing
- Add ResearchValidationResponse Pydantic model for proper validation - Create robust _extract_json_from_response() function to handle: * Markdown code blocks * Plain JSON objects * JSON embedded in explanatory text * Proper error handling for malformed responses - Replace manual json.loads() + dict.get() with Pydantic model_validate_json() - Remove exception swallowing that masked real parsing errors - Remove inappropriate raise_obstacle suggestions from parsing errors - Apply consistent parsing pattern to all LLM sampling functions: * _validate_research_quality * _evaluate_workflow_guidance * _evaluate_coding_plan * judge_code_change - Add comprehensive test suite (tests/test_json_extraction.py) with 8 test cases - Fix context injection issues by using proper Context type annotations - All 37 tests passing, mypy clean Resolves the Invalid JSON expected value at line 1 column 1 error caused by LLMs returning JSON wrapped in markdown code blocks.
1 parent 3fe3e40 commit 8b4da37

File tree

8 files changed

+507
-313
lines changed

8 files changed

+507
-313
lines changed

README.md

Lines changed: 15 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,11 @@
55
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
66
[![Python 3.12+](https://img.shields.io/badge/python-3.12+-blue.svg)](https://www.python.org/downloads/)
77
[![MCP Compatible](https://img.shields.io/badge/MCP-Compatible-green.svg)](https://modelcontextprotocol.io/)
8-
[![Docker](https://img.shields.io/badge/docker-supported-blue.svg)](https://www.docker.com/)
98

109
[![CI](https://github.com/hepivax/mcp-as-a-judge/workflows/CI/badge.svg)](https://github.com/hepivax/mcp-as-a-judge/actions/workflows/ci.yml)
1110
[![Release](https://github.com/hepivax/mcp-as-a-judge/workflows/Release/badge.svg)](https://github.com/hepivax/mcp-as-a-judge/actions/workflows/release.yml)
1211
[![PyPI version](https://badge.fury.io/py/mcp-as-a-judge.svg)](https://badge.fury.io/py/mcp-as-a-judge)
13-
[![Docker Image](https://img.shields.io/badge/docker-ghcr.io-blue?logo=docker)](https://github.com/hepivax/mcp-as-a-judge/pkgs/container/mcp-as-a-judge)
12+
1413
[![codecov](https://codecov.io/gh/hepivax/mcp-as-a-judge/branch/main/graph/badge.svg)](https://codecov.io/gh/hepivax/mcp-as-a-judge)
1514

1615
**MCP as a Judge** is a revolutionary Model Context Protocol (MCP) server that **transforms the developer-AI collaboration experience**. It acts as an intelligent gatekeeper for software development, preventing bad coding practices by using AI-powered evaluation and involving users in critical decisions when requirements are unclear or obstacles arise.
@@ -62,7 +61,7 @@
6261

6362
### **⚖️ Five Powerful Judge Tools**
6463

65-
1. **`check_swe_compliance`** - Workflow guidance and best practices
64+
1. **`get_workflow_guidance`** - Smart workflow analysis and tool recommendation
6665
2. **`judge_coding_plan`** - Comprehensive plan evaluation with requirements alignment
6766
3. **`judge_code_change`** - Code review with security and quality checks
6867
4. **`raise_obstacle`** - User involvement when blockers arise
@@ -110,46 +109,7 @@ uv add mcp-as-a-judge
110109
mcp-as-a-judge
111110
```
112111

113-
#### **Method 2: Using Docker (Recommended for Production)**
114-
115-
**Quick Start with Docker:**
116-
117-
```bash
118-
# Pull and run the latest image
119-
docker run -it --name mcp-as-a-judge ghcr.io/hepivax/mcp-as-a-judge:latest
120-
```
121-
122-
**Build from Source:**
123-
124-
```bash
125-
# Clone the repository
126-
git clone https://github.com/hepivax/mcp-as-a-judge.git
127-
cd mcp-as-a-judge
128-
129-
# Build the Docker image
130-
docker build -t mcp-as-a-judge:latest .
131-
132-
# Run with custom configuration
133-
docker run -it \
134-
--name mcp-as-a-judge \
135-
-e LOG_LEVEL=INFO \
136-
--restart unless-stopped \
137-
mcp-as-a-judge:latest
138-
```
139-
140-
**Using Docker Compose:**
141-
142-
```bash
143-
# For production (uses pre-built image from GitHub Container Registry)
144-
docker-compose --profile production up -d
145-
146-
# For development (builds from source)
147-
git clone https://github.com/hepivax/mcp-as-a-judge.git
148-
cd mcp-as-a-judge
149-
docker-compose --profile development up
150-
```
151-
152-
#### **Method 3: Using pip (Alternative)**
112+
#### **Method 2: Using pip (Alternative)**
153113

154114
```bash
155115
# Install from PyPI
@@ -159,7 +119,7 @@ pip install mcp-as-a-judge
159119
mcp-as-a-judge
160120
```
161121

162-
#### **Method 4: From Source (Development)**
122+
#### **Method 3: From Source (Development)**
163123

164124
```bash
165125
# Clone the repository for development
@@ -175,9 +135,7 @@ uv run mcp-as-a-judge
175135

176136
## 🔧 **VS Code Configuration**
177137

178-
Configure MCP as a Judge in VS Code with GitHub Copilot using one of these methods:
179-
180-
### **Option 1: Using uv (Recommended)**
138+
Configure MCP as a Judge in VS Code with GitHub Copilot:
181139

182140
1. **Install the package:**
183141

@@ -200,29 +158,6 @@ Configure MCP as a Judge in VS Code with GitHub Copilot using one of these metho
200158
}
201159
```
202160

203-
### **Option 2: Using Docker**
204-
205-
1. **Pull the Docker image:**
206-
207-
```bash
208-
docker pull ghcr.io/hepivax/mcp-as-a-judge:latest
209-
```
210-
211-
2. **Configure VS Code MCP settings:**
212-
213-
Add this to your VS Code MCP configuration file:
214-
215-
```json
216-
{
217-
"servers": {
218-
"mcp-as-a-judge": {
219-
"command": "docker",
220-
"args": ["run", "--rm", "-i", "ghcr.io/hepivax/mcp-as-a-judge:latest"]
221-
}
222-
}
223-
}
224-
```
225-
226161
### **📍 VS Code MCP Configuration Location**
227162

228163
The MCP configuration file is typically located at:
@@ -262,31 +197,21 @@ CORS_ENABLED=false # Enable CORS (production: false)
262197
CORS_ORIGINS=* # CORS allowed origins
263198
```
264199

265-
**Docker Environment File (.env):**
266-
267-
```bash
268-
# Copy .env.example to .env and customize
269-
cp .env.example .env
270-
271-
# Example .env file:
272-
TRANSPORT=sse
273-
PORT=8050
274-
LOG_LEVEL=INFO
275-
DEBUG=false
276-
```
277-
278200
## 📖 **How It Works**
279201

280202
Once MCP as a Judge is configured in VS Code with GitHub Copilot, it automatically guides your AI assistant through a structured software engineering workflow. The system operates transparently in the background, ensuring every development task follows best practices.
281203

282204
### **🔄 Automatic Workflow Enforcement**
283205

284-
**1. Initial Task Analysis**
285-
- When you make any development request, the AI assistant automatically calls `check_swe_compliance`
286-
- This tool analyzes your request and provides specific guidance on which validation steps are required
287-
- No manual intervention needed - the workflow starts automatically
206+
**1. Intelligent Workflow Guidance**
207+
208+
- When you make any development request, the AI assistant automatically calls `get_workflow_guidance`
209+
- This tool uses AI analysis to determine which validation steps are required for your specific task
210+
- Provides smart recommendations on which tools to use next and in what order
211+
- No manual intervention needed - the workflow starts automatically with intelligent guidance
288212

289213
**2. Planning & Design Phase**
214+
290215
- For any implementation task, the AI assistant must first help you create:
291216
- **Detailed coding plan** - Step-by-step implementation approach
292217
- **System design** - Architecture, components, and technical decisions
@@ -295,6 +220,7 @@ Once MCP as a Judge is configured in VS Code with GitHub Copilot, it automatical
295220
- **AI-powered evaluation** checks for design quality, security, research thoroughness, and requirements alignment
296221

297222
**3. Code Implementation Review**
223+
298224
- After any code is written or modified, `judge_code_change` is automatically triggered
299225
- **Mandatory code review** happens immediately after file creation/modification
300226
- Uses MCP Sampling to evaluate code quality, security vulnerabilities, and best practices
@@ -303,11 +229,13 @@ Once MCP as a Judge is configured in VS Code with GitHub Copilot, it automatical
303229
### **🤝 User Involvement When Needed**
304230

305231
**Obstacle Resolution**
232+
306233
- When the AI assistant encounters blockers or conflicting requirements, `raise_obstacle` automatically engages you
307234
- Uses MCP Elicitation to present options and get your decision
308235
- No hidden fallbacks - you're always involved in critical decisions
309236

310237
**Requirements Clarification**
238+
311239
- If your request lacks sufficient detail, `elicit_missing_requirements` automatically asks for clarification
312240
- Uses MCP Elicitation to gather specific missing information
313241
- Ensures implementation matches your actual needs

src/mcp_as_a_judge/models.py

Lines changed: 34 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
serialization, and API contracts.
66
"""
77

8-
from pydantic import BaseModel, Field
8+
from pydantic import BaseModel, Field, ValidationError
99

1010

1111
class JudgeResponse(BaseModel):
@@ -61,25 +61,47 @@ class RequirementsClarification(BaseModel):
6161
)
6262

6363

64-
class ComplianceCheckResult(BaseModel):
65-
"""Result model for SWE compliance checks.
64+
class WorkflowGuidance(BaseModel):
65+
"""Schema for workflow guidance responses.
6666
67-
Used by the check_swe_compliance tool to provide
68-
structured guidance on software engineering best practices.
67+
Used by the get_workflow_guidance tool to provide
68+
structured guidance on which tools to use next.
6969
"""
7070

71-
compliance_status: str = Field(
72-
description="Overall compliance status: 'compliant', 'needs_improvement', 'non_compliant'"
71+
next_tool: str = Field(
72+
description="The specific MCP tool that should be called next: 'judge_coding_plan', 'judge_code_change', 'raise_obstacle', or 'elicit_missing_requirements'"
7373
)
74-
recommendations: list[str] = Field(
75-
default_factory=list, description="Specific recommendations for improvement"
74+
reasoning: str = Field(
75+
description="Clear explanation of why this tool should be used next"
7676
)
77-
next_steps: list[str] = Field(
77+
preparation_needed: list[str] = Field(
7878
default_factory=list,
79-
description="Recommended next steps in the development workflow",
79+
description="List of things that need to be prepared before calling the recommended tool",
8080
)
8181
guidance: str = Field(
82-
description="Detailed guidance on software engineering best practices"
82+
description="Detailed step-by-step guidance for the AI assistant"
83+
)
84+
85+
86+
class ResearchValidationResponse(BaseModel):
87+
"""Schema for research validation responses.
88+
89+
Used by the _validate_research_quality function to parse
90+
LLM responses about research quality and design alignment.
91+
"""
92+
93+
research_adequate: bool = Field(
94+
description="Whether the research is comprehensive enough"
95+
)
96+
design_based_on_research: bool = Field(
97+
description="Whether the design is properly based on research"
98+
)
99+
issues: list[str] = Field(
100+
default_factory=list,
101+
description="List of specific issues if any"
102+
)
103+
feedback: str = Field(
104+
description="Detailed feedback on research quality and design alignment"
83105
)
84106

85107

0 commit comments

Comments
 (0)