You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat/general-refinement - Fix deterministic JSON parsing and remove exception swallowing
- Add ResearchValidationResponse Pydantic model for proper validation
- Create robust _extract_json_from_response() function to handle:
* Markdown code blocks
* Plain JSON objects
* JSON embedded in explanatory text
* Proper error handling for malformed responses
- Replace manual json.loads() + dict.get() with Pydantic model_validate_json()
- Remove exception swallowing that masked real parsing errors
- Remove inappropriate raise_obstacle suggestions from parsing errors
- Apply consistent parsing pattern to all LLM sampling functions:
* _validate_research_quality
* _evaluate_workflow_guidance
* _evaluate_coding_plan
* judge_code_change
- Add comprehensive test suite (tests/test_json_extraction.py) with 8 test cases
- Fix context injection issues by using proper Context type annotations
- All 37 tests passing, mypy clean
Resolves the Invalid JSON expected value at line 1 column 1 error
caused by LLMs returning JSON wrapped in markdown code blocks.
**MCP as a Judge** is a revolutionary Model Context Protocol (MCP) server that **transforms the developer-AI collaboration experience**. It acts as an intelligent gatekeeper for software development, preventing bad coding practices by using AI-powered evaluation and involving users in critical decisions when requirements are unclear or obstacles arise.
@@ -62,7 +61,7 @@
62
61
63
62
### **⚖️ Five Powerful Judge Tools**
64
63
65
-
1.**`check_swe_compliance`** - Workflow guidance and best practices
64
+
1.**`get_workflow_guidance`** - Smart workflow analysis and tool recommendation
66
65
2.**`judge_coding_plan`** - Comprehensive plan evaluation with requirements alignment
67
66
3.**`judge_code_change`** - Code review with security and quality checks
68
67
4.**`raise_obstacle`** - User involvement when blockers arise
@@ -110,46 +109,7 @@ uv add mcp-as-a-judge
110
109
mcp-as-a-judge
111
110
```
112
111
113
-
#### **Method 2: Using Docker (Recommended for Production)**
114
-
115
-
**Quick Start with Docker:**
116
-
117
-
```bash
118
-
# Pull and run the latest image
119
-
docker run -it --name mcp-as-a-judge ghcr.io/hepivax/mcp-as-a-judge:latest
The MCP configuration file is typically located at:
@@ -262,31 +197,21 @@ CORS_ENABLED=false # Enable CORS (production: false)
262
197
CORS_ORIGINS=*# CORS allowed origins
263
198
```
264
199
265
-
**Docker Environment File (.env):**
266
-
267
-
```bash
268
-
# Copy .env.example to .env and customize
269
-
cp .env.example .env
270
-
271
-
# Example .env file:
272
-
TRANSPORT=sse
273
-
PORT=8050
274
-
LOG_LEVEL=INFO
275
-
DEBUG=false
276
-
```
277
-
278
200
## 📖 **How It Works**
279
201
280
202
Once MCP as a Judge is configured in VS Code with GitHub Copilot, it automatically guides your AI assistant through a structured software engineering workflow. The system operates transparently in the background, ensuring every development task follows best practices.
281
203
282
204
### **🔄 Automatic Workflow Enforcement**
283
205
284
-
**1. Initial Task Analysis**
285
-
- When you make any development request, the AI assistant automatically calls `check_swe_compliance`
286
-
- This tool analyzes your request and provides specific guidance on which validation steps are required
287
-
- No manual intervention needed - the workflow starts automatically
206
+
**1. Intelligent Workflow Guidance**
207
+
208
+
- When you make any development request, the AI assistant automatically calls `get_workflow_guidance`
209
+
- This tool uses AI analysis to determine which validation steps are required for your specific task
210
+
- Provides smart recommendations on which tools to use next and in what order
211
+
- No manual intervention needed - the workflow starts automatically with intelligent guidance
288
212
289
213
**2. Planning & Design Phase**
214
+
290
215
- For any implementation task, the AI assistant must first help you create:
description="The specific MCP tool that should be called next: 'judge_coding_plan', 'judge_code_change', 'raise_obstacle', or 'elicit_missing_requirements'"
73
73
)
74
-
recommendations: list[str]=Field(
75
-
default_factory=list, description="Specific recommendations for improvement"
74
+
reasoning: str=Field(
75
+
description="Clear explanation of why this tool should be used next"
76
76
)
77
-
next_steps: list[str] =Field(
77
+
preparation_needed: list[str] =Field(
78
78
default_factory=list,
79
-
description="Recommended next steps in the development workflow",
79
+
description="List of things that need to be prepared before calling the recommended tool",
80
80
)
81
81
guidance: str=Field(
82
-
description="Detailed guidance on software engineering best practices"
82
+
description="Detailed step-by-step guidance for the AI assistant"
83
+
)
84
+
85
+
86
+
classResearchValidationResponse(BaseModel):
87
+
"""Schema for research validation responses.
88
+
89
+
Used by the _validate_research_quality function to parse
90
+
LLM responses about research quality and design alignment.
91
+
"""
92
+
93
+
research_adequate: bool=Field(
94
+
description="Whether the research is comprehensive enough"
95
+
)
96
+
design_based_on_research: bool=Field(
97
+
description="Whether the design is properly based on research"
98
+
)
99
+
issues: list[str] =Field(
100
+
default_factory=list,
101
+
description="List of specific issues if any"
102
+
)
103
+
feedback: str=Field(
104
+
description="Detailed feedback on research quality and design alignment"
0 commit comments