Skip to content

Commit b056d7d

Browse files
author
Zvi Fried
committed
context fixes
1 parent cd972ba commit b056d7d

17 files changed

+76
-53
lines changed

src/mcp_as_a_judge/prompts/shared/critical_tool_warnings.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,4 @@
22

33
- Skipping this tool causes severe token inefficiency and wasted iterations.
44
- Always invoke this tool at the appropriate stage to avoid extreme token loss and redundant processing.
5+
- Do not rely on assistant memory for identifiers. Always pass the exact `task_id` and recover it via `get_current_coding_task` if missing.

src/mcp_as_a_judge/prompts/system/judge_coding_plan.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ Evaluate submissions against the following comprehensive SWE best practices:
4242
- Is there evidence of understanding industry best practices?
4343
- Are trade-offs between different approaches analyzed?
4444
- Does the research demonstrate avoiding reinventing the wheel?
45+
- Does research explicitly cover all major aspects implied by the user requirements, not just a subset (e.g., cover each system, protocol, framework, or integration mentioned)?
4546

4647
**🏗️ Internal Codebase Analysis (ONLY evaluate if Status: REQUIRED):**
4748
- Validate that existing codebase patterns are properly considered
@@ -156,7 +157,7 @@ IMPORTANT applicability rule:
156157
### 1. User Requirements Alignment
157158

158159
- Does the plan directly address the user's stated requirements?
159-
- Are all user requirements covered in the implementation plan?
160+
- Are all user requirements decomposed into explicit sub-aspects (components, integrations, protocols, patterns) and covered in the implementation plan and research?
160161
- Is the solution appropriate for what the user actually wants to achieve?
161162
- Flag any misalignment between user needs and proposed solution
162163

src/mcp_as_a_judge/prompts/system/research_requirements_analysis.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ Always emphasize research quality over pure quantity:
8686
- Recency and relevance to current technology versions
8787
- Practical applicability to the specific task context
8888
- Coverage of implementation details and edge cases
89+
- Multi-aspect coverage: Ensure the research plan explicitly maps to ALL major aspects implied by the user requirements (each referenced system, framework, protocol, integration), rather than focusing on a single subset.
8990

9091
## Analysis Output Requirements
9192

@@ -107,4 +108,4 @@ You must respond with a JSON object that matches this schema:
107108
- **Context Sensitivity**: Consider the specific repository and project needs
108109
- **Practical Balance**: Don't over-research simple tasks or under-research complex ones
109110
- **Clear Reasoning**: Always explain why a specific count is recommended
110-
- **Adaptive Approach**: Different tasks need different research strategies
111+
- **Adaptive Approach**: Different tasks need different research strategies

src/mcp_as_a_judge/prompts/system/research_validation.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ Evaluate if the research is comprehensive enough and if the design is properly b
4343
- **REJECT IF MISSING**: No URLs provided means no online research was performed - REJECT immediately
4444
- **ONLINE RESEARCH EVIDENCE**: Do URLs demonstrate actual online research into implementation approaches and existing libraries?
4545
- **EXISTING SOLUTIONS FOCUS**: Do URLs show research into current repo capabilities, well-known libraries, and best practices?
46+
- **FULL REQUIREMENTS COVERAGE**: Do the provided URLs collectively cover ALL major aspects implied by the user requirements (each named system, framework, protocol, integration), rather than focusing on a single subset?
4647
- **REJECT IMMEDIATELY**: Missing URLs, insufficient online research, or failure to investigate existing solutions first
4748

4849
## Response Requirements

src/mcp_as_a_judge/prompts/system/workflow_guidance.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -62,9 +62,9 @@ CREATED → PLANNING → PLAN_APPROVED → IMPLEMENTING → REVIEW_READY → TES
6262
- For XS/S tasks: Skip planning, proceed to implementation (next_tool: null, but guidance must explain: implement → judge_code_change → judge_testing_implementation → judge_coding_task_completion)
6363
- For M/L/XL tasks: Recommend planning tools (judge_coding_plan)
6464
- **PLANNING** → Validate plan or gather more requirements
65-
- **PLAN_APPROVED** → Start implementation (implement ALL code AND tests, ensure tests pass)
66-
- **IMPLEMENTING**Continue implementation until ALL code AND tests are complete and passing, then call judge_code_change
67-
- **REVIEW_READY**Validate implementation code (judge_code_change for code review ONLY, not tests)
65+
- **PLAN_APPROVED** → Start implementation (begin coding; tests may be written before or after review)
66+
- **IMPLEMENTING**After code changes are ready, call judge_code_change to review implementation; then proceed to testing
67+
- **REVIEW_READY**Optional state if used by client; otherwise proceed directly from IMPLEMENTING to judge_code_change
6868
- **TESTING** → Validate test results and coverage (judge_testing_implementation ONLY)
6969
- **COMPLETED** → Workflow finished (next_tool: null)
7070
- **BLOCKED** → Resolve obstacles (raise_obstacle)
@@ -86,6 +86,7 @@ When recommending judge_coding_plan, the preparation_needed MUST include ALL ele
8686
- Detailed implementation plan with code examples
8787
- System design with architecture and data flow
8888
- List of files to be modified or created
89+
- Research coverage plan that maps to ALL major aspects in the user requirements (each referenced system, framework, protocol, integration). Avoid focusing on a single subset; ensure multi-aspect coverage.
8990

9091
**Conditionally Required (check task metadata):**
9192
- **If research_required = true**: Gather research URLs (minimum based on research_scope)
@@ -130,14 +131,13 @@ preparation_needed: [
130131
- judge_code_change has been approved
131132
- Code review is complete and implementation approved
132133
- Ready for test results and coverage validation
133-
- The task is transitioning from REVIEW_READY to TESTING state
134+
- The task is in or transitioning to TESTING state
134135

135136
**DO NOT call judge_code_change for:**
136-
- Individual file changes during implementation
137-
- Partial implementations
138-
- Work-in-progress code
139-
- Single file modifications
140-
- Before testing validation is complete
137+
- Clearly incomplete, non-compilable, or placeholder code
138+
- Changes unrelated to the approved plan
139+
140+
Note: You may call judge_code_change for a logical code change even if tests are not yet written or are failing. Tests are validated separately after code review.
141141

142142
### Task Completion Logic
143143

src/mcp_as_a_judge/prompts/tool_descriptions/get_current_coding_task.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
## Description
44
Retrieve the most recently active coding task UUID (task_id) and metadata from conversation history. Use when the task_id is missing from context.
55

6+
{% include 'shared/critical_tool_warnings.md' %}
7+
68
## When to use
79
- Need the task_id for follow-up tool calls
810
- Want to resume the last active coding task

src/mcp_as_a_judge/prompts/tool_descriptions/judge_code_change.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# Judge Code Change
22

33
## Description
4-
Review implementation code (not tests) once all implementation work is complete and tests are passing. Called when `workflow_guidance.next_tool == "judge_code_change"`.
4+
Review implementation code (not tests) when implementation changes are ready for review. Tests are validated separately by `judge_testing_implementation`. Called when `workflow_guidance.next_tool == "judge_code_change"`.
55

66
{% include 'shared/critical_tool_warnings.md' %}
77

88
## When to use
9-
- All implementation files written/modified, tests exist and pass; ready for review
9+
- After creating or modifying implementation code and a review is needed. Tests may be written before or after review; they are validated via `judge_testing_implementation`.
1010

1111
## Human-in-the-Loop (HITL) checks
1212
- If foundational choices are unclear or need confirmation (e.g., framework/library, UI vs CLI, web vs desktop, API style, auth, hosting), first call `raise_missing_requirements` to elicit the user’s intent
@@ -25,6 +25,6 @@ Review implementation code (not tests) once all implementation work is complete
2525
{{ JUDGE_RESPONSE_SCHEMA }}
2626
```
2727

28-
## Notes
29-
- Review only implementation code here; tests are validated via `judge_testing_implementation`. Always use the exact `task_id`.
28+
- Review only implementation code here; tests are validated via `judge_testing_implementation`.
29+
- Always use the exact `task_id`; recover it via `get_current_coding_task` if missing.
3030
- If HITL was performed, update the task description/requirements via `set_coding_task` if text needs to be clarified for future steps

src/mcp_as_a_judge/prompts/tool_descriptions/judge_coding_plan.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
## Description
44
Validate a proposed plan and design against requirements, research needs, and risks. Called when `workflow_guidance.next_tool == "judge_coding_plan"`.
55

6+
{% include 'shared/critical_tool_warnings.md' %}
7+
68
## Prerequisites
79
- Thoroughly analyze requirements, propose a concrete plan, and produce a system design
810

src/mcp_as_a_judge/prompts/tool_descriptions/judge_coding_task_completion.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,5 @@ Final validation gate before declaring a task complete. Called when `workflow_gu
2424
```
2525

2626
## Notes
27-
- Do not present completion summaries to the user without calling this tool. Always use the exact `task_id`.
27+
- The AI coding assistant MUST NOT present or claim task completion, or provide a final completion summary to the user, without successfully calling this tool and receiving approval.
28+
- Always use the exact `task_id`; if missing due to memory limits, recover it via `get_current_coding_task`.

src/mcp_as_a_judge/prompts/tool_descriptions/judge_testing_implementation.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,8 @@
33
## Description
44
Validate test quality, coverage, and execution results after code review is approved. Called when `workflow_guidance.next_tool == "judge_testing_implementation"`.
55

6+
{% include 'shared/critical_tool_warnings.md' %}
7+
68
## Args
79
- `task_id`: string — Task UUID (required)
810
- `test_summary`: string — Summary of the implemented tests (required)
@@ -21,4 +23,5 @@ Validate test quality, coverage, and execution results after code review is appr
2123
```
2224

2325
## Notes
24-
- Use after `judge_code_change` is approved. Follow `workflow_guidance.next_tool` for the next step. Always use the exact `task_id`.
26+
- Use after `judge_code_change` is approved. Follow `workflow_guidance.next_tool` for the next step.
27+
- Always use the exact `task_id`; recover it via `get_current_coding_task` if missing.

0 commit comments

Comments
 (0)