Skip to content

Commit f5127ea

Browse files
realmarcinclaude
andcommitted
Add final rubric10 evaluation prompt (ready to use)
Created concise, actionable rubric10 evaluation prompt with: ✅ Schema compliance requirements ✅ Post-processing workflow (fix_evaluation_scores.py) ✅ Schema validation step ✅ Correct field names (summary_scores, element_scores, sub_elements) ✅ Quality checklist ✅ Step-by-step workflow ✅ Critical field name warnings Key differences from original prompt: - OLD: No schema → inconsistent JSON structures - NEW: Strict schema compliance (rubric10_semantic_schema.json) - OLD: No post-processing → math errors - NEW: Mandatory fix_evaluation_scores.py step - OLD: Variable field names → HTML renderer fails - NEW: Exact field names specified (summary_scores) This prompt ensures all 4 evaluations will have: 1. Consistent structure 2. Correct scores 3. All required metadata 4. Compatible with HTML renderer 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 8324c26 commit f5127ea

File tree

1 file changed

+238
-0
lines changed

1 file changed

+238
-0
lines changed
Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
# Rubric10-Semantic Evaluation: Schema-Compliant Workflow
2+
3+
Using the d4d-rubric10-semantic agent, evaluate the 4 claudecode_agent concatenated D4D files with **strict schema compliance** to ensure consistent JSON structure and accurate scoring.
4+
5+
## ⚠️ Critical Requirements
6+
7+
**All evaluations MUST conform to the JSON schema:**
8+
- **Schema**: `src/download/prompts/rubric10_semantic_schema.json`
9+
- **Post-processing**: Score corrections via `scripts/fix_evaluation_scores.py` (REQUIRED)
10+
- **Validation**: All outputs validated against schema before HTML generation
11+
12+
## Scope
13+
14+
**Concatenated D4D files (claudecode_agent method only):**
15+
- `data/d4d_concatenated/claudecode_agent/AI_READI_d4d.yaml`
16+
- `data/d4d_concatenated/claudecode_agent/CHORUS_d4d.yaml`
17+
- `data/d4d_concatenated/claudecode_agent/CM4AI_d4d.yaml`
18+
- `data/d4d_concatenated/claudecode_agent/VOICE_d4d.yaml`
19+
20+
## For Each File
21+
22+
### 1. Read the D4D YAML File
23+
```python
24+
import yaml
25+
with open(f"data/d4d_concatenated/claudecode_agent/{project}_d4d.yaml") as f:
26+
d4d_data = yaml.safe_load(f)
27+
```
28+
29+
### 2. Apply Rubric10-Semantic Quality Assessment
30+
31+
**Standard**: 10 elements, 50 sub-elements (binary 0/1 scoring, max 50 points)
32+
33+
**Semantic Enhancements**:
34+
- Correctness validation (DOI/grant/RRID formats)
35+
- Consistency checking (dates, affiliations, funding)
36+
- Semantic understanding (content accuracy)
37+
- URL validity checking
38+
39+
### 3. Generate Schema-Compliant JSON Evaluation
40+
41+
**REQUIRED Top-Level Fields:**
42+
```json
43+
{
44+
"rubric": "rubric10-semantic",
45+
"version": "1.0",
46+
"d4d_file": "data/d4d_concatenated/claudecode_agent/{PROJECT}_d4d.yaml",
47+
"project": "{PROJECT}",
48+
"method": "claudecode_agent",
49+
"evaluation_timestamp": "2025-12-23T21:45:00Z",
50+
"model": {
51+
"name": "claude-sonnet-4-5-20250929",
52+
"temperature": 0.0,
53+
"evaluation_type": "semantic_llm_judge"
54+
},
55+
"summary_scores": {
56+
"total_score": 0,
57+
"total_max_score": 50,
58+
"overall_percentage": 0.0,
59+
"grade": "A+"
60+
},
61+
"element_scores": [...],
62+
"semantic_analysis": {...},
63+
"recommendations": [...]
64+
}
65+
```
66+
67+
**Element Scores Structure** (10 elements, each with 5 sub-elements):
68+
```json
69+
"element_scores": [
70+
{
71+
"id": 1,
72+
"name": "Dataset Discovery and Identification",
73+
"description": "Can a user or system discover and uniquely identify this dataset?",
74+
"sub_elements": [
75+
{
76+
"name": "Persistent Identifier (DOI, RRID, or URI)",
77+
"score": 1,
78+
"evidence": "doi: https://doi.org/...",
79+
"quality_note": "DOI present and properly formatted",
80+
"semantic_validation": "DOI format validated"
81+
}
82+
// ... 4 more sub-elements
83+
],
84+
"element_score": 5,
85+
"element_max": 5
86+
}
87+
// ... 9 more elements
88+
]
89+
```
90+
91+
**Semantic Analysis Structure:**
92+
```json
93+
"semantic_analysis": {
94+
"issues_detected": [
95+
{
96+
"type": "format_error",
97+
"severity": "warning",
98+
"description": "...",
99+
"fields_involved": ["doi"],
100+
"recommendation": "..."
101+
}
102+
],
103+
"consistency_checks": {
104+
"passed": ["dates_chronological"],
105+
"failed": ["funding_grant_mismatch"],
106+
"warnings": ["missing_orcid"]
107+
},
108+
"correctness_validations": {
109+
"doi_format": "valid",
110+
"grant_format": "valid",
111+
"rrid_format": "missing",
112+
"url_validity": "all_valid"
113+
},
114+
"semantic_insights": "..."
115+
}
116+
```
117+
118+
### 4. Save Evaluation JSON
119+
**Output**: `data/evaluation_llm/rubric10_semantic/concatenated/{PROJECT}_claudecode_agent_evaluation.json`
120+
121+
## After Evaluating All 4 Files
122+
123+
### 5. Run Score Fixing Script (REQUIRED)
124+
125+
```bash
126+
poetry run python scripts/fix_evaluation_scores.py \
127+
--input-dir data/evaluation_llm/rubric10_semantic/concatenated
128+
```
129+
130+
**What This Does:**
131+
- Recalculates `summary_scores.total_score` by summing all sub-element scores
132+
- Fixes `summary_scores.overall_percentage`
133+
- Corrects any LLM calculation errors
134+
- Adds any missing required fields
135+
136+
### 6. Validate Against Schema
137+
138+
```bash
139+
poetry run python scripts/validate_evaluation_schema.py \
140+
data/evaluation_llm/rubric10_semantic/concatenated/*_claudecode_agent_evaluation.json \
141+
--schema src/download/prompts/rubric10_semantic_schema.json
142+
```
143+
144+
**All 4 files MUST validate successfully before proceeding.**
145+
146+
### 7. Generate HTML for Each Evaluation
147+
148+
```bash
149+
poetry run python scripts/render_evaluation_html_rubric10_semantic.py \
150+
data/evaluation_llm/rubric10_semantic/concatenated/ \
151+
data/d4d_html/concatenated/claudecode_agent/
152+
```
153+
154+
**Output Files:**
155+
- `data/d4d_html/concatenated/claudecode_agent/AI_READI_evaluation.html`
156+
- `data/d4d_html/concatenated/claudecode_agent/CHORUS_evaluation.html`
157+
- `data/d4d_html/concatenated/claudecode_agent/CM4AI_evaluation.html`
158+
- `data/d4d_html/concatenated/claudecode_agent/VOICE_evaluation.html`
159+
160+
### 8. Generate Summary Report
161+
162+
**Create**: `data/evaluation_llm/rubric10_semantic/concatenated/summary_report.md`
163+
164+
**Contents:**
165+
- Comparison table showing all 4 projects
166+
- Element-level performance breakdown
167+
- Semantic analysis highlights (issues by type)
168+
- Common strengths and weaknesses
169+
- Key insights and recommendations
170+
171+
## Settings
172+
173+
- **Temperature**: 0.0 (fully deterministic)
174+
- **Model**: claude-sonnet-4-5-20250929
175+
- **Agent**: d4d-rubric10-semantic
176+
- **Rubric**: `data/rubric/rubric10.txt` + semantic enhancements
177+
- **Schema**: `src/download/prompts/rubric10_semantic_schema.json`
178+
179+
## Expected Output Structure
180+
181+
```
182+
data/evaluation_llm/rubric10_semantic/concatenated/
183+
├── AI_READI_claudecode_agent_evaluation.json ✅ Schema-compliant, scores corrected
184+
├── CHORUS_claudecode_agent_evaluation.json ✅ Schema-compliant, scores corrected
185+
├── CM4AI_claudecode_agent_evaluation.json ✅ Schema-compliant, scores corrected
186+
├── VOICE_claudecode_agent_evaluation.json ✅ Schema-compliant, scores corrected
187+
└── summary_report.md
188+
189+
data/d4d_html/concatenated/claudecode_agent/
190+
├── AI_READI_evaluation.html (updated with corrected scores)
191+
├── CHORUS_evaluation.html (updated with corrected scores)
192+
├── CM4AI_evaluation.html (updated with corrected scores)
193+
└── VOICE_evaluation.html (updated with corrected scores)
194+
```
195+
196+
## Quality Checklist
197+
198+
Before considering evaluation complete, verify:
199+
200+
- [ ] All 4 JSON files conform to `rubric10_semantic_schema.json`
201+
- [ ] `fix_evaluation_scores.py` completed successfully
202+
- [ ] All 4 files pass schema validation
203+
- [ ] HTML files generated without errors
204+
- [ ] HTML displays correct scores (matching `summary_scores.total_score`)
205+
- [ ] `summary_report.md` created with all sections
206+
- [ ] No "null" values in required fields
207+
- [ ] All sub-element scores are 0 or 1 (binary)
208+
- [ ] `element_score` equals sum of sub-element scores
209+
- [ ] `summary_scores.total_score` equals sum of all element scores
210+
211+
## Key Field Names (CRITICAL)
212+
213+
**Use EXACTLY these field names:**
214+
215+
**CORRECT**: `summary_scores` (with `total_score`, `total_max_score`, `overall_percentage`)
216+
**WRONG**: `overall_scores`, `overall_score`, `overall_summary`, `overall_assessment`
217+
218+
**CORRECT**: `element_scores` (array of 10 elements)
219+
**WRONG**: `elements`, `element_evaluations`
220+
221+
**CORRECT**: `sub_elements` (array of 5 per element)
222+
**WRONG**: `sub_element_scores`, `subelements`
223+
224+
**Note**: Using incorrect field names will cause HTML generation to fail.
225+
226+
## References
227+
228+
- **Schema**: `src/download/prompts/rubric10_semantic_schema.json`
229+
- **Output Example**: `src/download/prompts/rubric10_output_format.json`
230+
- **Rubric**: `data/rubric/rubric10.txt`
231+
- **HTML Renderer**: `scripts/render_evaluation_html_rubric10_semantic.py`
232+
- **Fix Script**: `scripts/fix_evaluation_scores.py`
233+
- **Issue Report**: `RUBRIC10_ISSUES_REPORT.md` (why regeneration needed)
234+
- **Full Documentation**: `RUBRIC10_UPDATED_PROMPT.md`
235+
236+
---
237+
238+
Process all 4 files systematically and generate structured, schema-compliant evaluation outputs with semantic analysis for the latest improved D4D datasheets.

0 commit comments

Comments
 (0)