Commit 4e70f41
committed
Add evaluation score fix script and correct all rubric20-semantic scores
PROBLEM: LLM evaluator generated JSON files with math errors in overall_score.
The overall_score.total_points was systematically lower than the sum of all
question scores. Additionally, some files had missing required fields (rubric,
version, d4d_file, project, method, evaluation_timestamp, model).
SOLUTION: Created scripts/fix_evaluation_scores.py to:
1. Recalculate overall_score by summing all question scores
2. Fix null/missing required schema fields
3. Infer project/method from filename if missing
4. Add placeholder timestamps and model info
CORRECTED SCORES (all increased):
- AI_READI: 79/84 (94.0%) → 82/84 (97.6%) [+3 points]
- CHORUS: 71/84 (84.5%) → 78/84 (92.9%) [+7 points]
- CM4AI: 77/84 (91.7%) → 82/84 (97.6%) [+5 points]
- VOICE: 81/84 (96.4%) → 84/84 (100.0%) [+3 points] 🎉 PERFECT SCORE!
SCHEMA COMPLIANCE:
- All 4 evaluations now pass rubric20-semantic schema validation
- Fixed missing fields: rubric, version, d4d_file, project, method,
evaluation_timestamp, model
REGENERATED HTML:
- Updated all v5 evaluation HTML files with corrected scores
- HTML now displays accurate scores matching question-level assessments
This script should be run after any LLM evaluation generation to ensure
score accuracy and schema compliance.1 parent 69a08df commit 4e70f41
File tree
17 files changed
+5380
-72
lines changed- data
- d4d_html/concatenated/claudecode_agent
- evaluation_llm/rubric20_semantic/concatenated
- docs/html_output
- scripts
17 files changed
+5380
-72
lines changedLines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
377 | 377 | | |
378 | 378 | | |
379 | 379 | | |
380 | | - | |
381 | | - | |
| 380 | + | |
| 381 | + | |
382 | 382 | | |
383 | 383 | | |
384 | 384 | | |
| |||
949 | 949 | | |
950 | 950 | | |
951 | 951 | | |
952 | | - | |
| 952 | + | |
953 | 953 | | |
954 | 954 | | |
955 | 955 | | |
| |||
Lines changed: 9 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
351 | 351 | | |
352 | 352 | | |
353 | 353 | | |
354 | | - | |
| 354 | + | |
355 | 355 | | |
356 | 356 | | |
357 | 357 | | |
358 | | - | |
| 358 | + | |
359 | 359 | | |
360 | 360 | | |
361 | 361 | | |
362 | | - | |
| 362 | + | |
363 | 363 | | |
364 | 364 | | |
365 | 365 | | |
366 | 366 | | |
367 | 367 | | |
368 | 368 | | |
369 | 369 | | |
370 | | - | |
| 370 | + | |
371 | 371 | | |
372 | 372 | | |
373 | 373 | | |
374 | | - | |
| 374 | + | |
375 | 375 | | |
376 | 376 | | |
377 | 377 | | |
378 | 378 | | |
379 | 379 | | |
380 | | - | |
381 | | - | |
| 380 | + | |
| 381 | + | |
382 | 382 | | |
383 | 383 | | |
384 | 384 | | |
| |||
958 | 958 | | |
959 | 959 | | |
960 | 960 | | |
961 | | - | |
| 961 | + | |
962 | 962 | | |
963 | 963 | | |
964 | 964 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
377 | 377 | | |
378 | 378 | | |
379 | 379 | | |
380 | | - | |
381 | | - | |
| 380 | + | |
| 381 | + | |
382 | 382 | | |
383 | 383 | | |
384 | 384 | | |
| |||
1058 | 1058 | | |
1059 | 1059 | | |
1060 | 1060 | | |
1061 | | - | |
| 1061 | + | |
1062 | 1062 | | |
1063 | 1063 | | |
1064 | 1064 | | |
| |||
Lines changed: 9 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
351 | 351 | | |
352 | 352 | | |
353 | 353 | | |
354 | | - | |
| 354 | + | |
355 | 355 | | |
356 | 356 | | |
357 | 357 | | |
358 | | - | |
| 358 | + | |
359 | 359 | | |
360 | 360 | | |
361 | 361 | | |
362 | | - | |
| 362 | + | |
363 | 363 | | |
364 | 364 | | |
365 | 365 | | |
366 | 366 | | |
367 | 367 | | |
368 | 368 | | |
369 | 369 | | |
370 | | - | |
| 370 | + | |
371 | 371 | | |
372 | 372 | | |
373 | 373 | | |
374 | | - | |
| 374 | + | |
375 | 375 | | |
376 | 376 | | |
377 | 377 | | |
378 | 378 | | |
379 | 379 | | |
380 | | - | |
381 | | - | |
| 380 | + | |
| 381 | + | |
382 | 382 | | |
383 | 383 | | |
384 | 384 | | |
| |||
1058 | 1058 | | |
1059 | 1059 | | |
1060 | 1060 | | |
1061 | | - | |
| 1061 | + | |
1062 | 1062 | | |
1063 | 1063 | | |
1064 | 1064 | | |
| |||
Lines changed: 15 additions & 6 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
20 | 23 | | |
21 | 24 | | |
22 | 25 | | |
23 | 26 | | |
24 | 27 | | |
25 | 28 | | |
26 | | - | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
27 | 33 | | |
28 | 34 | | |
29 | 35 | | |
30 | 36 | | |
31 | 37 | | |
32 | 38 | | |
33 | | - | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
34 | 43 | | |
35 | 44 | | |
36 | 45 | | |
| |||
65 | 74 | | |
66 | 75 | | |
67 | 76 | | |
68 | | - | |
69 | | - | |
70 | | - | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
71 | 80 | | |
72 | 81 | | |
73 | 82 | | |
| |||
Lines changed: 26 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
20 | 22 | | |
21 | 23 | | |
22 | 24 | | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
26 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
27 | 31 | | |
28 | 32 | | |
29 | 33 | | |
30 | 34 | | |
31 | 35 | | |
32 | 36 | | |
33 | | - | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
34 | 40 | | |
35 | 41 | | |
36 | 42 | | |
37 | 43 | | |
38 | 44 | | |
39 | 45 | | |
40 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
41 | 49 | | |
42 | 50 | | |
43 | 51 | | |
| |||
66 | 74 | | |
67 | 75 | | |
68 | 76 | | |
69 | | - | |
70 | | - | |
71 | | - | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
72 | 80 | | |
73 | 81 | | |
74 | 82 | | |
| |||
369 | 377 | | |
370 | 378 | | |
371 | 379 | | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
372 | 391 | | |
373 | 392 | | |
Lines changed: 17 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
19 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
20 | 23 | | |
21 | 24 | | |
22 | 25 | | |
23 | 26 | | |
24 | 27 | | |
25 | 28 | | |
26 | | - | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
27 | 33 | | |
28 | 34 | | |
29 | 35 | | |
30 | 36 | | |
31 | 37 | | |
32 | 38 | | |
33 | | - | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
34 | 42 | | |
35 | 43 | | |
36 | 44 | | |
37 | 45 | | |
38 | 46 | | |
39 | 47 | | |
40 | | - | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
41 | 51 | | |
42 | 52 | | |
43 | 53 | | |
| |||
66 | 76 | | |
67 | 77 | | |
68 | 78 | | |
69 | | - | |
70 | | - | |
71 | | - | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
72 | 82 | | |
73 | 83 | | |
74 | 84 | | |
| |||
0 commit comments