Skip to content

Red / green marker for evaluation#424

Merged
christianabbet merged 8 commits intomainfrom
feat/issue-398/red-green-markers-evaluation
Mar 30, 2026
Merged

Red / green marker for evaluation#424
christianabbet merged 8 commits intomainfrom
feat/issue-398/red-green-markers-evaluation

Conversation

@christianabbet
Copy link
Copy Markdown
Collaborator

@christianabbet christianabbet commented Mar 18, 2026

Close #398

Description

Introduces evaluate_single_prediction to run per-file evaluation on the fly as predictions are produced, enabling red/green markers in drawn output without waiting for the full pipeline to complete.

Also refactors how GroundTruth is constructed and passed through the pipeline. Itt is now built once and passed as an object rather than passing the path repeatedly.

Design rationale

Per-file evaluation is needed only for drawing (red/green markers). At the end of the script, evaluate_all_predictions re-runs over all files to compute overall metrics. This avoids changing the return type of the extraction pipeline to an already-evaluated object. This would require modifying the core logic shared by both the metadata and geology pipelines, resulting in heavy changes across both and introduce potential bugs.

Output example

Screenshot 2026-03-18 at 17 21 59

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 18, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
src/extraction
   runner.py2125773%44, 85–93, 120–125, 238–267, 286–299, 356–357, 460–472, 489–491, 503, 529, 571–574
src/extraction/evaluation/benchmark
   ground_truth.py24292%56–57
   score.py873856%37, 66–72, 116–128, 140–166, 171–188, 193
src/extraction/features/predictions
   overall_file_predictions.py35683%74–81
   predictions.py1223670%81, 142–148, 169–189, 218–235, 239–246
TOTAL6496161575% 

Tests Skipped Failures Errors Time
339 0 💤 0 ❌ 0 🔥 2m 58s ⏱️

@christianabbet christianabbet marked this pull request as ready for review March 19, 2026 09:35
Copy link
Copy Markdown
Collaborator

@AgathaSchmidt AgathaSchmidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, works well.

Only one note: When looking at this files and evaluating it we should have in mind that when no ground truth is available, it is also marked as red. e.g. no material description available for Thurgau, so when we see that everything is red, we should not misinterpret it.

Copy link
Copy Markdown
Collaborator

@stijnvermeeren-swisstopo stijnvermeeren-swisstopo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The visualizations look good again, but I think that the code does not yet separate the evaluation of individual files and the aggregation of statistics over all files cleanly enough...

Copy link
Copy Markdown
Collaborator

@stijnvermeeren-swisstopo stijnvermeeren-swisstopo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating the follow-up issue. This PR can be merged like this as far as I'm concerned :).

@christianabbet christianabbet merged commit f003761 into main Mar 30, 2026
3 checks passed
@christianabbet christianabbet deleted the feat/issue-398/red-green-markers-evaluation branch March 30, 2026 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Restore green/red evaluation markers in PDF output

3 participants