Red / green marker for evaluation by christianabbet · Pull Request #424 · swisstopo/swissgeol-boreholes-dataextraction

christianabbet · 2026-03-18T13:20:56Z

Close #398

Description

Introduces evaluate_single_prediction to run per-file evaluation on the fly as predictions are produced, enabling red/green markers in drawn output without waiting for the full pipeline to complete.

Also refactors how GroundTruth is constructed and passed through the pipeline. Itt is now built once and passed as an object rather than passing the path repeatedly.

Design rationale

Per-file evaluation is needed only for drawing (red/green markers). At the end of the script, evaluate_all_predictions re-runs over all files to compute overall metrics. This avoids changing the return type of the extraction pipeline to an already-evaluated object. This would require modifying the core logic shared by both the metadata and geology pipelines, resulting in heavy changes across both and introduce potential bugs.

Output example

github-actions · 2026-03-18T13:25:40Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
src/extraction
runner.py	212	57	73%	44, 85–93, 120–125, 238–267, 286–299, 356–357, 460–472, 489–491, 503, 529, 571–574
src/extraction/evaluation/benchmark
ground_truth.py	24	2	92%	56–57
score.py	87	38	56%	37, 66–72, 116–128, 140–166, 171–188, 193
src/extraction/features/predictions
overall_file_predictions.py	35	6	83%	74–81
predictions.py	122	36	70%	81, 142–148, 169–189, 218–235, 239–246
TOTAL	6496	1615	75%

Tests	Skipped	Failures	Errors	Time
339	0 💤	0 ❌	0 🔥	2m 58s ⏱️

AgathaSchmidt

Looks good to me, works well.

Only one note: When looking at this files and evaluating it we should have in mind that when no ground truth is available, it is also marked as red. e.g. no material description available for Thurgau, so when we see that everything is red, we should not misinterpret it.

src/extraction/evaluation/benchmark/score.py

stijnvermeeren-swisstopo

The visualizations look good again, but I think that the code does not yet separate the evaluation of individual files and the aggregation of statistics over all files cleanly enough...

src/extraction/evaluation/benchmark/score.py

stijnvermeeren-swisstopo

Thanks for creating the follow-up issue. This PR can be merged like this as far as I'm concerned :).

christianabbet added 2 commits March 18, 2026 13:59

first iteration fix

44519a8

load ground truth once

20fe2ce

christianabbet added 4 commits March 18, 2026 14:34

cleaning first iteration

052ada2

typo

93896f8

indent

b2401ea

update typos before final test

9a5f34b

christianabbet marked this pull request as ready for review March 19, 2026 09:35

christianabbet requested review from AgathaSchmidt, letao and stijnvermeeren-swisstopo and removed request for AgathaSchmidt and stijnvermeeren-swisstopo March 19, 2026 09:35

AgathaSchmidt approved these changes Mar 19, 2026

View reviewed changes

src/extraction/evaluation/benchmark/score.py Outdated Show resolved Hide resolved

christianabbet requested a review from stijnvermeeren-swisstopo March 24, 2026 10:32

typo fix

4df2d61

stijnvermeeren-swisstopo requested changes Mar 24, 2026

View reviewed changes

src/extraction/evaluation/benchmark/score.py Outdated Show resolved Hide resolved

src/extraction/evaluation/benchmark/score.py Outdated Show resolved Hide resolved

christianabbet mentioned this pull request Mar 27, 2026

Refactor evaluation pipeline to process files individually #433

Open

addressing comments on single file evaluation and logging

102c758

christianabbet requested a review from stijnvermeeren-swisstopo March 27, 2026 10:25

stijnvermeeren-swisstopo approved these changes Mar 30, 2026

View reviewed changes

christianabbet merged commit f003761 into main Mar 30, 2026
3 checks passed

christianabbet deleted the feat/issue-398/red-green-markers-evaluation branch March 30, 2026 11:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Red / green marker for evaluation#424

Red / green marker for evaluation#424
christianabbet merged 8 commits intomainfrom
feat/issue-398/red-green-markers-evaluation

christianabbet commented Mar 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

AgathaSchmidt left a comment

Uh oh!

Uh oh!

stijnvermeeren-swisstopo left a comment

Uh oh!

Uh oh!

Uh oh!

stijnvermeeren-swisstopo left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

christianabbet commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Design rationale

Output example

Uh oh!

github-actions bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AgathaSchmidt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

stijnvermeeren-swisstopo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

stijnvermeeren-swisstopo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

christianabbet commented Mar 18, 2026 •

edited

Loading

github-actions bot commented Mar 18, 2026 •

edited

Loading