Skip to content

Commit 8c32fcb

Browse files
committed
feat: Added evaluation and readme
1 parent 3d7452f commit 8c32fcb

File tree

3 files changed

+188
-29
lines changed

3 files changed

+188
-29
lines changed

README.md

Lines changed: 103 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,124 @@
11
# ScribbleBench
22

33
[![License Apache Software License 2.0](https://img.shields.io/pypi/l/ScribbleBench.svg?color=green)](https://github.com/Karol-G/ScribbleBench/raw/main/LICENSE)
4-
[![PyPI](https://img.shields.io/pypi/v/ScribbleBench.svg?color=green)](https://pypi.org/project/ScribbleBench)
54
[![Python Version](https://img.shields.io/pypi/pyversions/ScribbleBench.svg?color=green)](https://python.org)
6-
[![tests](https://github.com/Karol-G/ScribbleBench/workflows/tests/badge.svg)](https://github.com/Karol-G/ScribbleBench/actions)
7-
![Unit Tests](https://github.com/Karol-G/ScribbleBench/actions/workflows/test_and_deploy.yml/badge.svg?branch=main)
8-
[![codecov](https://codecov.io/gh/Karol-G/ScribbleBench/branch/main/graph/badge.svg)](https://codecov.io/gh/Karol-G/ScribbleBench)
95

10-
Revisiting 3D Medical Scribble Supervision: Benchmarking Beyond Cardiac Segmentation
6+
**ScribbleBench** is a comprehensive benchmark for evaluating the generalization capabilities of 3D scribble-supervised medical image segmentation methods. It spans seven diverse datasets across multiple anatomies and modalities and provides realistic, automatically generated scribble annotations.
117

12-
----------------------------------
8+
This repository provides:
9+
- A guide on how to setup the ScribbleBench benchmark using the original dataset sources and our ScribbleBench scribbles.
10+
- Our scribble generation code to create realistic interior and boundary scribbles heuristics.
11+
- An evaluation script to evaluate your method using ScribbleBench.
12+
- A reference to our scribble baseline nnnUNet+pL
13+
- A scribble annotation protocol for domain experts that can be used as guidance to quickly annotate new datasets manually.
1314

14-
Project description...
15+
ScribbleBench was introduced in our MICCAI 2025 paper:
16+
**“Revisiting 3D Medical Scribble Supervision: Benchmarking Beyond Cardiac Segmentation”**
17+
Authors: Karol Gotkowski, Klaus H. Maier-Hein, Fabian Isensee
1518

16-
## Installation
1719

18-
You can install `ScribbleBench` via [pip](https://pypi.org/project/ScribbleBench/):
20+
## 📦 Benchmark Setup
1921

20-
pip install ScribbleBench
22+
ScribbleBench includes scribbles for the following 7 public datasets:
23+
- ACDC
24+
- MSCMR
25+
- WORD
26+
- AMOS2022 (Task2)
27+
- KiTS23
28+
- LiTS
29+
- BraTS2020
2130

31+
### 📥 Download Datasets
2232

33+
TODO
2334

2435

25-
## Contributing
36+
## 🛠️ Scribble Generation
2637

27-
Contributions are very welcome. Tests can be run with [tox], please ensure
28-
the coverage at least stays the same before you submit a pull request.
38+
You can use our script to generate scribbles for your own 3D medical segmentation datasets. The script supports:
39+
- Interior scribbles using NURBS curves.
40+
- Boundary scribbles based on partial contours.
41+
- Foreground/background slice balancing.
42+
- Multiprocessing for efficient processing of large datasets.
2943

30-
## License
44+
### 🚀 Run Scribble Generation
3145

32-
Distributed under the terms of the [Apache Software License 2.0] license,
33-
"ScribbleBench" is free and open source software
46+
```bash
47+
python generate_scribbles.py \
48+
--input path/to/dense_segmentations \
49+
--output path/to/save_scribbles \
50+
--num_labels 4 \
51+
--conf scribble_conf.yml \
52+
--processes 8
53+
```
3454

35-
## Issues
55+
**Optional arguments:**
3656

37-
If you encounter any problems, please file an issue along with a detailed description.
57+
* `--name` → specify one or more file names to process (omit `.nii.gz`)
58+
* `--disable_ignore` → disables marking unlabeled voxels with an ignore label
3859

39-
[Cookiecutter]: https://github.com/audreyr/cookiecutter
40-
[MIT]: http://opensource.org/licenses/MIT
41-
[BSD-3]: http://opensource.org/licenses/BSD-3-Clause
42-
[GNU GPL v3.0]: http://www.gnu.org/licenses/gpl-3.0.txt
43-
[GNU LGPL v3.0]: http://www.gnu.org/licenses/lgpl-3.0.txt
44-
[Apache Software License 2.0]: http://www.apache.org/licenses/LICENSE-2.0
45-
[Mozilla Public License 2.0]: https://www.mozilla.org/media/MPL/2.0/index.txt
60+
## 📊 Evaluation
4661

47-
[tox]: https://tox.readthedocs.io/en/latest/
48-
[pip]: https://pypi.org/project/pip/
49-
[PyPI]: https://pypi.org/
62+
You can evaluate your segmentation predictions using the provided script:
63+
64+
```bash
65+
python evaluation.py \
66+
--gt_dir path/to/ground_truth \
67+
--pred_dir path/to/predictions \
68+
--num_labels 4 \
69+
--processes 8
70+
```
71+
72+
## 🛠️ Scribble Baseline nnUNet+pL
73+
74+
Our scribble baseline nnUNet+pL is implemented in the [nnU-Net](https://github.com/MIC-DKFZ/nnUNet) framework itself. It is there referred to as "ignore label" and is described [here](https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/ignore_label.md).
75+
76+
## 📋 Scribble Annotation Protocol
77+
78+
You can also manually create your own scribbles for new datasets by following this lightweight annotation protocol. These human-created scribbles can be used directly to train a model using the same methods as with automatically generated ones.
79+
80+
### ✏️ Instructions
81+
82+
Given a 3D image **I** in your dataset:
83+
- For each axial slice **S** in **I**:
84+
- For each class **C** present in slice **S**:
85+
- Select a single **connected component (CC)** of class **C** in **S**
86+
- For that component **CC**, draw:
87+
- One **interior scribble**
88+
- One **boundary scribble**
89+
90+
Note: Do not ignore the background class! Also include a good number of pure background slices.
91+
92+
#### 🟢 Interior Scribble
93+
- Must be drawn **inside the component CC**.
94+
- Should be placed roughly **in and around the center area** of the component.
95+
- Ideal length is **comparable to the diameter or extent** of the component.
96+
- Can be any arbitrary shape (straight, curved, etc.) as long as it lies **fully within the component**.
97+
98+
#### 🔵 Boundary Scribble
99+
- Should trace **a portion (15%–100%)** of the **inner boundary** of the component CC.
100+
- Should ideally follow the actual boundary as closely as possible.
101+
- A **1–3 voxel inward offset** is acceptable, but **closer to the true boundary is better**.
102+
- This scribble helps the model capture **boundary details** during learning.
103+
104+
Following this protocol allows quick and efficient labeling of 3D datasets using just a few sparse lines per class and slice, while maintaining strong training performance.
105+
106+
107+
108+
## 📄 Citation
109+
110+
If you use ScribbleBench or our scribble generation code, please cite:
111+
112+
```bibtex
113+
@inproceedings{gotkowski2025scribblebench,
114+
title = {Revisiting 3D Medical Scribble Supervision: Benchmarking Beyond Cardiac Segmentation},
115+
author = {Karol Gotkowski and Klaus H. Maier-Hein and Fabian Isensee},
116+
booktitle = {International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI)},
117+
year = {2025}
118+
}
119+
```
120+
121+
122+
## 📬 Contact
123+
124+
For questions, suggestions, or contributions, feel free to open an issue or contact [karol.gotkowski@dkfz.de](mailto:karol.gotkowski@dkfz.de).

scribblebench/evaluation.py

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
import numpy as np
2+
import os
3+
from tqdmp import tqdmp
4+
from pathlib import Path
5+
import argparse
6+
from medvol import MedVol
7+
8+
9+
def evaluate(gt_dir, pred_dir, num_classes, num_processes=None):
10+
gt_dir = Path(gt_dir)
11+
pred_dir = Path(pred_dir)
12+
names_gt = [path.name[:-7] for path in Path(gt_dir).rglob("*.nii.gz")]
13+
names_pred = [path.name[:-7] for path in Path(pred_dir).rglob("*.nii.gz")]
14+
15+
if set(names_gt) != set(names_pred):
16+
raise RuntimeError(f"The set of GT segmentations is different to the set of predictions. Do you have missing predictions?")
17+
18+
if isinstance(num_processes, str):
19+
num_processes = int(num_processes)
20+
21+
dice_scores = tqdmp(evaluate_prediction, names_gt, num_processes, gt_dir=gt_dir, pred_dir=pred_dir, num_classes=num_classes, desc="Evaluating")
22+
23+
mean_dice_score = float(np.mean(dice_scores))
24+
25+
print(f"Mean Dice Score: {mean_dice_score}")
26+
27+
28+
def evaluate_prediction(name, gt_dir, pred_dir, num_classes, foreground_only=True):
29+
gt_filepath = gt_dir / f"{name}.nii.gz"
30+
pred_filepath = pred_dir / f"{name}.nii.gz"
31+
if not os.path.exists(pred_filepath):
32+
raise RuntimeError(f"Prediction ({name}) does not exist.")
33+
gt = MedVol(str(gt_filepath)).array
34+
pred = MedVol(str(pred_filepath)).array
35+
gt = np.rint(np.asarray(gt)).astype(np.uint8)
36+
pred = np.rint(np.asarray(pred)).astype(np.uint8)
37+
if gt.shape != pred.shape:
38+
raise RuntimeError("Prediction and GT do not have the same shape.")
39+
gt = gt.flatten()
40+
pred = pred.flatten()
41+
dice_score = comp_dice(pred, gt, num_classes, foreground_only)
42+
return dice_score
43+
44+
45+
def comp_dice(pred, gt, num_classes, foreground_only=True, ignore_mask=None):
46+
class_labels = list(range(num_classes))
47+
if foreground_only:
48+
class_labels = class_labels[1:]
49+
50+
dice_score = []
51+
for label in class_labels:
52+
tp, fp, fn, tn = compute_tp_fp_fn_tn(gt == label, pred == label, ignore_mask)
53+
if tp + fp + fn != 0:
54+
class_dice_score = float(2 * tp / (2 * tp + fp + fn))
55+
else:
56+
class_dice_score = np.nan
57+
dice_score.append(class_dice_score)
58+
59+
dice_score = np.nanmean(dice_score)
60+
dice_score = float(dice_score)
61+
return dice_score
62+
63+
64+
def compute_tp_fp_fn_tn(mask_ref: np.ndarray, mask_pred: np.ndarray, ignore_mask: np.ndarray = None):
65+
if ignore_mask is None:
66+
use_mask = np.ones_like(mask_ref, dtype=bool)
67+
else:
68+
use_mask = ~ignore_mask
69+
tp = np.sum((mask_ref & mask_pred) & use_mask)
70+
fp = np.sum(((~mask_ref) & mask_pred) & use_mask)
71+
fn = np.sum((mask_ref & (~mask_pred)) & use_mask)
72+
tn = np.sum(((~mask_ref) & (~mask_pred)) & use_mask)
73+
return tp, fp, fn, tn
74+
75+
76+
if __name__ == '__main__':
77+
parser = argparse.ArgumentParser()
78+
parser.add_argument('-gt', "--gt_dir", required=True, help="Path to the dense GT segmentations folder.")
79+
parser.add_argument('-pred', "--pred_dir", required=True, help="Path to the dense prediction segmentation folder.")
80+
parser.add_argument('-l', "--num_labels", required=True, type=int, help="The number of segmentation labels.")
81+
parser.add_argument('-p', "--processes", required=False, default=None, help="Number of multiprocessing processes.")
82+
args = parser.parse_args()
83+
84+
evaluate(args.gt_dir, args.gt_dir, args.num_labels)

scribblebench/scribble_generation.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ def generate_scribble_dataset(load_dir, save_dir, num_labels, conf_filepath, num
3333
num_processes = int(num_processes)
3434

3535
if names is None:
36-
names = [path.name[:-7] for path in Path(load_dir).rglob("*.nii.gz")]
36+
names = [path.name[:-7] for path in load_dir.rglob("*.nii.gz")]
3737
elif isinstance(names, str):
3838
names = [names]
3939

0 commit comments

Comments
 (0)