AI-driven WHO 2021 classification of gliomas based only on H&E-stained slides

The WHO 2021 classification criteria for adult-type diffuse glioma integrate histology with molecular profiling for conclusive diagnosis. Since molecular profiling can be expensive and time-consuming, often necessitating outsourcing or leading to the "not otherwise specified (NOS) label," this study develops an AI-driven WHO 2021 classification of gliomas solely from H&E whole-slide images (WSIs).

Environment

Pre-requisites

Linux (Tested on Ubuntu 22.04)
NVIDIA GPU (Tested on Nvidia A6000/A100)

Next, use the environment configuration file to create a conda environment:

conda env create -n glioma_subtyping -f glioma_subtyping.yaml

Activate the environment:

conda activate glioma_subtyping

WSI Patching and Curation

data/wsi/<DATASET>
	├── patient_1_slide_a.svs
	├── patient_1_slide_b.svs
	└── ...
data/wsi/<DATASET>
	├── patient_2_slide_a.svs
	├── patient_2_slide_b.svs
	└── ...

🛠 Workflow Logic

The pipeline automatically adjusts the extraction scale and file paths based on the input magnification. This ensures that the physical area covered by a patch remains consistent or follows your specific protocol.

Input Argument	Target Data Directory	Output Directory	Patch Size	Down Sample Level
`40x`	`data/slides_40x/`	`data/slides_patches/`	512	`2`
`20x`	`data/slides_20x/`	`data/slides_patches/`	256	`1`

Patch Creation at Multiple Magnifications

Some datasets provide Whole Slide Images (WSIs) with pyramid downsample levels
1, 2, 4, 8, 16.
We map these levels to effective optical magnifications as follows:

Pyramid Level	Effective Magnification	Openslide Level
1	40x (native resolution)	0
2	20x	1
4	10x	2
8	5x	3
16	2.5x	4

Based on this mapping, patches are extracted using the following conventions:

20x / 10x / 5x / 2.5x
- step_size = 256
- patch_size = 256
40x
- step_size = 512
- patch_size = 512

Create Patches Script

The create_patches.sh script takes the following arguments:

./create_patches.sh <DATASET> <MAG> <PATCH_SIZE> <PATCH_LEVEL>

Examples: 
./script/create_patches.sh tcga 20x 256 0 #20x (Pyramid Level = 2)
./script/create_patches.sh tcga 10x 256 1 #10x (Pyramid Level = 4)
./script/create_patches.sh tcga 5x 256 2 #5x (Pyramid Level = 8)
./script/create_patches.sh tcga 2.5x 256 3 #2.5x (Pyramid Level = 16)

Output Directory Structure

data/patches/<DATASET>/<MAG>/
	├── masks
    		├── patient_1_slide_a.png
    		├── patient_1_slide_b.png
    		└── ...
	├── patches
    		├── patient_1_slide_a.h5
    		├── patient_1_slide_b.h5
    		└── ...
	├── stitches
    		├── patient_1_slide_a.png
    		├── patient_1_slide_b.png
    		└── ...
	└── slides_processed.csv

🧹 Patch Cleanup (Step 2)

After initial patching, the pipeline runs a Cleanup Script to filter out low-quality tiles. (if needed)

Filtering Criteria:

White Space: Patches with >85% background are removed.
Stain Detection: Uses HED (Hematoxylin-Eosin-DAB) color deconvolution to ensure tissue is actually present.
HSV Filtering: Removes blurry or out-of-focus areas based on saturation and value thresholds.

After patch extraction, a cleanup step is performed to remove invalid or unused patches based on the patching magnification. Run the cleanup script as follows:

python step_2_cleanup.py \
    --wsi_dir "$DATA_DIR" \
    --h5_dir "$COORD_DIR/patches" \
    --csv_path "$COORD_DIR/slides_processed.csv" \
    --patching "$MAG"

Arguments:

--wsi_dir : directory containing the original WSI files
--h5_dir : directory containing extracted patch coordinate .h5 files
--csv_path — CSV file generated during patch creation (slides_processed.csv)
--patching — magnification level used for patch extraction (e.g., 20x, 10x, 5x, 2.5x)

Creating Features

This script performs patch-level feature extraction using a selected backbone model.
It supports multiple self-supervised and supervised histopathology encoders and automatically selects the appropriate feature-extraction wrapper.

Usage

./script/extract_features.sh <MAG> <BATCH_SIZE> <CSV_FILE> <BACKBONE> <DATASET>

Example:

chmod +x extract_features.sh
./script/extract_features.sh 20x 128 tcga_2021_who_labels.csv uni tcga

Arguments:

MAG — magnification level (e.g., 20x, 10x, 5x, 2.5x)
BATCH_SIZE — batch size for feature extraction (e.g., 128)
CSV_FILE — dataset CSV file (located in dataset_csv/)
BACKBONE — feature extractor backbone
DATASET — dataset name (e.g., tcga, ebrains, ipd)

Supported Backbones

We support several state-of-the-art self-supervised foundation models for histopathology.
For more details about each model, please refer to the original repositories to request access and follow their specific licensing terms.

ReNet-50 : ImageNet pretrained
CTransPath : https://github.com/Xiyue-Wang/TransPath
Lunit ViT : https://github.com/lunit-io/benchmark-ssl-pathology
UNI : https://github.com/mahmoodlab/UNI
Conch : https://github.com/mahmoodlab/CONCH
Gigapath : https://github.com/prov-gigapath/prov-gigapath
Hibou : https://github.com/HistAI/hibou
Optimus : https://github.com/bioptimus/releases/tree/main/models/h-optimus/v0
Virchow2 : https://huggingface.co/paige-ai/Virchow2

Output Directory Structure

data/features/<BACKBONE>/<DATASET>/<MAG>/
    ├── h5_files/
  │   ├── slide_1.h5
  │   ├── slide_2.h5
  │   └── ...
  └── pt_files/
      ├── slide_1.pt
      ├── slide_2.pt
      └── ...

.h5 files contain patch features with coordinates while .pt files contain serialized tensors for faster downstream training.

🛠 Related Toolboxes

While this repository focuses on specific glioma subtyping TRIDENT provides several large-scale toolkits designed for high-throughput Whole-Slide Image (WSI) processing and benchmarking. TRIDENT is the next-generation successor to toolkits like CLAM, offering a more robust and scalable pipeline for giga-pixel image analysis.

Training the models

🧠 Supported MIL Models

The training script supports the following Multiple Instance Learning (MIL) model architectures. Use any of these as the <MODEL> argument when running train.sh.

Model Name	Description	Original Repository
`mean_mil`	Mean pooling MIL baseline	https://github.com/jakubmonhart/mil_pytorch
`max_mil`	Max pooling MIL baseline	https://github.com/jakubmonhart/mil_pytorch
`att_mil`	Attention-based MIL	https://github.com/AMLab-Amsterdam/AttentionDeepMIL
`trans_mil`	Transformer-based MIL	https://github.com/szc19990412/TransMIL
`clam_sb`	Attention-based MIL with instance clustering	https://github.com/mahmoodlab/CLAM/
`mamba_mil`	Mamba-based state space MIL	https://github.com/isyangshu/MambaMIL
`dsmil`	Dual-Stream MIL	https://github.com/binli123/dsmil-wsi
`wikgmil`	WIKG-MIL graph-based model	https://github.com/WonderLandxD/WiKG/
`rrtmil`	RRT-based MIL architecture	https://github.com/DearCaat/RRT-MIL

Usage Instructions

To run the training script, pass the magnification level and backbone name as arguments:

chmod +x train.sh
./train.sh <BACKBONE> <MODEL> <MAG>

Example:

./train.sh virchow trans_mil 20x
./train.sh uni mamba_mil 10x
./train.sh gigapath wikgmil 5x

To iterate over all the models, as well as backbone along with the magnification:

chmod +x train.sh

# The Triple Loop
for bb in uni imagenet hibou ctranspath lunit conch_v1 gigapath optimus virchow; do
    for model in mean_mil max_mil att_mil trans_mil clam_sb mamba_mil dsmil wikgmil rrtmil; do
        for mag in 20x 10x 5x 2.5x; do
            echo "------------------------------------------------"
            echo "RUNNING: Backbone: $bb | Model: $model | Mag: $mag"
            ./train.sh "$bb" "$model" "$mag"
        done
    done
done

Evaluation

To run the evaluation script, pass the magnification level and backbone name as arguments:

BACKBONES=("uni" "imagenet" "hibou" "ctranspath" "lunit" "conch_v1" "gigapath" "optimus" "virchow")
MODELS=("mean_mil" "max_mil" "att_mil" "trans_mil" "clam_sb" "mamba_mil" "dsmil" "wikgmil" "rrtmil")
MAGS=("20x" "10x" "5x" "2.5x")

chmod +x eval.sh

# The Master Loop
for bb in "${BACKBONES[@]}"; do
    for model in "${MODELS[@]}"; do
        for mag in "${MAGS[@]}"; do
            echo "------------------------------------------------"
            echo "EVALUATING: Backbone: $bb | Model: $model | Mag: $mag"
            ./eval.sh "$model" "$bb" "$mag"
        done
    done
done

Example:

./script/eval.sh mamba_mil uni 20x
./script/eval.sh rrt_mil gigapath 10x
./script/eval.sh att_mil virchow 5x
./script/eval.sh wikgmil optimus 2.5x

Output Directory Structure

eval_results/
└── tcga/              # <--- dataset_LABEL
    └── uni/                      # <--- BACKBONE (Foundation Model)
        └── mamba_mil/            # <--- MODEL (Architecture)
            ├── 20x/              # <--- mag (Magnification)
            │   ├── fold_0.csv    # <--- Slide-level results for Fold 0
            │   ├── fold_1.csv
            │   └── ...
            ├── 10x/
            └── ...

Late Fusion (Multi-Magnification Ensembles)

Late fusion aggregates predictions from multiple magnifications (e.g. 2.5x, 5x, 10x, 20x) by averaging class probabilities at the slide level, followed by argmax to obtain final predictions. This is performed after model inference, using precomputed evaluation CSVs.

To run late fusion across all backbones, models, and datasets, use:

chmod +x script/eval_ensemble.sh
./script/eval_ensemnles.sh who2021

Output Directory Structure

<dataset>_who2021/<BACKBONE>/<MODEL>/
└── 5x_10x_20x/
    ├── EVAL_tcga_2021_tcga_3_class_5x_10x_20x_test/
    │   ├── fold_0.csv
    │   └── ...
    └── EVAL_tcga_idh_5x_10x_20x_eval_results_detailed.csv

Acknowledgement

This codebase is heavily based on CLAM and MambaMIL. We are grateful to the authors for their open-source work.

This code is available for research and non-commercial academic purposes only. Please ensure you review the original repository licensing for any foundation models used, as well as the licensing terms for the two repositories mentioned above.

Citations

Shubham Innani, W Robert Bell, MacLean P Nasrallah, Bhakti Baheti, Spyridon Bakas, AI-driven WHO 2021 classification of gliomas based only on H&E-stained slides, Neuro-Oncology, 2025;, noaf189, https://doi.org/10.1093/neuonc/noaf189

@article{10.1093/neuonc/noaf189,
    author = {Innani, Shubham and Bell, W Robert and Nasrallah, MacLean P and Baheti, Bhakti and Bakas, Spyridon},
    title = {AI-driven WHO 2021 classification of gliomas based only on H\&amp;E-stained slides},
    journal = {Neuro-Oncology},
    pages = {noaf189},
    year = {2025},
    month = {08},
    abstract = {},
    issn = {1522-8517},
    doi = {10.1093/neuonc/noaf189},
    url = {https://doi.org/10.1093/neuonc/noaf189},
    eprint = {https://academic.oup.com/neuro-oncology/advance-article-pdf/doi/10.1093/neuonc/noaf189/64170409/noaf189.pdf},
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.vscode		.vscode
dataset		dataset
dataset_csv		dataset_csv
dataset_modules		dataset_modules
models		models
modules		modules
presets		presets
scripts		scripts
splits		splits
utils		utils
vis_utils		vis_utils
wsi_core		wsi_core
.gitignore		.gitignore
README.md		README.md
ResNet.py		ResNet.py
build_preset.py		build_preset.py
commands.sh		commands.sh
create_heatmaps.py		create_heatmaps.py
create_patches_fp.py		create_patches_fp.py
ensemble_script.py		ensemble_script.py
eval.py		eval.py
eval_clam.py		eval_clam.py
extract_features_fp.py		extract_features_fp.py
extract_features_fp_hibou.py		extract_features_fp_hibou.py
extract_features_fp_virchow.py		extract_features_fp_virchow.py
fig1a.jpg		fig1a.jpg
fig1b.jpg		fig1b.jpg
glioma_subtyping.yaml		glioma_subtyping.yaml
main.py		main.py
main_clam.py		main_clam.py
plotting.py		plotting.py
splitting.py		splitting.py
step_1a_patch_cleaning.py		step_1a_patch_cleaning.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-driven WHO 2021 classification of gliomas based only on H&E-stained slides

Environment

Pre-requisites

WSI Patching and Curation

🛠 Workflow Logic

Patch Creation at Multiple Magnifications

Create Patches Script

Output Directory Structure

🧹 Patch Cleanup (Step 2)

Filtering Criteria:

Creating Features

Usage

Supported Backbones

Output Directory Structure

🛠 Related Toolboxes

Training the models

🧠 Supported MIL Models

Usage Instructions

Evaluation

Output Directory Structure

Late Fusion (Multi-Magnification Ensembles)

Output Directory Structure

Acknowledgement

Citations

About

Uh oh!

Releases

Packages

Languages

IUCompPath/glioma-subtyping

Folders and files

Latest commit

History

Repository files navigation

AI-driven WHO 2021 classification of gliomas based only on H&E-stained slides

Environment

Pre-requisites

WSI Patching and Curation

🛠 Workflow Logic

Patch Creation at Multiple Magnifications

Create Patches Script

Output Directory Structure

🧹 Patch Cleanup (Step 2)

Filtering Criteria:

Creating Features

Usage

Supported Backbones

Output Directory Structure

🛠 Related Toolboxes

Training the models

🧠 Supported MIL Models

Usage Instructions

Evaluation

Output Directory Structure

Late Fusion (Multi-Magnification Ensembles)

Output Directory Structure

Acknowledgement

Citations

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages