The WHO 2021 classification criteria for adult-type diffuse glioma integrate histology with molecular profiling for conclusive diagnosis. Since molecular profiling can be expensive and time-consuming, often necessitating outsourcing or leading to the "not otherwise specified (NOS) label," this study develops an AI-driven WHO 2021 classification of gliomas solely from H&E whole-slide images (WSIs).
- Linux (Tested on Ubuntu 22.04)
- NVIDIA GPU (Tested on Nvidia A6000/A100)
Next, use the environment configuration file to create a conda environment:
conda env create -n glioma_subtyping -f glioma_subtyping.yamlActivate the environment:
conda activate glioma_subtypingdata/wsi/<DATASET>
├── patient_1_slide_a.svs
├── patient_1_slide_b.svs
└── ...
data/wsi/<DATASET>
├── patient_2_slide_a.svs
├── patient_2_slide_b.svs
└── ...The pipeline automatically adjusts the extraction scale and file paths based on the input magnification. This ensures that the physical area covered by a patch remains consistent or follows your specific protocol.
| Input Argument | Target Data Directory | Output Directory | Patch Size | Down Sample Level |
|---|---|---|---|---|
40x |
data/slides_40x/ |
data/slides_patches/ |
512 | 2 |
20x |
data/slides_20x/ |
data/slides_patches/ |
256 | 1 |
Some datasets provide Whole Slide Images (WSIs) with pyramid downsample levels
1, 2, 4, 8, 16.
We map these levels to effective optical magnifications as follows:
| Pyramid Level | Effective Magnification | Openslide Level |
|---|---|---|
| 1 | 40x (native resolution) | 0 |
| 2 | 20x | 1 |
| 4 | 10x | 2 |
| 8 | 5x | 3 |
| 16 | 2.5x | 4 |
Based on this mapping, patches are extracted using the following conventions:
- 20x / 10x / 5x / 2.5x
step_size = 256patch_size = 256
- 40x
step_size = 512patch_size = 512
The create_patches.sh script takes the following arguments:
./create_patches.sh <DATASET> <MAG> <PATCH_SIZE> <PATCH_LEVEL>
Examples:
./script/create_patches.sh tcga 20x 256 0 #20x (Pyramid Level = 2)
./script/create_patches.sh tcga 10x 256 1 #10x (Pyramid Level = 4)
./script/create_patches.sh tcga 5x 256 2 #5x (Pyramid Level = 8)
./script/create_patches.sh tcga 2.5x 256 3 #2.5x (Pyramid Level = 16)data/patches/<DATASET>/<MAG>/
├── masks
├── patient_1_slide_a.png
├── patient_1_slide_b.png
└── ...
├── patches
├── patient_1_slide_a.h5
├── patient_1_slide_b.h5
└── ...
├── stitches
├── patient_1_slide_a.png
├── patient_1_slide_b.png
└── ...
└── slides_processed.csvAfter initial patching, the pipeline runs a Cleanup Script to filter out low-quality tiles. (if needed)
- White Space: Patches with >85% background are removed.
- Stain Detection: Uses HED (Hematoxylin-Eosin-DAB) color deconvolution to ensure tissue is actually present.
- HSV Filtering: Removes blurry or out-of-focus areas based on saturation and value thresholds.
After patch extraction, a cleanup step is performed to remove invalid or unused patches based on the patching magnification. Run the cleanup script as follows:
python step_2_cleanup.py \
--wsi_dir "$DATA_DIR" \
--h5_dir "$COORD_DIR/patches" \
--csv_path "$COORD_DIR/slides_processed.csv" \
--patching "$MAG"Arguments:
--wsi_dir: directory containing the original WSI files--h5_dir: directory containing extracted patch coordinate.h5files--csv_path— CSV file generated during patch creation (slides_processed.csv)--patching— magnification level used for patch extraction (e.g.,20x,10x,5x,2.5x)
This script performs patch-level feature extraction using a selected backbone model.
It supports multiple self-supervised and supervised histopathology encoders and automatically selects the appropriate feature-extraction wrapper.
./script/extract_features.sh <MAG> <BATCH_SIZE> <CSV_FILE> <BACKBONE> <DATASET>Example:
chmod +x extract_features.sh
./script/extract_features.sh 20x 128 tcga_2021_who_labels.csv uni tcgaArguments:
MAG— magnification level (e.g.,20x,10x,5x,2.5x)BATCH_SIZE— batch size for feature extraction (e.g.,128)CSV_FILE— dataset CSV file (located indataset_csv/)BACKBONE— feature extractor backboneDATASET— dataset name (e.g.,tcga,ebrains,ipd)
We support several state-of-the-art self-supervised foundation models for histopathology.
For more details about each model, please refer to the original repositories to request access and follow their specific licensing terms.
- ReNet-50 : ImageNet pretrained
- CTransPath : https://github.com/Xiyue-Wang/TransPath
- Lunit ViT : https://github.com/lunit-io/benchmark-ssl-pathology
- UNI : https://github.com/mahmoodlab/UNI
- Conch : https://github.com/mahmoodlab/CONCH
- Gigapath : https://github.com/prov-gigapath/prov-gigapath
- Hibou : https://github.com/HistAI/hibou
- Optimus : https://github.com/bioptimus/releases/tree/main/models/h-optimus/v0
- Virchow2 : https://huggingface.co/paige-ai/Virchow2
data/features/<BACKBONE>/<DATASET>/<MAG>/
├── h5_files/
│ ├── slide_1.h5
│ ├── slide_2.h5
│ └── ...
└── pt_files/
├── slide_1.pt
├── slide_2.pt
└── ....h5 files contain patch features with coordinates while .pt files contain serialized tensors for faster downstream training.
While this repository focuses on specific glioma subtyping TRIDENT provides several large-scale toolkits designed for high-throughput Whole-Slide Image (WSI) processing and benchmarking. TRIDENT is the next-generation successor to toolkits like CLAM, offering a more robust and scalable pipeline for giga-pixel image analysis.
The training script supports the following Multiple Instance Learning (MIL) model architectures.
Use any of these as the <MODEL> argument when running train.sh.
| Model Name | Description | Original Repository |
|---|---|---|
mean_mil |
Mean pooling MIL baseline | https://github.com/jakubmonhart/mil_pytorch |
max_mil |
Max pooling MIL baseline | https://github.com/jakubmonhart/mil_pytorch |
att_mil |
Attention-based MIL | https://github.com/AMLab-Amsterdam/AttentionDeepMIL |
trans_mil |
Transformer-based MIL | https://github.com/szc19990412/TransMIL |
clam_sb |
Attention-based MIL with instance clustering | https://github.com/mahmoodlab/CLAM/ |
mamba_mil |
Mamba-based state space MIL | https://github.com/isyangshu/MambaMIL |
dsmil |
Dual-Stream MIL | https://github.com/binli123/dsmil-wsi |
wikgmil |
WIKG-MIL graph-based model | https://github.com/WonderLandxD/WiKG/ |
rrtmil |
RRT-based MIL architecture | https://github.com/DearCaat/RRT-MIL |
To run the training script, pass the magnification level and backbone name as arguments:
chmod +x train.sh
./train.sh <BACKBONE> <MODEL> <MAG>Example:
./train.sh virchow trans_mil 20x
./train.sh uni mamba_mil 10x
./train.sh gigapath wikgmil 5xTo iterate over all the models, as well as backbone along with the magnification:
chmod +x train.sh
# The Triple Loop
for bb in uni imagenet hibou ctranspath lunit conch_v1 gigapath optimus virchow; do
for model in mean_mil max_mil att_mil trans_mil clam_sb mamba_mil dsmil wikgmil rrtmil; do
for mag in 20x 10x 5x 2.5x; do
echo "------------------------------------------------"
echo "RUNNING: Backbone: $bb | Model: $model | Mag: $mag"
./train.sh "$bb" "$model" "$mag"
done
done
doneTo run the evaluation script, pass the magnification level and backbone name as arguments:
BACKBONES=("uni" "imagenet" "hibou" "ctranspath" "lunit" "conch_v1" "gigapath" "optimus" "virchow")
MODELS=("mean_mil" "max_mil" "att_mil" "trans_mil" "clam_sb" "mamba_mil" "dsmil" "wikgmil" "rrtmil")
MAGS=("20x" "10x" "5x" "2.5x")
chmod +x eval.sh
# The Master Loop
for bb in "${BACKBONES[@]}"; do
for model in "${MODELS[@]}"; do
for mag in "${MAGS[@]}"; do
echo "------------------------------------------------"
echo "EVALUATING: Backbone: $bb | Model: $model | Mag: $mag"
./eval.sh "$model" "$bb" "$mag"
done
done
done
Example:
./script/eval.sh mamba_mil uni 20x
./script/eval.sh rrt_mil gigapath 10x
./script/eval.sh att_mil virchow 5x
./script/eval.sh wikgmil optimus 2.5xeval_results/
└── tcga/ # <--- dataset_LABEL
└── uni/ # <--- BACKBONE (Foundation Model)
└── mamba_mil/ # <--- MODEL (Architecture)
├── 20x/ # <--- mag (Magnification)
│ ├── fold_0.csv # <--- Slide-level results for Fold 0
│ ├── fold_1.csv
│ └── ...
├── 10x/
└── ...Late fusion aggregates predictions from multiple magnifications (e.g. 2.5x, 5x, 10x, 20x) by averaging class probabilities at the slide level, followed by argmax to obtain final predictions. This is performed after model inference, using precomputed evaluation CSVs.
To run late fusion across all backbones, models, and datasets, use:
chmod +x script/eval_ensemble.sh
./script/eval_ensemnles.sh who2021<dataset>_who2021/<BACKBONE>/<MODEL>/
└── 5x_10x_20x/
├── EVAL_tcga_2021_tcga_3_class_5x_10x_20x_test/
│ ├── fold_0.csv
│ └── ...
└── EVAL_tcga_idh_5x_10x_20x_eval_results_detailed.csvThis codebase is heavily based on CLAM and MambaMIL. We are grateful to the authors for their open-source work.
This code is available for research and non-commercial academic purposes only. Please ensure you review the original repository licensing for any foundation models used, as well as the licensing terms for the two repositories mentioned above.
Shubham Innani, W Robert Bell, MacLean P Nasrallah, Bhakti Baheti, Spyridon Bakas, AI-driven WHO 2021 classification of gliomas based only on H&E-stained slides, Neuro-Oncology, 2025;, noaf189, https://doi.org/10.1093/neuonc/noaf189
@article{10.1093/neuonc/noaf189,
author = {Innani, Shubham and Bell, W Robert and Nasrallah, MacLean P and Baheti, Bhakti and Bakas, Spyridon},
title = {AI-driven WHO 2021 classification of gliomas based only on H\&E-stained slides},
journal = {Neuro-Oncology},
pages = {noaf189},
year = {2025},
month = {08},
abstract = {},
issn = {1522-8517},
doi = {10.1093/neuonc/noaf189},
url = {https://doi.org/10.1093/neuonc/noaf189},
eprint = {https://academic.oup.com/neuro-oncology/advance-article-pdf/doi/10.1093/neuonc/noaf189/64170409/noaf189.pdf},
}
