Uranik Berisha • Jens Mehnert • Alexandru Paul Condurache
This is the official PyTorch implementation of our paper "Variance-Based Pruning for Accelerating and Compressing Trained Networks"
Variance-Based Pruning is a simple structured one-shot pruning method for already trained vision backbones. The method measures the variance of intermediate MLP activations, removes neurons with low variance, and folds mean activation statistics back into the pruned network to preserve performance immediately after pruning.
In practice, this repository provides a compact pruning and fine-tuning pipeline for transformer-style vision models such as DeiT, ViT, Swin, and ConvNeXt. The workflow is designed to make post-training compression easy to reproduce: collect activation statistics, rank neurons, prune the selected channels, and recover the remaining accuracy with a short fine-tuning through knowledge distillation.
This project is intended as:
- A research baseline that can be directly built upon.
- An easy-to-implement benchmark for comparative pruning and compression work.
The core pruning logic is intentionally compact and centralized in src/pruner.py.
Only dependencies used by this codebase are listed below:
- Python 3.8
- torch==2.1.2
- torchvision==0.16.2
- timm==1.0.8
- numpy==1.24.4
- tqdm==4.65.0
Install with:
pip install -r requirements.txtThe code expects an ImageFolder-style dataset:
<data_path>/
train/
class_1/
class_2/
...
val/
class_1/
class_2/
...
The script supports loading timm models directly via --model_timm with --pretrained, and can also load local checkpoints via --model_path.
Base template:
python src/main.py \
--model <DeiT|Swin|ConvNeXt|ViT> \
--pretrained \
--model_timm <timm_model_name> \
--data_path <path_to_imagenet_like_dataset> \
--batch_size 32 \
--num_workers 4 \
--seed 0 \
--device cuda \
--num_batches 0 \
--percentage <pruning_ratio> \
--knowledge_distillationTo reproduce the results in the paper use the following commands:
DeiT family commands
# DeiT-T-045-T
python src/main.py --model DeiT --pretrained --model_timm deit_tiny_patch16_224 --data_path ../../datasets/ImageNet --batch_size 32 --num_workers 4 --device cuda --seed 0 --num_batches 0 --percentage 0.45 --knowledge_distillation
# DeiT-S-050-T
python src/main.py --model DeiT --pretrained --model_timm deit_small_patch16_224 --data_path ../../datasets/ImageNet --batch_size 32 --num_workers 4 --device cuda --seed 0 --num_batches 0 --percentage 0.50 --knowledge_distillation
# DeiT-B-055-T
python src/main.py --model DeiT --pretrained --model_timm deit_base_patch16_224 --data_path ../../datasets/ImageNet --batch_size 32 --num_workers 4 --device cuda --seed 0 --num_batches 0 --percentage 0.55 --knowledge_distillation
# DeiT-B-020-T
python src/main.py --model DeiT --pretrained --model_timm deit_base_patch16_224 --data_path ../../datasets/ImageNet --batch_size 32 --num_workers 4 --device cuda --seed 0 --num_batches 0 --percentage 0.20 --knowledge_distillationSwin family commands
# Swin-T-045-T
python src/main.py --model Swin --pretrained --model_timm swin_tiny_patch4_window7_224 --data_path ../../datasets/ImageNet --batch_size 32 --num_workers 4 --device cuda --seed 0 --num_batches 0 --percentage 0.45 --knowledge_distillation
# Swin-S-050-T
python src/main.py --model Swin --pretrained --model_timm swin_small_patch4_window7_224 --data_path ../../datasets/ImageNet --batch_size 32 --num_workers 4 --device cuda --seed 0 --num_batches 0 --percentage 0.50 --knowledge_distillation
# Swin-B-055-T
python src/main.py --model Swin --pretrained --model_timm swin_base_patch4_window7_224 --data_path ../../datasets/ImageNet --batch_size 32 --num_workers 4 --device cuda --seed 0 --num_batches 0 --percentage 0.55 --knowledge_distillation
# Swin-B-020-T
python src/main.py --model Swin --pretrained --model_timm swin_base_patch4_window7_224 --data_path ../../datasets/ImageNet --batch_size 32 --num_workers 4 --device cuda --seed 0 --num_batches 0 --percentage 0.20 --knowledge_distillationConvNeXt family commands
# ConvNeXt-T-045-T
python src/main.py --model ConvNeXt --pretrained --model_timm convnext_tiny --data_path ../../datasets/ImageNet --batch_size 32 --num_workers 4 --device cuda --seed 0 --num_batches 0 --percentage 0.45 --knowledge_distillation
# ConvNeXt-S-050-T
python src/main.py --model ConvNeXt --pretrained --model_timm convnext_small --data_path ../../datasets/ImageNet --batch_size 32 --num_workers 4 --device cuda --seed 0 --num_batches 0 --percentage 0.50 --knowledge_distillation
# ConvNeXt-B-055-T
python src/main.py --model ConvNeXt --pretrained --model_timm convnext_base --data_path ../../datasets/ImageNet --batch_size 32 --num_workers 4 --device cuda --seed 0 --num_batches 0 --percentage 0.55 --knowledge_distillationThis software is a research prototype, developed solely for and published as part of the Variance-Based Pruning publication. It will neither be maintained nor monitored in any way. For questions, please contact us via email.
This project is licensed under the GNU Affero General Public License v3.0. See LICENCE for full terms.
If you find this work useful, please cite:
@inproceedings{berisha2025variance,
title = {Variance-Based Pruning for Accelerating and Compressing Trained Networks},
author = {Berisha, Uranik and Mehnert, Jens and Condurache, Alexandru Paul},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2025}
}