This repository contains the experimental code and reproducibility artifacts for the paper "Fuzzy PyTorch: Rapid Numerical Variability Evaluation for Deep Learning Models".
Fuzzy PyTorch provides tools and methodologies for evaluating numerical variability and floating-point precision effects in deep learning models. This repository contains three main experimental evaluations comparing various floating-point error analysis tools:
- Deep Learning Benchmarks: Performance evaluation on real-world ML models (MNIST, FastSurfer, WavLM)
- Harmonic Series Analysis: Numerical accuracy assessment using mathematical series computations
- NAS Parallel Benchmarks: Performance overhead analysis using standard HPC benchmarks
├── containers/ # Container definitions for reproducible experiments
│ ├── Dockerfile-NPB # NPB benchmarks environment
│ ├── Dockerfile-harmonic # Harmonic series environment
│ ├── Dockerfile-tools # Deep learning environment
│ ├── Dockerfile-fastsurfer # FastSurfer with fuzzy Pytorch container definition (mode needs to be set)
│ ├── Dockerfile-freesurfer # FreeSurfer container definition
│ ├── Dockerfile-pytorch-ud # MNIST/Base fuzzy Pytorch UD container definition
│ ├── Dockerfile-pytorch-sr # MNIST/Base fuzzy Pytorch SR container definition
│ ├── Dockerfile-wavlm # WavLM with fuzzy Pytorch container definition
│ └── NPB/ # NPB source and build scripts
│ └── harmonic/ # Harmonic source and build scripts
│ └── tools/ # Additional scripts
├── experiments/ # Experimental evaluations
│ ├── DL/ # Deep learning performance evaluation
│ │ ├── Fuzzy_PyTorch.ipynb # Main notebook with MNIST and FastSurfer results
│ │ ├── FastSurfer_Use_Case/ # Experiments with FastSurfer models
│ │ │ ├── allsub_fast.txt # List run commands per subject for parallelization
│ │ │ ├── fastsurfer_embeddings.pdf # Embeddings visualization
│ │ │ ├── ieee_subjects.txt # IEEE subjects list
│ │ │ ├── run_fuzzy.sh # Run fuzzy experiments
│ │ │ ├── run_verrou.sh # Run Verrou experiment
│ │ │ ├── subjects.txt # All subjects list
│ │ │ └── verrou_sr_min_dice_scores.csv # Minimum Dice scores from Verrou SR FastSurfer inference
│ │ │ └── verrou_ud_min_dice_scores.csv # Minimum Dice scores from Verrou CESTAC FastSurfer inference
│ │ │ └── fuzzy_sr_min_dice_scores.csv # Minimum Dice scores from Fuzzy SR FastSurfer inference
│ │ │ └── fuzzy_ud_min_dice_scores.csv # Minimum Dice scores from Fuzzy UD FastSurfer inference
│ │ │ └── ieee_min_dice_scores.csv # Minimum Dice scores from IEEE default FastSurfer inference
│ │ ├── MNIST_Use_Case/ # Experiments with MNIST dataset
│ │ │ ├── mnist_test.py # MNIST testing script
│ │ │ ├── run_embedding.sh # Run embedding experiments
│ │ │ └── run_mnist.sh # Run MNIST experiments
│ │ └── WavLM_Use_Case/ # Experiments with WavLM speech model
│ │ │ ├── WavLM.ipynb # Main WavLM notebook
│ │ │ ├── inference_only.py # Script for inference
│ │ │ ├── run_iter.sh # Iterative run script
│ │ │ ├── run_model.sh # Model run script
│ │ │ ├── run_verrou.sh # Run Verrou instrumentation
│ │ │ ├── train.yaml # Training configuration
│ │ ├── figures
│ ├── NPB/ # NAS Parallel Benchmarks analysis
│ └── harmonics/ # Harmonic series numerical analysis
└── README.md # This file
Building Dockerfile-pytorch-ud
or Dockerfile-pytorch-sr
requires the Fuzzy repository and replacing fuzzy/docker/resources/pytorch/pytorch-vfc-exclude.txt with the version found in tools
1. Deep Learning Performance (experiments/DL/)
Evaluates the runtime overhead of floating-point error analysis tools on deep learning workloads including MNIST classification, FastSurfer brain segmentation, and WavLM speech processing.
2. Harmonic Series Analysis (experiments/harmonics/)
Assesses numerical accuracy and convergence properties of harmonic series computations across different floating-point precision analysis methods.
3. NAS Parallel Benchmarks (experiments/NPB/)
Measures performance overhead of various floating-point error analysis tools using the standard NAS Parallel Benchmarks suite (BT, CG, EP, FT, LU, MG, SP).
- Apptainer/Singularity: Container runtime for reproducible execution
- SLURM (optional): For automated batch job execution
- Python 3.x: For analysis notebooks
- Jupyter: For running analysis notebooks
-
Build containers:
cd containers ./build.sh
-
Run experiments: Navigate to each experiment directory and follow the instructions in their respective README files.
-
Generate figures: Execute the Jupyter notebooks in each experiment directory to reproduce the analysis and figures.
The experiments compare the following floating-point error analysis tools:
- IEEE: Standard IEEE 754 floating-point arithmetic
- PRISM: Precision analysis with stochastic rounding variants (SR, UD)
- Verrou: Monte Carlo Arithmetic with CESTAC and SR modes
- CADNA/CESTAC: Control of Accuracy and Debugging for Numerical Applications
- Verificarlo: Monte Carlo Arithmetic with Random Rounding (MCA RR)
- FM SR: Fast Math Stochastic Rounding
If you use this work, please cite:
Fuzzy PyTorch: Rapid Numerical Variability Evaluation for Deep Learning Models
See individual component licenses in their respective directories.