Having observed the LARGE runtime of the commonly used AUPRO implementation—and having suffered through it enough times—I looked into the source code to investigate and improve the implementation. I found The original implementation includes redundant operations and inefficiencies. By removing redundant computations and enabling GPU acceleration, I achieved 3–5× speedup on CPU and 8–38× on GPU. The optimized version is available here: aupro_efficient.py
To ensure the modifications did not compromise the metric’s integrity, I conducted experiments on synthetic masks and predictions with a controlled overlap ratio. The full evaluation is available in aupro_test.ipynb. The table below shows the runtime and corresponding AUPRO values across various input image sizes with a 0.3 overlap ratio:
| Image Size | Implementation | Execution Time (s) | AUPRO Value (%) |
|---|---|---|---|
| (256, 256) | Gudovskiy | 34.74 | 29.99 |
| Enhanced-CPU | 6.60 | 29.99 | |
| Enhanced-GPU | 4.43 | 29.99 | |
| (512, 512) | Gudovskiy | 90.15 | 29.99 |
| Enhanced-CPU | 31.16 | 29.99 | |
| Enhanced-GPU | 5.41 | 29.99 | |
| (1024, 1024) | Gudovskiy | 346.42 | 29.99 |
| Enhanced-CPU | 121.40 | 29.99 | |
| Enhanced-GPU | 9.02 | 29.99 |
In search for exisiting implementations, I only encountered the MVTec-3D evaluation code, which demonstrated significantly faster runtime—particularly for smaller images and batches. However, I observed that its AUPRO outputs were inconsistent compared to the Gudovskiy baseline.
I integrated this implementation into the same evaluation framework and found that while it offers excellent speed, its AUPRO values tend to deviate from expected results—especially at lower image resolutions.
| Image Size | Implementation | Execution Time (s) | AUPRO Value (%) |
|---|---|---|---|
| (256, 256) | MVTec-3D | 0.66 | 20.20 |
| Enhanced-GPU | 4.46 | 29.99 | |
| (512, 512) | MVTec-3D | 1.68 | 27.73 |
| Enhanced-GPU | 5.41 | 29.99 | |
| (1024, 1024) | MVTec-3D | 6.04 | 29.24 |
| Enhanced-GPU | 9.02 | 29.99 |
The MVTec-3D implementation appears to leverage smart optimization strategies, but based on my observations, it seems to contain subtle flaws in its AUPRO computation logic—particularly evident at smaller image sizes. Unfortunately, I haven’t yet had the time to fully trace and resolve these inconsistencies.
In contrast, the enhanced version of the Gudovskiy implementation—especially when GPU-accelerated—offers a good balance of accuracy and performance.
If you know of better implementations, optimization ideas, or interested on further refining the implementation, I would deeply appreciate your collaborations.