Version 2.25.0
2.25.0
- Bug fixes and Improvements
-
ONNX
- Reduced peak CPU memory usage for AdaScale and SeqMSE techniques (28f89a7)
- Reduced peak CUDA memory usage for AdaScale technique (a29f44f)
- Added support for Qwen3 VL models in GenAITests (c014961)
- ONNX-IR based supergroup pattern detection and replacement (9972c1b)
- Tie concat and interpolation ops by default (a8ac6f4)
-
Torch
- Bug fix for onnx qdq export with control flow ops (ae1abd1)
- Use Triton kernels by default if available (3adcbee)
- Introduces
block_sizeparameter to EncodingAnalyzer (e250abd) - Always export encodings as uint (ae7d5ef)
- float4/8 QDQ export support (135a0af)
- Support loading zero_point_shift with sim.load_encodings() (624ba30)
- Support built-in quantization of SyncBatchNorm (1e8eceb)
-