Releases: quic/aimet
Releases · quic/aimet
Version 2.26.0
Full Changelog: 2.15.0...2.26.0
Version 2.25.1
Version 2.25.0
2.25.0
- Bug fixes and Improvements
-
ONNX
- Reduced peak CPU memory usage for AdaScale and SeqMSE techniques (28f89a7)
- Reduced peak CUDA memory usage for AdaScale technique (a29f44f)
- Added support for Qwen3 VL models in GenAITests (c014961)
- ONNX-IR based supergroup pattern detection and replacement (9972c1b)
- Tie concat and interpolation ops by default (a8ac6f4)
-
Torch
- Bug fix for onnx qdq export with control flow ops (ae1abd1)
- Use Triton kernels by default if available (3adcbee)
- Introduces
block_sizeparameter to EncodingAnalyzer (e250abd) - Always export encodings as uint (ae7d5ef)
- float4/8 QDQ export support (135a0af)
- Support loading zero_point_shift with sim.load_encodings() (624ba30)
- Support built-in quantization of SyncBatchNorm (1e8eceb)
-
Version 2.24.0
- Bug fixes and Improvements
-
ONNX
-
Torch
- Expose block-level AdaScale API (72246db)
- Improve numerical stability of zero point shifting ([-1.5, -.5, .5, 1.5]) implementation (489f7df)
- Fix :func:
replace_lora_layers_with_quantizable_layersto inherit train/eval flag (af5a82d) - Fix SpinQuant evaluation by untying lm_head and embed_tokens prior to loading the state_dict (47f574d)
- Experimental - Implement Progressive Gradient Scaling (PGS) support for Triton-based quantization kernels (b58b00b)
-
Common
-
Version 2.23.0
- Bug fixes and Improvements
-
ONNX
-
Torch
- Enable Torch Dynamo ONNX export (59e0125)
-
Common
-
Version 2.22.0
Version 2.21.0
-
Bug fixes and Improvements
-
ONNX
- Fix IndexError when Conv or Linear layers are reused in the model (65c4b3b)
- Add optional argument
export_int32_biasto aimet-onnx export (3b8e0f0) - Unpin PyTorch version in aimet-onnx (d99b6c4)
- Align NaN handling with ORT CPU Execution Provider (e4c49eb)
- Fix quantization axis handling for transposed MatMul operations (6ca06d6)
-
PyTorch
- Fix quantization logic to enable input quantizers for layers following ignored layers (
80fb4fe_)
- Fix quantization logic to enable input quantizers for layers following ignored layers (
-
Version 2.20.0
2.20.0
-
Bug fixes and Improvements
-
Common
-
ONNX
- Export data movement op output encoding in sim.export by default (550c029)
- Assign generic node names if node name is missing or duplicate (273dd82)
- Add PyTorch Pad modules to nn.Module -> onnx op mapping (7e5342b)
- Add LSTM cell state int32 quantization mechanism for LPAI (3a8659b)
- Support stacked RNN/GRU/LSTM (552ad83)
- Make exclude/include node argument naming consistent (ec22d86)
- Implement LPBQ support in aimet-onnx SeqMSE (495567f)
- Add support for dilation, grouping, stride to Quantized Conv (f94f3e2)
- Remove block type from adascale config (b55b058)
- Skip tying concat encoding if input has multiple consumers (3136828)
- Tie quantizers upstream first and downstream later (59aac3e)
- Fix ValidationError in LazyExtractor when external files are missing or inconsistent (a8f32fc)
- Align torch and onnx GenAI recipes (7d4659d)
-
Torch
- Use separate input quantizer for each concat input (755c54a)
- Add predict and fallback later approach for batched matmul in aimet-torch seq mse (8874173)
- Refactored MMP to not use rounding mode (fd7e40d)
- Use tuple for strided slice indexing (4ddbd66)
- Fix symmetry bug in _from_qnn_encoding_dict (35602ea)
- Align onnx 1.0.0 BQ encoding export ordering with QAIRT expectation (0182b7a)
-
Version 2.19.0
-
New Features
-
Bug fixes and Improvements
-
ONNX
- Make LiteMP API percentage float (69f96ff)
- Set layernorm int16 weight to symmetric by default (8560e13)
- Automatically insert data movement op output qdq during to_onnx_qdq (15c8b9b)
- Create LazyExtractor to handle external data for onnx Extractor utils (104e7e8)
- Tie input/output encodings across maximum Concat subgraph (832ea91)
- Tie hidden state quantizers of RNN/GRU/LSTM (c18fd05)
-
Torch
-
Common
- Disable per-channel quantization of RNN/GRU/LSTM for all HTP backends (df8b875)
-
Version 2.18.0
-
New Features
-
Bug fixes and Improvements