09 Mar 18:00

aimetci

2.26.0

8106fb3

Version 2.26.0 Latest

Latest

Full Changelog: 2.15.0...2.26.0

Assets 10

03 Mar 16:21

aimetci

2.25.1

6eaf9c7

Version 2.25.1

2.25.1

Bug fixes and Improvements
- ONNX
  - Fix for encoding propagation for concat layers (5084af3)
- Torch
  - Fix to reduce GPU RAM usage for AdaScale for Qwen 3 VL model (ee3d193)

Assets 8

25 Feb 17:22

aimetci

2.25.0

cb9bb8e

Version 2.25.0

2.25.0

Bug fixes and Improvements
- ONNX
  - Reduced peak CPU memory usage for AdaScale and SeqMSE techniques (28f89a7)
  - Reduced peak CUDA memory usage for AdaScale technique (a29f44f)
  - Added support for Qwen3 VL models in GenAITests (c014961)
  - ONNX-IR based supergroup pattern detection and replacement (9972c1b)
  - Tie concat and interpolation ops by default (a8ac6f4)
- Torch
  - Bug fix for onnx qdq export with control flow ops (ae1abd1)
  - Use Triton kernels by default if available (3adcbee)
  - Introduces block_size parameter to EncodingAnalyzer (e250abd)
  - Always export encodings as uint (ae7d5ef)
  - float4/8 QDQ export support (135a0af)
  - Support loading zero_point_shift with sim.load_encodings() (624ba30)
  - Support built-in quantization of SyncBatchNorm (1e8eceb)

Assets 8

10 Feb 04:59

aimetci

2.24.0

2ec4261

Version 2.24.0

Bug fixes and Improvements
- ONNX
  - Add Windows ARM64 wheel build/test support, distribute Windows ARM64 wheel on GitHub releases (1390b96)
  - Add transpose MatMul support in Sequential MSE (ff7a284)
- Torch
  - Expose block-level AdaScale API (72246db)
  - Improve numerical stability of zero point shifting ([-1.5, -.5, .5, 1.5]) implementation (489f7df)
  - Fix :func:replace_lora_layers_with_quantizable_layers to inherit train/eval flag (af5a82d)
  - Fix SpinQuant evaluation by untying lm_head and embed_tokens prior to loading the state_dict (47f574d)
  - Experimental - Implement Progressive Gradient Scaling (PGS) support for Triton-based quantization kernels (b58b00b)
- Common
  - Fix TFEnhanced incorrectly producing negative scales when encountering empty (size‑0) inputs (ea4af6a)
  - Unpin numpy dependency (8a999a1)
  - Add an alias for referencing the eNPU configuration file (b79611c)

Assets 10

28 Jan 18:36

aimetci

2.23.0

f92bf54

Version 2.23.0

Bug fixes and Improvements
- ONNX
  - Disable per-channel quantization for ConvTranspose ops (9395e32)
  - New top level API for configuring parameter quantization type (a1c197d)
- Torch
  - Enable Torch Dynamo ONNX export (59e0125)
- Common
  - Enable per-channel matmul quantization in config files (7137849)
  - LLM quantization recipes in docs (6561f0e)
  - Fix CUDA discrepancies against CPU wheel (01e7422)

Assets 7

13 Jan 18:00

aimetci

2.22.0

d785b46

Version 2.22.0

Bug fixes and Improvements
- ONNX
  - Allow loading 2.0.0 encoding format to sim (e8cb098)
  - Fix Cast unpacking error (6761a19)
  - Enable exporting non-LPBQ encodings with zero_point shift (7b3cc4c)
  - Implement aimet-onnx LPBQEncoding (5ad7ea6)
- Common
  - Support exporting 1x1 Conv LPBQ to ONNX QDQ (58ce71d)

Assets 9

15 Dec 21:37

aimetci

2.21.0

aa1bd8a

Version 2.21.0

Bug fixes and Improvements
- ONNX
  - Fix IndexError when Conv or Linear layers are reused in the model (65c4b3b)
  - Add optional argument export_int32_bias to aimet-onnx export (3b8e0f0)
  - Unpin PyTorch version in aimet-onnx (d99b6c4)
  - Align NaN handling with ORT CPU Execution Provider (e4c49eb)
  - Fix quantization axis handling for transposed MatMul operations (6ca06d6)
- PyTorch
  - Fix quantization logic to enable input quantizers for layers following ignored layers (80fb4fe_)

Assets 9

02 Dec 21:17

aimetci

2.20.0

492f54a

Version 2.20.0

2.20.0

Bug fixes and Improvements
- Common
  - Update supported python version to >=3.10 (2bc8c94)
  - Repackage aimet_common as alias to aimet_onnx.common or aimet_torch.common (074e85f)
  - Remove Pad op from data movement ops (21cddb6)
- ONNX
  - Export data movement op output encoding in sim.export by default (550c029)
  - Assign generic node names if node name is missing or duplicate (273dd82)
  - Add PyTorch Pad modules to nn.Module -> onnx op mapping (7e5342b)
  - Add LSTM cell state int32 quantization mechanism for LPAI (3a8659b)
  - Support stacked RNN/GRU/LSTM (552ad83)
  - Make exclude/include node argument naming consistent (ec22d86)
  - Implement LPBQ support in aimet-onnx SeqMSE (495567f)
  - Add support for dilation, grouping, stride to Quantized Conv (f94f3e2)
  - Remove block type from adascale config (b55b058)
  - Skip tying concat encoding if input has multiple consumers (3136828)
  - Tie quantizers upstream first and downstream later (59aac3e)
  - Fix ValidationError in LazyExtractor when external files are missing or inconsistent (a8f32fc)
  - Align torch and onnx GenAI recipes (7d4659d)
- Torch
  - Use separate input quantizer for each concat input (755c54a)
  - Add predict and fallback later approach for batched matmul in aimet-torch seq mse (8874173)
  - Refactored MMP to not use rounding mode (fd7e40d)
  - Use tuple for strided slice indexing (4ddbd66)
  - Fix symmetry bug in _from_qnn_encoding_dict (35602ea)
  - Align onnx 1.0.0 BQ encoding export ordering with QAIRT expectation (0182b7a)

Assets 9

19 Nov 17:33

aimetci

2.19.0

99f1d2e

Version 2.19.0

New Features
Bug fixes and Improvements
- ONNX
  - Make LiteMP API percentage float (69f96ff)
  - Set layernorm int16 weight to symmetric by default (8560e13)
  - Automatically insert data movement op output qdq during to_onnx_qdq (15c8b9b)
  - Create LazyExtractor to handle external data for onnx Extractor utils (104e7e8)
  - Tie input/output encodings across maximum Concat subgraph (832ea91)
  - Tie hidden state quantizers of RNN/GRU/LSTM (c18fd05)
- Torch
  - Fix histogram observer rebinning logic (2c88364)
  - Fix connectedgraph input ordering for non-trivial layer types (2b7b548)
- Common
  - Disable per-channel quantization of RNN/GRU/LSTM for all HTP backends (df8b875)

Assets 7

06 Nov 16:32

aimetci

2.18.0

c34056c

Version 2.18.0

New Features
- Torch
  - Promoted aimettorch.onnx.export and QuantizationSimModel.onnx.export as production APIs (99160d2, e026fd1)
  - Added utility functions to exclude some or all unknown nn.Modules from quantization (5a419f3, 501eebd)
Bug fixes and Improvements
- ONNX
  - Fixed supergroup misidentification bug upon MatMul-MatMul-Add sequence (ab63866)
- Torch
  - Made compatible with PyTorch 1.13 (47fae94)
  - Made compatible with PyTorch 2.9 (283ecc1)
- Common
  - Set priority among supergroups (6676a6c)

Assets 7

Releases: quic/aimet

Version 2.26.0

Uh oh!

Version 2.25.1

2.25.1

Uh oh!

Version 2.25.0

2.25.0

Uh oh!

Version 2.24.0

Uh oh!

Version 2.23.0

Uh oh!

Version 2.22.0

Uh oh!

Version 2.21.0

Uh oh!

Version 2.20.0

Uh oh!

Version 2.19.0

Uh oh!

Version 2.18.0

Uh oh!