Skip to content

Releases: quic/aimet

Version 2.26.0

09 Mar 18:00

Choose a tag to compare

Version 2.25.1

03 Mar 16:21

Choose a tag to compare

2.25.1

  • Bug fixes and Improvements
    • ONNX

      • Fix for encoding propagation for concat layers (5084af3)
    • Torch

      • Fix to reduce GPU RAM usage for AdaScale for Qwen 3 VL model (ee3d193)

Version 2.25.0

25 Feb 17:22

Choose a tag to compare

2.25.0

  • Bug fixes and Improvements
    • ONNX

      • Reduced peak CPU memory usage for AdaScale and SeqMSE techniques (28f89a7)
      • Reduced peak CUDA memory usage for AdaScale technique (a29f44f)
      • Added support for Qwen3 VL models in GenAITests (c014961)
      • ONNX-IR based supergroup pattern detection and replacement (9972c1b)
      • Tie concat and interpolation ops by default (a8ac6f4)
    • Torch

      • Bug fix for onnx qdq export with control flow ops (ae1abd1)
      • Use Triton kernels by default if available (3adcbee)
      • Introduces block_size parameter to EncodingAnalyzer (e250abd)
      • Always export encodings as uint (ae7d5ef)
      • float4/8 QDQ export support (135a0af)
      • Support loading zero_point_shift with sim.load_encodings() (624ba30)
      • Support built-in quantization of SyncBatchNorm (1e8eceb)

Version 2.24.0

10 Feb 04:59

Choose a tag to compare

  • Bug fixes and Improvements
    • ONNX

      • Add Windows ARM64 wheel build/test support, distribute Windows ARM64 wheel on GitHub releases (1390b96)
      • Add transpose MatMul support in Sequential MSE (ff7a284)
    • Torch

      • Expose block-level AdaScale API (72246db)
      • Improve numerical stability of zero point shifting ([-1.5, -.5, .5, 1.5]) implementation (489f7df)
      • Fix :func:replace_lora_layers_with_quantizable_layers to inherit train/eval flag (af5a82d)
      • Fix SpinQuant evaluation by untying lm_head and embed_tokens prior to loading the state_dict (47f574d)
      • Experimental - Implement Progressive Gradient Scaling (PGS) support for Triton-based quantization kernels (b58b00b)
    • Common

      • Fix TFEnhanced incorrectly producing negative scales when encountering empty (size‑0) inputs (ea4af6a)
      • Unpin numpy dependency (8a999a1)
      • Add an alias for referencing the eNPU configuration file (b79611c)

Version 2.23.0

28 Jan 18:36

Choose a tag to compare

  • Bug fixes and Improvements
    • ONNX

      • Disable per-channel quantization for ConvTranspose ops (9395e32)
      • New top level API for configuring parameter quantization type (a1c197d)
    • Torch

      • Enable Torch Dynamo ONNX export (59e0125)
    • Common

      • Enable per-channel matmul quantization in config files (7137849)
      • LLM quantization recipes in docs (6561f0e)
      • Fix CUDA discrepancies against CPU wheel (01e7422)

Version 2.22.0

13 Jan 18:00

Choose a tag to compare

  • Bug fixes and Improvements

    • ONNX

      • Allow loading 2.0.0 encoding format to sim (e8cb098)
      • Fix Cast unpacking error (6761a19)
      • Enable exporting non-LPBQ encodings with zero_point shift (7b3cc4c)
      • Implement aimet-onnx LPBQEncoding (5ad7ea6)
    • Common

      • Support exporting 1x1 Conv LPBQ to ONNX QDQ (58ce71d)

Version 2.21.0

15 Dec 21:37

Choose a tag to compare

  • Bug fixes and Improvements

    • ONNX

      • Fix IndexError when Conv or Linear layers are reused in the model (65c4b3b)
      • Add optional argument export_int32_bias to aimet-onnx export (3b8e0f0)
      • Unpin PyTorch version in aimet-onnx (d99b6c4)
      • Align NaN handling with ORT CPU Execution Provider (e4c49eb)
      • Fix quantization axis handling for transposed MatMul operations (6ca06d6)
    • PyTorch

      • Fix quantization logic to enable input quantizers for layers following ignored layers (80fb4fe_)

Version 2.20.0

02 Dec 21:17

Choose a tag to compare

2.20.0

  • Bug fixes and Improvements

    • Common

      • Update supported python version to >=3.10 (2bc8c94)
      • Repackage aimet_common as alias to aimet_onnx.common or aimet_torch.common (074e85f)
      • Remove Pad op from data movement ops (21cddb6)
    • ONNX

      • Export data movement op output encoding in sim.export by default (550c029)
      • Assign generic node names if node name is missing or duplicate (273dd82)
      • Add PyTorch Pad modules to nn.Module -> onnx op mapping (7e5342b)
      • Add LSTM cell state int32 quantization mechanism for LPAI (3a8659b)
      • Support stacked RNN/GRU/LSTM (552ad83)
      • Make exclude/include node argument naming consistent (ec22d86)
      • Implement LPBQ support in aimet-onnx SeqMSE (495567f)
      • Add support for dilation, grouping, stride to Quantized Conv (f94f3e2)
      • Remove block type from adascale config (b55b058)
      • Skip tying concat encoding if input has multiple consumers (3136828)
      • Tie quantizers upstream first and downstream later (59aac3e)
      • Fix ValidationError in LazyExtractor when external files are missing or inconsistent (a8f32fc)
      • Align torch and onnx GenAI recipes (7d4659d)
    • Torch

      • Use separate input quantizer for each concat input (755c54a)
      • Add predict and fallback later approach for batched matmul in aimet-torch seq mse (8874173)
      • Refactored MMP to not use rounding mode (fd7e40d)
      • Use tuple for strided slice indexing (4ddbd66)
      • Fix symmetry bug in _from_qnn_encoding_dict (35602ea)
      • Align onnx 1.0.0 BQ encoding export ordering with QAIRT expectation (0182b7a)

Version 2.19.0

19 Nov 17:33

Choose a tag to compare

  • New Features

  • Bug fixes and Improvements

    • ONNX

      • Make LiteMP API percentage float (69f96ff)
      • Set layernorm int16 weight to symmetric by default (8560e13)
      • Automatically insert data movement op output qdq during to_onnx_qdq (15c8b9b)
      • Create LazyExtractor to handle external data for onnx Extractor utils (104e7e8)
      • Tie input/output encodings across maximum Concat subgraph (832ea91)
      • Tie hidden state quantizers of RNN/GRU/LSTM (c18fd05)
    • Torch

      • Fix histogram observer rebinning logic (2c88364)
      • Fix connectedgraph input ordering for non-trivial layer types (2b7b548)
    • Common

      • Disable per-channel quantization of RNN/GRU/LSTM for all HTP backends (df8b875)

Version 2.18.0

06 Nov 16:32

Choose a tag to compare

  • New Features

    • Torch
      • Promoted aimettorch.onnx.export and QuantizationSimModel.onnx.export as production APIs (99160d2, e026fd1)
      • Added utility functions to exclude some or all unknown nn.Modules from quantization (5a419f3, 501eebd)
  • Bug fixes and Improvements

    • ONNX

      • Fixed supergroup misidentification bug upon MatMul-MatMul-Add sequence (ab63866)
    • Torch

      • Made compatible with PyTorch 1.13 (47fae94)
      • Made compatible with PyTorch 2.9 (283ecc1)
    • Common

      • Set priority among supergroups (6676a6c)