Compressed Tensors v0.12.0
What's Changed
- Refactor module / parameter matching logic by @fynnsu in #406
- Revert "Refactor module / parameter matching logic (#406)" by @kylesayrs in #429
- Refactor module / parameter matching logic by @fynnsu in #431
- Add quality check to CI and fix existing errors by @fynnsu in #408
- Speed up nvfp4 pack/unpack w/ torch.compile by @fynnsu in #400
- Simplify
apply_quantization_config
by @kylesayrs in #433 - Install compilers etc to fix nightly test failure by @dhuangnm in #435
- Fix a minor bug on GH hosted runners by @dhuangnm in #438
- remove references to
llmcompressor.transformers.oneshot
in examples by @brian-dellabetta in #422 - [Tests] Combine quantization and dequantization tests by @kylesayrs in #443
- fix compress on meta device issue by @shanjiaz in #444
- Throw error for unsupported activation strategies by @kylesayrs in #446
- [Transform] Better dispatch support for offloaded and multi-gpu by @kylesayrs in #423
- [Quantization Format] Add functionality to infer format by @dsikka in #441
- Revert "[Quantization Format] Add functionality to infer format (#441)" by @dsikka in #451
- Raise ValueError when nvfp4 pack tensor has odd number of columns by @fynnsu in #402
- [Quantization] Allow dynamic group activation quantization by @kylesayrs in #450
- Fix lint error on main by @fynnsu in #460
- [Accelerate] Remove
is_module_offloaded
andupdate_prefix_dict
by @kylesayrs in #366 - [Decompression] Clean-up and some fixes by @dsikka in #461
- [ModelCompressor] Remove missing keys and missing modules by @dsikka in #462
- [Logging] Support use of loguru by @kylesayrs in #454
- [Utils] Deprecate
safe_permute
by @kylesayrs in #464 - [Quantization Format] Add functionality to infer format by @dsikka in #452
- [licensing refactor] remove
frozendict
dependency, usetypes.MappingProxyType
instead by @brian-dellabetta in #469 - [Transform] Support loading random hadamards on meta device by @kylesayrs in #445
- [transforms] TransformScheme.block_size, deprecate head_dim by @brian-dellabetta in #466
- [Multi-Modifier] Scoped apply quantization config by @brian-dellabetta in #432
- [Model Compressor] Move infer call to from_pretrained_model method by @dsikka in #470
- Always save g_idx when initialized in quantization compressor by @rahul-tuli in #467
- Add back get unexpected keys to support transformers lower bound by @dsikka in #475
- Improve Hugging Face API utilization in tests by @dbarbuzzi in #473
- [Transform] Revert deprecation of
TransformScheme.head_dim
for compatibility with vllm by @brian-dellabetta in #472 - [cicd] Include Python version in artifact name by @dbarbuzzi in #477
New Contributors
Full Changelog: 0.11.0...0.12.0