Releases: neuralmagic/compressed-tensors
Releases · neuralmagic/compressed-tensors
Compressed Tensors v0.12.2
What's Changed
- remove deprecated safe_permute by @brian-dellabetta in #471
- [Transform] Fix accelerate import to keep it as optional dependency by @tboerstad in #480
- [Bugfix] Fix Per-Token Dynamic Activation Quantization by @max410011 in #393
New Contributors
- @tboerstad made their first contribution in #480
- @max410011 made their first contribution in #393
Full Changelog: 0.12.1...0.12.2
Compressed Tensors v0.12.1
What's Changed
- [Patch Fix] Add
get_missing_module_keys
to support transformers lower bound by @dsikka in #479 - [cicd] Add post-release run to push next nightly by @dbarbuzzi in #478
Full Changelog: 0.12.0...0.12.1
Compressed Tensors v0.12.0
What's Changed
- Refactor module / parameter matching logic by @fynnsu in #406
- Revert "Refactor module / parameter matching logic (#406)" by @kylesayrs in #429
- Refactor module / parameter matching logic by @fynnsu in #431
- Add quality check to CI and fix existing errors by @fynnsu in #408
- Speed up nvfp4 pack/unpack w/ torch.compile by @fynnsu in #400
- Simplify
apply_quantization_config
by @kylesayrs in #433 - Install compilers etc to fix nightly test failure by @dhuangnm in #435
- Fix a minor bug on GH hosted runners by @dhuangnm in #438
- remove references to
llmcompressor.transformers.oneshot
in examples by @brian-dellabetta in #422 - [Tests] Combine quantization and dequantization tests by @kylesayrs in #443
- fix compress on meta device issue by @shanjiaz in #444
- Throw error for unsupported activation strategies by @kylesayrs in #446
- [Transform] Better dispatch support for offloaded and multi-gpu by @kylesayrs in #423
- [Quantization Format] Add functionality to infer format by @dsikka in #441
- Revert "[Quantization Format] Add functionality to infer format (#441)" by @dsikka in #451
- Raise ValueError when nvfp4 pack tensor has odd number of columns by @fynnsu in #402
- [Quantization] Allow dynamic group activation quantization by @kylesayrs in #450
- Fix lint error on main by @fynnsu in #460
- [Accelerate] Remove
is_module_offloaded
andupdate_prefix_dict
by @kylesayrs in #366 - [Decompression] Clean-up and some fixes by @dsikka in #461
- [ModelCompressor] Remove missing keys and missing modules by @dsikka in #462
- [Logging] Support use of loguru by @kylesayrs in #454
- [Utils] Deprecate
safe_permute
by @kylesayrs in #464 - [Quantization Format] Add functionality to infer format by @dsikka in #452
- [licensing refactor] remove
frozendict
dependency, usetypes.MappingProxyType
instead by @brian-dellabetta in #469 - [Transform] Support loading random hadamards on meta device by @kylesayrs in #445
- [transforms] TransformScheme.block_size, deprecate head_dim by @brian-dellabetta in #466
- [Multi-Modifier] Scoped apply quantization config by @brian-dellabetta in #432
- [Model Compressor] Move infer call to from_pretrained_model method by @dsikka in #470
- Always save g_idx when initialized in quantization compressor by @rahul-tuli in #467
- Add back get unexpected keys to support transformers lower bound by @dsikka in #475
- Improve Hugging Face API utilization in tests by @dbarbuzzi in #473
- [Transform] Revert deprecation of
TransformScheme.head_dim
for compatibility with vllm by @brian-dellabetta in #472 - [cicd] Include Python version in artifact name by @dbarbuzzi in #477
New Contributors
Full Changelog: 0.11.0...0.12.0
Compressed Tensors v0.11.0
What's Changed
- Fix nightly issues with python 3.11 by @dhuangnm in #371
- Clean up disk space and restore ubuntu 22.04 runner by @dhuangnm in #373
- [Transform] Update tests to use conftest file by @kylesayrs in #367
- [Transform] Hadamard Permutations by @kylesayrs in #329
- [Transform] Construct on GPU, cache on CPU by @kylesayrs in #352
- enable code coverage collection and reporting INFERENG-1049 by @derekk-nm in #382
- Deprecate
iter_named_leaf_modules
anditer_named_quantizable_modules
by @kylesayrs in #381 - Added support for compression on meta device by @shanjiaz in #376
- Add torch.float64 as a viable dtype for scales by @eldarkurtic in #379
- [Transform]
apply_transform_config
by @kylesayrs in #348 - [Compression] Fix compression device movement in cases of indexed devices by @kylesayrs in #384
- Enable code coverage report for nightly tests by @dhuangnm in #388
- [Bugfix] Only quant-compress modules with weight quantization by @kylesayrs in #387
- [Transform] Fix config serialization by @kylesayrs in #396
- [Transform] Do not fuse div operation into hadamard matrices by @kylesayrs in #395
- [Transform] Implement multi-headed transforms by @kylesayrs in #383
- Support DeepSeekV3-style block FP8 quantization by @mgoin in #372
- [Transform] [Utils] Canonical matching utilities by @kylesayrs in #392
- [Bugfix] Safeguard against submodule parameter deletion in decompress_model by @kylesayrs in #347
- fix block quantization initialization by @shanjiaz in #403
- [Utils] Skip internal modules when matching by @kylesayrs in #404
- [Quantization][Decompression] Fix QDQ for dynamic quant; Update NVFP4 Compression Params by @dsikka in #407
- [Utils] Support matching vLLM modules by @kylesayrs in #413
- Fix block size inference logic by @shanjiaz in #411
- [Transform] Serialize with tied weights by @kylesayrs in #370
- [Transform] [Utils] Support precision, add torch dtype validation by @kylesayrs in #414
- [Transform] Serialize transforms config by @kylesayrs in #412
- Error when configs are created with unrecognized fields by @kylesayrs in #386
- revert forbid constraint on QuantizationConfig by @brian-dellabetta in #418
- Revert "[Transform] Serialize transforms config (#412)" by @dsikka in #419
- added wrapper for execution device by @shanjiaz in #417
- [Transform] Serialize config (include format) by @dsikka in #420
- exclude transform_config from quantization_config parse by @brian-dellabetta in #421
- [Quantization] Support more than one quant-compressor by @dsikka in #415
- [QuantizationScheme] Validate format by @dsikka in #424
- [Utils] Expand
is_match
by @kylesayrs in #416 - fix match.py syntax by @shanjiaz in #426
- [Offload] Fully remove dispatch by @kylesayrs in #427
Full Changelog: 0.10.2...0.11.0
Compressed Tensors v0.10.2
What's Changed
- [Hotfix] Implement quantization compressor methods on dense compressor by @kylesayrs in #344
- [Hotfix] Implement method on dense compressor by @kylesayrs in #345
- [Transform] Factory classes with shared memory and offloading by @kylesayrs in #316
- [Transform] [Bugfix] Fix enum value serialization in python>=3.11 by @kylesayrs in #350
- Remove redundant call by @eldarkurtic in #349
- [Accelerate] Rename and simplify
force_cpu_offload
by @kylesayrs in #354 - [Transform] Extend set of known Hadamard matrices by @kylesayrs in #351
- [Accelerate] Fix
offloaded_dispatch
, implementdisable_offloading
by @kylesayrs in #355 - [Accelerate] Extend functionality of
register_offload_parameter
by @kylesayrs in #356 - [Bugfix] Fix saving of models dispatched by
offloaded_dispatch
by @kylesayrs in #357 - [Bugfix] Only update direct params in
disable_offloading
by @kylesayrs in #360 - reference updated reportportal_submit_execution_results action by @derekk-nm in #362
- [Accelerate] Expand
get_execution_device
to support models by @kylesayrs in #363 - [Accelerate] Fix typos in
get_execution_device
by @kylesayrs in #365
New Contributors
- @derekk-nm made their first contribution in #362
Full Changelog: 0.10.1...0.10.2
Compressed Tensors v0.10.1
What's Changed
- [Transform] Hadamard and Matrix Transform Utils by @kylesayrs in #330
- Fix error on import whenever accelerate is absent by @maresb in #342
New Contributors
Full Changelog: 0.10.0...0.10.1
Compressed Tensors v0.10.0
What's Changed
- Updates to build system by @dbarbuzzi in #304
- [Utils] add align_modules by @kylesayrs in #282
- Enable module state_dict compression, simplify compression logic by @kylesayrs in #302
- Fix
_initialize_scale_zero_point
initializing on the wrong device by @mgoin in #295 - Revert "Enable module state_dict compression, simplify compression lo… by @kylesayrs in #306
- [Bugfix] Fix shape calculation for group quantization by @kylesayrs in #308
- Enable module state_dict compression, simplify compression logic by @kylesayrs in #307
- Clarify decompression return type by @kylesayrs in #310
- Clarify
match_param_name
return type by @kylesayrs in #312 - [Compressor][NVFP4] Support FP4 Compression by @dsikka in #311
- [NVFP4] Update FloatArgs and NVFP4 by @dsikka in #313
- fix signatures on model_validator functions by @brian-dellabetta in #314
- [Performance] Add memory compression and decompression pathways by @kylesayrs in #301
- Model Compression: Set compression status by @kylesayrs in #318
- [NVFP4] Enable Fp4 Quantization; introduce / apply global_scales by @dsikka in #315
- [NVFP4] Skip fused global scale calculation if already fused by @dsikka in #322
- Update default observer to be
MSE
by @shanjiaz in #300 - [Misc] Generics typehinting for
RegistryMixin
by @kylesayrs in #320 - Revert "Update default observer to be
MSE
(#300)" by @dsikka in #323 - [NVFP4] Add
tensor_group
strategy; enable NVFP4 Activations by @dsikka in #317 - [Transforms] Transform Args, Scheme, and Config by @kylesayrs in #321
- [NVFP4] Expand dynamic types, clean-up conditions by @dsikka in #325
- Use different runner for UPLOAD job by @dbarbuzzi in #327
- [NVFP4] Use torch.compile when rounding to NVFP4 by @dsikka in #331
- [Tests] Update test_fp8_quant.py by @dsikka in #337
- [Tests] Fix test scale init for group quant by @dsikka in #338
- [Quantization] Update group quantization by @dsikka in #336
- [NVFP4] update global scale generation by @dsikka in #339
- [Transform] Accelerate Utilities by @kylesayrs in #328
- Model Compression: Delete offload by @kylesayrs in #319
- [Decompression] Keep unused parameters when decompressing from memory by @kylesayrs in #340
- [NVFP4] Small Nits by @dsikka in #341
New Contributors
Full Changelog: 0.9.4...0.10.0
Compressed Tensors v0.9.4
What's Changed
- Remove compression_ratio calculation by @dsikka in #293
- Build with setuptools scm by @dhuangnm in #292
- fix a few minor issues by @dhuangnm in #294
- Some fixes for AWQ by @rahul-tuli in #269
- Fix upload issue when package already existed on PyPI by @dhuangnm in #297
- Update action tags by @dhuangnm in #298
- Pick up fix from nm-actions by @dhuangnm in #299
- [Compressor] Update packed compressor to support zp packing by @dsikka in #296
- [Decompression] Update Decompression Lifecycle by @dsikka in #285
- [Accelerate] allow get_execution_device to be used when initializing a model by @kylesayrs in #303
Full Changelog: 0.9.3...0.9.4
Compressed Tensors v0.9.3
What's Changed
- remove testmo by @dhuangnm in #258
- update tag for summary-test action by @dhuangnm in #259
- [Bugfix] Support offloaded parameters when initializing KV cache parameters by @kylesayrs in #261
- Update: CompressedLinear to decompress once by @rahul-tuli in #266
- [BugFix]:
AttributeError
inCompressedLinear
by @rahul-tuli in #273 - Fix case when using weight_packed, not weight by @dsikka in #278
- Report test results to Report Portal by @dhuangnm in #271
- use fine-grained token for workflow by @dhuangnm in #283
- Rectify Asym Compression/Decompression Pathways by @dsikka in #225
- Bump CT Version by @dsikka in #288
Full Changelog: 0.9.2...0.9.3
Compressed Tensors v0.9.2
What's Changed
- ModelCompressor type checking import by @kylesayrs in #220
- Fix warning for dynamic quantization args by @kylesayrs in #227
- Depreciate get_observer by @kylesayrs in #214
- Accelerate Utilities: Throw warning when updating with different shapes by @kylesayrs in #231
- Use faster operations on packed-quantized, add tests by @horheynm in #211
- Update build workflow to Python 3.12 by @dbarbuzzi in #248
- Replace
COMPRESSION_PARAM_NAMES
with Abstract Property by @rahul-tuli in #249 - Kylesayrs/update readme by @brian-dellabetta in #252
- Add: missing and unexpected keys in ModelCompressor by @rahul-tuli in #250
- switch runners by @dhuangnm in #254
- Bump version for patch release by @dsikka in #255
Full Changelog: 0.9.1...0.9.2