v0.5.2
What's Changed
- Exclude images from package by @kylesayrs in #1397
- [Tracing] Skip non-ancestors of sequential targets by @kylesayrs in #1389
- Consolidate build config by @dbarbuzzi in #1398
- [Tests] Disable silently failing kv cache test by @kylesayrs in #1371
- Drop
flash_attnskip for quantizing_moe example tests by @dbarbuzzi in #1396 - [VLM] Fix mllama targets by @kylesayrs in #1402
- [Tests] Use requires_gpu, fix missing gpu test skip, add explicit test for gpu from gha by @kylesayrs in #1264
- Implement
QuantizationMixinby @kylesayrs in #1351 - Add new-features section by @rahul-tuli in #1408
- [Tracing] Support tracing of Gemma3 [#1248] by @kelkelcheng in #1373
- bugfix kv cache quantization with ignored layers by @brian-dellabetta in #1312
- AWQ sanitize_kwargs minor cleanup by @brian-dellabetta in #1405
- [Tracing][Testing] Add tracing tests by @kylesayrs in #1335
- fix lm eval test reproducibility issues by @brian-dellabetta in #1260
- Pipeline Extraction by @kylesayrs in #1279
- Add
pull_requesttrigger to base tests workflow by @dbarbuzzi in #1417 - removing RecipeMetadata and references by @shanjiaz in #1414
- Update examples to only load required number of samples from dataset by @kylesayrs in #1118
- [Tracing] Reinstate ignore functionality by @kylesayrs in #1423
- [Typo] overriden by @kylesayrs in #1420
- Rename SparsityModifierMixin to SparsityModifierBase by @kylesayrs in #1416
- Remove RecipeArgs class & its references by @shanjiaz in #1429
- [Examples] Standardize AWQ example by @kylesayrs in #1412
- [Logging] Support logging once by @kylesayrs in #1431
- Add: deepseekv2 smoothquant mappings by @rahul-tuli in #1433
- AWQ QuantizationMixin + SequentialPipeline by @brian-dellabetta in #1426
- patch awq tests/readme after QuantizationMixin refactor by @brian-dellabetta in #1439
- Added more tests for Quantization24SparseW4A16 by @shanjiaz in #1434
- [GPTQ] Add
actorderoption to modifier by @kylesayrs in #1424 - [Bugfix][Tracing] Fix qwen2_5_vl by @kylesayrs in #1448
- [Tests] Use proper offloading utils in
test_compress_tensor_utilsby @kylesayrs in #1449 - [Tracing] Fix Traceable Imports by @kylesayrs in #1452
- [NVFP4] Enable FP4 Weight-Only Quantization by @dsikka in #1309
- Pin transformers to <4.52.0 by @brian-dellabetta in #1459
- AWQ Apply Scales Bugfix when smooth layer output length doesn't match balance layer input length by @brian-dellabetta in #1451
- Fix #1344 Extend e2e tests to add asym support for W8A8-Int8 by @ved1beta in #1345
- [Tests] Fix activation recipe for w8a8 asym by @dsikka in #1461
- AWQ Qwen and Phi mappings by @brian-dellabetta in #1440
- [Observer] Optimize mse observer by @shanjiaz in #1450
- Fix: Improve
SmoothQuantSupport for Mixture of Experts (MoE) Models by @rahul-tuli in #1455 - [Tests] Add nvfp4a16 e2e test case by @dsikka in #1463
- [Docs] Update README to list fp4 by @dsikka in #1462
- Remove duplicate model id var from awq example recipe by @AndrewMead10 in #1467
- Added observer type for test_min_max by @shanjiaz in #1466
- Disable kernels during calibration (and tracing) by @kylesayrs in #1454
- [GPTQ] Fix actorder resolution, add sentinel by @kylesayrs in #1453
- Set
show_progressto True by @dsikka in #1471 - Remove
compressby @dsikka in #1470 - raise error if block quantization is used, as it is not yet supported by @brian-dellabetta in #1476
- [Tests] Increase max seq length for tracing tests by @kylesayrs in #1478
- [Tests] Fix dynamic field to be a bool, not string by @dsikka in #1480
- [Examples] Fix qwen vision examples by @kylesayrs in #1481
- [NVFP4] Update to use
tensor_groupstrategy; update observers by @dsikka in #1484 - loosen lmeval assertions to upper or lower bound by @brian-dellabetta in #1477
- Revert "expand observers to calculate gparams, add example for activa… by @dsikka in #1486
- fix rest of the minmax tests by @shanjiaz in #1469
- Add warning for non-divisible group quantization by @kylesayrs in #1401
- [AWQ] Support accumulation for reduced memory usage by @kylesayrs in #1435
- [Tracing] Code AutoWrapper by @kylesayrs in #1411
- Removed RecipeTuple & RecipeContainer class by @shanjiaz in #1460
- Unpin to support
transformers==4.52.3by @kylesayrs in #1479 - [Tests] GPTQ Actorder Resolution Tests by @kylesayrs in #1468
- [Testing] Skip FP4 Test by @dsikka in #1499
- [Bugfix] Remove tracing imports from tests by @kylesayrs in #1498
- [Testing] Use a slightly larger model that works with group_size 128 by @dsikka in #1502
- skip tracing tests if token unavailable by @brian-dellabetta in #1493
- Fix missing logs when calling oneshot by @kelkelcheng in #1446
- [NVFP4] Expand observers to calculate gparam, support NVFP4 Activations by @dsikka in #1487
- [Tests] Remove duplicate test by @kylesayrs in #1500
- [Model] Mistral3 example and test by @kylesayrs in #1490
- [NVFP4] Use observers to generate global weight scales by @dsikka in #1504
- Revert "[NVFP4] Use observers to generate global weight scales " by @dsikka in #1507
- [NVFP4] Update global scale generation by @dsikka in #1508
- [NVFP4] Fix onloading of fused layers by @dsikka in #1512
- Pin pandas to <2.3 by @dbarbuzzi in #1515
- AWQModifier fast resolve mappings, better logging, MoE support by @brian-dellabetta in #1444
- Update setup.py by @dsikka in #1516
- Use model compression pathways by @kylesayrs in #1419
- [Example] [Bugfix] Fix Gemma3 Generation by @kylesayrs in #1517
- [Docs] Update ReadME details for FP4 by @dsikka in #1519
- [Examples] [Bugfix] Perform sample generation before saving as compressed by @kylesayrs in #1530
- Add citation information both in README as well as native GitHub file support by @markurtz in #1527
- update compressed-tensors version requirement by @dhuangnm in #1534
New Contributors
- @kelkelcheng made their first contribution in #1373
- @AndrewMead10 made their first contribution in #1467
Full Changelog: 0.5.1...0.5.2