Release v0.5.2 · vllm-project/llm-compressor

What's Changed

Exclude images from package by @kylesayrs in #1397
[Tracing] Skip non-ancestors of sequential targets by @kylesayrs in #1389
Consolidate build config by @dbarbuzzi in #1398
[Tests] Disable silently failing kv cache test by @kylesayrs in #1371
Drop flash_attn skip for quantizing_moe example tests by @dbarbuzzi in #1396
[VLM] Fix mllama targets by @kylesayrs in #1402
[Tests] Use requires_gpu, fix missing gpu test skip, add explicit test for gpu from gha by @kylesayrs in #1264
Implement QuantizationMixin by @kylesayrs in #1351
Add new-features section by @rahul-tuli in #1408
[Tracing] Support tracing of Gemma3 [#1248] by @kelkelcheng in #1373
bugfix kv cache quantization with ignored layers by @brian-dellabetta in #1312
AWQ sanitize_kwargs minor cleanup by @brian-dellabetta in #1405
[Tracing][Testing] Add tracing tests by @kylesayrs in #1335
fix lm eval test reproducibility issues by @brian-dellabetta in #1260
Pipeline Extraction by @kylesayrs in #1279
Add pull_request trigger to base tests workflow by @dbarbuzzi in #1417
removing RecipeMetadata and references by @shanjiaz in #1414
Update examples to only load required number of samples from dataset by @kylesayrs in #1118
[Tracing] Reinstate ignore functionality by @kylesayrs in #1423
[Typo] overriden by @kylesayrs in #1420
Rename SparsityModifierMixin to SparsityModifierBase by @kylesayrs in #1416
Remove RecipeArgs class & its references by @shanjiaz in #1429
[Examples] Standardize AWQ example by @kylesayrs in #1412
[Logging] Support logging once by @kylesayrs in #1431
Add: deepseekv2 smoothquant mappings by @rahul-tuli in #1433
AWQ QuantizationMixin + SequentialPipeline by @brian-dellabetta in #1426
patch awq tests/readme after QuantizationMixin refactor by @brian-dellabetta in #1439
Added more tests for Quantization24SparseW4A16 by @shanjiaz in #1434
[GPTQ] Add actorder option to modifier by @kylesayrs in #1424
[Bugfix][Tracing] Fix qwen2_5_vl by @kylesayrs in #1448
[Tests] Use proper offloading utils in test_compress_tensor_utils by @kylesayrs in #1449
[Tracing] Fix Traceable Imports by @kylesayrs in #1452
[NVFP4] Enable FP4 Weight-Only Quantization by @dsikka in #1309
Pin transformers to <4.52.0 by @brian-dellabetta in #1459
AWQ Apply Scales Bugfix when smooth layer output length doesn't match balance layer input length by @brian-dellabetta in #1451
Fix #1344 Extend e2e tests to add asym support for W8A8-Int8 by @ved1beta in #1345
[Tests] Fix activation recipe for w8a8 asym by @dsikka in #1461
AWQ Qwen and Phi mappings by @brian-dellabetta in #1440
[Observer] Optimize mse observer by @shanjiaz in #1450
Fix: Improve SmoothQuant Support for Mixture of Experts (MoE) Models by @rahul-tuli in #1455
[Tests] Add nvfp4a16 e2e test case by @dsikka in #1463
[Docs] Update README to list fp4 by @dsikka in #1462
Remove duplicate model id var from awq example recipe by @AndrewMead10 in #1467
Added observer type for test_min_max by @shanjiaz in #1466
Disable kernels during calibration (and tracing) by @kylesayrs in #1454
[GPTQ] Fix actorder resolution, add sentinel by @kylesayrs in #1453
Set show_progress to True by @dsikka in #1471
Remove compress by @dsikka in #1470
raise error if block quantization is used, as it is not yet supported by @brian-dellabetta in #1476
[Tests] Increase max seq length for tracing tests by @kylesayrs in #1478
[Tests] Fix dynamic field to be a bool, not string by @dsikka in #1480
[Examples] Fix qwen vision examples by @kylesayrs in #1481
[NVFP4] Update to use tensor_group strategy; update observers by @dsikka in #1484
loosen lmeval assertions to upper or lower bound by @brian-dellabetta in #1477
Revert "expand observers to calculate gparams, add example for activa… by @dsikka in #1486
fix rest of the minmax tests by @shanjiaz in #1469
Add warning for non-divisible group quantization by @kylesayrs in #1401
[AWQ] Support accumulation for reduced memory usage by @kylesayrs in #1435
[Tracing] Code AutoWrapper by @kylesayrs in #1411
Removed RecipeTuple & RecipeContainer class by @shanjiaz in #1460
Unpin to support transformers==4.52.3 by @kylesayrs in #1479
[Tests] GPTQ Actorder Resolution Tests by @kylesayrs in #1468
[Testing] Skip FP4 Test by @dsikka in #1499
[Bugfix] Remove tracing imports from tests by @kylesayrs in #1498
[Testing] Use a slightly larger model that works with group_size 128 by @dsikka in #1502
skip tracing tests if token unavailable by @brian-dellabetta in #1493
Fix missing logs when calling oneshot by @kelkelcheng in #1446
[NVFP4] Expand observers to calculate gparam, support NVFP4 Activations by @dsikka in #1487
[Tests] Remove duplicate test by @kylesayrs in #1500
[Model] Mistral3 example and test by @kylesayrs in #1490
[NVFP4] Use observers to generate global weight scales by @dsikka in #1504
Revert "[NVFP4] Use observers to generate global weight scales " by @dsikka in #1507
[NVFP4] Update global scale generation by @dsikka in #1508
[NVFP4] Fix onloading of fused layers by @dsikka in #1512
Pin pandas to <2.3 by @dbarbuzzi in #1515
AWQModifier fast resolve mappings, better logging, MoE support by @brian-dellabetta in #1444
Update setup.py by @dsikka in #1516
Use model compression pathways by @kylesayrs in #1419
[Example] [Bugfix] Fix Gemma3 Generation by @kylesayrs in #1517
[Docs] Update ReadME details for FP4 by @dsikka in #1519
[Examples] [Bugfix] Perform sample generation before saving as compressed by @kylesayrs in #1530
Add citation information both in README as well as native GitHub file support by @markurtz in #1527
update compressed-tensors version requirement by @dhuangnm in #1534

New Contributors

@kelkelcheng made their first contribution in #1373
@AndrewMead10 made their first contribution in #1467

Full Changelog: 0.5.1...0.5.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.5.2

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!