AutoFP8 to llmcompressor migration for FP8 quantization by a-ys · Pull Request #2701 · deepjavalibrary/djl-serving

a-ys · 2025-02-01T05:40:54Z

Description

Migrates existing FP8 quantization functionality to llmcompressor from AutoFP8. The existing quantization recipe is fp8 weights and activation quantization, with static activation scales. Uses cnn-dailymail for calibration, defaulting to 512 samples at 2048 seq len.

Installed the previous version of llm-compressor (0.3.1) due to an issue with compatibility between llm-compressor (0.4.0) and transformers 2.5.2, which is the current version in djl-serving.

TODOs

Update CI tests in lmi-distro to verify these changes.
Remove AutoFP8 dependency from container.
Future feature support (prioritization tbd)
- MoE model quantization. (May run with exiting code, but will not ignore the correct layers)
- KV cache quantization with static scales.
- Calibration dataset selection
- AWQ migration (not yet supported in llmcompressor)

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

Please add the link of Integration Tests Executor run with related tests.
Have you manually built the docker image and verify the change?
Have you run related tests? Check how to set up the test environment here; One example would be pytest tests.py -k "TestCorrectnessLmiDist" -m "lmi_dist"
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Feature/Issue validation/testing

Tested quantization of tinyllama with llmcompressor and serving with v14 v2 preview container through Neo workflow.

serving/docker/lmi-container-requirements-common.txt

siddvenk · 2025-02-01T23:00:10Z

serving/docker/partition/sm_neo_quantize.py

                    "will not include this field.")

+        if output_properties.get("option.quantize") == "fp8":
+            output_properties["option.quantize"] = "compressed-tensors"


does this work for both lmi-dist (vllm 0.6.3.post1) and vanilla vllm (0.7.0)?

siddvenk · 2025-02-02T22:57:23Z

merging this for now - lgtm. let's validate this again with the vllm update for both lmi-dist/vllm

AutoFP8 to llmcompressor migration for FP8 quant

2396f0f

a-ys requested review from a team and zachgk as code owners February 1, 2025 05:40

siddvenk reviewed Feb 1, 2025

View reviewed changes

serving/docker/lmi-container-requirements-common.txt Show resolved Hide resolved

siddvenk reviewed Feb 1, 2025

View reviewed changes

siddvenk approved these changes Feb 2, 2025

View reviewed changes

siddvenk merged commit d4f5ee7 into deepjavalibrary:master Feb 2, 2025
9 checks passed

siddvenk mentioned this pull request Feb 3, 2025

fix lmi/vllm virtual envs, update to vllm 0.7.1 #2703

Merged

12 tasks

a-ys mentioned this pull request Feb 5, 2025

Neo Quantization Fixes #2724

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoFP8 to llmcompressor migration for FP8 quantization#2701

AutoFP8 to llmcompressor migration for FP8 quantization#2701
siddvenk merged 1 commit intodeepjavalibrary:masterfrom
a-ys:llm-compressor-upgrade

a-ys commented Feb 1, 2025

Uh oh!

Uh oh!

siddvenk Feb 1, 2025

Uh oh!

Uh oh!

siddvenk commented Feb 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

a-ys commented Feb 1, 2025

Description

TODOs

Type of change

Checklist:

Feature/Issue validation/testing

Uh oh!

Uh oh!

siddvenk Feb 1, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

siddvenk commented Feb 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants