Add e2e tests for non-uniform quantization examples by saurabhaloneai · Pull Request #2321 · vllm-project/llm-compressor

saurabhaloneai · 2026-01-31T14:18:42Z

SUMMARY:

Adds two new e2e test cases for non-uniform quantization examples.

NVFP4+FP8 Mixed: Uses NVFP4 for attention/gate/up layers, FP8 for down_proj
GPTQ+AWQ Mixed: Uses AWQ W4A16 for attention, GPTQ W8A8 for MLP

Closes #2315
Depends on #2317

Files added:

tests/e2e/vLLM/configs/nvfp4_fp8_mixed.yaml
tests/e2e/vLLM/configs/multiple_modifiers_gptq_awq.yaml
tests/e2e/vLLM/recipes/non_uniform/recipe_nvfp4_fp8_mixed.yaml
tests/e2e/vLLM/recipes/non_uniform/recipe_gptq_awq.yaml

TEST PLAN:

Tested locally on GPU (RTX 6000 Pro Blackwell):

NVFP4+FP8 test:

CADENCE=nightly TEST_DATA_FILE=tests/e2e/vLLM/configs/nvfp4_fp8_mixed.yaml pytest tests/e2e/vLLM/test_vllm.py -v

Result: Compression passed (154 layers)

GPTQ+AWQ test:

CADENCE=nightly TEST_DATA_FILE=tests/e2e/vLLM/configs/multiple_modifiers_gptq_awq.yaml pytest tests/e2e/vLLM/test_vllm.py -v

Result: Compression passed for both modifiers

github-actions · 2026-01-31T14:18:51Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2026-01-31T14:18:58Z

Summary of Changes

Hello @saurabhaloneai, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces comprehensive end-to-end tests to validate advanced non-uniform quantization strategies within the vLLM framework. It ensures the correct application and functionality of mixed precision quantization, specifically for NVFP4+FP8 and GPTQ+AWQ configurations, thereby enhancing the robustness and reliability of the quantization pipeline for large language models.

Highlights

New E2E Test Cases: Two new end-to-end test cases have been added to validate non-uniform quantization examples within the vLLM framework.
NVFP4+FP8 Mixed Quantization: One test case specifically targets a mixed NVFP4+FP8 quantization scheme, applying NVFP4 to attention, gate, and up layers, and FP8 to the down_proj layer.
GPTQ+AWQ Mixed Quantization: Another test case validates a mixed GPTQ+AWQ quantization, utilizing AWQ W4A16 for attention layers and GPTQ W8A8 for MLP layers.
Configuration and Recipe Files: New YAML configuration and recipe files were introduced to define the parameters and layer targets for these specific non-uniform quantization strategies.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds two new end-to-end test cases for non-uniform quantization, which is a great addition for improving test coverage. My review focuses on the new YAML configuration files. I've identified a few areas where the configurations can be improved for robustness and performance. Specifically, some of the regular expressions for targeting layers are a bit too broad and could be made more specific. Additionally, one of the GPTQ configurations could be adjusted for better accuracy.

tests/e2e/vLLM/recipes/non_uniform/recipe_nvfp4_fp8_mixed.yaml

tests/e2e/vLLM/recipes/non_uniform/recipe_gptq_awq.yaml

HDCharles

looks good to me, note, you can request reviews from contributors once your PR is ready,

dsikka

Thanks! Do you have any sample checkpoints from the recipes / configs?

Signed-off-by: saurabhaloneai <saurabhaloney85@gmail.com>

- Escape literal dots and add $ anchors to prevent unintended matches - Consolidate attention/gate/up proj patterns for readability Co-authored-by: gemini-code-assist[bot] Signed-off-by: saurabhaloneai <saurabhaloney85@gmail.com>

saurabhaloneai · 2026-02-03T04:04:16Z

@dsikka here are the checkpoints(I just uploaded on personal hf account):

NVFP4+FP8: https://huggingface.co/zenzen9/TinyLlama-1.1B-nvfp4-fp8-mixed
GPTQ+AWQ: https://huggingface.co/zenzen9/TinyLlama-1.1B-gptq-awq-mixed

dsikka

This is perfect - thank you for sharing the checkpoints as well! Great work

SUMMARY: Adds two new e2e test cases for non-uniform quantization examples. - NVFP4+FP8 Mixed: Uses NVFP4 for attention/gate/up layers, FP8 for down_proj - GPTQ+AWQ Mixed: Uses AWQ W4A16 for attention, GPTQ W8A8 for MLP Closes vllm-project#2315 Depends on vllm-project#2317 Files added: - tests/e2e/vLLM/configs/nvfp4_fp8_mixed.yaml - tests/e2e/vLLM/configs/multiple_modifiers_gptq_awq.yaml - tests/e2e/vLLM/recipes/non_uniform/recipe_nvfp4_fp8_mixed.yaml - tests/e2e/vLLM/recipes/non_uniform/recipe_gptq_awq.yaml TEST PLAN: Tested locally on GPU (RTX 6000 Pro Blackwell): 1. NVFP4+FP8 test: ```bash CADENCE=nightly TEST_DATA_FILE=tests/e2e/vLLM/configs/nvfp4_fp8_mixed.yaml pytest tests/e2e/vLLM/test_vllm.py -v ``` Result: Compression passed (154 layers) 2. GPTQ+AWQ test: ```bash CADENCE=nightly TEST_DATA_FILE=tests/e2e/vLLM/configs/multiple_modifiers_gptq_awq.yaml pytest tests/e2e/vLLM/test_vllm.py -v ``` Result: Compression passed for both modifiers --------- Signed-off-by: saurabhaloneai <saurabhaloney85@gmail.com> Co-authored-by: Dipika Sikka <dipikasikka1@gmail.com>

gemini-code-assist bot reviewed Jan 31, 2026

View reviewed changes

tests/e2e/vLLM/recipes/non_uniform/recipe_nvfp4_fp8_mixed.yaml Outdated Show resolved Hide resolved

tests/e2e/vLLM/recipes/non_uniform/recipe_nvfp4_fp8_mixed.yaml Outdated Show resolved Hide resolved

tests/e2e/vLLM/recipes/non_uniform/recipe_gptq_awq.yaml Show resolved Hide resolved

saurabhaloneai force-pushed the add-non-uniform-e2e-tests branch from 51a7099 to 4baf690 Compare February 1, 2026 08:40

HDCharles approved these changes Feb 2, 2026

View reviewed changes

HDCharles added ready When a PR is ready for review gptq For any PR / issue related to GPTQ support awq For any issue / PR related to AWQ support labels Feb 2, 2026

HDCharles requested review from brian-dellabetta, dsikka and kylesayrs February 2, 2026 16:48

dsikka reviewed Feb 2, 2026

View reviewed changes

saurabhaloneai added 2 commits February 3, 2026 08:25

Add e2e tests for non-uniform quantization examples

ad3e8f6

Signed-off-by: saurabhaloneai <saurabhaloney85@gmail.com>

saurabhaloneai force-pushed the add-non-uniform-e2e-tests branch from 75fab6f to 7962e7b Compare February 3, 2026 03:51

dsikka approved these changes Feb 3, 2026

View reviewed changes

Merge branch 'main' into add-non-uniform-e2e-tests

9008134

dsikka merged commit ea3bfa4 into vllm-project:main Feb 3, 2026
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add e2e tests for non-uniform quantization examples#2321

Add e2e tests for non-uniform quantization examples#2321
dsikka merged 3 commits intovllm-project:mainfrom
saurabhaloneai:add-non-uniform-e2e-tests

saurabhaloneai commented Jan 31, 2026

Uh oh!

github-actions bot commented Jan 31, 2026

Uh oh!

gemini-code-assist bot commented Jan 31, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HDCharles left a comment

Uh oh!

dsikka left a comment

Uh oh!

saurabhaloneai commented Feb 3, 2026 •

edited

Loading

Uh oh!

dsikka left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

saurabhaloneai commented Jan 31, 2026

Uh oh!

github-actions bot commented Jan 31, 2026

Uh oh!

gemini-code-assist bot commented Jan 31, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

HDCharles left a comment

Choose a reason for hiding this comment

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

saurabhaloneai commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dsikka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

saurabhaloneai commented Feb 3, 2026 •

edited

Loading