Enable merge queue support in CI workflows by dsikka · Pull Request #2432 · vllm-project/llm-compressor

dsikka · 2026-03-03T16:23:37Z

Summary

Add merge_group trigger to all required status check workflows to enable GitHub merge queue functionality
This allows PRs to be automatically merged via the merge queue without the constant rebase treadmill

Changes

✅ Added merge_group trigger to test-check.yaml (base/pytorch tests)
✅ Added merge_group trigger to test-check-transformers.yaml
✅ Added merge_group trigger to quality-check.yaml
✅ Added merge_group trigger to linkcheck.yml
✅ Added merge_group trigger to ready-label-check.yaml with auto-pass logic

Why the changes?

When merge queue is enabled in branch protection settings, GitHub creates temporary branches with the pattern gh-readonly-queue/{base}/{pr}-{sha} to test PRs. These workflows need to trigger on merge_group events for the queue to function.

Special note on ready-label-check: Added conditional logic to auto-pass when triggered by merge queue, since merge queue branches don't have PR labels attached.

Next steps after merge

Go to Settings → Branches → Branch protection rules for main
Enable "Require merge queue"
PRs can then use "Add to merge queue" instead of direct merging

🤖 Generated with Claude Code

Add merge_group trigger to all required status check workflows to support GitHub's merge queue feature. This enables automatic PR merging without the rebase treadmill. Changes: - Add merge_group trigger to test-check.yaml (base/pytorch tests) - Add merge_group trigger to test-check-transformers.yaml - Add merge_group trigger to quality-check.yaml - Add merge_group trigger to linkcheck.yml - Add merge_group trigger to ready-label-check.yaml with auto-pass logic (merge queue branches don't have PR labels) These workflows are all required status checks and must run on merge queue branches for the queue to function properly. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

github-actions · 2026-03-03T16:23:47Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2026-03-03T16:24:06Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the llmcompressor framework by integrating comprehensive support for the Qwen3.5 Mixture-of-Experts (MoE) model. It introduces specialized calibration mechanisms for MoE sparse blocks, which are essential for achieving optimal quantization results. The changes also include practical example scripts that showcase how to apply both W4A4 FP4 and W8A8 FP8 quantization to the Qwen3.5 MoE model, alongside minor adjustments to internal utility functions to support these new capabilities.

Highlights

Qwen3.5 MoE Quantization Support: Added new modules and examples to support quantization for the Qwen3.5 Mixture-of-Experts (MoE) model, including specific calibration logic for its sparse MoE blocks.
New Quantization Examples: Introduced two new example scripts demonstrating W4A4 FP4 and W8A8 FP8 quantization schemes for the Qwen3.5 MoE model using the llmcompressor framework.
MoE Calibration Logic: Implemented a specialized calibration module, CalibrateQwen3_5MoeTextSparseMoeBlock, which ensures all tokens are sent to all experts during calibration, crucial for effective quantization of MoE architectures.
Utility Function Updates: Adjusted internal utility functions related to PyTorch model initialization and the handling of no_split_modules to align with new model structures or library updates.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

examples/quantization_w4a4_fp4/qwen3_5_moe.py
- Added a new example script for W4A4 FP4 quantization of the Qwen3.5 MoE model.
examples/quantization_w8a8_fp8/qwen3_5_moe.py
- Added a new example script for W8A8 FP8 dynamic quantization of the Qwen3.5 MoE model.
src/llmcompressor/modeling/init.py
- Imported the new CalibrateQwen3_5MoeTextSparseMoeBlock for Qwen3.5 MoE model support.
src/llmcompressor/modeling/qwen3_5_vl_moe.py
- Added a new module defining CalibrateQwen3_5MoeTextSparseMoeBlock and SequentialQwen3VLMoeTextExperts for Qwen3.5 MoE calibration.
src/llmcompressor/utils/dev.py
- Removed the import of TORCH_INIT_FUNCTIONS from transformers.modeling_utils.
- Re-defined TORCH_INIT_FUNCTIONS locally using torch.nn.init functions.
src/llmcompressor/utils/pytorch/module.py
- Updated the method for accessing no_split_modules from model._get_no_split_modules("auto") to model._no_split_modules.

Ignored Files

Ignored by pattern: .github/workflows/** (5)
- .github/workflows/linkcheck.yml
- .github/workflows/quality-check.yaml
- .github/workflows/ready-label-check.yaml
- .github/workflows/test-check-transformers.yaml
- .github/workflows/test-check.yaml

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

dsikka · 2026-03-03T16:25:18Z

Closing to recreate with clean branch (contained unrelated commits)

gemini-code-assist

Code Review

This pull request introduces support for quantizing the Qwen3.5-MoE model, including new calibration logic and example scripts. The changes to handle recent updates in the transformers library are also included. My review focuses on improving code clarity and correctness in the new example scripts and model implementation. A critical point to address is that the pull request title and description are misleading, as they refer to CI workflow changes rather than the actual model support being added. This should be corrected for repository history clarity.

gemini-code-assist · 2026-03-03T16:26:07Z

examples/quantization_w4a4_fp4/qwen3_5_moe.py

+    messgages = []
+    for message in example["messages"]:
+        messgages.append(
+            {
+                "role": message["role"],
+                "content": [{"type": "text", "text": message["content"]}],
+            }
+        )
+
+    return processor.apply_chat_template(
+        messgages,


There is a typo in the variable name messgages. It should be messages. This typo appears multiple times within the preprocess_function.

Suggested change

messgages = []

for message in example["messages"]:

messgages.append(

{

"role": message["role"],

"content": [{"type": "text", "text": message["content"]}],

}

)

return processor.apply_chat_template(

messgages,

messages = []

for message in example["messages"]:

messages.append(

{

"role": message["role"],

"content": [{"type": "text", "text": message["content"]}],

}

)

return processor.apply_chat_template(

messages,

gemini-code-assist · 2026-03-03T16:26:07Z

examples/quantization_w4a4_fp4/qwen3_5_moe.py

+    moe_calibrate_all_experts=True)
+
+# Save to disk in compressed-tensors format.
+SAVE_DIR = "/raid/engine/dsikka/" + "Qwen3.5-397B-A17B" + "-NVFP4"


Using + for string concatenation to build a file path can be fragile and less readable. It's better to define the full path as a single string literal for clarity, or use os.path.join for better portability (which would require importing os).

Suggested change

SAVE_DIR = "/raid/engine/dsikka/" + "Qwen3.5-397B-A17B" + "-NVFP4"

SAVE_DIR = "/raid/engine/dsikka/Qwen3.5-397B-A17B-NVFP4"

gemini-code-assist · 2026-03-03T16:26:07Z

examples/quantization_w8a8_fp8/qwen3_5_moe.py

+oneshot(model=model, recipe=recipe)
+
+# Save to disk in compressed-tensors format.
+SAVE_DIR = "/raid/engine/dsikka/" + "Qwen3.5-397B-A17B" + "-FP8-Dynamic-NoLinearAttn"


Using + for string concatenation to build a file path can be fragile and less readable. It's better to define the full path as a single string literal for clarity, or use os.path.join for better portability (which would require importing os).

Suggested change

SAVE_DIR = "/raid/engine/dsikka/" + "Qwen3.5-397B-A17B" + "-FP8-Dynamic-NoLinearAttn"

SAVE_DIR = "/raid/engine/dsikka/Qwen3.5-397B-A17B-FP8-Dynamic-NoLinearAttn"

gemini-code-assist · 2026-03-03T16:26:07Z

src/llmcompressor/modeling/qwen3_5_vl_moe.py

+        return original
+
+
+class SequentialQwen3VLMoeTextExperts(torch.nn.ModuleList):


There's a naming inconsistency. This class name SequentialQwen3VLMoeTextExperts and the filename qwen3_5_vl_moe.py suggest a Vision-Language model. However, the code in this file seems to target the non-VL Qwen3_5Moe model. This is likely a copy-paste artifact. For clarity, this class should be renamed to SequentialQwen3_5MoeTextExperts. You'll also need to update its usage in CalibrateQwen3_5MoeTextSparseMoeBlock.__init__ on line 35.

Suggested change

class SequentialQwen3VLMoeTextExperts(torch.nn.ModuleList):

class SequentialQwen3_5MoeTextExperts(torch.nn.ModuleList):

dsikka and others added 5 commits March 3, 2026 15:33

add support

a362cfc

update

7474c7d

update

55593b6

update

a215aae

dsikka requested review from HDCharles, brian-dellabetta and kylesayrs as code owners March 3, 2026 16:23

mergify bot added the documentation Improvements or additions to documentation label Mar 3, 2026

dsikka closed this Mar 3, 2026

gemini-code-assist bot reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable merge queue support in CI workflows#2432

Enable merge queue support in CI workflows#2432
dsikka wants to merge 5 commits intomainfrom
enable-merge-queue

dsikka commented Mar 3, 2026

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Uh oh!

dsikka commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

gemini-code-assist bot Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	SAVE_DIR = "/raid/engine/dsikka/" + "Qwen3.5-397B-A17B" + "-NVFP4"
	SAVE_DIR = "/raid/engine/dsikka/Qwen3.5-397B-A17B-NVFP4"

		return original


		class SequentialQwen3VLMoeTextExperts(torch.nn.ModuleList):

	class SequentialQwen3VLMoeTextExperts(torch.nn.ModuleList):
	class SequentialQwen3_5MoeTextExperts(torch.nn.ModuleList):

Conversation

dsikka commented Mar 3, 2026

Summary

Changes

Why the changes?

Next steps after merge

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

gemini-code-assist bot commented Mar 3, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

dsikka commented Mar 3, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant