Skip to content

Enable merge queue support in CI workflows#2432

Closed
dsikka wants to merge 5 commits intomainfrom
enable-merge-queue
Closed

Enable merge queue support in CI workflows#2432
dsikka wants to merge 5 commits intomainfrom
enable-merge-queue

Conversation

@dsikka
Copy link
Collaborator

@dsikka dsikka commented Mar 3, 2026

Summary

  • Add merge_group trigger to all required status check workflows to enable GitHub merge queue functionality
  • This allows PRs to be automatically merged via the merge queue without the constant rebase treadmill

Changes

  • ✅ Added merge_group trigger to test-check.yaml (base/pytorch tests)
  • ✅ Added merge_group trigger to test-check-transformers.yaml
  • ✅ Added merge_group trigger to quality-check.yaml
  • ✅ Added merge_group trigger to linkcheck.yml
  • ✅ Added merge_group trigger to ready-label-check.yaml with auto-pass logic

Why the changes?

When merge queue is enabled in branch protection settings, GitHub creates temporary branches with the pattern gh-readonly-queue/{base}/{pr}-{sha} to test PRs. These workflows need to trigger on merge_group events for the queue to function.

Special note on ready-label-check: Added conditional logic to auto-pass when triggered by merge queue, since merge queue branches don't have PR labels attached.

Next steps after merge

  1. Go to Settings → Branches → Branch protection rules for main
  2. Enable "Require merge queue"
  3. PRs can then use "Add to merge queue" instead of direct merging

🤖 Generated with Claude Code

dsikka and others added 5 commits March 3, 2026 15:33
Add merge_group trigger to all required status check workflows to
support GitHub's merge queue feature. This enables automatic PR
merging without the rebase treadmill.

Changes:
- Add merge_group trigger to test-check.yaml (base/pytorch tests)
- Add merge_group trigger to test-check-transformers.yaml
- Add merge_group trigger to quality-check.yaml
- Add merge_group trigger to linkcheck.yml
- Add merge_group trigger to ready-label-check.yaml with auto-pass
  logic (merge queue branches don't have PR labels)

These workflows are all required status checks and must run on
merge queue branches for the queue to function properly.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the llmcompressor framework by integrating comprehensive support for the Qwen3.5 Mixture-of-Experts (MoE) model. It introduces specialized calibration mechanisms for MoE sparse blocks, which are essential for achieving optimal quantization results. The changes also include practical example scripts that showcase how to apply both W4A4 FP4 and W8A8 FP8 quantization to the Qwen3.5 MoE model, alongside minor adjustments to internal utility functions to support these new capabilities.

Highlights

  • Qwen3.5 MoE Quantization Support: Added new modules and examples to support quantization for the Qwen3.5 Mixture-of-Experts (MoE) model, including specific calibration logic for its sparse MoE blocks.
  • New Quantization Examples: Introduced two new example scripts demonstrating W4A4 FP4 and W8A8 FP8 quantization schemes for the Qwen3.5 MoE model using the llmcompressor framework.
  • MoE Calibration Logic: Implemented a specialized calibration module, CalibrateQwen3_5MoeTextSparseMoeBlock, which ensures all tokens are sent to all experts during calibration, crucial for effective quantization of MoE architectures.
  • Utility Function Updates: Adjusted internal utility functions related to PyTorch model initialization and the handling of no_split_modules to align with new model structures or library updates.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • examples/quantization_w4a4_fp4/qwen3_5_moe.py
    • Added a new example script for W4A4 FP4 quantization of the Qwen3.5 MoE model.
  • examples/quantization_w8a8_fp8/qwen3_5_moe.py
    • Added a new example script for W8A8 FP8 dynamic quantization of the Qwen3.5 MoE model.
  • src/llmcompressor/modeling/init.py
    • Imported the new CalibrateQwen3_5MoeTextSparseMoeBlock for Qwen3.5 MoE model support.
  • src/llmcompressor/modeling/qwen3_5_vl_moe.py
    • Added a new module defining CalibrateQwen3_5MoeTextSparseMoeBlock and SequentialQwen3VLMoeTextExperts for Qwen3.5 MoE calibration.
  • src/llmcompressor/utils/dev.py
    • Removed the import of TORCH_INIT_FUNCTIONS from transformers.modeling_utils.
    • Re-defined TORCH_INIT_FUNCTIONS locally using torch.nn.init functions.
  • src/llmcompressor/utils/pytorch/module.py
    • Updated the method for accessing no_split_modules from model._get_no_split_modules("auto") to model._no_split_modules.
Ignored Files
  • Ignored by pattern: .github/workflows/** (5)
    • .github/workflows/linkcheck.yml
    • .github/workflows/quality-check.yaml
    • .github/workflows/ready-label-check.yaml
    • .github/workflows/test-check-transformers.yaml
    • .github/workflows/test-check.yaml
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the documentation Improvements or additions to documentation label Mar 3, 2026
@dsikka
Copy link
Collaborator Author

dsikka commented Mar 3, 2026

Closing to recreate with clean branch (contained unrelated commits)

@dsikka dsikka closed this Mar 3, 2026
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for quantizing the Qwen3.5-MoE model, including new calibration logic and example scripts. The changes to handle recent updates in the transformers library are also included. My review focuses on improving code clarity and correctness in the new example scripts and model implementation. A critical point to address is that the pull request title and description are misleading, as they refer to CI workflow changes rather than the actual model support being added. This should be corrected for repository history clarity.

Comment on lines +37 to +47
messgages = []
for message in example["messages"]:
messgages.append(
{
"role": message["role"],
"content": [{"type": "text", "text": message["content"]}],
}
)

return processor.apply_chat_template(
messgages,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the variable name messgages. It should be messages. This typo appears multiple times within the preprocess_function.

Suggested change
messgages = []
for message in example["messages"]:
messgages.append(
{
"role": message["role"],
"content": [{"type": "text", "text": message["content"]}],
}
)
return processor.apply_chat_template(
messgages,
messages = []
for message in example["messages"]:
messages.append(
{
"role": message["role"],
"content": [{"type": "text", "text": message["content"]}],
}
)
return processor.apply_chat_template(
messages,

moe_calibrate_all_experts=True)

# Save to disk in compressed-tensors format.
SAVE_DIR = "/raid/engine/dsikka/" + "Qwen3.5-397B-A17B" + "-NVFP4"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using + for string concatenation to build a file path can be fragile and less readable. It's better to define the full path as a single string literal for clarity, or use os.path.join for better portability (which would require importing os).

Suggested change
SAVE_DIR = "/raid/engine/dsikka/" + "Qwen3.5-397B-A17B" + "-NVFP4"
SAVE_DIR = "/raid/engine/dsikka/Qwen3.5-397B-A17B-NVFP4"

oneshot(model=model, recipe=recipe)

# Save to disk in compressed-tensors format.
SAVE_DIR = "/raid/engine/dsikka/" + "Qwen3.5-397B-A17B" + "-FP8-Dynamic-NoLinearAttn"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using + for string concatenation to build a file path can be fragile and less readable. It's better to define the full path as a single string literal for clarity, or use os.path.join for better portability (which would require importing os).

Suggested change
SAVE_DIR = "/raid/engine/dsikka/" + "Qwen3.5-397B-A17B" + "-FP8-Dynamic-NoLinearAttn"
SAVE_DIR = "/raid/engine/dsikka/Qwen3.5-397B-A17B-FP8-Dynamic-NoLinearAttn"

return original


class SequentialQwen3VLMoeTextExperts(torch.nn.ModuleList):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a naming inconsistency. This class name SequentialQwen3VLMoeTextExperts and the filename qwen3_5_vl_moe.py suggest a Vision-Language model. However, the code in this file seems to target the non-VL Qwen3_5Moe model. This is likely a copy-paste artifact. For clarity, this class should be renamed to SequentialQwen3_5MoeTextExperts. You'll also need to update its usage in CalibrateQwen3_5MoeTextSparseMoeBlock.__init__ on line 35.

Suggested change
class SequentialQwen3VLMoeTextExperts(torch.nn.ModuleList):
class SequentialQwen3_5MoeTextExperts(torch.nn.ModuleList):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant