Skip to content

[Transforms] Update examples for R4 and transform_block_size option#1870

Merged
brian-dellabetta merged 5 commits intomainfrom
bdellabe/transforms-updates
Sep 30, 2025
Merged

[Transforms] Update examples for R4 and transform_block_size option#1870
brian-dellabetta merged 5 commits intomainfrom
bdellabe/transforms-updates

Conversation

@brian-dellabetta
Copy link
Copy Markdown
Collaborator

@brian-dellabetta brian-dellabetta commented Sep 26, 2025

SUMMARY:
Prerequisites:

This PR updates the SpinQuant and Quip examples to include transform_block_size and the latest R4 feature in SpinQuant. It also reverts the TransformScheme.block_size changes previously introduced into CT, and updated in Pr linked above. While block_size is a more appropriate name, head_dim has already landed in vllm, and it would be too much of a pain to change. Users will rarely create their own TransformScheme anyway.

TEST PLAN:

  • Both examples run and the saved model can be run in vllm, output is meaningful.
  • with prints, confirmed hadacore is used for QuIPModifier(rotations=["v", "u"], transform_block_size=64, transform_type="hadamard")
  • and dense gemm is used for QuIPModifier(rotations=["v", "u"], transform_block_size=64, transform_type="random-hadamard")

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @brian-dellabetta, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on enhancing the usability and consistency of the transformation examples and underlying API. It updates the QuIP and SpinQuant examples to demonstrate new features like transform_block_size and the R4 rotation, while also ensuring that the internal API aligns with external dependencies like vLLM by standardizing parameter names.

Highlights

  • Example Updates: The QuIP and SpinQuant examples have been updated to incorporate the transform_block_size parameter and the latest R4 rotation feature in SpinQuant.
  • API Consistency: The TransformScheme.block_size parameter has been reverted to head_dim across relevant modules to maintain consistency with vLLM's existing terminology, despite block_size being a more semantically appropriate name.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the SpinQuant and QuIP examples to use the transform_block_size option and adds support for R4 rotation in the SpinQuant example. The core change is the renaming of the block_size parameter to head_dim in the internal TransformScheme class to maintain compatibility with vllm, as explained in the description. The user-facing API in the modifiers correctly retains the transform_block_size parameter. The changes are consistent, and the examples are updated accordingly. I've identified a couple of areas where validation could be improved to prevent potential runtime errors.

Copy link
Copy Markdown
Collaborator

@shanjiaz shanjiaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Copy link
Copy Markdown
Collaborator

@fynnsu fynnsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Added a question about comment below.

@brian-dellabetta brian-dellabetta changed the title [Transforms] Update examples for R3 and transform_block_size option [Transforms] Update examples for R4 and transform_block_size option Sep 29, 2025
@brian-dellabetta brian-dellabetta added the ready When a PR is ready for review label Sep 30, 2025
@brian-dellabetta brian-dellabetta enabled auto-merge (squash) September 30, 2025 15:26
@brian-dellabetta brian-dellabetta merged commit 4c95fd2 into main Sep 30, 2025
9 of 10 checks passed
@brian-dellabetta brian-dellabetta deleted the bdellabe/transforms-updates branch September 30, 2025 15:39
brian-dellabetta added a commit that referenced this pull request Oct 1, 2025
…1883)

SUMMARY:
Quick follow-up to recently merged
* #1870 

Updates our `examples/transform` scripts to 
- [x] default to `transform_type="hadamard"`, which is preferred so that
vllm hadacore kernel is used
- [x] default to `tranform_block_size=128`, which is preferred for
group-size 128 schemes like W4A16


TEST PLAN:
Previously confirmed that hadacore kernel was being invoked for
`transform_type="hadamard"`

---------

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Rahul Tuli <rtuli@redhat.com>
cajeonrh pushed a commit to cajeonrh/llm-compressor that referenced this pull request Oct 2, 2025
…vllm-project#1870)

SUMMARY:
Prerequisites:
- [x] vllm-project/compressed-tensors#472

This PR updates the SpinQuant and Quip examples to include
`transform_block_size` and the latest R4 feature in SpinQuant. It also
reverts the `TransformScheme.block_size` changes previously introduced
into CT, and updated in Pr linked above. While `block_size` is a more
appropriate name, `head_dim` has already landed in vllm, and it would be
too much of a pain to change. Users will rarely create their own
`TransformScheme` anyway.

TEST PLAN:
- [x] Both examples run and the saved model can be run in vllm, output
is meaningful.
- [x] with prints, confirmed hadacore is used for
`QuIPModifier(rotations=["v", "u"], transform_block_size=64,
transform_type="hadamard")`
- [x] and dense gemm is used for `QuIPModifier(rotations=["v", "u"],
transform_block_size=64, transform_type="random-hadamard")`

---------

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Cassie Jeon <cajeon@redhat.com>
cajeonrh pushed a commit to cajeonrh/llm-compressor that referenced this pull request Oct 2, 2025
…llm-project#1883)

SUMMARY:
Quick follow-up to recently merged
* vllm-project#1870

Updates our `examples/transform` scripts to
- [x] default to `transform_type="hadamard"`, which is preferred so that
vllm hadacore kernel is used
- [x] default to `tranform_block_size=128`, which is preferred for
group-size 128 schemes like W4A16

TEST PLAN:
Previously confirmed that hadacore kernel was being invoked for
`transform_type="hadamard"`

---------

Signed-off-by: Brian Dellabetta <bdellabe@redhat.com>
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Rahul Tuli <rtuli@redhat.com>
Signed-off-by: Cassie Jeon <cajeon@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready When a PR is ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants