Skip to content

[Transform] QuIP Modifier #1648

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 58 commits into
base: main
Choose a base branch
from
Open

Conversation

kylesayrs
Copy link
Collaborator

@kylesayrs kylesayrs commented Jul 15, 2025

Purpose

  • Enable quip-style transforms

Prerequisites

Changes

  • Added quip_example.py to examples folder
    • As made clear in the disclaimer, this example requires minimum versions of compressed-tensors and transformers to run
  • Added QuIPModifier which handles the construction of a quip-style transform config

Testing

  • Added modifier serialization and correctness tests

Evaluation

Evaluation performed by @brian-dellabetta

Evals on Llama 3.2 1B with Quip (num_fewshot 8, limit 1000 to be compatible with results here) :

Strat gsm8k,strict gsm8k_llama,strict
FP16 .352 .323
Quip .348 .322
W4A16 .180 .017
Quip+W4A16 .213 .141

Follow Ups

  • Infer data free pipeline, even if a transform modifier is included
  • Modify example to use GPTQ once basic evaluation has been performed

kylesayrs and others added 30 commits June 23, 2025 19:34
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
brian-dellabetta and others added 6 commits July 17, 2025 14:16
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Brian Dellabetta <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
@brian-dellabetta brian-dellabetta force-pushed the kylesayrs/transform-quip-modifier branch from 7805321 to ac7dbcd Compare July 25, 2025 18:41
@brian-dellabetta
Copy link
Collaborator

Evals on Llama 3.2 1B with Quip (num_fewshot 8, limit 1000 to be compatible with results here) :

Strat gsm8k,strict gsm8k_llama,strict
FP16 .352 .323
Quip .348 .322
W4A16 .180 .017
Quip+W4A16 .213 .141

Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs kylesayrs changed the base branch from bdellabe/transform-modifier to main August 4, 2025 15:28
@kylesayrs kylesayrs marked this pull request as ready for review August 5, 2025 17:17
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beautiful

Signed-off-by: Kyle Sayers <[email protected]>
Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice - can we fix the example?

@@ -27,5 +28,9 @@ def __call__(
:param dataloader: loads data for calibration
:param dataset_args: dataset arguments relevant to pipelines
"""
# some ops are still performed on the model by modifiers
# we want those ops to occur on the GPU
dispatch_for_generation(model)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That means we can leverage more than one gpu for data free cases, including weight-only RTN schemes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically yes, although the weights are still calibrated in a synchronous for loop, so there's no speedup gained from the extra gpus

Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Signed-off-by: Kyle Sayers <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants