Skip to content

Enable custom dataset for AutoRound#13

Open
yiliu30 wants to merge 4 commits intomainfrom
support-custom-dataset
Open

Enable custom dataset for AutoRound#13
yiliu30 wants to merge 4 commits intomainfrom
support-custom-dataset

Conversation

@yiliu30
Copy link
Owner

@yiliu30 yiliu30 commented Dec 17, 2025

  • UT
  • figure out why attention_mask failed to parse
  • Double check the attention_mask order

Signed-off-by: yiliu30 <yi4.liu@intel.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@gemini-code-assist
Copy link

Summary of Changes

Hello @yiliu30, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the flexibility of the AutoRound quantization modifier by enabling it to work effectively with custom calibration datasets. The core enhancement involves modifying the AutoRoundModifier to correctly process and incorporate attention masks from the input data during the quantization procedure. This ensures that the algorithm can accurately handle padded sequences, which is crucial for diverse datasets. To illustrate this new capability, a comprehensive example script has been added, showcasing the use of AutoRound with a Llama 3 model and a Hugging Face dataset.

Highlights

  • Custom Dataset Support for AutoRound: The AutoRoundModifier has been enhanced to properly handle and utilize custom calibration datasets, specifically by integrating attention masks during the quantization process.
  • New Example Script: A new example script, llama3_example_custom_dataset.py, has been added to demonstrate how to apply AutoRound quantization to a Llama 3 model using a custom Hugging Face dataset.
  • Attention Mask Integration: The AutoRoundModifier now collects attention_mask tensors from the calibration dataset and passes them to the quantization logic, ensuring accurate processing of padded inputs.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables the use of custom datasets with attention masks for AutoRound quantization by collecting attention masks from the calibration data. It also adds a new example demonstrating this functionality. The core logic change in AutoRoundModifier has a critical flaw where it consumes the calibration dataloader prematurely, which will cause the calibration process to fail. I've provided a detailed explanation of the issue and a recommended fix. Additionally, I've included some suggestions to improve the clarity and robustness of the new example file.

Comment on lines +138 to +140
for batch in state.data.calib:
if "attention_mask" in batch:
self._attention_mask_list.append(batch["attention_mask"])

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This loop over state.data.calib will consume the DataLoader, leaving it empty for the actual calibration process that follows. This will likely cause calibration to fail or produce incorrect results as it will run on no data.

A more robust approach would be to collect the attention_mask during the calibration forward passes using a hook. This would avoid consuming the dataloader prematurely. You could add a forward pre-hook to state.model in the on_start method to capture the attention_mask from the keyword arguments of each batch as it's processed. This is the recommended fix.

Example of a hook-based implementation in on_start:

def attention_mask_capture_hook(module, *args, **kwargs):
    if "attention_mask" in kwargs and kwargs["attention_mask"] is not None:
        self._attention_mask_list.append(kwargs["attention_mask"])

self.register_hook(
    state.model, attention_mask_capture_hook, "forward_pre", with_kwargs=True
)

This would require removing the added loop from on_initialize.

ds = ds.map(tokenize, remove_columns=ds.column_names)

# Configure the quantization algorithm to run.
# * quantize the weights to 4 bit with GPTQ with a group size 128

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment is misleading as it refers to GPTQ, but the AutoRoundModifier uses the AutoRound algorithm. This could cause confusion for users of this example.

Suggested change
# * quantize the weights to 4 bit with GPTQ with a group size 128
# * quantize the weights to 4 bit with AutoRound with a group size 128

print("==========================================\n\n")

# Save to disk compressed.
SAVE_DIR = model_id.rstrip("/").split("/")[-1] + "-W4A16-G128"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The save directory name is constructed with a hardcoded "-W4A16-G128". The G128 part, representing a group size of 128, is an implicit default of the W4A16 scheme and is not explicitly defined in the recipe. This makes the code brittle. If the default for the scheme changes in the future, this directory name will become misleading. For an example this might be acceptable, but it's worth noting.

Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
Signed-off-by: yiliu30 <yi4.liu@intel.com>
@yiliu30
Copy link
Owner Author

yiliu30 commented Dec 18, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables the use of custom datasets for AutoRound quantization, adding a new example script and modifying the core AutoRoundModifier. The changes correctly gather attention masks from calibration data. My review includes suggestions to improve the new example's clarity and robustness, such as removing redundant code and making file paths programmatic. I've also pointed out development artifacts like FIXME comments and breakpoint() calls in the core logic that should be cleaned up before merging.



# Use batched=True for efficiency
ds = ds.map(fix_batch_if_needed)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The comment on line 57 suggests using batched=True for efficiency, but it's missing in this .map() call. The fix_batch_if_needed function is designed to work with batches. Please add batched=True for correctness and performance.

Suggested change
ds = ds.map(fix_batch_if_needed)
ds = ds.map(fix_batch_if_needed, batched=True)

Comment on lines +10 to +11
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
model_id = "/models/Qwen3-0.6B-Base"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The variable model_id is assigned on line 10 and then immediately reassigned on line 11. The first assignment is redundant and can be removed to avoid confusion.

Suggested change
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
model_id = "/models/Qwen3-0.6B-Base"
model_id = "/models/Qwen3-0.6B-Base"

print("==========================================\n\n")

# Save to disk compressed.
SAVE_DIR = model_id.rstrip("/").split("/")[-1] + "-W4A16-G128-AutoRound-Untrachat200k"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The SAVE_DIR is constructed with several hardcoded strings (-W4A16-G128-AutoRound-Untrachat200k). This makes the example brittle if parameters like the quantization scheme or dataset change. Consider constructing this string programmatically from the relevant variables (recipe, DATASET_ID, etc.) to improve maintainability and make the example more robust.

return ar_quant_scheme


# FIXME: refine code

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This line contains a FIXME comment. It's good practice to address such comments before merging, either by refining the code as suggested or by adding a more detailed comment explaining the issue and why it's being deferred (e.g., by creating a ticket and referencing it).

if m.ndim == 2:
all_ones_rows = torch.all(m == 1, dim=1) # [batch]
if torch.any(all_ones_rows):
# breakpoint()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A commented-out breakpoint() call is present here. Such debugging artifacts should be removed from the code before merging to maintain code cleanliness.

batch['attention_mask'] is List[List[int]] when batched=True.
Convert to tensor, fix, then convert back to list to stay compatible with datasets.
"""
# breakpoint()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A commented-out breakpoint() call is present here. Such debugging artifacts should be removed from the code before merging to maintain code cleanliness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant