Enable custom dataset for AutoRound by yiliu30 · Pull Request #13 · yiliu30/llm-compressor-fork

yiliu30 · 2025-12-17T07:23:54Z

UT
figure out why attention_mask failed to parse
Double check the attention_mask order

Signed-off-by: yiliu30 <yi4.liu@intel.com>

github-actions · 2025-12-17T07:24:04Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

gemini-code-assist · 2025-12-17T07:24:11Z

Summary of Changes

Hello @yiliu30, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly improves the flexibility of the AutoRound quantization modifier by enabling it to work effectively with custom calibration datasets. The core enhancement involves modifying the AutoRoundModifier to correctly process and incorporate attention masks from the input data during the quantization procedure. This ensures that the algorithm can accurately handle padded sequences, which is crucial for diverse datasets. To illustrate this new capability, a comprehensive example script has been added, showcasing the use of AutoRound with a Llama 3 model and a Hugging Face dataset.

Highlights

Custom Dataset Support for AutoRound: The AutoRoundModifier has been enhanced to properly handle and utilize custom calibration datasets, specifically by integrating attention masks during the quantization process.
New Example Script: A new example script, llama3_example_custom_dataset.py, has been added to demonstrate how to apply AutoRound quantization to a Llama 3 model using a custom Hugging Face dataset.
Attention Mask Integration: The AutoRoundModifier now collects attention_mask tensors from the calibration dataset and passes them to the quantization logic, ensuring accurate processing of padded inputs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request enables the use of custom datasets with attention masks for AutoRound quantization by collecting attention masks from the calibration data. It also adds a new example demonstrating this functionality. The core logic change in AutoRoundModifier has a critical flaw where it consumes the calibration dataloader prematurely, which will cause the calibration process to fail. I've provided a detailed explanation of the issue and a recommended fix. Additionally, I've included some suggestions to improve the clarity and robustness of the new example file.

gemini-code-assist · 2025-12-17T07:26:12Z

src/llmcompressor/modifiers/autoround/base.py

+        for batch in state.data.calib:
+            if "attention_mask" in batch:
+                self._attention_mask_list.append(batch["attention_mask"])


This loop over state.data.calib will consume the DataLoader, leaving it empty for the actual calibration process that follows. This will likely cause calibration to fail or produce incorrect results as it will run on no data.

A more robust approach would be to collect the attention_mask during the calibration forward passes using a hook. This would avoid consuming the dataloader prematurely. You could add a forward pre-hook to state.model in the on_start method to capture the attention_mask from the keyword arguments of each batch as it's processed. This is the recommended fix.

Example of a hook-based implementation in on_start:

def attention_mask_capture_hook(module, *args, **kwargs): if "attention_mask" in kwargs and kwargs["attention_mask"] is not None: self._attention_mask_list.append(kwargs["attention_mask"]) self.register_hook( state.model, attention_mask_capture_hook, "forward_pre", with_kwargs=True )

This would require removing the added loop from on_initialize.

gemini-code-assist · 2025-12-17T07:26:13Z

examples/autoround/llama3_example_custom_dataset.py

+ds = ds.map(tokenize, remove_columns=ds.column_names)
+
+# Configure the quantization algorithm to run.
+#   * quantize the weights to 4 bit with GPTQ with a group size 128


The comment is misleading as it refers to GPTQ, but the AutoRoundModifier uses the AutoRound algorithm. This could cause confusion for users of this example.

Suggested change

# * quantize the weights to 4 bit with GPTQ with a group size 128

# * quantize the weights to 4 bit with AutoRound with a group size 128

gemini-code-assist · 2025-12-17T07:26:13Z

examples/autoround/llama3_example_custom_dataset.py

+print("==========================================\n\n")
+
+# Save to disk compressed.
+SAVE_DIR = model_id.rstrip("/").split("/")[-1] + "-W4A16-G128"


The save directory name is constructed with a hardcoded "-W4A16-G128". The G128 part, representing a group size of 128, is an implicit default of the W4A16 scheme and is not explicitly defined in the recipe. This makes the code brittle. If the default for the scheme changes in the future, this directory name will become misleading. For an example this might be acceptable, but it's worth noting.

Signed-off-by: yiliu30 <yi4.liu@intel.com>

yiliu30 · 2025-12-18T12:02:20Z

/gemini review

gemini-code-assist

Code Review

This pull request enables the use of custom datasets for AutoRound quantization, adding a new example script and modifying the core AutoRoundModifier. The changes correctly gather attention masks from calibration data. My review includes suggestions to improve the new example's clarity and robustness, such as removing redundant code and making file paths programmatic. I've also pointed out development artifacts like FIXME comments and breakpoint() calls in the core logic that should be cleaned up before merging.

gemini-code-assist · 2025-12-18T12:04:00Z

examples/autoround/llama3_example_custom_dataset.py

+
+
+# Use batched=True for efficiency
+ds = ds.map(fix_batch_if_needed)


The comment on line 57 suggests using batched=True for efficiency, but it's missing in this .map() call. The fix_batch_if_needed function is designed to work with batches. Please add batched=True for correctness and performance.

Suggested change

ds = ds.map(fix_batch_if_needed)

ds = ds.map(fix_batch_if_needed, batched=True)

gemini-code-assist · 2025-12-18T12:04:00Z

examples/autoround/llama3_example_custom_dataset.py

+model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
+model_id = "/models/Qwen3-0.6B-Base"


The variable model_id is assigned on line 10 and then immediately reassigned on line 11. The first assignment is redundant and can be removed to avoid confusion.

Suggested change

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"

model_id = "/models/Qwen3-0.6B-Base"

model_id = "/models/Qwen3-0.6B-Base"

gemini-code-assist · 2025-12-18T12:04:00Z

examples/autoround/llama3_example_custom_dataset.py

+print("==========================================\n\n")
+
+# Save to disk compressed.
+SAVE_DIR = model_id.rstrip("/").split("/")[-1] + "-W4A16-G128-AutoRound-Untrachat200k"


The SAVE_DIR is constructed with several hardcoded strings (-W4A16-G128-AutoRound-Untrachat200k). This makes the example brittle if parameters like the quantization scheme or dataset change. Consider constructing this string programmatically from the relevant variables (recipe, DATASET_ID, etc.) to improve maintainability and make the example more robust.

gemini-code-assist · 2025-12-18T12:04:00Z

src/llmcompressor/modifiers/autoround/base.py

        return ar_quant_scheme
+
+
+# FIXME: refine code


This line contains a FIXME comment. It's good practice to address such comments before merging, either by refining the code as suggested or by adding a more detailed comment explaining the issue and why it's being deferred (e.g., by creating a ticket and referencing it).

gemini-code-assist · 2025-12-18T12:04:00Z

src/llmcompressor/modifiers/autoround/base.py

+    if m.ndim == 2:
+        all_ones_rows = torch.all(m == 1, dim=1)  # [batch]
+        if torch.any(all_ones_rows):
+            # breakpoint()


A commented-out breakpoint() call is present here. Such debugging artifacts should be removed from the code before merging to maintain code cleanliness.

gemini-code-assist · 2025-12-18T12:04:00Z

src/llmcompressor/modifiers/autoround/base.py

+    batch['attention_mask'] is List[List[int]] when batched=True.
+    Convert to tensor, fix, then convert back to list to stay compatible with datasets.
+    """
+    # breakpoint()


A commented-out breakpoint() call is present here. Such debugging artifacts should be removed from the code before merging to maintain code cleanliness.

allow pass attn mask

8b62b3b

Signed-off-by: yiliu30 <yi4.liu@intel.com>

gemini-code-assist bot reviewed Dec 17, 2025

View reviewed changes

yiliu30 added 3 commits December 17, 2025 00:22

rename

46d64d3

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix

4428860

Signed-off-by: yiliu30 <yi4.liu@intel.com>

fix mask if needed

918a6b4

Signed-off-by: yiliu30 <yi4.liu@intel.com>

gemini-code-assist bot reviewed Dec 18, 2025

View reviewed changes

	# * quantize the weights to 4 bit with GPTQ with a group size 128
	# * quantize the weights to 4 bit with AutoRound with a group size 128



		# Use batched=True for efficiency
		ds = ds.map(fix_batch_if_needed)

	ds = ds.map(fix_batch_if_needed)
	ds = ds.map(fix_batch_if_needed, batched=True)

		model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
		model_id = "/models/Qwen3-0.6B-Base"

Conversation

yiliu30 commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 17, 2025

Uh oh!

gemini-code-assist bot commented Dec 17, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 17, 2025

Choose a reason for hiding this comment

Uh oh!

yiliu30 commented Dec 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yiliu30 commented Dec 17, 2025 •

edited

Loading