-
Notifications
You must be signed in to change notification settings - Fork 1
[Self-Review] Enhance Autoround to support multiple cards tuning #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
bc97d48
19ab4f2
9ba113c
646982a
1050335
50e6682
d139071
a0affbd
17ba9f5
cd943cd
8338ed5
56515af
ad6c1c0
09a72c0
af112bd
ec98118
aa50449
c5eae60
2d482fc
17b7e45
7a9b3cd
4f45b17
0fac601
0f7a990
3f25fd1
5f6c8db
58ef017
c9ea99c
d2a7c92
d48c3d6
2427825
5a29932
6264c59
6bb7905
993a68e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,60 @@ | ||||||||||
| from auto_round.calib_dataset import get_dataset | ||||||||||
| from transformers import AutoModelForCausalLM, AutoTokenizer | ||||||||||
|
|
||||||||||
| from llmcompressor import oneshot | ||||||||||
| from llmcompressor.modifiers.autoround import AutoRoundModifier | ||||||||||
| from llmcompressor.utils import dispatch_for_generation | ||||||||||
|
|
||||||||||
| # Select model and load it. | ||||||||||
| model_id = "meta-llama/Meta-Llama-3-8B-Instruct" | ||||||||||
| model_id = "/storage/yiliu7/unsloth/DeepSeek-R1-BF16" | ||||||||||
| # model_id = "/storage/yiliu7/deepseek-ai/DeepSeek-V2-Lite-Chat/" | ||||||||||
|
||||||||||
| model_id = "meta-llama/Meta-Llama-3-8B-Instruct" | |
| model_id = "/storage/yiliu7/unsloth/DeepSeek-R1-BF16" | |
| # model_id = "/storage/yiliu7/deepseek-ai/DeepSeek-V2-Lite-Chat/" | |
| model_id = "deepseek-ai/DeepSeek-V2-Lite-Chat" |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using trust_remote_code=True can introduce a security vulnerability if the model repository contains malicious code. It is crucial to warn users about this risk, especially in an example script that they might copy and run. Please add a comment explaining that users should only enable this if they trust the source of the model.
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,71 @@ | ||||||||||||||||||||||||||||
| from auto_round.calib_dataset import get_dataset | ||||||||||||||||||||||||||||
| from transformers import AutoModelForCausalLM, AutoTokenizer | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| from llmcompressor import oneshot | ||||||||||||||||||||||||||||
| from llmcompressor.modifiers.autoround import AutoRoundModifier | ||||||||||||||||||||||||||||
| from llmcompressor.utils import dispatch_for_generation | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| # Select model and load it. | ||||||||||||||||||||||||||||
| model_id = "Qwen/Qwen3-30B-A3B" | ||||||||||||||||||||||||||||
| # model_id = "/storage/yiliu7/Qwen/Qwen3-30B-A3B" | ||||||||||||||||||||||||||||
| # model_id = "/storage/yiliu7/Qwen/Qwen2.5-0.5B/" | ||||||||||||||||||||||||||||
| model_id = "/storage/yiliu7/Qwen/Qwen3-235B-A22B/" | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
| model_id = "Qwen/Qwen3-30B-A3B" | |
| # model_id = "/storage/yiliu7/Qwen/Qwen3-30B-A3B" | |
| # model_id = "/storage/yiliu7/Qwen/Qwen2.5-0.5B/" | |
| model_id = "/storage/yiliu7/Qwen/Qwen3-235B-A22B/" | |
| model_id = "Qwen/Qwen1.5-0.5B-Chat" |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The model_id is hardcoded to a local path, which makes this example not runnable for other users. It's better to default to a model from the Hugging Face Hub and provide the local path as a commented-out alternative.
| model_id = "Qwen/Qwen3-30B-A3B" | |
| # model_id = "/storage/yiliu7/Qwen/Qwen3-30B-A3B" | |
| # model_id = "/storage/yiliu7/Qwen/Qwen2.5-0.5B/" | |
| model_id = "/storage/yiliu7/Qwen/Qwen3-235B-A22B/" | |
| model_id = "Qwen/Qwen3-30B-A3B" | |
| # model_id = "/storage/yiliu7/Qwen/Qwen3-30B-A3B" | |
| # model_id = "/storage/yiliu7/Qwen/Qwen2.5-0.5B/" | |
| # model_id = "/storage/yiliu7/Qwen/Qwen3-235B-A22B/" |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SAVE_DIR is constructed using a hardcoded absolute path. This will cause the script to fail for any user who does not have the /storage/yiliu7/ directory. The output directory should be a relative path to make the example portable.
SAVE_DIR = model_id.rstrip("/").split("/")[-1] + "-W4A16-G128-AutoRound"| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -1,6 +1,8 @@ | ||||||
| from contextlib import contextmanager | ||||||
| from typing import Dict, List, Optional, Tuple, Union | ||||||
|
|
||||||
| import torch | ||||||
| from accelerate.hooks import add_hook_to_module, remove_hook_from_submodules | ||||||
| from auto_round import AutoRound | ||||||
| from auto_round.schemes import QuantizationScheme as ARQuantizationScheme | ||||||
| from compressed_tensors.quantization import ( | ||||||
|
|
@@ -54,6 +56,36 @@ def _wrap_decoding_layer(layer: torch.nn.Module) -> _PretrainModelWrapper: | |||||
| return wrapped_model | ||||||
|
|
||||||
|
|
||||||
| import torch.nn as nn | ||||||
yiliu30 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
|
||||||
|
|
||||||
| @contextmanager | ||||||
| def suspend_accelerate_hooks(model: nn.Module): | ||||||
| """ | ||||||
| Context manager to temporarily detach Accelerate hooks (e.g., offloading, | ||||||
| casting) and automatically restore them upon exit. | ||||||
| """ | ||||||
| saved_hooks = {} | ||||||
|
|
||||||
| # 1. Capture existing hooks | ||||||
| for _, module in model.named_modules(): | ||||||
| if hasattr(module, "_hf_hook"): | ||||||
| saved_hooks[module] = module._hf_hook | ||||||
|
|
||||||
| # 2. Detach hooks for the duration of the context | ||||||
| remove_hook_from_submodules(model) | ||||||
|
|
||||||
| try: | ||||||
| yield | ||||||
| finally: | ||||||
| # 3. Ensure a clean slate (remove any hooks added inside the block) | ||||||
| remove_hook_from_submodules(model) | ||||||
|
|
||||||
| # 4. Re-attach the original hooks | ||||||
| for module, hook in saved_hooks.items(): | ||||||
| add_hook_to_module(module, hook, append=True) | ||||||
|
|
||||||
|
|
||||||
| class AutoRoundModifier(Modifier, QuantizationMixin): | ||||||
| """ | ||||||
| Implements the AutoRound algorithm from https://aclanthology.org/2024.findings-emnlp.662.pdf. | ||||||
|
|
@@ -110,6 +142,7 @@ class AutoRoundModifier(Modifier, QuantizationMixin): | |||||
| iters: int = 200 | ||||||
| enable_torch_compile: bool = True | ||||||
| batch_size: int = 8 | ||||||
| device_map: str = "0" | ||||||
|
||||||
| device_map: str = "0" | |
| device_map: str = "auto" |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment # Leave offload for LLMC is now misleading since auto_offload is set to True. With the addition of suspend_accelerate_hooks, it seems the intention is now to use auto_round's internal offloading. The comment should be updated to reflect this change in behavior.
| # Leave offload for LLMC | |
| # Use auto_round's internal offloading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
model_idis hardcoded to a local path, which makes this example not portable or runnable for other users. It's better to default to a model identifier from the Hugging Face Hub and provide the local path as a commented-out alternative for users who wish to use a local model.