-
Notifications
You must be signed in to change notification settings - Fork 93
[Bug]: preak_ram is high in the first time quantization #1619
Copy link
Copy link
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Problem Description
With hashed dataset, the peak_ram < 3GB, without hash, the peak_ram is about 8GB
- Now hash fails on A100, we need to fix it
- Reduce the first time peak_ram.
Reproduction Steps
auto_round Qwen/Qwen3-0.6B
Environment Information
A100
Error Logs
################################# Hash failure #############################
2026-03-26 13:07:47 INFO base.py L1800: start to cache block inputs
Parameter 'function'=<function get_tokenizer_function.<locals>.default_tokenizer_function at 0x7ae2a39919e0> of the
transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Ma
ke sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and cachi
ng to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous ca
lls and recompute everything. This warning is only shown once. Subsequent hashing failures won't be shown.
################################# First time #############################
2026-03-26 13:20:10 INFO base.py L1800: start to cache block inputs
README.md: 100%|??????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 373/373 [00:00<00:00, 1.85MB/s]
dataset_infos.json: 100%|????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 921/921 [00:00<00:00, 3.68MB/s]
data/train-00000-of-00001-4746b8785c874c(??): 100%|??????????????????????????????????????????????????????????| 33.3M/33.3M [00:03<00:00, 11.0MB/s]
Generating train split: 100%|??????????????????????????????????????????????????????????????????????????????| 10000/10000 [00:00<00:00, 27910.62 examples/s]
Map: 100%|????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 10000/10000 [00:37<00:00, 264.02 examples/s]
Filter: 100%|????????????????????????????????????????????????????????????????????????????????????????????????????????????????| 10000/10000 [00:05<00:00, 1687.67 examples/s]
Casting the dataset: 100%|????????????????????????????????????????????????????????????????????????????????????????????| 1216/1216 [00:04<00:00, 299.25 examples/s]
2026-03-26 13:21:16 INFO base.py L1817: caching done
Quantizing model.layers.0: 0%| | 0/28 [00:00<?, ?it/s]
/home/xinhe/auto-round/.venv/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: Flash Attention
defaults to a non-deterministic algorithm. To explicitly enable determinism call torch.use_deterministic_algorithms(
True, warn_only=False). (Triggered internally at /pytorch/aten/src/ATen/native/transformers/cuda/attention_backward.
cu:114.)
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
# first time
quantized 7/7 layers in the block, loss iter 0: 0.001116 -> iter 175: 0.000317,'peak_ram': 8.85GB, 'peak_vram': 3.51
GB
################################# Second time #############################
2026-03-26 13:22:16 INFO base.py L1800: start to cache block inputs
2026-03-26 13:22:24 INFO base.py L1817: caching done
Quantizing model.layers.0: 0%| | 0/28 [00:00<?, ?it/s]/home/xinhe/auto-round/.venv/lib/python3.12/site-packages/torch/autograd/graph.py:865: UserWarning: Flash Attention defaults to a non-deterministic algorithm. To explicitly enable determinism call torch.use_deterministic_algorithms(True, warn_only=False). (Triggered internally at /pytorch/aten/src/ATen/native/transformers/cuda/attention_backward.cu:114.)
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
quantized 7/7 layers in the block, loss iter 0: 0.001116 -> iter 175: 0.000315,'peak_ram': 2.01GB, 'peak_vram': 3.51GB
Additional Context
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working