-
Notifications
You must be signed in to change notification settings - Fork 206
Closed as duplicate of#1650
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Hello. when I tried DeepSeek-V3 AWQ Quantization,
it was killed during the 48th layer smoothing due to Host memory OOM.
(1xH100, 72 cpu cores, 1Ti memory)
Is there any way to reduce host memory while AWQ quantization process?
I performed quantization based on the following PR.
#1619
@cjackal (I apologize if my mention of this surprised you.)
Expected behavior
A clear and concise description of what you expected to happen.
Environment
Include all relevant environment information:
- OS [e.g. Ubuntu 20.04]: Ubuntu 22.04
- Python version [e.g. 3.7]: 3.12
- LLM Compressor version or commit hash [e.g. 0.1.0,
f7245c8
]: 0.6.1.dev42+g92cdf630 - ML framework version(s) [e.g. torch 2.3.1]: torch 2.7.1
- Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
accelerate 1.9.0
aiohappyeyeballs 2.6.1
aiohttp 3.12.14
aiosignal 1.4.0
annotated-types 0.7.0
attrs 25.3.0
certifi 2025.7.14
charset-normalizer 3.4.2
compressed-tensors 0.10.3a20250724
datasets 4.0.0
dill 0.3.8
filelock 3.18.0
frozendict 2.4.6
frozenlist 1.7.0
fsspec 2025.3.0
hf-xet 1.1.5
huggingface-hub 0.34.1
idna 3.10
Jinja2 3.1.6
llmcompressor 0.6.1.dev42+g92cdf630
loguru 0.7.3
MarkupSafe 3.0.2
mpmath 1.3.0
multidict 6.6.3
multiprocess 0.70.16
networkx 3.5
numpy 2.2.6
nvidia-cublas-cu12 12.6.4.1
nvidia-cuda-cupti-cu12 12.6.80
nvidia-cuda-nvrtc-cu12 12.6.77
nvidia-cuda-runtime-cu12 12.6.77
nvidia-cudnn-cu12 9.5.1.17
nvidia-cufft-cu12 11.3.0.4
nvidia-cufile-cu12 1.11.1.6
nvidia-curand-cu12 10.3.7.77
nvidia-cusolver-cu12 11.7.1.2
nvidia-cusparse-cu12 12.5.4.2
nvidia-cusparselt-cu12 0.6.3
nvidia-ml-py 12.575.51
nvidia-nccl-cu12 2.26.2
nvidia-nvjitlink-cu12 12.6.85
nvidia-nvtx-cu12 12.6.77
nvitop 1.5.2
packaging 25.0
pandas 2.3.1
pillow 11.3.0
pip 24.3.1
propcache 0.3.2
psutil 7.0.0
pyarrow 21.0.0
pydantic 2.11.7
pydantic_core 2.33.2
pynvml 12.0.0
python-dateutil 2.9.0.post0
pytz 2025.2
PyYAML 6.0.2
regex 2024.11.6
requests 2.32.4
safetensors 0.5.3
setuptools 80.9.0
six 1.17.0
sympy 1.14.0
tokenizers 0.21.2
torch 2.7.1
tqdm 4.67.1
transformers 4.54.0
triton 3.3.1
typing_extensions 4.14.1
typing-inspection 0.4.1
tzdata 2025.2
urllib3 2.5.0
xxhash 3.5.0
yarl 1.20.1 - Other relevant environment information [e.g. hardware, CUDA version]:
NVIDIA-SMI 535.183.06
Driver Version: 535.183.06
CUDA Version: 12.4
To Reproduce
Exact steps to reproduce the behavior: quantizing AWQ quantization on DeepSeek-V3.
Errors
Host Memory OOM Killed without any additional messages
Additional context

Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working