Host memory OOM Killed when quantizing DeepSeek-V3 AWQ

**Describe the bug**
Hello. when I tried DeepSeek-V3 AWQ Quantization,
it was killed during the 48th layer smoothing due to Host memory OOM.
(1xH100, 72 cpu cores, 1Ti memory)
Is there any way to reduce host memory while AWQ quantization process?

I performed quantization based on the following PR.
https://github.com/vllm-project/llm-compressor/pull/1619
@cjackal (I apologize if my mention of this surprised you.)


**Expected behavior**
A clear and concise description of what you expected to happen.

**Environment**
Include all relevant environment information:
1. OS [e.g. Ubuntu 20.04]: Ubuntu 22.04
2. Python version [e.g. 3.7]: 3.12
3. LLM Compressor version or commit hash [e.g. 0.1.0, `f7245c8`]: 0.6.1.dev42+g92cdf630
4. ML framework version(s) [e.g. torch 2.3.1]: torch 2.7.1
5. Other Python package versions [e.g. vLLM, compressed-tensors, numpy, ONNX]:
accelerate               1.9.0
aiohappyeyeballs         2.6.1
aiohttp                  3.12.14
aiosignal                1.4.0
annotated-types          0.7.0
attrs                    25.3.0
certifi                  2025.7.14
charset-normalizer       3.4.2
compressed-tensors       0.10.3a20250724
datasets                 4.0.0
dill                     0.3.8
filelock                 3.18.0
frozendict               2.4.6
frozenlist               1.7.0
fsspec                   2025.3.0
hf-xet                   1.1.5
huggingface-hub          0.34.1
idna                     3.10
Jinja2                   3.1.6
llmcompressor            0.6.1.dev42+g92cdf630
loguru                   0.7.3
MarkupSafe               3.0.2
mpmath                   1.3.0
multidict                6.6.3
multiprocess             0.70.16
networkx                 3.5
numpy                    2.2.6
nvidia-cublas-cu12       12.6.4.1
nvidia-cuda-cupti-cu12   12.6.80
nvidia-cuda-nvrtc-cu12   12.6.77
nvidia-cuda-runtime-cu12 12.6.77
nvidia-cudnn-cu12        9.5.1.17
nvidia-cufft-cu12        11.3.0.4
nvidia-cufile-cu12       1.11.1.6
nvidia-curand-cu12       10.3.7.77
nvidia-cusolver-cu12     11.7.1.2
nvidia-cusparse-cu12     12.5.4.2
nvidia-cusparselt-cu12   0.6.3
nvidia-ml-py             12.575.51
nvidia-nccl-cu12         2.26.2
nvidia-nvjitlink-cu12    12.6.85
nvidia-nvtx-cu12         12.6.77
nvitop                   1.5.2
packaging                25.0
pandas                   2.3.1
pillow                   11.3.0
pip                      24.3.1
propcache                0.3.2
psutil                   7.0.0
pyarrow                  21.0.0
pydantic                 2.11.7
pydantic_core            2.33.2
pynvml                   12.0.0
python-dateutil          2.9.0.post0
pytz                     2025.2
PyYAML                   6.0.2
regex                    2024.11.6
requests                 2.32.4
safetensors              0.5.3
setuptools               80.9.0
six                      1.17.0
sympy                    1.14.0
tokenizers               0.21.2
torch                    2.7.1
tqdm                     4.67.1
transformers             4.54.0
triton                   3.3.1
typing_extensions        4.14.1
typing-inspection        0.4.1
tzdata                   2025.2
urllib3                  2.5.0
xxhash                   3.5.0
yarl                     1.20.1
7. Other relevant environment information [e.g. hardware, CUDA version]:
NVIDIA-SMI 535.183.06             
Driver Version: 535.183.06   
CUDA Version: 12.4  
**To Reproduce**
Exact steps to reproduce the behavior: quantizing AWQ quantization on DeepSeek-V3.


**Errors**
Host Memory OOM Killed without any additional messages

**Additional context**

<img width="1501" height="283" alt="Image" src="https://github.com/user-attachments/assets/5f5ab3eb-b227-418c-8f66-95fedbf1cf8a" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Host memory OOM Killed when quantizing DeepSeek-V3 AWQ #1684

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Host memory OOM Killed when quantizing DeepSeek-V3 AWQ #1684

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions