Skip to content

Commit 0873f28

Browse files
authored
[GPTQ] Add inversion fallback (#1283)
## Purpose ## * Given the increasing size of large language models (such as DeepSeek-R1 which contains 45034 linear layers), the likelihood that any of the hessian inversions will spontaneously fail is significant * These changes cause the GPTQ algorithm to fall back to RTN for any layers which fail hessian inversion ## Changes ## * Implement fallback by setting hessian value to identity matrix if inversion fails --------- Signed-off-by: Kyle Sayers <[email protected]>
1 parent 81271b5 commit 0873f28

File tree

1 file changed

+5
-2
lines changed

1 file changed

+5
-2
lines changed

src/llmcompressor/modifiers/quantization/gptq/gptq_quantize.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
QuantizationStrategy,
1111
fake_quantize,
1212
)
13+
from loguru import logger
1314

1415
from llmcompressor.modifiers.utils import SPARSITY_THRESHOLD
1516
from llmcompressor.observers.base import Observer
@@ -161,11 +162,13 @@ def quantize_weight(
161162
H = torch.linalg.cholesky(H, upper=True)
162163
Hinv = H
163164
except torch._C._LinAlgError:
164-
raise torch._C._LinAlgError(
165+
logger.warning(
165166
"Failed to invert hessian due to numerical instability. Consider "
166167
"increasing GPTQModifier.dampening_frac, increasing the number "
167-
"of calibration samples, or shuffling the calibration dataset"
168+
"of calibration samples, or shuffling the calibration dataset. "
169+
"Falling back to round-to-nearest for this module."
168170
)
171+
Hinv = H = torch.eye(num_columns, dtype=H.dtype, device=H.device)
169172

170173
# See section 3.4 of https://arxiv.org/abs/2203.07259
171174
for i1 in range(0, num_columns, blocksize):

0 commit comments

Comments
 (0)