-
Notifications
You must be signed in to change notification settings - Fork 273
Open
Open
Copy link
Labels
compressed-tensorsRelates to compressed-tensorsRelates to compressed-tensorsenhancementNew feature or requestNew feature or requestgptqFor any PR / issue related to GPTQ supportFor any PR / issue related to GPTQ supportwNa16Anything related to weight-only int-quantized supportAnything related to weight-only int-quantized support
Description
When you apply W4A16 quantization to a model, all layers get quantized, if you then try to save the compressed model you'll run into errors saving layers where the groupsize doesn't evenly divide the columns.
this has tripped up users before:
e.g. vllm-project/compressed-tensors#447
It feels like there should be an option to ignore or pad layers that don't have the necessary dimension so you don't have to manually ignore all the layers with that shape.
Metadata
Metadata
Assignees
Labels
compressed-tensorsRelates to compressed-tensorsRelates to compressed-tensorsenhancementNew feature or requestNew feature or requestgptqFor any PR / issue related to GPTQ supportFor any PR / issue related to GPTQ supportwNa16Anything related to weight-only int-quantized supportAnything related to weight-only int-quantized support