Skip to content

W4A16 quantization option to handle/ignore layers with columns%groupsize != 0 #1983

@HDCharles

Description

@HDCharles

When you apply W4A16 quantization to a model, all layers get quantized, if you then try to save the compressed model you'll run into errors saving layers where the groupsize doesn't evenly divide the columns.

https://github.com/vllm-project/compressed-tensors/blob/main/src/compressed_tensors/quantization/lifecycle/forward.py#L278

this has tripped up users before:

e.g. vllm-project/compressed-tensors#447

It feels like there should be an option to ignore or pad layers that don't have the necessary dimension so you don't have to manually ignore all the layers with that shape.

Metadata

Metadata

Assignees

No one assigned

    Labels

    compressed-tensorsRelates to compressed-tensorsenhancementNew feature or requestgptqFor any PR / issue related to GPTQ supportwNa16Anything related to weight-only int-quantized support

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions