W4A16 quantization option to handle/ignore layers with columns%groupsize != 0

When you apply W4A16 quantization to a model, all layers get quantized, if you then try to save the compressed model you'll run into errors saving layers where the groupsize doesn't evenly divide the columns.  

https://github.com/vllm-project/compressed-tensors/blob/main/src/compressed_tensors/quantization/lifecycle/forward.py#L278

this has tripped up users before:

e.g. https://github.com/vllm-project/compressed-tensors/issues/447

It feels like there should be an option to ignore or pad layers that don't have the necessary dimension so you don't have to manually ignore all the layers with that shape.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

W4A16 quantization option to handle/ignore layers with columns%groupsize != 0 #1983

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

W4A16 quantization option to handle/ignore layers with columns%groupsize != 0 #1983

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions