[AWQ] add option to take smooth layer quantization into accout

normally the way AWQ works is to pick a layer that is going to be quantized, try a bunch of scale factors to find the one that minimizes quantization error when that layer is quantized and then do an [inverse rescale](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/modifiers/awq/base.py#L520) on the preceeding layer which is normally not quantized. However a problem arises for the [up_proj -> down_proj](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/modifiers/awq/mappings.py#L51-L54) mapping
because both the smooth and balance layers are targeted for quantization. Since we only take into account the quantization of the balance layers in our current AWQ implementation, we could be making the smooth layer harder to quantize with our choice of scale factor for the balance layer since the smooth layer is basically ignored during the quantization error calculation for quantizing the balance layer.

We should

1) test if this has a significant impact
2) add an option to enable this feature if its beneficial

STEPS

A) add a check [here](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/modifiers/awq/base.py#L378) for whether smooth_name is in targeted_names and if so, change the get_lowest_common...etc search to include the smooth layer (this is how we determine what module is run to determine the quantization error, so we need smooth layer to be run if we're taking its quantization into accout)
B) add a flag to [compute_best_scale](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/modifiers/awq/base.py#L567) for if the smooth layer is targeted
C) if necessary add the smooth layer to [this dict](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/modifiers/awq/base.py#L497)
D) move [the rescale weight code](https://github.com/vllm-project/llm-compressor/blob/main/src/llmcompressor/modifiers/awq/base.py#L664-L696) into a function which is called for each balance layer
E) if necessary, call the rescale weight code for 1/_scalesview on the smooth_layer
F) check whether this has an impact on lm_eval performance on some small set of models.
G) check how this affects the runtime of AWQ for those models.

if its beneficial then put up a PR with those changes demonstrating what was tested and how it affects things.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AWQ] add option to take smooth layer quantization into accout #2296

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[AWQ] add option to take smooth layer quantization into accout #2296

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions