You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix target matching for fused layers with compressed-tensors (#12617)
Without this PR
---------------
Quantizing models with llm-compressor and a recipe that explicitly lists
names of layers produces a model that is not loadable by vLLM (i.e.
`vllm serve <model>` fails with `raise ValueError(f"Unable to find
matching target for {module} in the ...`).
Example recipe:
```
recipe = """
quantization_stage:
run_type: oneshot
quantization_modifiers:
GPTQModifier:
ignore: ["lm_head"]
config_groups:
group_0:
weights:
num_bits: 4
type: "int"
symmetric: true
strategy: "group"
group_size: 128
targets: [
"model.layers.0.mlp.down_proj",
"model.layers.2.mlp.down_proj",
"model.layers.3.mlp.down_proj",
"model.layers.4.mlp.down_proj",
"model.layers.5.mlp.down_proj",
"model.layers.6.mlp.down_proj",
"model.layers.7.mlp.down_proj",
"model.layers.8.mlp.down_proj",
"model.layers.9.mlp.down_proj",
"model.layers.10.mlp.down_proj",
"model.layers.11.mlp.down_proj",
"model.layers.12.mlp.down_proj",
"model.layers.13.mlp.down_proj",
"model.layers.14.mlp.down_proj",
"model.layers.15.mlp.down_proj",
"model.layers.16.mlp.down_proj",
"model.layers.17.mlp.down_proj",
"model.layers.19.mlp.down_proj",
"model.layers.21.mlp.down_proj",
"model.layers.22.mlp.down_proj",
.
.
.
]
"""
```
To reproduce the vLLM error:
```bash
vllm serve nm-testing/eldar-test
```
With this PR
------------
Models are loaded correctly without any errors.
0 commit comments