What is the meaning of weight_quantizer in W4A8_AWQ_BETA_CFG?

The config of W4A8_AWQ is: 
```
W4A8_AWQ_BETA_CFG = {
    "quant_cfg": {
        "*weight_quantizer": [
            {"num_bits": 4, "block_sizes": {-1: 128, "type": "static"}, "enable": True},
            {"num_bits": (4, 3), "axis": None, "enable": True},
        ],
        "*input_quantizer": {"num_bits": (4, 3), "axis": None, "enable": True},
        **_default_disabled_quantizer_cfg,
    },
    "algorithm": "awq_lite",
}
```
Why the weight quantizer has two config? Does it mean that the bf16 weight will be first per-tensor quantized to FP8, than the FP8 weight will be per-group quantized to int4 weight?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What is the meaning of weight_quantizer in W4A8_AWQ_BETA_CFG? #268

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What is the meaning of weight_quantizer in W4A8_AWQ_BETA_CFG? #268

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions