Skip to content

input_quantizer with mx format like mxint8 was diabled when use awq_lite. #543

@bestzsq

Description

@bestzsq

I want to test awq_lite with mx format input_quantizer, however input_quantizer was disabled after call awq_lite function, my quant_cfg and quant_summary:

quant_cfg = {
    "quant_cfg": {
        "*weight_quantizer": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 16, "type": "dynamic", "scale_bits": (4, 3)},
            "axis": None,
            "enable": True,
        },
        "*input_quantizer": {
            "num_bits": 8,
            "block_sizes": {-1: 32, "type": "dynamic", "scale_bits": (8, 0)},
            "enable": True,
        },
        **_default_disabled_quantizer_cfg,
    },
    "algorithm": "awq_lite",
}

and with below quant summary(awq_lite/)

model.layers.23.mlp.up_proj.input_quantizer                                      TensorQuantizer(disabled)
model.layers.23.mlp.up_proj.output_quantizer                                     TensorQuantizer(disabled)
model.layers.23.mlp.up_proj.weight_quantizer                                     TensorQuantizer((2, 1) bit fake block_sizes:{-1: 16, 'type': 'dynamic', 'scale_bits': (4, 3)}, amax=2.4219 calibrator=MaxCalibrator quant)

when i try awq_clip with mx format input_quantizer, the input_quantizer was as expected:

{
    "quant_cfg": {
        "*weight_quantizer": {
            "num_bits": (2, 1),
            "block_sizes": {-1: 16, "type": "dynamic", "scale_bits": (4, 3)},
            "axis": None,
            "enable": True,
        },
        "*input_quantizer": {
            "num_bits": 8,
            "block_sizes": {-1: 32, "type": "dynamic", "scale_bits": (8, 0)},
            "enable": True,
        },
        **_default_disabled_quantizer_cfg,
    },
    "algorithm": {"method": "awq_clip"},
}

and with quant summary(awq_clip)

model.layers.23.mlp.down_proj.input_quantizer                                    TensorQuantizer(8 bit fake block_sizes={-1: 32, 'type': 'dynamic', 'scale_bits': (8, 0)}, amax=None calibrator=MaxCalibrator quant)
model.layers.23.mlp.down_proj.output_quantizer                                   TensorQuantizer(disabled)
model.layers.23.mlp.down_proj.weight_quantizer                                   TensorQuantizer((2, 1) bit fake block_sizes={-1: 16, 'type': 'dynamic', 'scale_bits': (4, 3)}, amax=0.5000 calibrator=MaxCalibrator quant)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions