Skip to content

[SUPPORT] Can't load quantised Gpt-oss-20b. (MxFP4)  #2146

@nandhakumarsuriya

Description

@nandhakumarsuriya

Gpt-oss finetuning :
While loading gpt-oss-20b model quantised on H200 GPU, with the below config:

quantization_config = Mxfp4Config(dequantize=False)
model_kwargs = dict(
    attn_implementation="eager",
    torch_dtype='auto',
    quantization_config=quantization_config,
    use_cache=False,
    device_map="auto",
)

Error:
MXFP4 quantization requires triton >= 3.4.0 and kernels installed, we will default to dequantizing the model to bf16

Note:
I have also installed triton.

Still why can't i load model with Mxfp4 quantized version.

Metadata

Metadata

Assignees

No one assigned

    Labels

    supportQuestions about how to do something

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions