-
Notifications
You must be signed in to change notification settings - Fork 11.4k
Open
Labels
supportQuestions about how to do somethingQuestions about how to do something
Description
Gpt-oss finetuning :
While loading gpt-oss-20b model quantised on H200 GPU, with the below config:
quantization_config = Mxfp4Config(dequantize=False)
model_kwargs = dict(
attn_implementation="eager",
torch_dtype='auto',
quantization_config=quantization_config,
use_cache=False,
device_map="auto",
)
Error:
MXFP4 quantization requires triton >= 3.4.0
and kernels installed, we will default to dequantizing the model to bf16
Note:
I have also installed triton.
Still why can't i load model with Mxfp4 quantized version.
Metadata
Metadata
Assignees
Labels
supportQuestions about how to do somethingQuestions about how to do something