[Experimental] Mistral-format FP8 quantization#1359
Conversation
Signed-off-by: mgoin <michael@neuralmagic.com>
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <michael@neuralmagic.com>
|
Would be actually pretty useful as a reference exemple ! thanks @mgoin |
Signed-off-by: mgoin <michael@neuralmagic.com>
kylesayrs
left a comment
There was a problem hiding this comment.
Great enablement, looking forward to supporting this officially in the future
rahul-tuli
left a comment
There was a problem hiding this comment.
LGTM! this is a good idea, thank you!
|
hello @mgoin thanks for the example on quantizing latest Magistral model here https://github.com/vllm-project/llm-compressor/tree/main/experimental/mistral By any mean, do you have on the HF Hub a quantized and properly calibrated FP8 version already? Would be extremely useful. Thanks for your work on this lib, very useful. |
https://huggingface.co/nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8 ``` vllm serve nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8 --tokenizer_mode mistral --config_format mistral --load_format mistral --quantization fp8 lm_eval --model local-completions --model_args model=nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8,tokenizer=mistralai/Mistral-Small-3.1-24B-Instruct-2503,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=500,tokenized_requests=False --tasks gsm8k --num_fewshot 5 local-completions (model=nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8,tokenizer=mistralai/Mistral-Small-3.1-24B-Instruct-2503,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=500,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.8886|± |0.0087| | | |strict-match | 5|exact_match|↑ |0.8848|± |0.0088| ``` --------- Signed-off-by: mgoin <michael@neuralmagic.com>
https://huggingface.co/nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8