Skip to content

[Experimental] Mistral-format FP8 quantization#1359

Merged
mgoin merged 7 commits intomainfrom
mistral-format-fp8
Jun 10, 2025
Merged

[Experimental] Mistral-format FP8 quantization#1359
mgoin merged 7 commits intomainfrom
mistral-format-fp8

Conversation

@mgoin
Copy link
Copy Markdown
Member

@mgoin mgoin commented Apr 16, 2025

https://huggingface.co/nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8

vllm serve nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8 --tokenizer_mode mistral --config_format mistral --load_format mistral --quantization fp8

lm_eval --model local-completions --model_args model=nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8,tokenizer=mistralai/Mistral-Small-3.1-24B-Instruct-2503,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=500,tokenized_requests=False --tasks gsm8k --num_fewshot 5
local-completions (model=nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8,tokenizer=mistralai/Mistral-Small-3.1-24B-Instruct-2503,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=500,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8886|±  |0.0087|
|     |       |strict-match    |     5|exact_match|↑  |0.8848|±  |0.0088|

Signed-off-by: mgoin <michael@neuralmagic.com>
@github-actions
Copy link
Copy Markdown

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@dsikka dsikka marked this pull request as draft April 17, 2025 03:20
@fpaupier
Copy link
Copy Markdown

Would be actually pretty useful as a reference exemple ! thanks @mgoin
👍
Could this be merged @kylesayrs ?

@mgoin mgoin marked this pull request as ready for review June 10, 2025 15:50
@kylesayrs kylesayrs added ready When a PR is ready for review and removed ready When a PR is ready for review labels Jun 10, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
Copy link
Copy Markdown
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great enablement, looking forward to supporting this officially in the future

Copy link
Copy Markdown
Collaborator

@rahul-tuli rahul-tuli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! this is a good idea, thank you!

Copy link
Copy Markdown
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@mgoin mgoin merged commit d947364 into main Jun 10, 2025
11 checks passed
@mgoin mgoin deleted the mistral-format-fp8 branch June 10, 2025 18:27
@fpaupier
Copy link
Copy Markdown

hello @mgoin thanks for the example on quantizing latest Magistral model here https://github.com/vllm-project/llm-compressor/tree/main/experimental/mistral

By any mean, do you have on the HF Hub a quantized and properly calibrated FP8 version already? Would be extremely useful. Thanks for your work on this lib, very useful.

aireilly pushed a commit to aireilly/llm-compressor that referenced this pull request Jul 30, 2025
https://huggingface.co/nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8

```
vllm serve nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8 --tokenizer_mode mistral --config_format mistral --load_format mistral --quantization fp8

lm_eval --model local-completions --model_args model=nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8,tokenizer=mistralai/Mistral-Small-3.1-24B-Instruct-2503,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=500,tokenized_requests=False --tasks gsm8k --num_fewshot 5
local-completions (model=nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8,tokenizer=mistralai/Mistral-Small-3.1-24B-Instruct-2503,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=500,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8886|±  |0.0087|
|     |       |strict-match    |     5|exact_match|↑  |0.8848|±  |0.0088|
```

---------

Signed-off-by: mgoin <michael@neuralmagic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants