[Experimental] Mistral-format FP8 quantization by mgoin · Pull Request #1359 · vllm-project/llm-compressor

mgoin · 2025-04-16T23:26:40Z

https://huggingface.co/nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8

vllm serve nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8 --tokenizer_mode mistral --config_format mistral --load_format mistral --quantization fp8

lm_eval --model local-completions --model_args model=nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8,tokenizer=mistralai/Mistral-Small-3.1-24B-Instruct-2503,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=500,tokenized_requests=False --tasks gsm8k --num_fewshot 5
local-completions (model=nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8,tokenizer=mistralai/Mistral-Small-3.1-24B-Instruct-2503,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=500,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1
|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|↑  |0.8886|±  |0.0087|
|     |       |strict-match    |     5|exact_match|↑  |0.8848|±  |0.0088|

Signed-off-by: mgoin <michael@neuralmagic.com>

github-actions · 2025-04-16T23:26:48Z

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

Signed-off-by: mgoin <michael@neuralmagic.com>

fpaupier · 2025-05-26T08:48:02Z

Would be actually pretty useful as a reference exemple ! thanks @mgoin
👍
Could this be merged @kylesayrs ?

Signed-off-by: mgoin <michael@neuralmagic.com>

kylesayrs

Great enablement, looking forward to supporting this officially in the future

rahul-tuli

LGTM! this is a good idea, thank you!

brian-dellabetta

🚀

fpaupier · 2025-06-19T13:41:27Z

hello @mgoin thanks for the example on quantizing latest Magistral model here https://github.com/vllm-project/llm-compressor/tree/main/experimental/mistral

By any mean, do you have on the HF Hub a quantized and properly calibrated FP8 version already? Would be extremely useful. Thanks for your work on this lib, very useful.

https://huggingface.co/nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8 ``` vllm serve nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8 --tokenizer_mode mistral --config_format mistral --load_format mistral --quantization fp8 lm_eval --model local-completions --model_args model=nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8,tokenizer=mistralai/Mistral-Small-3.1-24B-Instruct-2503,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=500,tokenized_requests=False --tasks gsm8k --num_fewshot 5 local-completions (model=nm-testing/Mistral-Small-3.1-24B-Instruct-2503-FP8,tokenizer=mistralai/Mistral-Small-3.1-24B-Instruct-2503,base_url=http://0.0.0.0:8000/v1/completions,num_concurrent=500,tokenized_requests=False), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 1 |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.8886|± |0.0087| | | |strict-match | 5|exact_match|↑ |0.8848|± |0.0088| ``` --------- Signed-off-by: mgoin <michael@neuralmagic.com>

Mistral-format FP8 quantization

59a64c8

Signed-off-by: mgoin <michael@neuralmagic.com>

dsikka marked this pull request as draft April 17, 2025 03:20

mgoin added 4 commits April 21, 2025 14:09

Update to ct format

346f562

Signed-off-by: mgoin <michael@neuralmagic.com>

Add ptpc quantization (still has accuracy issues)

ceb778f

Signed-off-by: mgoin <michael@neuralmagic.com>

Delete experimental/mistral/fp8_ptpc_quantize.py

e8dff7e

Merge branch 'main' into mistral-format-fp8

f5789fd

Merge branch 'main' into mistral-format-fp8

db8ef6a

mgoin marked this pull request as ready for review June 10, 2025 15:50

kylesayrs added ready When a PR is ready for review and removed ready When a PR is ready for review labels Jun 10, 2025

Add README

4470984

Signed-off-by: mgoin <michael@neuralmagic.com>

kylesayrs approved these changes Jun 10, 2025

View reviewed changes

rahul-tuli approved these changes Jun 10, 2025

View reviewed changes

brian-dellabetta approved these changes Jun 10, 2025

View reviewed changes

mgoin merged commit d947364 into main Jun 10, 2025
11 checks passed

mgoin deleted the mistral-format-fp8 branch June 10, 2025 18:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Experimental] Mistral-format FP8 quantization#1359

[Experimental] Mistral-format FP8 quantization#1359
mgoin merged 7 commits intomainfrom
mistral-format-fp8

mgoin commented Apr 16, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Apr 16, 2025

Uh oh!

fpaupier commented May 26, 2025

Uh oh!

kylesayrs left a comment

Uh oh!

rahul-tuli left a comment

Uh oh!

brian-dellabetta left a comment

Uh oh!

Uh oh!

fpaupier commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mgoin commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 16, 2025

Uh oh!

fpaupier commented May 26, 2025

Uh oh!

kylesayrs left a comment

Choose a reason for hiding this comment

Uh oh!

rahul-tuli left a comment

Choose a reason for hiding this comment

Uh oh!

brian-dellabetta left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fpaupier commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mgoin commented Apr 16, 2025 •

edited

Loading