Skip to content

Conversation

@kliuae-amd
Copy link

@kliuae-amd kliuae-amd commented Jan 31, 2026

Purpose

Sync changes from upstream.

Upstream vLLM cf1167e (v0.15.0 release)
aiter: 6af8b6874

Test Plan

Evaluate the following models of interest.
LLM: gsm8k
VLM: ChartQA

Test Result

LLM

deepseek-ai/DeepSeek-R1, TP8

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|_  |0.9553|_  |0.0057|
|     |       |strict-match    |     5|exact_match|_  |0.9553|_  |0.0057|

deepseek-ai/DeepSeek-R1, TP8 + shared expert fusion

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|_  |0.9553|_  |0.0057|
|     |       |strict-match    |     5|exact_match|_  |0.9545|_  |0.0057|

deepseek-ai/DeepSeek-R1, TP8 + EP

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|_  |0.9553|_  |0.0057|
|     |       |strict-match    |     5|exact_match|_  |0.9545|_  |0.0057|

deepseek-ai/DeepSeek-R1, TP8 + EP + shared expert fusion

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|_  |0.9507|_  | 0.006|
|     |       |strict-match    |     5|exact_match|_  |0.9492|_  | 0.006|

EmbeddedLLM/deepseek-r1-FP8-Dynamic PTPC FP8, TP8

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|_  |0.9378|_  |0.0067|
|     |       |strict-match    |     5|exact_match|_  |0.9378|_  |0.0067|

EmbeddedLLM/deepseek-r1-FP8-Dynamic PTPC FP8, TP8 + EP

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|_  |0.9553|_  |0.0057|
|     |       |strict-match    |     5|exact_match|_  |0.9560|_  |0.0056|

EmbeddedLLM/Qwen3-Coder-480B-A35B-Instruct-FP8-Dynamic, TP4

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|_  |0.8779|_  |0.0090|
|     |       |strict-match    |     5|exact_match|_  |0.8491|_  |0.0099|

EmbeddedLLM/Qwen3-Coder-480B-A35B-Instruct-FP8-Dynamic, TP4 + EP

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|_  |0.8840|_  |0.0088|
|     |       |strict-match    |     5|exact_match|_  |0.8529|_  |0.0098|

Qwen/Qwen3-Next-80B-A3B, TP4

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|_  |0.8552|_  |0.0097|
|     |       |strict-match    |     5|exact_match|_  |0.8089|_  |0.0108|

Qwen/Qwen3-Omni-30B-A3B-Instruct, TP2

|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value |   |Stderr|
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
|gsm8k|      3|flexible-extract|     5|exact_match|_  |0.8597|_  |0.0096|
|     |       |strict-match    |     5|exact_match|_  |0.8476|_  |0.0099|

VLM

Qwen/Qwen3-VL-235B-A22B-Instruct TP4

{
    "explicit_prompt_relaxed_correctness": 0.8652,
    "anywhere_in_answer_relaxed_correctness": 0.8672
}

RedHatAI/Qwen3-VL-235B-A22B-Instruct-FP8-dynamic TP4

{
    "explicit_prompt_relaxed_correctness": 0.868,
    "anywhere_in_answer_relaxed_correctness": 0.8688
}

Qwen/Qwen2.5-VL-72B-Instruct

{
    "explicit_prompt_relaxed_correctness": 0.8648,
    "anywhere_in_answer_relaxed_correctness": 0.8864
}

RedHatAI/Qwen2.5-VL-72B-Instruct-FP8-dynamic TP2

{
    "explicit_prompt_relaxed_correctness": 0.8744,
    "anywhere_in_answer_relaxed_correctness": 0.8888
}

Qwen/Qwen3-Omni-30B-A3B-Instruct TP2

{
    "explicit_prompt_relaxed_correctness": 0.8712,
    "anywhere_in_answer_relaxed_correctness": 0.872
}

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

noooop and others added 30 commits January 16, 2026 06:17
Signed-off-by: ilmarkov <[email protected]>
Signed-off-by: Tyler Michael Smith <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]>
LucasWilkinson and others added 27 commits January 24, 2026 16:03
Signed-off-by: Maryam Tahhan <[email protected]>
Co-authored-by: Li, Jiang <[email protected]>
…ct#28973)

Signed-off-by: Joshua Deng <[email protected]>
Signed-off-by: Patrick von Platen <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Patrick von Platen <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Signed-off-by: JJJYmmm <[email protected]>
Signed-off-by: Roger Wang <[email protected]>
Signed-off-by: Isotr0py <[email protected]>
Co-authored-by: Roger Wang <[email protected]>
Co-authored-by: Isotr0py <[email protected]>
… is None in ViT attention backend) (vllm-project#33033)

Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: luotingdan <[email protected]>
Signed-off-by: ltd0924 <[email protected]>
Co-authored-by: luotingdan <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
Signed-off-by: Danielle Robinson <[email protected]>
Co-authored-by: Danielle Robinson <[email protected]>
Co-authored-by: Jee Jee Li <[email protected]>
Signed-off-by: Robert Shaw <[email protected]>
Co-authored-by: Robert Shaw <[email protected]>
(cherry picked from commit 43a013c)
Signed-off-by: kliuae <[email protected]>
Signed-off-by: kliuae <[email protected]>
Signed-off-by: kliuae <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.