Skip to content

Commit c783a5e

Browse files
authored
Merge pull request #774 from ROCm/dev/perf_add_accuracy_baselines
Add accuracy baselines
2 parents 5769cfc + 3d821f5 commit c783a5e

File tree

1 file changed

+96
-0
lines changed

1 file changed

+96
-0
lines changed

evaluation/ACCURACY.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
# Baseline accuracies for models of interest
2+
3+
## LLMs (lm_eval on gsm8k)
4+
5+
### DeepSeek-R1 Block-Scale FP8
6+
7+
deepseek-ai/DeepSeek-R1
8+
```shell
9+
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
10+
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
11+
|gsm8k| 3|flexible-extract| 5|exact_match||0.9492|± |0.0060|
12+
| | |strict-match | 5|exact_match||0.9484|± |0.0061|
13+
```
14+
15+
### DeepSeek-R1 PTPC FP8
16+
17+
EmbeddedLLM/deepseek-r1-FP8-Dynamic
18+
```shell
19+
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
20+
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
21+
|gsm8k| 3|flexible-extract| 5|exact_match||0.9477|± |0.0061|
22+
| | |strict-match | 5|exact_match||0.9469|± |0.0062|
23+
```
24+
25+
### Qwen3-Coder PTPC Quark FP8
26+
27+
EmbeddedLLM/Qwen3-Coder-480B-A35B-Instruct-FP8-Dynamic
28+
```shell
29+
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
30+
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
31+
|gsm8k| 3|flexible-extract| 5|exact_match|_ |0.8848|_ |0.0088|
32+
| | |strict-match | 5|exact_match|_ |0.8590|_ |0.0096|
33+
```
34+
35+
### Qwen3-Next
36+
37+
Qwen/Qwen3-Next-80B-A3B-Instruct
38+
```shell
39+
|Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr|
40+
|-----|------:|----------------|-----:|-----------|---|-----:|---|-----:|
41+
|gsm8k| 3|flexible-extract| 5|exact_match|_ |0.8537|_ |0.0097|
42+
| | |strict-match | 5|exact_match|_ |0.8135|_ |0.0107|
43+
44+
```
45+
46+
## VLMs and Omni Models (mistral-eval on chartqa)
47+
48+
### Qwen2.5-VL-72B
49+
Qwen/Qwen2.5-VL-72B-Instruct
50+
```shell
51+
Metrics:
52+
{
53+
"explicit_prompt_relaxed_correctness": 0.8652,
54+
"anywhere_in_answer_relaxed_correctness": 0.8828
55+
}
56+
```
57+
58+
### Qwen2.5-VL-72B PTPC FP8
59+
RedHatAI/Qwen2.5-VL-72B-Instruct-FP8-dynamic
60+
```shell
61+
Metrics:
62+
{
63+
"explicit_prompt_relaxed_correctness": 0.8792,
64+
"anywhere_in_answer_relaxed_correctness": 0.8888
65+
}
66+
```
67+
68+
### Qwen3-VL-235B
69+
Qwen/Qwen3-VL-235B-A22B-Instruct
70+
```shell
71+
Metrics:
72+
{
73+
"explicit_prompt_relaxed_correctness": 0.8736,
74+
"anywhere_in_answer_relaxed_correctness": 0.8752
75+
}
76+
```
77+
78+
### Qwen3-VL-235B PTPC FP8
79+
RedHatAI/Qwen3-VL-235B-A22B-Instruct-FP8-dynamic
80+
```shell
81+
Metrics:
82+
{
83+
"explicit_prompt_relaxed_correctness": 0.8724,
84+
"anywhere_in_answer_relaxed_correctness": 0.874
85+
}
86+
```
87+
88+
### Qwen3-Omni
89+
Qwen/Qwen3-Omni-30B-A3B-Instruct
90+
```shell
91+
Metrics:
92+
{
93+
"explicit_prompt_relaxed_correctness": 0.8736,
94+
"anywhere_in_answer_relaxed_correctness": 0.8768
95+
}
96+
```

0 commit comments

Comments
 (0)