Skip to content

Commit bea2c7b

Browse files
committed
Updated benchmarks format
1 parent 2c70ab0 commit bea2c7b

File tree

1 file changed

+89
-9
lines changed

1 file changed

+89
-9
lines changed

benchmarks.md

Lines changed: 89 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,15 +22,95 @@ measuring:
2222

2323
### Results
2424

25-
| Platform | GPU | Model | Flow | Source material | In tok/s | Out tok/s | Total tok/s |
26-
| -------- | --- | ----- | ---- | --------------- | ---------- | ----------- | ---------- |
27-
| vLLM on Intel Gaudi 2 | Gaudi 2, 8 cards | meta-llama/Llama-3.3-70B-Instruct | document-rag+graph-rag | NASA Challenger Report Volume 1 | 1493.6 | 1545.8 | 3039.5 |
28-
| vllm-server | H100-SXM5-80GB (Tensordock) | TheBloke/Mistral-7B-v0.1-AWQ | document-rag+graph-rag | NASA Challenger Report Volume 1 | 304.3 | 1845.6 | 2150.0 |
29-
| VertexAI | n/a | Gemini 2.0 Flash | document-rag+graph-rag | NASA Challenger Report Volume 1 | 216.2 | 155.8 | 372.0 |
30-
| LMStudio | Radeon RX 7900 XTX | Gemma3 4B QAT | document-rag+graph-rag | NASA Challenger Report Volume 1 | 116.2 | 133.9 | 250.1 |
31-
| Granite Ridge | 128 Xeon Gen 6 CPU | mistralai/Mistral-7B-Instruct-v0.3 | document-rag+graph-rag | NASA Challenger Report Volume 1 | 117.7 | 90.0 | 207.8 |
32-
| LMStudio | Radeon RX 7900 XTX | Gemma2 9B | document-rag+graph-rag | NASA Challenger Report Volume 1 | 119.6 | 73.0 | 192.6 |
33-
| Granite Ridge | 128 Xeon Gen 6 CPU | meta-llama/Llama-3.3-70B-Instruct | document-rag+graph-rag | NASA Challenger Report Volume 1 | 67.0 | 22.4 | 89.3 |
25+
<table>
26+
<thead>
27+
<tr>
28+
<th>Platform</th>
29+
<th>GPU</th>
30+
<th>Model</th>
31+
<th>Config</th>
32+
<th>Token Rate</th>
33+
<th>Time to Process</th>
34+
</tr>
35+
</thead>
36+
<tbody>
37+
<tr>
38+
<td>vLLM on Intel Gaudi 2 🏆</td>
39+
<td>Gaudi 2, 8 cards</td>
40+
<td>meta-llama/Llama-3.3-70B-Instruct</td>
41+
<td>TC1</td>
42+
<td>In: 1493.6<br/>Out: 1545.8<br/>Total: 3039.5</td>
43+
<td>8.5 min</td>
44+
</tr>
45+
<tr>
46+
<td>vllm-server on NVidia</td>
47+
<td>H100-SXM5-80GB (Tensordock)</td>
48+
<td>TheBloke/Mistral-7B-v0.1-AWQ</td>
49+
<td>TC1</td>
50+
<td>In: 304.3<br/>Out: 1845.6<br/>Total: 2150.0</td>
51+
<td>12.0 min</td>
52+
</tr>
53+
<tr>
54+
<td>VertexAI</td>
55+
<td>n/a</td>
56+
<td>Gemini 2.0 Flash</td>
57+
<td>TC1</td>
58+
<td>In: 216.2<br/>Out: 155.8<br/>Total: 372.0</td>
59+
<td>69.4 min</td>
60+
</tr>
61+
<tr>
62+
<td>LMStudio</td>
63+
<td>Radeon RX 7900 XTX</td>
64+
<td>Gemma3 4B QAT</td>
65+
<td>TC1</td>
66+
<td>In: 116.2<br/>Out: 133.9<br/>Total: 250.1</td>
67+
<td>103.3 min</td>
68+
</tr>
69+
<tr>
70+
<td>Granite Ridge</td>
71+
<td>128 Xeon Gen 6 CPU</td>
72+
<td>mistralai/Mistral-7B-Instruct-v0.3</td>
73+
<td>TC1</td>
74+
<td>In: 117.7<br/>Out: 90.0<br/>Total: 207.8</td>
75+
<td>124.3 min</td>
76+
</tr>
77+
<tr>
78+
<td>LMStudio</td>
79+
<td>Radeon RX 7900 XTX</td>
80+
<td>Gemma2 9B</td>
81+
<td>TC1</td>
82+
<td>In: 119.6<br/>Out: 73.0<br/>Total: 192.6</td>
83+
<td>134.1 min</td>
84+
</tr>
85+
<tr>
86+
<td>Granite Ridge</td>
87+
<td>128 Xeon Gen 6 CPU</td>
88+
<td>meta-llama/Llama-3.3-70B-Instruct</td>
89+
<td>TC1</td>
90+
<td>In: 67.0<br/>Out: 22.4<br/>Total: 89.3</td>
91+
<td>289.3 min</td>
92+
</tr>
93+
</tbody>
94+
</table>
95+
96+
### Test Configurations
97+
98+
<table>
99+
<thead>
100+
<tr>
101+
<th>Config ID</th>
102+
<th>Flow</th>
103+
<th>Source Material</th>
104+
</tr>
105+
</thead>
106+
<tbody>
107+
<tr>
108+
<td>TC1</td>
109+
<td>document-rag+graph-rag</td>
110+
<td>NASA Challenger Report Volume 1 (1,549,890 tokens)</td>
111+
</tr>
112+
</tbody>
113+
</table>
34114

35115
## Procedure
36116

0 commit comments

Comments
 (0)