@@ -22,15 +22,95 @@ measuring:
2222
2323### Results
2424
25- | Platform | GPU | Model | Flow | Source material | In tok/s | Out tok/s | Total tok/s |
26- | -------- | --- | ----- | ---- | --------------- | ---------- | ----------- | ---------- |
27- | vLLM on Intel Gaudi 2 | Gaudi 2, 8 cards | meta-llama/Llama-3.3-70B-Instruct | document-rag+graph-rag | NASA Challenger Report Volume 1 | 1493.6 | 1545.8 | 3039.5 |
28- | vllm-server | H100-SXM5-80GB (Tensordock) | TheBloke/Mistral-7B-v0.1-AWQ | document-rag+graph-rag | NASA Challenger Report Volume 1 | 304.3 | 1845.6 | 2150.0 |
29- | VertexAI | n/a | Gemini 2.0 Flash | document-rag+graph-rag | NASA Challenger Report Volume 1 | 216.2 | 155.8 | 372.0 |
30- | LMStudio | Radeon RX 7900 XTX | Gemma3 4B QAT | document-rag+graph-rag | NASA Challenger Report Volume 1 | 116.2 | 133.9 | 250.1 |
31- | Granite Ridge | 128 Xeon Gen 6 CPU | mistralai/Mistral-7B-Instruct-v0.3 | document-rag+graph-rag | NASA Challenger Report Volume 1 | 117.7 | 90.0 | 207.8 |
32- | LMStudio | Radeon RX 7900 XTX | Gemma2 9B | document-rag+graph-rag | NASA Challenger Report Volume 1 | 119.6 | 73.0 | 192.6 |
33- | Granite Ridge | 128 Xeon Gen 6 CPU | meta-llama/Llama-3.3-70B-Instruct | document-rag+graph-rag | NASA Challenger Report Volume 1 | 67.0 | 22.4 | 89.3 |
25+ <table >
26+ <thead >
27+ <tr>
28+ <th>Platform</th>
29+ <th>GPU</th>
30+ <th>Model</th>
31+ <th>Config</th>
32+ <th>Token Rate</th>
33+ <th>Time to Process</th>
34+ </tr>
35+ </thead >
36+ <tbody >
37+ <tr>
38+ <td>vLLM on Intel Gaudi 2 🏆</td>
39+ <td>Gaudi 2, 8 cards</td>
40+ <td>meta-llama/Llama-3.3-70B-Instruct</td>
41+ <td>TC1</td>
42+ <td>In: 1493.6<br/>Out: 1545.8<br/>Total: 3039.5</td>
43+ <td>8.5 min</td>
44+ </tr>
45+ <tr>
46+ <td>vllm-server on NVidia</td>
47+ <td>H100-SXM5-80GB (Tensordock)</td>
48+ <td>TheBloke/Mistral-7B-v0.1-AWQ</td>
49+ <td>TC1</td>
50+ <td>In: 304.3<br/>Out: 1845.6<br/>Total: 2150.0</td>
51+ <td>12.0 min</td>
52+ </tr>
53+ <tr>
54+ <td>VertexAI</td>
55+ <td>n/a</td>
56+ <td>Gemini 2.0 Flash</td>
57+ <td>TC1</td>
58+ <td>In: 216.2<br/>Out: 155.8<br/>Total: 372.0</td>
59+ <td>69.4 min</td>
60+ </tr>
61+ <tr>
62+ <td>LMStudio</td>
63+ <td>Radeon RX 7900 XTX</td>
64+ <td>Gemma3 4B QAT</td>
65+ <td>TC1</td>
66+ <td>In: 116.2<br/>Out: 133.9<br/>Total: 250.1</td>
67+ <td>103.3 min</td>
68+ </tr>
69+ <tr>
70+ <td>Granite Ridge</td>
71+ <td>128 Xeon Gen 6 CPU</td>
72+ <td>mistralai/Mistral-7B-Instruct-v0.3</td>
73+ <td>TC1</td>
74+ <td>In: 117.7<br/>Out: 90.0<br/>Total: 207.8</td>
75+ <td>124.3 min</td>
76+ </tr>
77+ <tr>
78+ <td>LMStudio</td>
79+ <td>Radeon RX 7900 XTX</td>
80+ <td>Gemma2 9B</td>
81+ <td>TC1</td>
82+ <td>In: 119.6<br/>Out: 73.0<br/>Total: 192.6</td>
83+ <td>134.1 min</td>
84+ </tr>
85+ <tr>
86+ <td>Granite Ridge</td>
87+ <td>128 Xeon Gen 6 CPU</td>
88+ <td>meta-llama/Llama-3.3-70B-Instruct</td>
89+ <td>TC1</td>
90+ <td>In: 67.0<br/>Out: 22.4<br/>Total: 89.3</td>
91+ <td>289.3 min</td>
92+ </tr>
93+ </tbody >
94+ </table >
95+
96+ ### Test Configurations
97+
98+ <table >
99+ <thead >
100+ <tr>
101+ <th>Config ID</th>
102+ <th>Flow</th>
103+ <th>Source Material</th>
104+ </tr>
105+ </thead >
106+ <tbody >
107+ <tr>
108+ <td>TC1</td>
109+ <td>document-rag+graph-rag</td>
110+ <td>NASA Challenger Report Volume 1 (1,549,890 tokens)</td>
111+ </tr>
112+ </tbody >
113+ </table >
34114
35115## Procedure
36116
0 commit comments