Skip to content

Commit d44ab3b

Browse files
author
Harsh Shah
committed
Add a column for tensor parallel
1 parent 00c8229 commit d44ab3b

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

inference/trillium/vLLM/Llama3.x/README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -135,11 +135,12 @@ vllm serve meta-llama/Llama-3.3-70B-Instruct \
135135
--max-model-len $MAX_MODEL_LEN
136136
```
137137

138-
| Model | Input/Output Scenario | max-num-batched-tokens | max-num-seqs |
139-
| :--- | :--- | :--- | :--- |
140-
| Llama-3.x-70B-Instruct | Prefill Heavy | 2048 | 256 |
141-
| Llama-3.x-70B-Instruct | Decode Heavy/ Balanced | 512 | 256 |
142-
| Llama3.1-8B-Instruct | Prefill Heavy | 1024 | 128 |
138+
| Model | Input/Output Scenario | max-num-batched-tokens | max-num-seqs | tensor-parallel-size |
139+
|:--- | :--- | :--- | :--- | :--- |
140+
| Llama-3.x-70B-Instruct | Prefill Heavy | 2048 | 256 | 8 |
141+
| Llama-3.x-70B-Instruct | Decode Heavy/ Balanced | 512 | 256 | 8 |
142+
| Llama3.1-8B-Instruct | Prefill Heavy | 1024 | 128 | 1 |
143+
143144

144145

145146
It takes a few minutes depending on the model size to prepare the server.

0 commit comments

Comments
 (0)