File tree Expand file tree Collapse file tree 1 file changed +6
-5
lines changed
inference/trillium/vLLM/Llama3.x Expand file tree Collapse file tree 1 file changed +6
-5
lines changed Original file line number Diff line number Diff line change @@ -135,11 +135,12 @@ vllm serve meta-llama/Llama-3.3-70B-Instruct \
135135 --max-model-len $MAX_MODEL_LEN
136136```
137137
138- | Model | Input/Output Scenario | max-num-batched-tokens | max-num-seqs |
139- | :--- | :--- | :--- | :--- |
140- | Llama-3.x-70B-Instruct | Prefill Heavy | 2048 | 256 |
141- | Llama-3.x-70B-Instruct | Decode Heavy/ Balanced | 512 | 256 |
142- | Llama3.1-8B-Instruct | Prefill Heavy | 1024 | 128 |
138+ | Model | Input/Output Scenario | max-num-batched-tokens | max-num-seqs | tensor-parallel-size |
139+ | :--- | :--- | :--- | :--- | :--- |
140+ | Llama-3.x-70B-Instruct | Prefill Heavy | 2048 | 256 | 8 |
141+ | Llama-3.x-70B-Instruct | Decode Heavy/ Balanced | 512 | 256 | 8 |
142+ | Llama3.1-8B-Instruct | Prefill Heavy | 1024 | 128 | 1 |
143+
143144
144145
145146It takes a few minutes depending on the model size to prepare the server.
You can’t perform that action at this time.
0 commit comments