Skip to content

Commit b0bd47d

Browse files
authored
update server configs for large models (#36)
* update server configs for large models * use 8 for tensor-parallel-size
1 parent 902ae7e commit b0bd47d

File tree

5 files changed

+12
-2
lines changed
  • RedHatAI
    • Llama-4-Scout-17B-16E-Instruct-FP8-dynamic/accuracy
    • Llama-4-Scout-17B-16E-Instruct-quantized.w4a16/accuracy
  • meta-llama
    • Llama-4-Maverick-17B-128E-Instruct-FP8/accuracy
    • Llama-4-Maverick-17B-128E-Instruct/accuracy
    • Llama-4-Scout-17B-16E-Instruct/accuracy

5 files changed

+12
-2
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
trust-remote-code: true
2+
tensor-parallel-size: 2
3+
max-model-len: 16384
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
trust-remote-code: true
2+
tensor-parallel-size: 2
3+
max-model-len: 16384
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
trust-remote-code: true
2+
tensor-parallel-size: 8
3+
max-model-len: 16384
Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
# server configs for https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Instruct
2-
model: "meta-llama/Llama-4-Maverick-17B-128E-Instruct"
31
trust-remote-code: true
42
tensor-parallel-size: 8
53
max-model-len: 16384
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
trust-remote-code: true
2+
tensor-parallel-size: 4
3+
max-model-len: 16384

0 commit comments

Comments
 (0)