Skip to content

Commit d4170e4

Browse files
authored
clean up server.yml (#31)
1 parent 505d071 commit d4170e4

File tree

33 files changed

+12
-188
lines changed
  • Qwen/Qwen2.5-7B-Instruct/accuracy
  • RedHatAI
    • Llama-3.3-70B-Instruct-FP8-dynamic/accuracy
    • Llama-3.3-70B-Instruct-quantized.w4a16/accuracy
    • Llama-3.3-70B-Instruct-quantized.w8a8/accuracy
    • Llama-4-Scout-17B-16E-Instruct-FP8-dynamic/accuracy
    • Meta-Llama-3.1-8B-Instruct-FP8-dynamic/accuracy
    • Meta-Llama-3.1-8B-Instruct-quantized.w4a16/accuracy
    • Meta-Llama-3.1-8B-Instruct-quantized.w8a8/accuracy
    • Mistral-Small-24B-Instruct-2501-FP8-Dynamic/accuracy
    • Mistral-Small-24B-Instruct-2501-quantized.w4a16/accuracy
    • Mistral-Small-24B-Instruct-2501-quantized.w8a8/accuracy
    • Mistral-Small-3.1-24B-Instruct-2503-FP8-dynamic/accuracy
    • Mistral-Small-3.1-24B-Instruct-2503-quantized.w4a16/accuracy
    • Mistral-Small-3.1-24B-Instruct-2503-quantized.w8a8/accuracy
    • Qwen2.5-7B-Instruct-FP8-dynamic/accuracy
    • Qwen2.5-7B-Instruct-quantized.w4a16/accuracy
    • Qwen2.5-7B-Instruct-quantized.w8a8/accuracy
    • Qwen2.5-7B-quantized.w4a16/accuracy
    • granite-3.1-8b-instruct-FP8-dynamic/accuracy
    • granite-3.1-8b-instruct-quantized.w4a16/accuracy
    • granite-3.1-8b-instruct-quantized.w8a8/accuracy
    • phi-4-FP8-dynamic/accuracy
    • phi-4-quantized.w4a16/accuracy
    • phi-4-quantized.w8a8/accuracy
  • ibm-granite/granite-3.1-8b-instruct/accuracy
  • meta-llama
  • mistralai

33 files changed

+12
-188
lines changed

Qwen/Qwen2.5-7B-Instruct/accuracy/server.yml

Lines changed: 0 additions & 5 deletions
This file was deleted.
Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
# server configs for https://huggingface.co/RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic
22
model: "RedHatAI/Llama-3.3-70B-Instruct-FP8-dynamic"
33
trust-remote-code: true
4-
enable-chunked-prefill: true
5-
tensor-parallel-size: 1
6-
max-model-len: 4096
4+
tensor-parallel-size: 8
5+
max-model-len: 16384
Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
# server configs for https://huggingface.co/RedHatAI/Llama-3.3-70B-Instruct-quantized.w4a16
22
model: "RedHatAI/Llama-3.3-70B-Instruct-quantized.w4a16"
33
trust-remote-code: true
4-
enable-chunked-prefill: true
5-
tensor-parallel-size: 1
6-
max-model-len: 4096
4+
tensor-parallel-size: 8
5+
max-model-len: 16384
Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
# server configs for https://huggingface.co/RedHatAI/Llama-3.3-70B-Instruct-quantized.w8a8
22
model: "RedHatAI/Llama-3.3-70B-Instruct-quantized.w8a8"
33
trust-remote-code: true
4-
enable-chunked-prefill: true
5-
tensor-parallel-size: 1
6-
max-model-len: 8192
4+
tensor-parallel-size: 8
5+
max-model-len: 16384

RedHatAI/Llama-4-Scout-17B-16E-Instruct-FP8-dynamic/accuracy/server.yml

Lines changed: 0 additions & 6 deletions
This file was deleted.

RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic/accuracy/server.yml

Lines changed: 0 additions & 6 deletions
This file was deleted.

RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w4a16/accuracy/server.yml

Lines changed: 0 additions & 6 deletions
This file was deleted.

RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8/accuracy/server.yml

Lines changed: 0 additions & 6 deletions
This file was deleted.

RedHatAI/Mistral-Small-24B-Instruct-2501-FP8-Dynamic/accuracy/server.yml

Lines changed: 0 additions & 6 deletions
This file was deleted.

RedHatAI/Mistral-Small-24B-Instruct-2501-quantized.w4a16/accuracy/server.yml

Lines changed: 0 additions & 6 deletions
This file was deleted.

0 commit comments

Comments
 (0)