Skip to content

Commit 2828fd0

Browse files
WuhanMonkeysubramen
authored andcommitted
Update on-prem vllm scripts and readme
1 parent 9c8cad2 commit 2828fd0

File tree

4 files changed

+5
-4
lines changed

4 files changed

+5
-4
lines changed

recipes/benchmarks/inference_throughput/on-prem/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,3 +37,5 @@ To run pretrained model benchmark, follow the command below.
3737
```
3838
python pretrained_vllm_benchmark.py
3939
```
40+
41+
Refer to more vLLM benchmark details on their official Github repo [here](https://github.com/vllm-project/vllm/tree/main/benchmarks).

recipes/benchmarks/inference_throughput/on-prem/vllm/chat_vllm_benchmark.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
import csv
55
import json
66
import time
7-
import random
87
import threading
98
import numpy as np
109
import requests
@@ -18,7 +17,7 @@
1817
from azure.ai.contentsafety.models import AnalyzeTextOptions
1918

2019
from concurrent.futures import ThreadPoolExecutor, as_completed
21-
from typing import Dict, Tuple, List
20+
from typing import Tuple, List
2221

2322

2423

recipes/benchmarks/inference_throughput/on-prem/vllm/parameters.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"MAX_NEW_TOKENS" : 256,
33
"CONCURRENT_LEVELS" : [1, 2, 4, 8, 16, 32, 64, 128, 256],
4-
"MODEL_PATH" : "meta-llama/Meta-Llama-3-70B-Instruct",
4+
"MODEL_PATH" : "meta-llama/your-model-path",
55
"MODEL_HEADERS" : {"Content-Type": "application/json"},
66
"SAFE_CHECK" : true,
77
"THRESHOLD_TPS" : 7,

recipes/benchmarks/inference_throughput/on-prem/vllm/pretrained_vllm_benchmark.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
from azure.ai.contentsafety.models import AnalyzeTextOptions
1919

2020
from concurrent.futures import ThreadPoolExecutor, as_completed
21-
from typing import Dict, Tuple, List
21+
from typing import Tuple, List
2222

2323

2424
# Predefined inputs

0 commit comments

Comments
 (0)