Skip to content

Commit 4901841

Browse files
WuhanMonkeysubramen
authored andcommitted
Update readme in Inference folder
1 parent 2828fd0 commit 4901841

File tree

1 file changed

+4
-4
lines changed
  • recipes/benchmarks/inference_throughput

1 file changed

+4
-4
lines changed

recipes/benchmarks/inference_throughput/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Inference Throughput Benchmarks
2-
In this folder we provide a series of benchmark scripts that apply a throughput analysis for Llama 2 models inference on various backends:
2+
In this folder we provide a series of benchmark scripts that apply a throughput analysis for Llama models inference on various backends:
33
* On-prem - Popular serving frameworks and containers (i.e. vLLM)
4-
* [**WIP**]Cloud API - Popular API services (i.e. Azure Model-as-a-Service)
5-
* [**WIP**]On-device - Popular on-device inference solutions on Android and iOS (i.e. mlc-llm, QNN)
4+
* Cloud API - Popular API services (i.e. Azure Model-as-a-Service or Serverless API)
5+
* [**WIP**]On-device - Popular on-device inference solutions on mobile and desktop (i.e. ExecuTorch, MLC-LLM, Ollama)
66
* [**WIP**]Optimization - Popular optimization solutions for faster inference and quantization (i.e. AutoAWQ)
77

88
# Why
@@ -16,7 +16,7 @@ Here are the parameters (if applicable) that you can configure for running the b
1616
* **PROMPT** - Prompt sent in for inference (configure the length of prompt, choose from 5, 25, 50, 100, 500, 1k and 2k)
1717
* **MAX_NEW_TOKENS** - Max number of tokens generated
1818
* **CONCURRENT_LEVELS** - Max number of concurrent requests
19-
* **MODEL_PATH** - Model source
19+
* **MODEL_PATH** - Model source from Huggingface
2020
* **MODEL_HEADERS** - Request headers
2121
* **SAFE_CHECK** - Content safety check (either Azure service or simulated latency)
2222
* **THRESHOLD_TPS** - Threshold TPS (threshold for tokens per second below which we deem the query to be slow)

0 commit comments

Comments
 (0)