Skip to content

Commit 14f45a0

Browse files
added sample json files
1 parent 8ec78b4 commit 14f45a0

File tree

4 files changed

+70
-0
lines changed

4 files changed

+70
-0
lines changed

docs/sample_blueprints/offline-inference-infra/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ Offline inference is ideal for:
4141

4242
This blueprint supports benchmark execution via a job-mode recipe using a YAML config file. The recipe mounts a model and config file from Object Storage, runs offline inference, and logs metrics.
4343

44+
Notes : Make sure your output object storage is in the same tenancy as your stack.
4445
---
4546

4647
### Sample Recipe (Job Mode for Offline SGLang Inference)
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
benchmark_type: offline
2+
offline_backend: sglang
3+
4+
model_path: /models/NousResearch/Meta-Llama-3.1-8B
5+
tokenizer_path: /models/NousResearch/Meta-Llama-3.1-8B
6+
trust_remote_code: true
7+
conv_template: llama-2
8+
9+
input_len: 128
10+
output_len: 128
11+
num_prompts: 64
12+
max_seq_len: 4096
13+
max_batch_size: 8
14+
dtype: auto
15+
temperature: 0.7
16+
top_p: 0.9
17+
18+
mlflow_uri: http://mlflow-benchmarking.corrino-oci.com:5000
19+
experiment_name: "sglang-bench-doc-test-new"
20+
run_name: "llama3-8b-sglang-test"
21+
22+
23+
save_metrics_path: /mlcommons_output/benchmark_output_llama3_sglang.json
24+
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
benchmark_type: offline
2+
model: /models/NousResearch/Meta-Llama-3.1-8B
3+
tokenizer: /models/NousResearch/Meta-Llama-3.1-8B
4+
5+
input_len: 12
6+
output_len: 12
7+
num_prompts: 2
8+
seed: 42
9+
tensor_parallel_size: 8
10+
11+
# vLLM-specific
12+
#quantization: awq
13+
dtype: half
14+
gpu_memory_utilization: 0.99
15+
num_scheduler_steps: 10
16+
device: cuda
17+
enforce_eager: true
18+
kv_cache_dtype: auto
19+
enable_prefix_caching: true
20+
distributed_executor_backend: mp
21+
22+
# Output
23+
#output_json: ./128_128.json
24+
25+
# MLflow
26+
mlflow_uri: http://mlflow-benchmarking.corrino-oci.com:5000
27+
experiment_name: test-bm-suite-doc
28+
run_name: llama3-vllm-test
29+
save_metrics_path: /mlcommons_output/benchmark_output_llama3_vllm.json
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
benchmark_type: online
2+
model: meta/llama3-8b-instruct
3+
input_len: 64
4+
output_len: 32
5+
max_requests: 5
6+
timeout: 300
7+
num_concurrent: 1
8+
results_dir: /workspace/results_on
9+
llm_api: openai
10+
llm_api_key: dummy-key
11+
llm_api_base: http://localhost:8001/v1
12+
experiment_name: local-bench
13+
run_name: llama3-test
14+
mlflow_uri: http://mlflow-benchmarking.corrino-oci.com:5000
15+
llmperf_path: /opt/llmperf-src
16+
metadata: test=localhost

0 commit comments

Comments
 (0)