Skip to content

Commit 3d5c701

Browse files
FMBench readme updates for Llama3 on Inf2 and config.yml cleanup (meta-llama#486)
2 parents c3b2628 + c1c213f commit 3d5c701

File tree

2 files changed

+35
-17
lines changed

2 files changed

+35
-17
lines changed

recipes/benchmarks/fmbench/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,11 +99,12 @@ Llama3 is now available on Bedrock (read [blog post](https://aws.amazon.com/blog
9999
100100
## 🚨 Benchmarking Llama3 on Amazon SageMaker 🚨
101101
102-
Llama3 is now available on SageMaker (read [blog post](https://aws.amazon.com/blogs/machine-learning/meta-llama-3-models-are-now-available-in-amazon-sagemaker-jumpstart/)), and you can now benchmark it using `FMBench`. Here are the config files for benchmarking `Llama3-8b-instruct` and `Llama3-70b-instruct` on `ml.p4d.24xlarge` and `ml.g5.12xlarge` instance.
102+
Llama3 is now available on SageMaker (read [blog post](https://aws.amazon.com/blogs/machine-learning/meta-llama-3-models-are-now-available-in-amazon-sagemaker-jumpstart/)), and you can now benchmark it using `FMBench`. Here are the config files for benchmarking `Llama3-8b-instruct` and `Llama3-70b-instruct` on `ml.p4d.24xlarge`, `ml.inf2.24xlarge` and `ml.g5.12xlarge` instances.
103103
104104
<!-- markdown-link-check-disable -->
105105
- [Config file](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/src/fmbench/configs/config-llama3-8b-instruct-g5-p4d.yml) for `Llama3-8b-instruct` on `ml.p4d.24xlarge` and `ml.g5.12xlarge`.
106106
- [Config file](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/src/fmbench/configs/config-llama3-70b-instruct-g5-p4d.yml) for `Llama3-70b-instruct` on `ml.p4d.24xlarge` and `ml.g5.48xlarge`.
107+
- [Config file](https://github.com/aws-samples/foundation-model-benchmarking-tool/blob/main/src/fmbench/configs/config-llama3-8b-inf2-g5.yml) for `Llama3-8b-instruct` on `ml.inf2.24xlarge` and `ml.g5.12xlarge`.
107108
<!-- markdown-link-check-enable -->
108109
109110
## Benchmarking Llama2 on Amazon SageMaker

recipes/benchmarks/fmbench/config.yml

Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ aws:
99
# SageMaker execution role used to run FMBench, this parameter is templatized, no need to change
1010
sagemaker_execution_role: {role_arn}
1111
# S3 bucket to which metrics, plots and reports would be written to
12-
bucket: {write_bucket} ## add the name of your desired bucket
12+
bucket: {write_bucket}
1313

1414
# directory paths in the write bucket, no need to change these
1515
dir_paths:
@@ -22,9 +22,10 @@ dir_paths:
2222

2323
# S3 information for reading datasets, scripts and tokenizer
2424
s3_read_data:
25-
# read bucket name, templatized, if left unchanged will default to sagemaker-fmbench-read-{region}-{account_id}
25+
# read bucket name, templatized, if left unchanged will default to sagemaker-fmbench-read-<region>-<account_id>
2626
read_bucket: {read_bucket}
27-
27+
scripts_prefix: scripts
28+
2829
# S3 prefix in the read bucket where deployment and inference scripts should be placed
2930
scripts_prefix: scripts
3031

@@ -52,13 +53,12 @@ s3_read_data:
5253
- narrativeqa.jsonl
5354
- triviaqa_e.jsonl
5455
- triviaqa.jsonl
55-
5656
# S3 prefix for the tokenizer to be used with the models
5757
# NOTE 1: the same tokenizer is used with all the models being tested through a config file
5858
# NOTE 2: place your model specific tokenizers in a prefix named as <model_name>_tokenizer
59-
# so the mistral tokenizer goes in mistral_tokenizer, Llama2 tokenizer goes in llama2_tokenizer
59+
# so the mistral tokenizer goes in mistral_tokenizer, Llama2 tokenizer goes in llama2_tokenizer and so on and so forth.
6060
tokenizer_prefix: tokenizer
61-
61+
6262
# S3 prefix for prompt templates
6363
prompt_template_dir: prompt_template
6464

@@ -79,7 +79,7 @@ run_steps:
7979
4_model_metric_analysis.ipynb: yes
8080
5_cleanup.ipynb: yes
8181

82-
# dataset related configuration
82+
8383
datasets:
8484
# Refer to the 1_generate_data.ipynb notebook
8585
# the dataset you use is expected to have the
@@ -89,7 +89,7 @@ datasets:
8989
prompt_template_keys:
9090
- input
9191
- context
92-
92+
9393
# if your dataset has multiple languages and it has a language
9494
# field then you could filter it for a language. Similarly,
9595
# you can filter your dataset to only keep prompts between
@@ -125,7 +125,7 @@ datasets:
125125
# dataset which is listed below as the dataset_of_interest
126126
metrics:
127127
dataset_of_interest: en_2000-3000
128-
128+
129129
# all pricing information is in the pricing.yml file
130130
# this file is provided in the repo. You can add entries
131131
# to this file for new instance types and new Bedrock models
@@ -156,18 +156,18 @@ experiments:
156156
# model_id is interpreted in conjunction with the deployment_script, so if you
157157
# use a JumpStart model id then set the deployment_script to jumpstart.py.
158158
# if deploying directly from HuggingFace this would be a HuggingFace model id
159-
# see the DJL serving deployment script in the code repo for reference.
159+
# see the DJL serving deployment script in the code repo for reference.
160160
model_id: meta-textgeneration-llama-2-7b-f
161161
model_version: "3.*"
162162
model_name: llama2-7b-f
163163
ep_name: llama-2-7b-g5xlarge
164164
instance_type: "ml.g5.xlarge"
165165
image_uri: '763104351884.dkr.ecr.{region}.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04'
166-
deploy: yes
166+
deploy: yes
167167
instance_count: 1
168168
# FMBench comes packaged with multiple deployment scripts, such as scripts for JumpStart
169169
# scripts for deploying using DJL DeepSpeed, tensorRT etc. You can also add your own.
170-
# See repo for details
170+
# See repo for details
171171
deployment_script: jumpstart.py
172172
# FMBench comes packaged with multiple inference scripts, such as scripts for SageMaker
173173
# and Bedrock. You can also add your own. See repo for details
@@ -181,14 +181,15 @@ experiments:
181181
- payload_en_500-1000.jsonl
182182
- payload_en_1000-2000.jsonl
183183
- payload_en_2000-3000.jsonl
184+
#- payload_en_3000-3840.jsonl
184185
# concurrency level refers to number of requests sent in parallel to an endpoint
185186
# the next set of requests is sent once responses for all concurrent requests have
186187
# been received.
187188
concurrency_levels:
188189
- 1
189190
- 2
190191
- 4
191-
# Added for models that require accepting a EULA
192+
192193
accept_eula: true
193194
# Environment variables to be passed to the container
194195
# this is not a fixed list, you can add more parameters as applicable.
@@ -204,30 +205,47 @@ experiments:
204205
SAGEMAKER_MODEL_SERVER_WORKERS: "1"
205206

206207
- name: llama2-7b-g5.2xlarge-huggingface-pytorch-tgi-inference-2.0.1-tgi1.1.0
208+
# model_id is interpreted in conjunction with the deployment_script, so if you
209+
# use a JumpStart model id then set the deployment_script to jumpstart.py.
210+
# if deploying directly from HuggingFace this would be a HuggingFace model id
211+
# see the DJL serving deployment script in the code repo for reference.
207212
model_id: meta-textgeneration-llama-2-7b-f
208213
model_version: "3.*"
209214
model_name: llama2-7b-f
210215
ep_name: llama-2-7b-g5-2xlarge
211216
instance_type: "ml.g5.2xlarge"
212217
image_uri: '763104351884.dkr.ecr.{region}.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04'
213218
deploy: yes
219+
# FMBench comes packaged with multiple deployment scripts, such as scripts for JumpStart
220+
# scripts for deploying using DJL DeepSpeed, tensorRT etc. You can also add your own.
221+
# See repo for details
214222
instance_count: 1
215223
deployment_script: jumpstart.py
224+
# FMBench comes packaged with multiple inference scripts, such as scripts for SageMaker
225+
# and Bedrock. You can also add your own. See repo for details
216226
inference_script: sagemaker_predictor.py
217227
inference_spec:
228+
# this should match one of the sections in the inference_parameters section above
218229
parameter_set: sagemaker
230+
# runs are done for each combination of payload file and concurrency level
219231
payload_files:
220232
- payload_en_1-500.jsonl
221233
- payload_en_500-1000.jsonl
222234
- payload_en_1000-2000.jsonl
223235
- payload_en_2000-3000.jsonl
224-
236+
#- payload_en_3000-3840.jsonl
237+
238+
# concurrency level refers to number of requests sent in parallel to an endpoint
239+
# the next set of requests is sent once responses for all concurrent requests have
240+
# been received.
225241
concurrency_levels:
226242
- 1
227243
- 2
228244
- 4
229-
245+
# Added for models that require accepting a EULA
230246
accept_eula: true
247+
# Environment variables to be passed to the container
248+
# this is not a fixed list, you can add more parameters as applicable.
231249
env:
232250
SAGEMAKER_PROGRAM: "inference.py"
233251
ENDPOINT_SERVER_TIMEOUT: "3600"
@@ -249,7 +267,6 @@ report:
249267
latency_budget: 2
250268
cost_per_10k_txn_budget: 20
251269
error_rate_budget: 0
252-
253270
# other misc reporting parameters, see 4_model_metric_analysis.ipynb
254271
# for more information
255272
per_inference_request_file: per_inference_request_results.csv

0 commit comments

Comments
 (0)