Skip to content

Commit 2933a7d

Browse files
committed
move config to a separate file
1 parent 2d4996b commit 2933a7d

File tree

2 files changed

+260
-263
lines changed

2 files changed

+260
-263
lines changed

recipes/benchmarks/fmbench/README.md

Lines changed: 1 addition & 263 deletions
Original file line numberDiff line numberDiff line change
@@ -85,269 +85,7 @@ The following steps provide a Quick start guide for `FMBench`. For a more detail
8585
8686
Each `FMBench` run works with a configuration file that contains the information about the model, the deployment steps, and the tests to run. A typical `FMBench` workflow involves either directly using an already provided config file from the [`configs`](https://github.com/aws-samples/foundation-model-benchmarking-tool/tree/main/src/fmbench/configs) folder in the `FMBench` GitHub repo or editing an already provided config file as per your own requirements (say you want to try benchmarking on a different instance type, or a different inference container etc.).
8787
88-
A simple config file with some key parameters annotated is presented below. The file below benchmarks performance of Llama2-7b on an `ml.g5.xlarge` instance and an `ml.g5.2xlarge` instance.
89-
90-
```{markdown}
91-
general:
92-
name: "llama2-7b-v1"
93-
model_name: "Llama2-7b"
94-
95-
# AWS and SageMaker settings
96-
aws:
97-
# AWS region, this parameter is templatized, no need to change
98-
region: {region}
99-
# SageMaker execution role used to run FMBench, this parameter is templatized, no need to change
100-
sagemaker_execution_role: {role_arn}
101-
# S3 bucket to which metrics, plots and reports would be written to
102-
bucket: {write_bucket} ## add the name of your desired bucket
103-
104-
# directory paths in the write bucket, no need to change these
105-
dir_paths:
106-
data_prefix: data
107-
prompts_prefix: prompts
108-
all_prompts_file: all_prompts.csv
109-
metrics_dir: metrics
110-
models_dir: models
111-
metadata_dir: metadata
112-
113-
# S3 information for reading datasets, scripts and tokenizer
114-
s3_read_data:
115-
# read bucket name, templatized, if left unchanged will default to sagemaker-fmbench-read-{region}-{account_id}
116-
read_bucket: {read_bucket}
117-
118-
# S3 prefix in the read bucket where deployment and inference scripts should be placed
119-
scripts_prefix: scripts
120-
121-
# deployment and inference script files to be downloaded are placed in this list
122-
# only needed if you are creating a new deployment script or inference script
123-
# your HuggingFace token does need to be in this list and should be called "hf_token.txt"
124-
script_files:
125-
- hf_token.txt
126-
127-
# configuration files (like this one) are placed in this prefix
128-
configs_prefix: configs
129-
130-
# list of configuration files to download, for now only pricing.yml needs to be downloaded
131-
config_files:
132-
- pricing.yml
133-
134-
# S3 prefix for the dataset files
135-
source_data_prefix: source_data
136-
# list of dataset files, the list below is from the LongBench dataset https://huggingface.co/datasets/THUDM/LongBench
137-
source_data_files:
138-
- 2wikimqa_e.jsonl
139-
- 2wikimqa.jsonl
140-
- hotpotqa_e.jsonl
141-
- hotpotqa.jsonl
142-
- narrativeqa.jsonl
143-
- triviaqa_e.jsonl
144-
- triviaqa.jsonl
145-
146-
# S3 prefix for the tokenizer to be used with the models
147-
# NOTE 1: the same tokenizer is used with all the models being tested through a config file
148-
# NOTE 2: place your model specific tokenizers in a prefix named as <model_name>_tokenizer
149-
# so the mistral tokenizer goes in mistral_tokenizer, Llama2 tokenizer goes in llama2_tokenizer
150-
tokenizer_prefix: tokenizer
151-
152-
# S3 prefix for prompt templates
153-
prompt_template_dir: prompt_template
154-
155-
# prompt template to use, NOTE: same prompt template gets used for all models being tested through a config file
156-
# the FMBench repo already contains a bunch of prompt templates so review those first before creating a new one
157-
prompt_template_file: prompt_template_llama2.txt
158-
159-
# steps to run, usually all of these would be
160-
# set to yes so nothing needs to change here
161-
# you could, however, bypass some steps for example
162-
# set the 2_deploy_model.ipynb to no if you are re-running
163-
# the same config file and the model is already deployed
164-
run_steps:
165-
0_setup.ipynb: yes
166-
1_generate_data.ipynb: yes
167-
2_deploy_model.ipynb: yes
168-
3_run_inference.ipynb: yes
169-
4_model_metric_analysis.ipynb: yes
170-
5_cleanup.ipynb: yes
171-
172-
# dataset related configuration
173-
datasets:
174-
# Refer to the 1_generate_data.ipynb notebook
175-
# the dataset you use is expected to have the
176-
# columns you put in prompt_template_keys list
177-
# and your prompt template also needs to have
178-
# the same placeholders (refer to the prompt template folder)
179-
prompt_template_keys:
180-
- input
181-
- context
182-
183-
# if your dataset has multiple languages and it has a language
184-
# field then you could filter it for a language. Similarly,
185-
# you can filter your dataset to only keep prompts between
186-
# a certain token length limit (the token length is determined
187-
# using the tokenizer you provide in the tokenizer_prefix prefix in the
188-
# read S3 bucket). Each of the array entries below create a payload file
189-
# containing prompts matching the language and token length criteria.
190-
filters:
191-
- language: en
192-
min_length_in_tokens: 1
193-
max_length_in_tokens: 500
194-
payload_file: payload_en_1-500.jsonl
195-
- language: en
196-
min_length_in_tokens: 500
197-
max_length_in_tokens: 1000
198-
payload_file: payload_en_500-1000.jsonl
199-
- language: en
200-
min_length_in_tokens: 1000
201-
max_length_in_tokens: 2000
202-
payload_file: payload_en_1000-2000.jsonl
203-
- language: en
204-
min_length_in_tokens: 2000
205-
max_length_in_tokens: 3000
206-
payload_file: payload_en_2000-3000.jsonl
207-
- language: en
208-
min_length_in_tokens: 3000
209-
max_length_in_tokens: 3840
210-
payload_file: payload_en_3000-3840.jsonl
211-
212-
# While the tests would run on all the datasets
213-
# configured in the experiment entries below but
214-
# the price:performance analysis is only done for 1
215-
# dataset which is listed below as the dataset_of_interest
216-
metrics:
217-
dataset_of_interest: en_2000-3000
218-
219-
# all pricing information is in the pricing.yml file
220-
# this file is provided in the repo. You can add entries
221-
# to this file for new instance types and new Bedrock models
222-
pricing: pricing.yml
223-
224-
# inference parameters, these are added to the payload
225-
# for each inference request. The list here is not static
226-
# any parameter supported by the inference container can be
227-
# added to the list. Put the sagemaker parameters in the sagemaker
228-
# section, bedrock parameters in the bedrock section (not shown here).
229-
# Use the section name (sagemaker in this example) in the inference_spec.parameter_set
230-
# section under experiments.
231-
inference_parameters:
232-
sagemaker:
233-
do_sample: yes
234-
temperature: 0.1
235-
top_p: 0.92
236-
top_k: 120
237-
max_new_tokens: 100
238-
return_full_text: False
239-
240-
# Configuration for experiments to be run. The experiments section is an array
241-
# so more than one experiments can be added, these could belong to the same model
242-
# but different instance types, or different models, or even different hosting
243-
# options (such as one experiment is SageMaker and the other is Bedrock).
244-
experiments:
245-
- name: llama2-7b-g5.xlarge-huggingface-pytorch-tgi-inference-2.0.1-tgi1.1.0
246-
# model_id is interpreted in conjunction with the deployment_script, so if you
247-
# use a JumpStart model id then set the deployment_script to jumpstart.py.
248-
# if deploying directly from HuggingFace this would be a HuggingFace model id
249-
# see the DJL serving deployment script in the code repo for reference.
250-
model_id: meta-textgeneration-llama-2-7b-f
251-
model_version: "3.*"
252-
model_name: llama2-7b-f
253-
ep_name: llama-2-7b-g5xlarge
254-
instance_type: "ml.g5.xlarge"
255-
image_uri: '763104351884.dkr.ecr.{region}.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04'
256-
deploy: yes
257-
instance_count: 1
258-
# FMBench comes packaged with multiple deployment scripts, such as scripts for JumpStart
259-
# scripts for deploying using DJL DeepSpeed, tensorRT etc. You can also add your own.
260-
# See repo for details
261-
deployment_script: jumpstart.py
262-
# FMBench comes packaged with multiple inference scripts, such as scripts for SageMaker
263-
# and Bedrock. You can also add your own. See repo for details
264-
inference_script: sagemaker_predictor.py
265-
inference_spec:
266-
# this should match one of the sections in the inference_parameters section above
267-
parameter_set: sagemaker
268-
# runs are done for each combination of payload file and concurrency level
269-
payload_files:
270-
- payload_en_1-500.jsonl
271-
- payload_en_500-1000.jsonl
272-
- payload_en_1000-2000.jsonl
273-
- payload_en_2000-3000.jsonl
274-
# concurrency level refers to number of requests sent in parallel to an endpoint
275-
# the next set of requests is sent once responses for all concurrent requests have
276-
# been received.
277-
concurrency_levels:
278-
- 1
279-
- 2
280-
- 4
281-
# Added for models that require accepting a EULA
282-
accept_eula: true
283-
# Environment variables to be passed to the container
284-
# this is not a fixed list, you can add more parameters as applicable.
285-
env:
286-
SAGEMAKER_PROGRAM: "inference.py"
287-
ENDPOINT_SERVER_TIMEOUT: "3600"
288-
MODEL_CACHE_ROOT: "/opt/ml/model"
289-
SAGEMAKER_ENV: "1"
290-
HF_MODEL_ID: "/opt/ml/model"
291-
MAX_INPUT_LENGTH: "4095"
292-
MAX_TOTAL_TOKENS: "4096"
293-
SM_NUM_GPUS: "1"
294-
SAGEMAKER_MODEL_SERVER_WORKERS: "1"
295-
296-
- name: llama2-7b-g5.2xlarge-huggingface-pytorch-tgi-inference-2.0.1-tgi1.1.0
297-
model_id: meta-textgeneration-llama-2-7b-f
298-
model_version: "3.*"
299-
model_name: llama2-7b-f
300-
ep_name: llama-2-7b-g5-2xlarge
301-
instance_type: "ml.g5.2xlarge"
302-
image_uri: '763104351884.dkr.ecr.{region}.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.1.0-gpu-py39-cu118-ubuntu20.04'
303-
deploy: yes
304-
instance_count: 1
305-
deployment_script: jumpstart.py
306-
inference_script: sagemaker_predictor.py
307-
inference_spec:
308-
parameter_set: sagemaker
309-
payload_files:
310-
- payload_en_1-500.jsonl
311-
- payload_en_500-1000.jsonl
312-
- payload_en_1000-2000.jsonl
313-
- payload_en_2000-3000.jsonl
314-
315-
concurrency_levels:
316-
- 1
317-
- 2
318-
- 4
319-
320-
accept_eula: true
321-
env:
322-
SAGEMAKER_PROGRAM: "inference.py"
323-
ENDPOINT_SERVER_TIMEOUT: "3600"
324-
MODEL_CACHE_ROOT: "/opt/ml/model"
325-
SAGEMAKER_ENV: "1"
326-
HF_MODEL_ID: "/opt/ml/model"
327-
MAX_INPUT_LENGTH: "4095"
328-
MAX_TOTAL_TOKENS: "4096"
329-
SM_NUM_GPUS: "1"
330-
SAGEMAKER_MODEL_SERVER_WORKERS: "1"
331-
332-
# parameters related to how the final report is generated
333-
report:
334-
# constraints for latency, cost and error rate
335-
# an experiment is considered successful or eligible for
336-
# selection for a use-case if it satisfies all of the following
337-
# constraints. Experiments are scored as per this criteria
338-
# higher score is better (see 4_model_metric_analysis.ipynb score_run function)
339-
latency_budget: 2
340-
cost_per_10k_txn_budget: 20
341-
error_rate_budget: 0
342-
343-
# other misc reporting parameters, see 4_model_metric_analysis.ipynb
344-
# for more information
345-
per_inference_request_file: per_inference_request_results.csv
346-
all_metrics_file: all_metrics.csv
347-
txn_count_for_showing_cost: 10000
348-
v_shift_w_single_instance: 0.025
349-
v_shift_w_gt_one_instance: 0.025
350-
```
88+
A simple config file with key parameters annotated is includes in this repo, see [`config.yml`](./config.yml). This file benchmarks performance of Llama2-7b on an `ml.g5.xlarge` instance and an `ml.g5.2xlarge` instance.
35189
35290
## 🚨 Benchmarking Llama3 on Amazon SageMaker 🚨
35391

0 commit comments

Comments
 (0)