Skip to content

Commit 65229d0

Browse files
authored
Add Quick start guides for Ensemble and BLS (#755)
* Add Ensemble model and BLS model quick start guide * Update ensemble_quick_start.md * Add bls_quick_start.md * Update newly added quick start guides * Update BLS and Ensemble quick start guides * new line * new line * Pre commit error fixes * Pre-commit errors fix * Modifications
1 parent 281f6cd commit 65229d0

File tree

8 files changed

+377
-0
lines changed

8 files changed

+377
-0
lines changed

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,14 @@ See the [Single Model Quick Start](docs/quick_start.md) for a guide on how to us
7676
### **Multi Model**
7777

7878
See the [Multi-model Quick Start](docs/mm_quick_start.md) for a guide on how to use Model Analyzer to profile, analyze and report on two models running concurrently on the same GPU.
79+
80+
### **Ensemble Model**
81+
82+
See the [Ensemble Model Quick Start](docs/ensemble_quick_start.md) for a guide on how to use Model Analyzer to profile, analyze and report on a simple Ensemble model.
83+
84+
### **BLS Model**
85+
86+
See the [BLS Model Quick Start](docs/bls_quick_start.md) for a guide on how to use Model Analyzer to profile, analyze and report on a simple BLS model.
7987
<br><br>
8088

8189
# Documentation

docs/bls_quick_start.md

Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
<!--
2+
Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
-->
16+
17+
# BLS Model Quick Start
18+
19+
The steps below will guide you through using Model Analyzer in Docker mode to profile and analyze a simple BLS model: bls.
20+
21+
## `Step 1:` Download the BLS model `bls` and composing model `add`
22+
23+
---
24+
25+
**1. Create a new directory and enter it**
26+
27+
```
28+
mkdir <new_dir> && cd <new_dir>
29+
```
30+
31+
**2. Start a git repository**
32+
33+
```
34+
git init && git remote add -f origin https://github.com/triton-inference-server/model_analyzer.git
35+
```
36+
37+
**3. Enable sparse checkout, and download the examples directory, which contains the bls and add models**
38+
39+
```
40+
git config core.sparseCheckout true && \
41+
echo 'examples' >> .git/info/sparse-checkout && \
42+
git pull origin main
43+
```
44+
45+
## `Step 2:` Pull and Run the SDK Container
46+
47+
---
48+
49+
**1. Pull the SDK container:**
50+
51+
```
52+
docker pull nvcr.io/nvidia/tritonserver:23.09-py3-sdk
53+
```
54+
55+
**2. Run the SDK container**
56+
57+
```
58+
docker run -it --gpus 1 \
59+
--shm-size 2G \
60+
-v /var/run/docker.sock:/var/run/docker.sock \
61+
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
62+
-v <path-to-output-model-repo>:<path-to-output-model-repo> \
63+
--net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk
64+
```
65+
66+
**Replacing** `<path-to-output-model-repo>` with the
67+
**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br>
68+
**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
69+
**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you need to increase the shared memory size accordingly<br><br>
70+
71+
## `Step 3:` Profile the `bls` model
72+
73+
---
74+
75+
The [examples/quick-start](../examples/quick-start) directory is an example [Triton Model Repository](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md) that contains the BLS model `bls` which calculates the sum of two inputs using `add` model.
76+
77+
An example model analyzer YAML config that performs a BLS model search
78+
79+
```
80+
model_repository: <path-to-examples-quick-start>
81+
profile_models:
82+
- bls
83+
bls_composing_models: add
84+
perf_analyzer_flags:
85+
input-data: <path-to-examples-bls_input_data.json>
86+
triton_launch_mode: docker
87+
triton_docker_shm_size: 2G
88+
output_model_repository_path: <path-to-output-model-repo>/<output_dir>
89+
export_path: profile_results
90+
```
91+
92+
**Important:** You must specify an `<output_dir>` subdirectory. You cannot have `output_model_repository_path` point directly to `<path-to-output-model-repo>`
93+
94+
**Important:** If you already ran this earlier in the container, you can overwrite earlier results by adding the `override_output_model_repository: true` field to the YAML file.
95+
96+
**Important**: All models must be in the same repository
97+
98+
**Important:** [`bls`](../examples/quick-start/bls) model takes "MODEL_NAME" as one of its inputs. We must include "add" in the input data JSON file as "MODEL_NAME" for this example to function. Otherwise, Perf Analyzer will produce random data for "MODEL_NAME," resulting in failed inferences.
99+
100+
Run the Model Analyzer `profile` subcommand inside the container with:
101+
102+
```
103+
model-analyzer profile -f /path/to/config.yml
104+
```
105+
106+
---
107+
108+
The Model analyzer uses [Quick Search](config_search.md#quick-search-mode) algorithm for profiling the BLS model. After the quick search is completed, Model Analyzer will then sweep concurrencies for the top three configurations and then create a summary report and CSV outputs.
109+
110+
Here is an example result summary, run on a Tesla V100 GPU:
111+
112+
![Result Summary Top](../examples/bls_result_summary_top.jpg)
113+
![Result Summary Table](../examples/bls_result_summary_table.jpg)
114+
115+
You will note that the top model configuration has a higher throughput than the other configurations.
116+
117+
---
118+
119+
The measured data and summary report will be placed inside the
120+
`./profile_results` directory. The directory will be structured as follows.
121+
122+
```
123+
$HOME
124+
|-- model_analyzer
125+
|-- profile_results
126+
|-- perf_analyzer_error.log
127+
|-- plots
128+
| |-- detailed
129+
| | |-- bls_config_7
130+
| | | `-- latency_breakdown.png
131+
| | |-- bls_config_8
132+
| | | `-- latency_breakdown.png
133+
| | `-- bls_config_9
134+
| | `-- latency_breakdown.png
135+
| `-- simple
136+
| |-- bls
137+
| | |-- gpu_mem_v_latency.png
138+
| | `-- throughput_v_latency.png
139+
| |-- bls_config_7
140+
| | |-- cpu_mem_v_latency.png
141+
| | |-- gpu_mem_v_latency.png
142+
| | |-- gpu_power_v_latency.png
143+
| | `-- gpu_util_v_latency.png
144+
| |-- bls_config_8
145+
| | |-- cpu_mem_v_latency.png
146+
| | |-- gpu_mem_v_latency.png
147+
| | |-- gpu_power_v_latency.png
148+
| | `-- gpu_util_v_latency.png
149+
| `-- bls_config_9
150+
| |-- cpu_mem_v_latency.png
151+
| |-- gpu_mem_v_latency.png
152+
| |-- gpu_power_v_latency.png
153+
| `-- gpu_util_v_latency.png
154+
|-- reports
155+
| |-- detailed
156+
| | |-- bls_config_7
157+
| | | `-- detailed_report.pdf
158+
| | |-- bls_config_8
159+
| | | `-- detailed_report.pdf
160+
| | `-- bls_config_9
161+
| | `-- detailed_report.pdf
162+
| `-- summaries
163+
| `-- bls
164+
| `-- result_summary.pdf
165+
`-- results
166+
|-- metrics-model-gpu.csv
167+
|-- metrics-model-inference.csv
168+
`-- metrics-server-only.csv
169+
```
170+
171+
**Note:** Above configurations, bls_config_7, bls_config_8, and bls_config_9 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations.

docs/ensemble_quick_start.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
<!--
2+
Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
-->
16+
17+
# Ensemble Model Quick Start
18+
19+
The steps below will guide you through using Model Analyzer in Docker mode to profile and analyze a simple ensemble model: ensemble_add_sub.
20+
21+
## `Step 1:` Download the ensemble model `ensemble_add_sub` and composing models `add`, `sub`
22+
23+
---
24+
25+
**1. Create a new directory and enter it**
26+
27+
```
28+
mkdir <new_dir> && cd <new_dir>
29+
```
30+
31+
**2. Start a git repository**
32+
33+
```
34+
git init && git remote add -f origin https://github.com/triton-inference-server/model_analyzer.git
35+
```
36+
37+
**3. Enable sparse checkout, and download the examples directory, which contains the ensemble_add_sub, add and sub**
38+
39+
```
40+
git config core.sparseCheckout true && \
41+
echo 'examples' >> .git/info/sparse-checkout && \
42+
git pull origin main
43+
```
44+
45+
**3. Add a version directory to ensemble_add_sub**
46+
47+
```
48+
mkdir examples/quick/ensemble_add_sub/1
49+
```
50+
51+
## `Step 2:` Pull and Run the SDK Container
52+
53+
---
54+
55+
**1. Pull the SDK container:**
56+
57+
```
58+
docker pull nvcr.io/nvidia/tritonserver:23.09-py3-sdk
59+
```
60+
61+
**2. Run the SDK container**
62+
63+
```
64+
docker run -it --gpus 1 \
65+
--shm-size 1G \
66+
-v /var/run/docker.sock:/var/run/docker.sock \
67+
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
68+
-v <path-to-output-model-repo>:<path-to-output-model-repo> \
69+
--net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk
70+
```
71+
72+
**Replacing** `<path-to-output-model-repo>` with the
73+
**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br>
74+
**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
75+
**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly<br><br>
76+
77+
## `Step 3:` Profile the `ensemble_add_sub` model
78+
79+
---
80+
81+
The [examples/quick-start](../examples/quick-start) directory is an example [Triton Model Repository](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md) that contains the ensemble model `ensemble_add_sub`, which calculates the sum and difference of two inputs using `add` and `sub` models.
82+
83+
Run the Model Analyzer `profile` subcommand inside the container with:
84+
85+
```
86+
model-analyzer profile \
87+
--model-repository <path-to-examples-quick-start> \
88+
--profile-models ensemble_add_sub \
89+
--triton-launch-mode=docker --triton-docker-shm-size=1G \
90+
--output-model-repository-path <path-to-output-model-repo>/<output_dir> \
91+
--export-path profile_results
92+
```
93+
94+
**Important:** You must specify an `<output_dir>` subdirectory. You cannot have `--output-model-repository-path` point directly to `<path-to-output-model-repo>`
95+
96+
**Important:** If you already ran this earlier in the container, you can use the `--override-output-model-repository` option to overwrite the earlier results.
97+
98+
**Important**: All models must be in the same repository
99+
100+
---
101+
102+
The Model analyzer uses [Quick Search](config_search.md#quick-search-mode) algorithm for profiling the Ensemble model. After the quick search is completed, Model Analyzer will then sweep concurrencies for the top three configurations and then create a summary report and CSV outputs.
103+
104+
105+
Here is an example result summary, run on a Tesla V100 GPU:
106+
107+
![Result Summary Top](../examples/ensemble_result_summary_top.jpg)
108+
![Result Summary Table](../examples/ensemble_result_summary_table.jpg)
109+
110+
You will note that the top model configuration has a higher throughput than the other configurations.
111+
112+
---
113+
114+
The measured data and summary report will be placed inside the
115+
`./profile_results` directory. The directory will be structured as follows.
116+
117+
```
118+
$HOME
119+
|-- model_analyzer
120+
|-- profile_results
121+
|-- plots
122+
| |-- detailed
123+
| | |-- ensemble_add_sub_config_5
124+
| | | `-- latency_breakdown.png
125+
| | |-- ensemble_add_sub_config_6
126+
| | | `-- latency_breakdown.png
127+
| | `-- ensemble_add_sub_config_7
128+
| | `-- latency_breakdown.png
129+
| `-- simple
130+
| |-- ensemble_add_sub
131+
| | |-- gpu_mem_v_latency.png
132+
| | `-- throughput_v_latency.png
133+
| |-- ensemble_add_sub_config_5
134+
| | |-- cpu_mem_v_latency.png
135+
| | |-- gpu_mem_v_latency.png
136+
| | |-- gpu_power_v_latency.png
137+
| | `-- gpu_util_v_latency.png
138+
| |-- ensemble_add_sub_config_6
139+
| | |-- cpu_mem_v_latency.png
140+
| | |-- gpu_mem_v_latency.png
141+
| | |-- gpu_power_v_latency.png
142+
| | `-- gpu_util_v_latency.png
143+
| `-- ensemble_add_sub_config_7
144+
| |-- cpu_mem_v_latency.png
145+
| |-- gpu_mem_v_latency.png
146+
| |-- gpu_power_v_latency.png
147+
| `-- gpu_util_v_latency.png
148+
|-- reports
149+
| |-- detailed
150+
| | |-- ensemble_add_sub_config_5
151+
| | | `-- detailed_report.pdf
152+
| | |-- ensemble_add_sub_config_6
153+
| | | `-- detailed_report.pdf
154+
| | `-- ensemble_add_sub_config_7
155+
| | `-- detailed_report.pdf
156+
| `-- summaries
157+
| `-- ensemble_add_sub
158+
| `-- result_summary.pdf
159+
`-- results
160+
|-- metrics-model-gpu.csv
161+
|-- metrics-model-inference.csv
162+
`-- metrics-server-only.csv
163+
```
164+
165+
**Note:** Above configurations, ensemble_add_sub_config_5, ensemble_add_sub_config_6, and ensemble_add_sub_config_7 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations.

examples/bls_input_data.json

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
{
2+
"data": [
3+
{
4+
"MODEL_NAME": [
5+
"add"
6+
],
7+
"INPUT0": [
8+
0.74106514,
9+
0.7371813,
10+
0.5274665,
11+
0.13930903
12+
],
13+
"INPUT1": [
14+
0.7845891,
15+
0.88089234,
16+
0.8466405,
17+
0.55024815
18+
]
19+
}
20+
]
21+
}
Lines changed: 3 additions & 0 deletions
Loading
Lines changed: 3 additions & 0 deletions
Loading
Lines changed: 3 additions & 0 deletions
Loading
Lines changed: 3 additions & 0 deletions
Loading

0 commit comments

Comments
 (0)