Add Quick start guides for Ensemble and BLS (#755)

pskiran1 · web-flow · commit 65229d0ad1bf · 2023-08-29T07:35:13.000-07:00
* Add Ensemble model and BLS model quick start guide

* Update ensemble_quick_start.md

* Add bls_quick_start.md

* Update newly added quick start guides

* Update BLS and Ensemble quick start guides

* new line

* new line

* Pre commit error fixes

* Pre-commit errors fix

* Modifications
diff --git a/README.md b/README.md
@@ -76,6 +76,14 @@ See the [Single Model Quick Start](docs/quick_start.md) for a guide on how to us
 ### **Multi Model**
 
 See the [Multi-model Quick Start](docs/mm_quick_start.md) for a guide on how to use Model Analyzer to profile, analyze and report on two models running concurrently on the same GPU.
+
+### **Ensemble Model**
+
+See the [Ensemble Model Quick Start](docs/ensemble_quick_start.md) for a guide on how to use Model Analyzer to profile, analyze and report on a simple Ensemble model.
+
+### **BLS Model**
+
+See the [BLS Model Quick Start](docs/bls_quick_start.md) for a guide on how to use Model Analyzer to profile, analyze and report on a simple BLS model.
 <br><br>
 
 # Documentation
diff --git a/docs/bls_quick_start.md b/docs/bls_quick_start.md
@@ -0,0 +1,171 @@
+<!--
+Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# BLS Model Quick Start
+
+The steps below will guide you through using Model Analyzer in Docker mode to profile and analyze a simple BLS model: bls.
+
+## `Step 1:` Download the BLS model `bls` and composing model `add`
+
+---
+
+**1. Create a new directory and enter it**
+
+```
+mkdir <new_dir> && cd <new_dir>
+```
+
+**2. Start a git repository**
+
+```
+git init && git remote add -f origin https://github.com/triton-inference-server/model_analyzer.git
+```
+
+**3. Enable sparse checkout, and download the examples directory, which contains the bls and add models**
+
+```
+git config core.sparseCheckout true && \
+echo 'examples' >> .git/info/sparse-checkout && \
+git pull origin main
+```
+
+## `Step 2:` Pull and Run the SDK Container
+
+---
+
+**1. Pull the SDK container:**
+
+```
+docker pull nvcr.io/nvidia/tritonserver:23.09-py3-sdk
+```
+
+**2. Run the SDK container**
+
+```
+docker run -it --gpus 1 \
+      --shm-size 2G \
+      -v /var/run/docker.sock:/var/run/docker.sock \
+      -v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
+      -v <path-to-output-model-repo>:<path-to-output-model-repo> \
+      --net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk
+```
+
+**Replacing** `<path-to-output-model-repo>` with the
+**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br>
+**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
+**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you need to increase the shared memory size accordingly<br><br>
+
+## `Step 3:` Profile the `bls` model
+
+---
+
+The [examples/quick-start](../examples/quick-start) directory is an example [Triton Model Repository](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md) that contains the BLS model `bls` which calculates the sum of two inputs using `add` model.
+
+An example model analyzer YAML config that performs a BLS model search
+
+```
+model_repository: <path-to-examples-quick-start>
+profile_models:
+  - bls
+bls_composing_models: add
+perf_analyzer_flags:
+  input-data: <path-to-examples-bls_input_data.json>
+triton_launch_mode: docker
+triton_docker_shm_size: 2G
+output_model_repository_path: <path-to-output-model-repo>/<output_dir>
+export_path: profile_results
+```
+
+**Important:** You must specify an `<output_dir>` subdirectory. You cannot have `output_model_repository_path` point directly to `<path-to-output-model-repo>`
+
+**Important:** If you already ran this earlier in the container, you can overwrite earlier results by adding the `override_output_model_repository: true` field to the YAML file.
+
+**Important**: All models must be in the same repository
+
+**Important:** [`bls`](../examples/quick-start/bls) model takes "MODEL_NAME" as one of its inputs. We must include "add" in the input data JSON file as "MODEL_NAME" for this example to function. Otherwise, Perf Analyzer will produce random data for "MODEL_NAME," resulting in failed inferences.
+
+Run the Model Analyzer `profile` subcommand inside the container with:
+
+```
+model-analyzer profile -f /path/to/config.yml
+```
+
+---
+
+The Model analyzer uses [Quick Search](config_search.md#quick-search-mode) algorithm for profiling the BLS model. After the quick search is completed, Model Analyzer will then sweep concurrencies for the top three configurations and then create a summary report and CSV outputs.
+
+Here is an example result summary, run on a Tesla V100 GPU:
+
+![Result Summary Top](../examples/bls_result_summary_top.jpg)
+![Result Summary Table](../examples/bls_result_summary_table.jpg)
+
+You will note that the top model configuration has a higher throughput than the other configurations.
+
+---
+
+The measured data and summary report will be placed inside the
+`./profile_results` directory. The directory will be structured as follows.
+
+```
+$HOME
+|-- model_analyzer
+    |-- profile_results
+        |-- perf_analyzer_error.log
+        |-- plots
+        |   |-- detailed
+        |   |   |-- bls_config_7
+        |   |   |   `-- latency_breakdown.png
+        |   |   |-- bls_config_8
+        |   |   |   `-- latency_breakdown.png
+        |   |   `-- bls_config_9
+        |   |       `-- latency_breakdown.png
+        |   `-- simple
+        |       |-- bls
+        |       |   |-- gpu_mem_v_latency.png
+        |       |   `-- throughput_v_latency.png
+        |       |-- bls_config_7
+        |       |   |-- cpu_mem_v_latency.png
+        |       |   |-- gpu_mem_v_latency.png
+        |       |   |-- gpu_power_v_latency.png
+        |       |   `-- gpu_util_v_latency.png
+        |       |-- bls_config_8
+        |       |   |-- cpu_mem_v_latency.png
+        |       |   |-- gpu_mem_v_latency.png
+        |       |   |-- gpu_power_v_latency.png
+        |       |   `-- gpu_util_v_latency.png
+        |       `-- bls_config_9
+        |           |-- cpu_mem_v_latency.png
+        |           |-- gpu_mem_v_latency.png
+        |           |-- gpu_power_v_latency.png
+        |           `-- gpu_util_v_latency.png
+        |-- reports
+        |   |-- detailed
+        |   |   |-- bls_config_7
+        |   |   |   `-- detailed_report.pdf
+        |   |   |-- bls_config_8
+        |   |   |   `-- detailed_report.pdf
+        |   |   `-- bls_config_9
+        |   |       `-- detailed_report.pdf
+        |   `-- summaries
+        |       `-- bls
+        |           `-- result_summary.pdf
+        `-- results
+            |-- metrics-model-gpu.csv
+            |-- metrics-model-inference.csv
+            `-- metrics-server-only.csv
+```
+
+**Note:** Above configurations, bls_config_7, bls_config_8, and bls_config_9 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations.
diff --git a/docs/ensemble_quick_start.md b/docs/ensemble_quick_start.md
@@ -0,0 +1,165 @@
+<!--
+Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Ensemble Model Quick Start
+
+The steps below will guide you through using Model Analyzer in Docker mode to profile and analyze a simple ensemble model: ensemble_add_sub.
+
+## `Step 1:` Download the ensemble model `ensemble_add_sub` and composing models `add`, `sub`
+
+---
+
+**1. Create a new directory and enter it**
+
+```
+mkdir <new_dir> && cd <new_dir>
+```
+
+**2. Start a git repository**
+
+```
+git init && git remote add -f origin https://github.com/triton-inference-server/model_analyzer.git
+```
+
+**3. Enable sparse checkout, and download the examples directory, which contains the ensemble_add_sub, add and sub**
+
+```
+git config core.sparseCheckout true && \
+echo 'examples' >> .git/info/sparse-checkout && \
+git pull origin main
+```
+
+**3. Add a version directory to ensemble_add_sub**
+
+```
+mkdir examples/quick/ensemble_add_sub/1
+```
+
+## `Step 2:` Pull and Run the SDK Container
+
+---
+
+**1. Pull the SDK container:**
+
+```
+docker pull nvcr.io/nvidia/tritonserver:23.09-py3-sdk
+```
+
+**2. Run the SDK container**
+
+```
+docker run -it --gpus 1 \
+      --shm-size 1G \
+      -v /var/run/docker.sock:/var/run/docker.sock \
+      -v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
+      -v <path-to-output-model-repo>:<path-to-output-model-repo> \
+      --net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk
+```
+
+**Replacing** `<path-to-output-model-repo>` with the
+**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br>
+**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
+**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly<br><br>
+
+## `Step 3:` Profile the `ensemble_add_sub` model
+
+---
+
+The [examples/quick-start](../examples/quick-start) directory is an example [Triton Model Repository](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md) that contains the ensemble model `ensemble_add_sub`, which calculates the sum and difference of two inputs using `add` and `sub` models.
+
+Run the Model Analyzer `profile` subcommand inside the container with:
+
+```
+model-analyzer profile \
+    --model-repository <path-to-examples-quick-start> \
+    --profile-models ensemble_add_sub \
+    --triton-launch-mode=docker --triton-docker-shm-size=1G \
+    --output-model-repository-path <path-to-output-model-repo>/<output_dir> \
+    --export-path profile_results
+```
+
+**Important:** You must specify an `<output_dir>` subdirectory. You cannot have `--output-model-repository-path` point directly to `<path-to-output-model-repo>`
+
+**Important:** If you already ran this earlier in the container, you can use the `--override-output-model-repository` option to overwrite the earlier results.
+
+**Important**: All models must be in the same repository
+
+---
+
+The Model analyzer uses [Quick Search](config_search.md#quick-search-mode) algorithm for profiling the Ensemble model. After the quick search is completed, Model Analyzer will then sweep concurrencies for the top three configurations and then create a summary report and CSV outputs.
+
+
+Here is an example result summary, run on a Tesla V100 GPU:
+
+![Result Summary Top](../examples/ensemble_result_summary_top.jpg)
+![Result Summary Table](../examples/ensemble_result_summary_table.jpg)
+
+You will note that the top model configuration has a higher throughput than the other configurations.
+
+---
+
+The measured data and summary report will be placed inside the
+`./profile_results` directory. The directory will be structured as follows.
+
+```
+$HOME
+|-- model_analyzer
+    |-- profile_results
+        |-- plots
+        |   |-- detailed
+        |   |   |-- ensemble_add_sub_config_5
+        |   |   |   `-- latency_breakdown.png
+        |   |   |-- ensemble_add_sub_config_6
+        |   |   |   `-- latency_breakdown.png
+        |   |   `-- ensemble_add_sub_config_7
+        |   |       `-- latency_breakdown.png
+        |   `-- simple
+        |       |-- ensemble_add_sub
+        |       |   |-- gpu_mem_v_latency.png
+        |       |   `-- throughput_v_latency.png
+        |       |-- ensemble_add_sub_config_5
+        |       |   |-- cpu_mem_v_latency.png
+        |       |   |-- gpu_mem_v_latency.png
+        |       |   |-- gpu_power_v_latency.png
+        |       |   `-- gpu_util_v_latency.png
+        |       |-- ensemble_add_sub_config_6
+        |       |   |-- cpu_mem_v_latency.png
+        |       |   |-- gpu_mem_v_latency.png
+        |       |   |-- gpu_power_v_latency.png
+        |       |   `-- gpu_util_v_latency.png
+        |       `-- ensemble_add_sub_config_7
+        |           |-- cpu_mem_v_latency.png
+        |           |-- gpu_mem_v_latency.png
+        |           |-- gpu_power_v_latency.png
+        |           `-- gpu_util_v_latency.png
+        |-- reports
+        |   |-- detailed
+        |   |   |-- ensemble_add_sub_config_5
+        |   |   |   `-- detailed_report.pdf
+        |   |   |-- ensemble_add_sub_config_6
+        |   |   |   `-- detailed_report.pdf
+        |   |   `-- ensemble_add_sub_config_7
+        |   |       `-- detailed_report.pdf
+        |   `-- summaries
+        |       `-- ensemble_add_sub
+        |           `-- result_summary.pdf
+        `-- results
+            |-- metrics-model-gpu.csv
+            |-- metrics-model-inference.csv
+            `-- metrics-server-only.csv
+```
+
+**Note:** Above configurations, ensemble_add_sub_config_5, ensemble_add_sub_config_6, and ensemble_add_sub_config_7 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations.
diff --git a/examples/bls_input_data.json b/examples/bls_input_data.json
@@ -0,0 +1,21 @@
+{
+    "data": [
+        {
+            "MODEL_NAME": [
+                "add"
+            ],
+            "INPUT0": [
+                0.74106514,
+                0.7371813,
+                0.5274665,
+                0.13930903
+            ],
+            "INPUT1": [
+                0.7845891,
+                0.88089234,
+                0.8466405,
+                0.55024815
+            ]
+        }
+    ]
+}
diff --git a/examples/bls_result_summary_table.jpg b/examples/bls_result_summary_table.jpg
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:ca9c4635772a8cbd1c83164689f48fd128bff0648b1e63b4f22369a9909587c2
+size 123725
diff --git a/examples/bls_result_summary_top.jpg b/examples/bls_result_summary_top.jpg
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:98e305516a1ecc4983d313ebcfd409c13a934cb0aa39eac93b08b775e8748903
+size 128361
diff --git a/examples/ensemble_result_summary_table.jpg b/examples/ensemble_result_summary_table.jpg
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:5452c38e12c5cc13e5b7ab43e598ca24b2801afe742be9941eb0d0c3705b30c8
+size 167500
diff --git a/examples/ensemble_result_summary_top.jpg b/examples/ensemble_result_summary_top.jpg
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:8dcc7335ec26639d644be6d831d8850d89d30bbb16e3e21cf5be57dcaf81efae
+size 137205