|
| 1 | +<!-- |
| 2 | +Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 3 | +
|
| 4 | +Licensed under the Apache License, Version 2.0 (the "License"); |
| 5 | +you may not use this file except in compliance with the License. |
| 6 | +You may obtain a copy of the License at |
| 7 | +
|
| 8 | + http://www.apache.org/licenses/LICENSE-2.0 |
| 9 | +
|
| 10 | +Unless required by applicable law or agreed to in writing, software |
| 11 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 12 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 13 | +See the License for the specific language governing permissions and |
| 14 | +limitations under the License. |
| 15 | +--> |
| 16 | + |
| 17 | +# Ensemble Model Quick Start |
| 18 | + |
| 19 | +The steps below will guide you through using Model Analyzer in Docker mode to profile and analyze a simple ensemble model: ensemble_add_sub. |
| 20 | + |
| 21 | +## `Step 1:` Download the ensemble model `ensemble_add_sub` and composing models `add`, `sub` |
| 22 | + |
| 23 | +--- |
| 24 | + |
| 25 | +**1. Create a new directory and enter it** |
| 26 | + |
| 27 | +``` |
| 28 | +mkdir <new_dir> && cd <new_dir> |
| 29 | +``` |
| 30 | + |
| 31 | +**2. Start a git repository** |
| 32 | + |
| 33 | +``` |
| 34 | +git init && git remote add -f origin https://github.com/triton-inference-server/model_analyzer.git |
| 35 | +``` |
| 36 | + |
| 37 | +**3. Enable sparse checkout, and download the examples directory, which contains the ensemble_add_sub, add and sub** |
| 38 | + |
| 39 | +``` |
| 40 | +git config core.sparseCheckout true && \ |
| 41 | +echo 'examples' >> .git/info/sparse-checkout && \ |
| 42 | +git pull origin main |
| 43 | +``` |
| 44 | + |
| 45 | +**3. Add a version directory to ensemble_add_sub** |
| 46 | + |
| 47 | +``` |
| 48 | +mkdir examples/quick/ensemble_add_sub/1 |
| 49 | +``` |
| 50 | + |
| 51 | +## `Step 2:` Pull and Run the SDK Container |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +**1. Pull the SDK container:** |
| 56 | + |
| 57 | +``` |
| 58 | +docker pull nvcr.io/nvidia/tritonserver:23.09-py3-sdk |
| 59 | +``` |
| 60 | + |
| 61 | +**2. Run the SDK container** |
| 62 | + |
| 63 | +``` |
| 64 | +docker run -it --gpus 1 \ |
| 65 | + --shm-size 1G \ |
| 66 | + -v /var/run/docker.sock:/var/run/docker.sock \ |
| 67 | + -v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \ |
| 68 | + -v <path-to-output-model-repo>:<path-to-output-model-repo> \ |
| 69 | + --net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk |
| 70 | +``` |
| 71 | + |
| 72 | +**Replacing** `<path-to-output-model-repo>` with the |
| 73 | +**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br> |
| 74 | +**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br> |
| 75 | +**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly<br><br> |
| 76 | + |
| 77 | +## `Step 3:` Profile the `ensemble_add_sub` model |
| 78 | + |
| 79 | +--- |
| 80 | + |
| 81 | +The [examples/quick-start](../examples/quick-start) directory is an example [Triton Model Repository](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_repository.md) that contains the ensemble model `ensemble_add_sub`, which calculates the sum and difference of two inputs using `add` and `sub` models. |
| 82 | + |
| 83 | +Run the Model Analyzer `profile` subcommand inside the container with: |
| 84 | + |
| 85 | +``` |
| 86 | +model-analyzer profile \ |
| 87 | + --model-repository <path-to-examples-quick-start> \ |
| 88 | + --profile-models ensemble_add_sub \ |
| 89 | + --triton-launch-mode=docker --triton-docker-shm-size=1G \ |
| 90 | + --output-model-repository-path <path-to-output-model-repo>/<output_dir> \ |
| 91 | + --export-path profile_results |
| 92 | +``` |
| 93 | + |
| 94 | +**Important:** You must specify an `<output_dir>` subdirectory. You cannot have `--output-model-repository-path` point directly to `<path-to-output-model-repo>` |
| 95 | + |
| 96 | +**Important:** If you already ran this earlier in the container, you can use the `--override-output-model-repository` option to overwrite the earlier results. |
| 97 | + |
| 98 | +**Important**: All models must be in the same repository |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +The Model analyzer uses [Quick Search](config_search.md#quick-search-mode) algorithm for profiling the Ensemble model. After the quick search is completed, Model Analyzer will then sweep concurrencies for the top three configurations and then create a summary report and CSV outputs. |
| 103 | + |
| 104 | + |
| 105 | +Here is an example result summary, run on a Tesla V100 GPU: |
| 106 | + |
| 107 | + |
| 108 | + |
| 109 | + |
| 110 | +You will note that the top model configuration has a higher throughput than the other configurations. |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +The measured data and summary report will be placed inside the |
| 115 | +`./profile_results` directory. The directory will be structured as follows. |
| 116 | + |
| 117 | +``` |
| 118 | +$HOME |
| 119 | +|-- model_analyzer |
| 120 | + |-- profile_results |
| 121 | + |-- plots |
| 122 | + | |-- detailed |
| 123 | + | | |-- ensemble_add_sub_config_5 |
| 124 | + | | | `-- latency_breakdown.png |
| 125 | + | | |-- ensemble_add_sub_config_6 |
| 126 | + | | | `-- latency_breakdown.png |
| 127 | + | | `-- ensemble_add_sub_config_7 |
| 128 | + | | `-- latency_breakdown.png |
| 129 | + | `-- simple |
| 130 | + | |-- ensemble_add_sub |
| 131 | + | | |-- gpu_mem_v_latency.png |
| 132 | + | | `-- throughput_v_latency.png |
| 133 | + | |-- ensemble_add_sub_config_5 |
| 134 | + | | |-- cpu_mem_v_latency.png |
| 135 | + | | |-- gpu_mem_v_latency.png |
| 136 | + | | |-- gpu_power_v_latency.png |
| 137 | + | | `-- gpu_util_v_latency.png |
| 138 | + | |-- ensemble_add_sub_config_6 |
| 139 | + | | |-- cpu_mem_v_latency.png |
| 140 | + | | |-- gpu_mem_v_latency.png |
| 141 | + | | |-- gpu_power_v_latency.png |
| 142 | + | | `-- gpu_util_v_latency.png |
| 143 | + | `-- ensemble_add_sub_config_7 |
| 144 | + | |-- cpu_mem_v_latency.png |
| 145 | + | |-- gpu_mem_v_latency.png |
| 146 | + | |-- gpu_power_v_latency.png |
| 147 | + | `-- gpu_util_v_latency.png |
| 148 | + |-- reports |
| 149 | + | |-- detailed |
| 150 | + | | |-- ensemble_add_sub_config_5 |
| 151 | + | | | `-- detailed_report.pdf |
| 152 | + | | |-- ensemble_add_sub_config_6 |
| 153 | + | | | `-- detailed_report.pdf |
| 154 | + | | `-- ensemble_add_sub_config_7 |
| 155 | + | | `-- detailed_report.pdf |
| 156 | + | `-- summaries |
| 157 | + | `-- ensemble_add_sub |
| 158 | + | `-- result_summary.pdf |
| 159 | + `-- results |
| 160 | + |-- metrics-model-gpu.csv |
| 161 | + |-- metrics-model-inference.csv |
| 162 | + `-- metrics-server-only.csv |
| 163 | +``` |
| 164 | + |
| 165 | +**Note:** Above configurations, ensemble_add_sub_config_5, ensemble_add_sub_config_6, and ensemble_add_sub_config_7 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations. |
0 commit comments