Skip to content

Commit 7f06f57

Browse files
authored
Changes to remove remote mode and abs path restrictions (#758)
1 parent 9f029a5 commit 7f06f57

File tree

5 files changed

+12
-46
lines changed

5 files changed

+12
-46
lines changed

docs/bls_quick_start.md

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -59,14 +59,10 @@ docker run -it --gpus 1 \
5959
--shm-size 2G \
6060
-v /var/run/docker.sock:/var/run/docker.sock \
6161
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
62-
-v <path-to-output-model-repo>:<path-to-output-model-repo> \
6362
--net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk
6463
```
6564

66-
**Replacing** `<path-to-output-model-repo>` with the
67-
**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br>
68-
**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
69-
**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you need to increase the shared memory size accordingly<br><br>
65+
**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly<br><br>
7066

7167
## `Step 3:` Profile the `bls` model
7268

@@ -168,4 +164,4 @@ $HOME
168164
`-- metrics-server-only.csv
169165
```
170166

171-
**Note:** Above configurations, bls_config_7, bls_config_8, and bls_config_9 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations.
167+
**Note:** Above configurations, bls_config_7, bls_config_8, and bls_config_9 are generated as the top configurations when running profiling on a single Tesla V100 GPU. However, running on multiple GPUs or different model GPUs may result in different top configurations.

docs/ensemble_quick_start.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -65,13 +65,9 @@ docker run -it --gpus 1 \
6565
--shm-size 1G \
6666
-v /var/run/docker.sock:/var/run/docker.sock \
6767
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
68-
-v <path-to-output-model-repo>:<path-to-output-model-repo> \
6968
--net=host nvcr.io/nvidia/tritonserver:23.09-py3-sdk
7069
```
7170

72-
**Replacing** `<path-to-output-model-repo>` with the
73-
**_absolute_ _path_** to the directory where the output model repository will be located. This ensures the Triton SDK container has access to the model config variants that Model Analyzer creates.<br><br>
74-
**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
7571
**Important:** The example above uses a single GPU. If you are running on multiple GPUs, you may need to increase the shared memory size accordingly<br><br>
7672

7773
## `Step 3:` Profile the `ensemble_add_sub` model
@@ -101,7 +97,6 @@ model-analyzer profile \
10197

10298
The Model analyzer uses [Quick Search](config_search.md#quick-search-mode) algorithm for profiling the Ensemble model. After the quick search is completed, Model Analyzer will then sweep concurrencies for the top three configurations and then create a summary report and CSV outputs.
10399

104-
105100
Here is an example result summary, run on a Tesla V100 GPU:
106101

107102
![Result Summary Top](../examples/ensemble_result_summary_top.jpg)

docs/launch_modes.md

Lines changed: 10 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
1313
See the License for the specific language governing permissions and
1414
limitations under the License.
1515
-->
16+
1617
# Launch Modes
1718

1819
Triton Model Analyzer's `profile` subcommand supports four different launch
@@ -25,7 +26,7 @@ Inference Server.
2526
### Docker
2627

2728
| CLI Option | **`--triton-launch-mode=docker`** |
28-
| - | - |
29+
| ---------- | --------------------------------- |
2930

3031
Note: A full step by step example of docker mode can be found in the [Quick Start Guide](quick_start.md).
3132

@@ -40,16 +41,8 @@ following flags are mandatory for correct behavior:
4041

4142
Additionally, Model Analyzer uses the `output_model_repository_path` to
4243
manipulate and store model config variants. When Model Analyzer launches the
43-
Triton container, it does so as a *sibling container*. The launched Triton
44-
container will only have access to the host filesystem. **As a result, in the
45-
docker launch mode, the output model directory will need to be mounted to the
46-
Model Analyzer docker container at the same absolute path it has outside the
47-
container.** So you must add the following when you launch the model analyzer
48-
container as well.
49-
50-
```
51-
-v <path-to-output-model-repository>:<path-to-output-model-repository>
52-
```
44+
Triton container, it does so as a _sibling container_. The launched Triton
45+
container will only have access to the host filesystem.
5346

5447
Finally, when launching model analyzer, the argument `--output-model-repository`
5548
must be provided as a directory inside `<path-to-output-model-repository>`. This
@@ -65,7 +58,7 @@ Triton SDK Container. You will need Docker installed, though.
6558
### Local
6659

6760
| CLI Option | **`--triton-launch-mode=local`** |
68-
| - | - |
61+
| ---------- | -------------------------------- |
6962

7063
Local mode is the default mode if no `triton-launch-mode` is specified.
7164

@@ -80,7 +73,7 @@ have a TritonServer executable
8073
### C API
8174

8275
| CLI Option | **`--triton-launch-mode=c_api`** |
83-
| - | - |
76+
| ---------- | -------------------------------- |
8477

8578
In this mode, Triton server is launched locally via the
8679
[C_API](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#in-process-triton-server-api)
@@ -96,19 +89,17 @@ the Model Analyzer is being used.
9689
The server metrics that Model Analyzer gathers and reports are not available directly
9790
from the triton server when running in C-API mode. Instead, Model Analyzer will attempt to
9891
gather this information itself. This can lead to less precise results, and will generally result
99-
in GPU utilization and power numbers being underreported.
92+
in GPU utilization and power numbers being under-reported.
10093

10194
### Remote
10295

10396
| CLI Option | **`--triton-launch-mode=remote`** |
104-
| - | - |
97+
| ---------- | --------------------------------- |
10598

10699
This mode is beneficial when you want to use an already running Triton Inference
107100
Server. You may provide the URLs for the Triton instance's HTTP or GRPC endpoint
108101
depending on your chosen client protocol using the `--triton-grpc-endpoint`, and
109102
`--triton-http-endpoint` flags. You should also make sure that same GPUs are
110103
available to the Inference Server and Model Analyzer and they are on the same
111-
machine. Model Analyzer does not currently support profiling remote GPUs. Triton
112-
Server in this mode needs to be launched with `--model-control-mode explicit`
113-
flag to support loading/unloading of the models. The model parameters cannot be
114-
changed in remote mode, though.
104+
machine. Triton Server in this mode needs to be launched with `--model-control-mode explicit`
105+
flag to support loading/unloading of the models.

docs/mm_quick_start.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -58,17 +58,9 @@ docker pull nvcr.io/nvidia/tritonserver:23.08-py3-sdk
5858
docker run -it --gpus all \
5959
-v /var/run/docker.sock:/var/run/docker.sock \
6060
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
61-
-v <path-to-output-model-repo>:<path-to-output-model-repo> \
6261
--net=host nvcr.io/nvidia/tritonserver:23.08-py3-sdk
6362
```
6463

65-
**Replacing** `<path-to-output-model-repo>` with the
66-
**_absolute_ _path_** to the directory where the output model repository
67-
will be located.
68-
This ensures the Triton SDK container has access to the model
69-
config variants that Model Analyzer creates.<br><br>
70-
**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
71-
7264
## `Step 3:` Profile both models concurrently
7365

7466
---

docs/quick_start.md

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -58,17 +58,9 @@ docker pull nvcr.io/nvidia/tritonserver:23.08-py3-sdk
5858
docker run -it --gpus all \
5959
-v /var/run/docker.sock:/var/run/docker.sock \
6060
-v $(pwd)/examples/quick-start:$(pwd)/examples/quick-start \
61-
-v <path-to-output-model-repo>:<path-to-output-model-repo> \
6261
--net=host nvcr.io/nvidia/tritonserver:23.08-py3-sdk
6362
```
6463

65-
**Replacing** `<path-to-output-model-repo>` with the
66-
**_absolute_ _path_** to the directory where the output model repository
67-
will be located.
68-
This ensures the Triton SDK container has access to the model
69-
config variants that Model Analyzer creates.<br><br>
70-
**Important:** You must ensure the absolutes paths are identical on both sides of the mounts (or else Tritonserver cannot load the model)<br><br>
71-
7264
## `Step 3:` Profile the `add_sub` model
7365

7466
---

0 commit comments

Comments
 (0)