Skip to content

Commit 7b2dae6

Browse files
authored
Add early access documentation for multi-model (#541)
* First pass at adding mutli-model documentation * Further revisions * Updated based on Tim's review comments * Adding missing parameters in example * Changed step 2 to indicate where the tritonserver container actually comes from * Removing we
1 parent c566d57 commit 7b2dae6

File tree

5 files changed

+84
-56
lines changed

5 files changed

+84
-56
lines changed

README.md

Lines changed: 34 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -33,37 +33,40 @@ Triton Inference Server.
3333

3434
## Features
3535

36-
* [Brute and Quick search](docs/config_search.md): Model Analyzer can
37-
help you automatically find the optimal settings for
38-
[Max Batch Size](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#maximum-batch-size),
39-
[Dynamic Batching](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#dynamic-batcher), and
40-
[Instance Group](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups)
41-
parameters of your model configuration. Model Analyzer utilizes
42-
[Performance Analyzer](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md)
43-
to test the model with different concurrency and batch sizes of requests. Using
44-
[Manual Config Search](docs/config_search.md#manual-brute-search), you can create manual sweeps for every parameter that can be specified in the model configuration.
45-
46-
* [Detailed and summary reports](docs/report.md): Model Analyzer is able to generate
47-
summarized and detailed reports that can help you better understand the trade-offs
48-
between different model configurations that can be used for your model.
49-
50-
* [QoS Constraints](docs/config.md#constraint): Constraints can help you
51-
filter out the Model Analyzer results based on your QoS requirements. For
52-
example, you can specify a latency budget to filter out model configurations
53-
that do not satisfy the specified latency threshold.
36+
- [Brute and Quick search](docs/config_search.md): Model Analyzer can
37+
help you automatically find the optimal settings for
38+
[Max Batch Size](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#maximum-batch-size),
39+
[Dynamic Batching](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#dynamic-batcher), and
40+
[Instance Group](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups)
41+
parameters of your model configuration. Model Analyzer utilizes
42+
[Performance Analyzer](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md)
43+
to test the model with different concurrency and batch sizes of requests. Using
44+
[Manual Config Search](docs/config_search.md#manual-brute-search), you can create manual sweeps for every parameter that can be specified in the model configuration.
45+
46+
- [Multi-Model Search](docs/config_search.md#multi-model-search-mode): **EARLY ACCESS** - Model Analyzer can help you
47+
find the optimal settings when profiling multiple concurrent models, utilizing our Quick Search alogrithm
48+
49+
- [Detailed and summary reports](docs/report.md): Model Analyzer is able to generate
50+
summarized and detailed reports that can help you better understand the trade-offs
51+
between different model configurations that can be used for your model.
52+
53+
- [QoS Constraints](docs/config.md#constraint): Constraints can help you
54+
filter out the Model Analyzer results based on your QoS requirements. For
55+
example, you can specify a latency budget to filter out model configurations
56+
that do not satisfy the specified latency threshold.
5457

5558
## Documentation
5659

57-
* [Installation](docs/install.md)
58-
* [Quick Start](docs/quick_start.md)
59-
* [Model Analyzer CLI](docs/cli.md)
60-
* [Launch Modes](docs/launch_modes.md)
61-
* [Configuring Model Analyzer](docs/config.md)
62-
* [Model Analyzer Metrics](docs/metrics.md)
63-
* [Model Config Search](docs/config_search.md)
64-
* [Checkpointing](docs/checkpoints.md)
65-
* [Model Analyzer Reports](docs/report.md)
66-
* [Deployment with Kubernetes](docs/kubernetes_deploy.md)
60+
- [Installation](docs/install.md)
61+
- [Quick Start](docs/quick_start.md)
62+
- [Model Analyzer CLI](docs/cli.md)
63+
- [Launch Modes](docs/launch_modes.md)
64+
- [Configuring Model Analyzer](docs/config.md)
65+
- [Model Analyzer Metrics](docs/metrics.md)
66+
- [Model Config Search](docs/config_search.md)
67+
- [Checkpointing](docs/checkpoints.md)
68+
- [Model Analyzer Reports](docs/report.md)
69+
- [Deployment with Kubernetes](docs/kubernetes_deploy.md)
6770

6871
# Reporting problems, asking questions
6972

@@ -72,14 +75,14 @@ project. When help with code is needed, follow the process outlined in
7275
the Stack Overflow (https://stackoverflow.com/help/mcve)
7376
document. Ensure posted examples are:
7477

75-
* minimal – use as little code as possible that still produces the
78+
- minimal – use as little code as possible that still produces the
7679
same problem
7780

78-
* complete – provide all parts needed to reproduce the problem. Check
81+
- complete – provide all parts needed to reproduce the problem. Check
7982
if you can strip external dependency and still show the problem. The
8083
less time we spend on reproducing problems the more time we have to
8184
fix it
8285

83-
* verifiable – test the code you're about to provide to make sure it
86+
- verifiable – test the code you're about to provide to make sure it
8487
reproduces the problem. Remove all other problems that are not
8588
related to your request/question.

docs/config.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -132,16 +132,16 @@ profile_models: <comma-delimited-string-list>
132132
# Triton Docker image tag used when launching using Docker mode
133133
[ triton_docker_image: <string> | default: nvcr.io/nvidia/tritonserver:22.09-py3 ]
134134
135-
# Triton Server HTTP endpoint url used by Model Analyzer client. Will be ignored if server-launch-mode is not 'remote'".
135+
# Triton Server HTTP endpoint url used by Model Analyzer client.".
136136
[ triton_http_endpoint: <string> | default: localhost:8000 ]
137137
138138
# The full path to the parent directory of 'lib/libtritonserver.so. Only required when using triton_launch_mode=c_api.
139139
[ triton_install_path: <string> | default: /opt/tritonserver ]
140140
141-
# Triton Server GRPC endpoint url used by Model Analyzer client. Will be ignored if server-launch-mode is not 'remote'".
141+
# Triton Server GRPC endpoint url used by Model Analyzer client.".
142142
[ triton_grpc_endpoint: <string> | default: localhost:8001 ]
143143
144-
# Triton Server metrics endpoint url used by Model Analyzer client. Will be ignored if server-launch-mode is not 'remote'".
144+
# Triton Server metrics endpoint url used by Model Analyzer client.".
145145
[ triton_metrics_url: <string> | default: http://localhost:8002/metrics ]
146146
147147
# The full path to the tritonserver binary executable
@@ -189,6 +189,9 @@ profile_models: <comma-delimited-string-list>
189189
# Disables automatic config search
190190
[ run_config_search_disable: <bool> | default: false ]
191191
192+
# Enables the profiling of all supplied models concurrently
193+
[ run_config_profile_models_concurrently_enable: <bool> | default: false]
194+
192195
# Skips the generation of analysis summary reports and tables
193196
[ skip_summary_reports: <bool> | default: false]
194197
@@ -583,10 +586,10 @@ specified on a per model basis and cannot be specified globally (like
583586
Table below presents the list of common parameters that can be used for manual
584587
sweeping:
585588

586-
| Option | Description |
587-
| :------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
589+
| Option | Description |
590+
| :-----------------------------------------------------------------------------------------------------------------------------------------: | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: |
588591
| [`dynamic_batching`](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#dynamic-batcher) | Dynamic batching is a feature of Triton that allows inference requests to be combined by the server, so that a batch is created dynamically. |
589-
| [`max_batch_size`](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#maximum-batch-size) | The max_batch_size property indicates the maximum batch size that the model supports for the [types of batching](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/architecture.md#models-and-schedulers) that can be exploited by Triton. |
592+
| [`max_batch_size`](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#maximum-batch-size) | The max_batch_size property indicates the maximum batch size that the model supports for the [types of batching](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/architecture.md#models-and-schedulers) that can be exploited by Triton. |
590593
| [`instance_group`](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#instance-groups) | Triton can provide multiple instances of a model so that multiple inference requests for that model can be handled simultaneously. The model configuration ModelInstanceGroup property is used to specify the number of execution instances that should be made available and what compute resource should be used for those instances. |
591594

592595
An example `<model-config-parameters>` look like below:
@@ -700,7 +703,7 @@ perf_analyzer_flags:
700703

701704
#### Model-specific options for Perf Analyzer
702705

703-
In order to set flags only for a specific model, you can specify
706+
In order to set flags only for a specific model, you can specify
704707
the flags in the following way:
705708

706709
```yaml
@@ -736,8 +739,8 @@ then the `shape` option of the `perf_analyzer_flags` option must be specified.
736739
More information about this can be found in the
737740
[Perf Analyzer documentation](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/perf_analyzer.md#input-data).
738741

739-
740742
#### SSL Support:
743+
741744
Perf Analyzer supports SSL via GRPC and HTTP. It can be enabled via Model Analyzer configuration file updates.
742745

743746
GRPC example:

docs/config_search.md

Lines changed: 29 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,17 @@ limitations under the License.
1717
# Model Config Search
1818

1919
Model Analyzer's `profile` subcommand supports multiple modes when searching to find the best model configuration.
20-
* [Brute](config_search.md#brute-search-mode) is the default, and will do a brute-force sweep of the cross product of all possible configurations
21-
* [Quick](config_search.md#quick-search-mode) will use heuristics to try to find the optimal configuration much quicker than brute, and can be enabled via `--run-config-search-mode quick`
20+
21+
- [Brute](config_search.md#brute-search-mode) is the default, and will do a brute-force sweep of the cross product of all possible configurations
22+
- [Quick](config_search.md#quick-search-mode) will use heuristics to try to find the optimal configuration much quicker than brute, and can be enabled via `--run-config-search-mode quick`
23+
24+
_This is mode is in **EARLY ACCESS** and is limited in scope:_
25+
26+
- [Multi-model](config_search.md#multi-model-search-mode) will profile mutliple models to find the optimal configurations for all models while they are running concurrently. This feature is enabled via `--run-config-profile-models-concurrently-enable`
2227

2328
## Brute Search Mode
2429

25-
Model Analyzer's brute search mode will do a brute-force sweep of the cross product of all possible configurations. You can [Manually](config_search.md#manual-brute-search) provide `model_config_parameters` to tell Model Analyzer what to sweep over, or you can
30+
Model Analyzer's brute search mode will do a brute-force sweep of the cross product of all possible configurations. You can [Manually](config_search.md#manual-brute-search) provide `model_config_parameters` to tell Model Analyzer what to sweep over, or you can
2631
let it [Automatically](config_search.md#automatic-brute-search) sweep through configurations expected to have the highest impact on performance for Triton models.
2732

2833
### Automatic Brute Search
@@ -86,7 +91,8 @@ model_repository: /path/to/model/repository/
8691
8792
profile_models:
8893
model_1:
89-
concurrency: 1,2,3,128
94+
parameters:
95+
concurrency: 1,2,3,128
9096
```
9197

9298
The config described below will only sweep through different values for
@@ -123,7 +129,7 @@ the configuration that is generated is loadable by Triton.
123129

124130
You can also specify `concurrency` ranges to sweep through. If unspecified, it will
125131
automatically sweep concurrency for every model configuration (unless `--run-config-search-disable`
126-
is set, in which case it will only use the concurrency value of 1)
132+
is set, in which case it will only use the concurrency value of 1)
127133

128134
An example Model Analyzer Config that performs manual sweeping looks like below:
129135

@@ -174,3 +180,21 @@ the maximal objective value within the specified constraints. In the majority of
174180
this will find greater than 95% of the maximum objective value (that could be found using a brute force search), while needing to search less than 10% of the configuration space.
175181

176182
After it has found the best config(s), it will then sweep the top-N configurations found (specified by `--num-configs-per-model`) over the default concurrency range before generation of the summary reports.
183+
184+
## Multi-Model Search Mode
185+
186+
_This mode is in EARLY ACCESS and has the following limitations:_
187+
188+
- Can only be run in `quick` search mode
189+
- Cannot set limitations on min/max batch size, concurrency or instance count
190+
- Does not support individual model constraints, only global constraints
191+
- Does not support individual model weighting, all models are treated with equal priority when trying to maximize objective value
192+
- Does not support detailed reporting, only summary reports
193+
194+
Multi-model concurrent search mode can be enabled by adding the parameter `--run-config-profile-models-concurrently-enable` to the CLI.
195+
196+
It uses Quick Search mode's hill climbing algorithm to search all models configurations spaces in parallel, looking for the maximal objective value within the specified constraints. Model Analyzer has observed positive outcomes towards finding the maximum objective value; with runtimes of around 20-30 minutes (compared to the days it would take a brute force run to complete).
197+
198+
After it has found the best config(s), it will then sweep the top-N configurations found (specified by `--num-configs-per-model`) over the default concurrency range before generation of the summary reports.
199+
200+
_Note:_ The algorithm attempts to find the most fair and optimal result for all models, by evaluating each model objective's gain/loss. In many cases this will result in the algorithm ranking higher a configuration that has a lower total combined throughput (if that was the objective), if this better balances the throughputs of all the models.

docs/install.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -95,7 +95,7 @@ cd ./model_analyzer
9595
docker build --pull -t model-analyzer .
9696
```
9797

98-
Model Analyzer's Dockerfile bases the container on the latest `tritonserver`
98+
Model Analyzer's Dockerfile bases the container on the corresponding `tritonserver` (from step 1)
9999
containers from NGC.<br><br>
100100

101101
**3. Run the Container**

model_analyzer/config/input/config_command_profile.py

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -610,25 +610,23 @@ def _add_triton_configs(self):
610610
field_type=ConfigPrimitive(str),
611611
default_value=DEFAULT_TRITON_HTTP_ENDPOINT,
612612
description=
613-
"Triton Server HTTP endpoint url used by Model Analyzer client. "
614-
"Will be ignored if server-launch-mode is not 'remote'"))
613+
"Triton Server HTTP endpoint url used by Model Analyzer client."
614+
))
615615
self._add_config(
616616
ConfigField(
617617
'triton_grpc_endpoint',
618618
flags=['--triton-grpc-endpoint'],
619619
field_type=ConfigPrimitive(str),
620620
default_value=DEFAULT_TRITON_GRPC_ENDPOINT,
621621
description=
622-
"Triton Server HTTP endpoint url used by Model Analyzer client. "
623-
"Will be ignored if server-launch-mode is not 'remote'"))
622+
"Triton Server HTTP endpoint url used by Model Analyzer client."
623+
))
624624
self._add_config(
625-
ConfigField(
626-
'triton_metrics_url',
627-
field_type=ConfigPrimitive(str),
628-
flags=['--triton-metrics-url'],
629-
default_value=DEFAULT_TRITON_METRICS_URL,
630-
description="Triton Server Metrics endpoint url. "
631-
"Will be ignored if server-launch-mode is not 'remote'"))
625+
ConfigField('triton_metrics_url',
626+
field_type=ConfigPrimitive(str),
627+
flags=['--triton-metrics-url'],
628+
default_value=DEFAULT_TRITON_METRICS_URL,
629+
description="Triton Server Metrics endpoint url. "))
632630
self._add_config(
633631
ConfigField(
634632
'triton_server_path',

0 commit comments

Comments
 (0)