Skip to content

Commit 5134ba3

Browse files
nv-braftgerdesnv
andauthored
Document quick search (#516)
* Initial set of changes to document quick search * Changes based on review comments * Fixing formatting * Create brute and quick sections * Creating links and descriptions for auto vs manual * Some more details and cleanup * fix typo Co-authored-by: tgerdes <[email protected]>
1 parent 49dbb87 commit 5134ba3

File tree

4 files changed

+64
-43
lines changed

4 files changed

+64
-43
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,15 +33,15 @@ Triton Inference Server.
3333

3434
## Features
3535

36-
* [Automatic and manual configuration search](docs/config_search.md): Model Analyzer can
36+
* [Brute and Quick search](docs/config_search.md): Model Analyzer can
3737
help you automatically find the optimal settings for
3838
[Max Batch Size](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#maximum-batch-size),
3939
[Dynamic Batching](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#dynamic-batcher), and
4040
[Instance Group](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#instance-groups)
4141
parameters of your model configuration. Model Analyzer utilizes
4242
[Performance Analyzer](https://github.com/triton-inference-server/server/blob/main/docs/perf_analyzer.md)
4343
to test the model with different concurrency and batch sizes of requests. Using
44-
[Manual Config Search](docs/config_search.md#manual-configuration-search), you can create manual sweeps for every parameter that can be specified in the model configuration.
44+
[Manual Config Search](docs/config_search.md#manual-brute-search), you can create manual sweeps for every parameter that can be specified in the model configuration.
4545

4646
* [Detailed and summary reports](docs/report.md): Model Analyzer is able to generate
4747
summarized and detailed reports that can help you better understand the trade-offs

docs/cli.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ $ model-analyzer -h
2626
Options like `-q`, `--quiet` and `-v`, `--verbose` are global and apply to all
2727
model analyzer subcommands.
2828

29-
## Model Analyze Modes
29+
## Model Analyzer Modes
3030

3131
The `-m` or `--mode` flag is global and is accessible to all subcommands. It tells the model analyzer the context
3232
in which it is being run. Currently model analyzer supports 2 modes.
@@ -86,8 +86,8 @@ $ model-analyzer profile -h
8686

8787
Depending on the command line or YAML config options provided, the `profile`
8888
subcommand will either perform a
89-
[manual](./config_search.md#manual-configuration-search) or [automatic
90-
search](./config_search.md#automatic-configuration-search) over perf analyzer
89+
[manual](./config_search.md#manual-brute-search), [automatic](./config_search.md#automatic-brute-search), or
90+
[quick](./config_search.md#quick-configuration-search) search over perf analyzer
9191
and model config file parameters. For each combination of [model config
9292
parameters](./config.md#model-config-parameters) (e.g. _max batch size_, _dynamic batching_, and _instance count_), it will run tritonserver and perf analyzer instances with
9393
all the specified run parameters (client request concurrency and static batch
@@ -112,19 +112,25 @@ Some example profile commands are shown here. For a full example see the
112112
$ model-analyzer profile -m /home/model_repo --profile-models resnet50_libtorch
113113
```
114114

115-
2. Run auto config search on 2 models called `resnet50_libtorch` and `vgg16_graphdef` located in `/home/model_repo` and save checkpoints to `checkpoints`
115+
2. Run quick search on a model called `resnet50_libtorch` located in `/home/model_repo`
116+
117+
```
118+
$ model-analyzer profile -m /home/model_repo --profile-models resnet50_libtorch --run-config-search-mode quick
119+
```
120+
121+
3. Run auto config search on 2 models called `resnet50_libtorch` and `vgg16_graphdef` located in `/home/model_repo` and save checkpoints to `checkpoints`
116122

117123
```
118124
$ model-analyzer profile -m /home/model_repo --profile-models resnet50_libtorch,vgg16_graphdef --checkpoint-directory=checkpoints
119125
```
120126

121-
3. Run auto config search on a model called `resnet50_libtorch` located in `/home/model_repo`, but change the repository where model config variants are stored to `/home/output_repo`
127+
4. Run auto config search on a model called `resnet50_libtorch` located in `/home/model_repo`, but change the repository where model config variants are stored to `/home/output_repo`
122128

123129
```
124130
$ model-analyzer profile -m /home/model_repo --output-model-repository-path=/home/output_repo --profile-models resnet50_libtorch
125131
```
126132

127-
4. Run profile over manually defined configurations for a models `classification_malaria_v1` and `classification_chestxray_v1` located in `/home/model_repo` using the YAML config file
133+
5. Run profile over manually defined configurations for a models `classification_malaria_v1` and `classification_chestxray_v1` located in `/home/model_repo` using the YAML config file
128134

129135
```
130136
$ model-analyzer profile -f /path/to/config.yaml
@@ -157,7 +163,7 @@ profile_models:
157163
max_queue_delay_microseconds: [100]
158164
```
159165
160-
5. Apply objectives and constraints to sort and filter results in summary plots and tables using yaml config file.
166+
6. Apply objectives and constraints to sort and filter results in summary plots and tables using yaml config file.
161167
162168
```
163169
$ model-analyzer profile -f /path/to/config.yaml

docs/config.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,6 +165,9 @@ profile_models: <comma-delimited-string-list>
165165
# List of GPU UUIDs to be used for the profiling. Use 'all' to profile all the GPUs visible by CUDA.
166166
[ gpus: <string|comma-delimited-list-string> | default: 'all' ]
167167
168+
# Search mode. Options are "brute" and "quick"
169+
[ run_config_search_mode: <string> | default: brute]
170+
168171
# Minimum concurrency used for the automatic config search.
169172
[ run_config_search_min_concurrency: <int> | default: 1 ]
170173

docs/config_search.md

Lines changed: 46 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -16,20 +16,24 @@ limitations under the License.
1616

1717
# Model Config Search
1818

19-
Model Analyzer's `profile` subcommand supports **automatic** and **manual**
20-
sweeping through different configurations for Triton models.
19+
Model Analyzer's `profile` subcommand supports multiple modes when searching to find the best model configuration.
20+
* [Brute](config_search.md#brute-search-mode) is the default, and will do a brute-force sweep of the cross product of all possible configurations
21+
* [Quick](config_search.md#quick-search-mode) will use heuristics to try to find the optimal configuration much quicker than brute, and can be enabled via `--run-config-search-mode quick`
2122

22-
## Automatic Configuration Search
23+
## Brute Search Mode
24+
25+
Model Analyzer's brute search mode will do a brute-force sweep of the cross product of all possible configurations. You can [Manually](config_search.md#manual-brute-search) provide `model_config_parameters` to tell Model Analyzer what to sweep over, or you can
26+
let it [Automatically](config_search.md#automatic-brute-search) sweep through configurations expected to have the highest impact on performance for Triton models.
27+
28+
### Automatic Brute Search
2329

2430
Automatic configuration search is the default behavior when running Model
25-
Analyzer. This mode is enabled when there is not any parameters specified for the
26-
`model_config_parameters` section of the Model Analyzer Config. The parameters
31+
Analyzer without manually specifying what values to search. The parameters
2732
that are automatically searched are
2833
[`max_batch_size`](https://github.com/triton-inference-server/server/blob/master/docs/model_configuration.md#maximum-batch-size)
2934
and
3035
[`instance_group`](https://github.com/triton-inference-server/server/blob/master/docs/model_configuration.md#instance-groups).
31-
Additionally, [`dynamic_batching`](https://github.com/triton-inference-server/server/blob/master/docs/model_configuration.md#dynamic-batcher) will be enabled.
32-
36+
Additionally, [`dynamic_batching`](https://github.com/triton-inference-server/server/blob/master/docs/model_configuration.md#dynamic-batcher) will be enabled if it is legal to do so.
3337

3438
An example model analyzer config that performs automatic config search looks
3539
like below:
@@ -50,7 +54,7 @@ For each `instance_group`, Model Analyzer will sweep values 1 through 128 increa
5054
[`Dynamic_batching`](https://github.com/triton-inference-server/server/blob/master/docs/model_configuration.md#dynamic-batcher)
5155
will be enabled for all model configs generated using automatic search.
5256

53-
For each model config that is generated in automatic search, Model Analyzer will gather data for
57+
For each model config that is generated in automatic search, Model Analyzer will gather data for
5458
[`concurrency`](https://github.com/triton-inference-server/server/blob/master/docs/perf_analyzer.md#request-concurrency)
5559
values 1 through 1024 increased exponentially (i.e. 1, 2, 4, 8, ...). The maximum value can be configured
5660
using the `run_config_search_max_concurrency` key in the Model Analyzer Config.
@@ -68,7 +72,7 @@ profile_models:
6872
- model_2
6973
```
7074

71-
If any `model_config_parameters` are specified for a model, it will disable
75+
If any `model_config_parameters` are specified for a model, it will disable
7276
automatic searching of model configs and will only search within the values specified.
7377
If `concurrency` is specified then only those values will be tried instead of the default concurrency sweep.
7478
If both `concurrency` and `model_config_parameters` are specified, automatic
@@ -94,34 +98,33 @@ model_repository: /path/to/model/repository/
9498
profile_models:
9599
model_1:
96100
model_config_parameters:
97-
instance_group:
98-
-
99-
kind: KIND_GPU
100-
count: [1, 2]
101+
instance_group:
102+
- kind: KIND_GPU
103+
count: [1, 2]
101104
```
102105

103-
### Important Note about Remote Mode
106+
#### Important Note about Remote Mode
104107

105108
In the remote mode, `model_config_parameters` are always ignored because Model
106109
Analyzer has no way of accessing the model repository of the remote Triton
107110
Server. In this mode, only concurrency values can be swept.
108111

109-
## Manual Configuration Search
112+
### Manual Brute Search
110113

111114
In addition to the automatic config search, Model Analyzer supports a manual
112-
config search mode. To enable this mode, `--run-config-search-disable` flag
113-
should be provided in the CLI or `run_config_search_disable: True` in the Model
114-
Analyzer Config.
115-
116-
In this mode, values for both `concurrency` and `model_config_parameters` needs
117-
to be specified. If no value for `concurrency` is specified, the default value,
118-
1, will be used. This mode in comparison to the automatic mode, is not limited
119-
to `max_batch_size`, `dynamic_batching`, and `instance_count` config parameters. Using manual
115+
config search mode. To enable this mode, you can specify `model_config_parameters`
116+
to sweep through, or set `--run-config-search-disable`
117+
118+
Unlike automatic mode, this mode is not limited to `max_batch_size`, `dynamic_batching`, and `instance_count` config parameters. Using manual
120119
config search, you can create custom sweeps for every parameter that can be
121120
specified in the model configuration. Model Analyzer only checks the syntax
122121
of the `model_config_parameters` that is specified and cannot guarantee that
123122
the configuration that is generated is loadable by Triton.
124123

124+
You can also specify `concurrency` ranges to sweep through. If unspecified, it will
125+
automatically sweep concurrency for every model configuration (unless `--run-config-search-disable`
126+
is set, in which case it will only use the concurrency value of 1)
127+
125128
An example Model Analyzer Config that performs manual sweeping looks like below:
126129

127130
```yaml
@@ -131,13 +134,12 @@ run_config_search_disable: True
131134
profile_models:
132135
model_1:
133136
model_config_parameters:
134-
max_batch_size: [6, 8]
135-
dynamic_batching:
136-
max_queue_delay_microseconds: [200, 300]
137-
instance_group:
138-
-
139-
kind: KIND_GPU
140-
count: [1, 2]
137+
max_batch_size: [6, 8]
138+
dynamic_batching:
139+
max_queue_delay_microseconds: [200, 300]
140+
instance_group:
141+
- kind: KIND_GPU
142+
count: [1, 2]
141143
```
142144

143145
In this mode, Model Analyzer can sweep through every Triton model configuration
@@ -150,7 +152,7 @@ as the range for the `max_batch_size` to `[1]`, it will no longer be a valid
150152
Triton Model Configuration.
151153

152154
The configuration sweep described above, will sweep through 8 configs = (2
153-
`max_batch_size`) * (2 `max_queue_delay_microseconds`) * (2 `instance_group`) values.
155+
`max_batch_size`) \* (2 `max_queue_delay_microseconds`) \* (2 `instance_group`) values.
154156

155157
### Examples of Additional Model Config Parameters
156158

@@ -159,6 +161,16 @@ sweep on every parameter that can be specified in Triton model configuration. In
159161
this section, we describe some of the parameters that might be of interest for
160162
manual sweep:
161163

162-
* [Rate limiter](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#rate-limiter-config) setting
163-
* If the model is using [ONNX](https://github.com/triton-inference-server/onnxruntime_backend) or [Tensorflow backend](https://github.com/triton-inference-server/tensorflow_backend), the "execution_accelerators" parameters. More information about this parameter is
164-
available in the [Triton Optimization Guide](https://github.com/triton-inference-server/server/blob/main/docs/optimization.md#framework-specific-optimization)
164+
- [Rate limiter](https://github.com/triton-inference-server/server/blob/main/docs/model_configuration.md#rate-limiter-config) setting
165+
- If the model is using [ONNX](https://github.com/triton-inference-server/onnxruntime_backend) or [Tensorflow backend](https://github.com/triton-inference-server/tensorflow_backend), the "execution_accelerators" parameters. More information about this parameter is
166+
available in the [Triton Optimization Guide](https://github.com/triton-inference-server/server/blob/main/docs/optimization.md#framework-specific-optimization)
167+
168+
## Quick Search Mode
169+
170+
Quick search can be enabled by adding the parameter `--run-config-search-mode quick` to the CLI.
171+
172+
It uses a hill climbing algorithm to search the configuration space, looking for
173+
the maximal objective value within the specified constraints. In the majority of cases
174+
this will find greater than 95% of the maximum objective value (that could be found using a brute force search), while needing to search less than 10% of the configuration space.
175+
176+
After it has found the best config(s), it will then sweep the top-N configurations found (specified by `--num-configs-per-model`) over the default concurrency range before generation of the summary reports.

0 commit comments

Comments
 (0)