You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add early access documentation for multi-model (#541)
* First pass at adding mutli-model documentation
* Further revisions
* Updated based on Tim's review comments
* Adding missing parameters in example
* Changed step 2 to indicate where the tritonserver container actually comes from
* Removing we
to test the model with different concurrency and batch sizes of requests. Using
44
-
[Manual Config Search](docs/config_search.md#manual-brute-search), you can create manual sweeps for every parameter that can be specified in the model configuration.
45
-
46
-
*[Detailed and summary reports](docs/report.md): Model Analyzer is able to generate
47
-
summarized and detailed reports that can help you better understand the trade-offs
48
-
between different model configurations that can be used for your model.
49
-
50
-
*[QoS Constraints](docs/config.md#constraint): Constraints can help you
51
-
filter out the Model Analyzer results based on your QoS requirements. For
52
-
example, you can specify a latency budget to filter out model configurations
53
-
that do not satisfy the specified latency threshold.
36
+
-[Brute and Quick search](docs/config_search.md): Model Analyzer can
37
+
help you automatically find the optimal settings for
to test the model with different concurrency and batch sizes of requests. Using
44
+
[Manual Config Search](docs/config_search.md#manual-brute-search), you can create manual sweeps for every parameter that can be specified in the model configuration.
45
+
46
+
-[Multi-Model Search](docs/config_search.md#multi-model-search-mode): **EARLY ACCESS** - Model Analyzer can help you
47
+
find the optimal settings when profiling multiple concurrent models, utilizing our Quick Search alogrithm
48
+
49
+
-[Detailed and summary reports](docs/report.md): Model Analyzer is able to generate
50
+
summarized and detailed reports that can help you better understand the trade-offs
51
+
between different model configurations that can be used for your model.
52
+
53
+
-[QoS Constraints](docs/config.md#constraint): Constraints can help you
54
+
filter out the Model Analyzer results based on your QoS requirements. For
55
+
example, you can specify a latency budget to filter out model configurations
56
+
that do not satisfy the specified latency threshold.
54
57
55
58
## Documentation
56
59
57
-
*[Installation](docs/install.md)
58
-
*[Quick Start](docs/quick_start.md)
59
-
*[Model Analyzer CLI](docs/cli.md)
60
-
*[Launch Modes](docs/launch_modes.md)
61
-
*[Configuring Model Analyzer](docs/config.md)
62
-
*[Model Analyzer Metrics](docs/metrics.md)
63
-
*[Model Config Search](docs/config_search.md)
64
-
*[Checkpointing](docs/checkpoints.md)
65
-
*[Model Analyzer Reports](docs/report.md)
66
-
*[Deployment with Kubernetes](docs/kubernetes_deploy.md)
60
+
-[Installation](docs/install.md)
61
+
-[Quick Start](docs/quick_start.md)
62
+
-[Model Analyzer CLI](docs/cli.md)
63
+
-[Launch Modes](docs/launch_modes.md)
64
+
-[Configuring Model Analyzer](docs/config.md)
65
+
-[Model Analyzer Metrics](docs/metrics.md)
66
+
-[Model Config Search](docs/config_search.md)
67
+
-[Checkpointing](docs/checkpoints.md)
68
+
-[Model Analyzer Reports](docs/report.md)
69
+
-[Deployment with Kubernetes](docs/kubernetes_deploy.md)
67
70
68
71
# Reporting problems, asking questions
69
72
@@ -72,14 +75,14 @@ project. When help with code is needed, follow the process outlined in
72
75
the Stack Overflow (https://stackoverflow.com/help/mcve)
73
76
document. Ensure posted examples are:
74
77
75
-
* minimal – use as little code as possible that still produces the
78
+
- minimal – use as little code as possible that still produces the
76
79
same problem
77
80
78
-
* complete – provide all parts needed to reproduce the problem. Check
81
+
- complete – provide all parts needed to reproduce the problem. Check
79
82
if you can strip external dependency and still show the problem. The
80
83
less time we spend on reproducing problems the more time we have to
81
84
fix it
82
85
83
-
* verifiable – test the code you're about to provide to make sure it
86
+
- verifiable – test the code you're about to provide to make sure it
84
87
reproduces the problem. Remove all other problems that are not
| [`dynamic_batching`](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#dynamic-batcher) | Dynamic batching is a feature of Triton that allows inference requests to be combined by the server, so that a batch is created dynamically. |
589
-
| [`max_batch_size`](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#maximum-batch-size) | The max_batch_size property indicates the maximum batch size that the model supports for the [types of batching](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/architecture.md#models-and-schedulers) that can be exploited by Triton. |
592
+
| [`max_batch_size`](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#maximum-batch-size) | The max_batch_size property indicates the maximum batch size that the model supports for the [types of batching](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/architecture.md#models-and-schedulers) that can be exploited by Triton. |
590
593
| [`instance_group`](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#instance-groups) | Triton can provide multiple instances of a model so that multiple inference requests for that model can be handled simultaneously. The model configuration ModelInstanceGroup property is used to specify the number of execution instances that should be made available and what compute resource should be used for those instances. |
591
594
592
595
An example `<model-config-parameters>` look like below:
@@ -700,7 +703,7 @@ perf_analyzer_flags:
700
703
701
704
#### Model-specific options for Perf Analyzer
702
705
703
-
In order to set flags only for a specific model, you can specify
706
+
In order to set flags only for a specific model, you can specify
704
707
the flags in the following way:
705
708
706
709
```yaml
@@ -736,8 +739,8 @@ then the `shape` option of the `perf_analyzer_flags` option must be specified.
Copy file name to clipboardExpand all lines: docs/config_search.md
+29-5Lines changed: 29 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,12 +17,17 @@ limitations under the License.
17
17
# Model Config Search
18
18
19
19
Model Analyzer's `profile` subcommand supports multiple modes when searching to find the best model configuration.
20
-
*[Brute](config_search.md#brute-search-mode) is the default, and will do a brute-force sweep of the cross product of all possible configurations
21
-
*[Quick](config_search.md#quick-search-mode) will use heuristics to try to find the optimal configuration much quicker than brute, and can be enabled via `--run-config-search-mode quick`
20
+
21
+
-[Brute](config_search.md#brute-search-mode) is the default, and will do a brute-force sweep of the cross product of all possible configurations
22
+
-[Quick](config_search.md#quick-search-mode) will use heuristics to try to find the optimal configuration much quicker than brute, and can be enabled via `--run-config-search-mode quick`
23
+
24
+
_This is mode is in **EARLY ACCESS** and is limited in scope:_
25
+
26
+
-[Multi-model](config_search.md#multi-model-search-mode) will profile mutliple models to find the optimal configurations for all models while they are running concurrently. This feature is enabled via `--run-config-profile-models-concurrently-enable`
22
27
23
28
## Brute Search Mode
24
29
25
-
Model Analyzer's brute search mode will do a brute-force sweep of the cross product of all possible configurations. You can [Manually](config_search.md#manual-brute-search) provide `model_config_parameters` to tell Model Analyzer what to sweep over, or you can
30
+
Model Analyzer's brute search mode will do a brute-force sweep of the cross product of all possible configurations. You can [Manually](config_search.md#manual-brute-search) provide `model_config_parameters` to tell Model Analyzer what to sweep over, or you can
26
31
let it [Automatically](config_search.md#automatic-brute-search) sweep through configurations expected to have the highest impact on performance for Triton models.
The config described below will only sweep through different values for
@@ -123,7 +129,7 @@ the configuration that is generated is loadable by Triton.
123
129
124
130
You can also specify `concurrency` ranges to sweep through. If unspecified, it will
125
131
automatically sweep concurrency for every model configuration (unless `--run-config-search-disable`
126
-
is set, in which case it will only use the concurrency value of 1)
132
+
is set, in which case it will only use the concurrency value of 1)
127
133
128
134
An example Model Analyzer Config that performs manual sweeping looks like below:
129
135
@@ -174,3 +180,21 @@ the maximal objective value within the specified constraints. In the majority of
174
180
this will find greater than 95% of the maximum objective value (that could be found using a brute force search), while needing to search less than 10% of the configuration space.
175
181
176
182
After it has found the best config(s), it will then sweep the top-N configurations found (specified by `--num-configs-per-model`) over the default concurrency range before generation of the summary reports.
183
+
184
+
## Multi-Model Search Mode
185
+
186
+
_This mode is in EARLY ACCESS and has the following limitations:_
187
+
188
+
- Can only be run in `quick` search mode
189
+
- Cannot set limitations on min/max batch size, concurrency or instance count
190
+
- Does not support individual model constraints, only global constraints
191
+
- Does not support individual model weighting, all models are treated with equal priority when trying to maximize objective value
192
+
- Does not support detailed reporting, only summary reports
193
+
194
+
Multi-model concurrent search mode can be enabled by adding the parameter `--run-config-profile-models-concurrently-enable` to the CLI.
195
+
196
+
It uses Quick Search mode's hill climbing algorithm to search all models configurations spaces in parallel, looking for the maximal objective value within the specified constraints. Model Analyzer has observed positive outcomes towards finding the maximum objective value; with runtimes of around 20-30 minutes (compared to the days it would take a brute force run to complete).
197
+
198
+
After it has found the best config(s), it will then sweep the top-N configurations found (specified by `--num-configs-per-model`) over the default concurrency range before generation of the summary reports.
199
+
200
+
_Note:_ The algorithm attempts to find the most fair and optimal result for all models, by evaluating each model objective's gain/loss. In many cases this will result in the algorithm ranking higher a configuration that has a lower total combined throughput (if that was the objective), if this better balances the throughputs of all the models.
0 commit comments