Skip to content

Commit 4e96d21

Browse files
authored
Documentation update for Optuna (alpha release) (#895)
* Documentation update for Optuna (alpha release) * More fixes based on PR
1 parent acf085f commit 4e96d21

File tree

4 files changed

+139
-10
lines changed

4 files changed

+139
-10
lines changed

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,8 @@ Triton Model Analyzer is a CLI tool which can help you find a more optimal confi
3333

3434
### Search Modes
3535

36+
- [Optuna Search](docs/config_search.md#optuna-search-mode) **_-ALPHA RELEASE-_** allows you to search for every parameter that can be specified in the model configuration, using a hyperparameter optimization framework. Please see the [Optuna](https://optuna.org/) website if you are interested in specific details on how the algorithm functions.
37+
3638
- [Quick Search](docs/config_search.md#quick-search-mode) will **sparsely** search the [Max Batch Size](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#maximum-batch-size),
3739
[Dynamic Batching](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#dynamic-batcher), and
3840
[Instance Group](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups) spaces by utilizing a heuristic hill-climbing algorithm to help you quickly find a more optimal configuration

docs/config.md

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -188,31 +188,31 @@ cpu_only_composing_models: <comma-delimited-string-list>
188188
# List of GPU UUIDs to be used for the profiling. Use 'all' to profile all the GPUs visible by CUDA
189189
[ gpus: <string|comma-delimited-list-string> | default: 'all' ]
190190
191-
# Search mode. Options are "brute" and "quick"
191+
# Search mode. Options are "brute", "quick", and "optuna"
192192
[ run_config_search_mode: <string> | default: brute]
193193
194-
# Minimum concurrency used for the automatic/quick config search
194+
# Minimum concurrency used for the automatic/quick/optuna config search
195195
[ run_config_search_min_concurrency: <int> | default: 1 ]
196196
197-
# Maximum concurrency used for the automatic/quick config search
197+
# Maximum concurrency used for the automatic/quick/optuna config search
198198
[ run_config_search_max_concurrency: <int> | default: 1024 ]
199199
200-
# Minimum max_batch_size used for the automatic/quick config search
200+
# Minimum max_batch_size used for the automatic/quick/optuna config search
201201
[ run_config_search_min_model_batch_size: <int> | default: 1 ]
202202
203-
# Maximum max_batch_size used for the automatic/quick config search
203+
# Maximum max_batch_size used for the automatic/quick/optuna config search
204204
[ run_config_search_max_model_batch_size: <int> | default: 128 ]
205205
206-
# Minimum instance group count used for the automatic/quick config search
206+
# Minimum instance group count used for the automatic/quick/optuna config search
207207
[ run_config_search_min_instance_count: <int> | default: 1 ]
208208
209-
# Maximum instance group count used for the automatic/quick config search
209+
# Maximum instance group count used for the automatic/quick/optuna config search
210210
[ run_config_search_max_instance_count: <int> | default: 5 ]
211211
212-
# Minimum request rate used for the automatic/quick config search
212+
# Minimum request rate used for the automatic/quick/optuna config search
213213
[ run_config_search_min_request_rate: <int> | default: 16 ]
214214
215-
# Maximum request rate used for the automatic/quick config search
215+
# Maximum request rate used for the automatic/quick/optuna config search
216216
[ run_config_search_max_request_rate: <int> | default: 8092 ]
217217
218218
# Maximum number of steps taken during a binary search
@@ -227,6 +227,27 @@ cpu_only_composing_models: <comma-delimited-string-list>
227227
# Enables the searching of request rate (instead of concurrency)
228228
[ request_rate_search_enable: <bool> | default: false]
229229
230+
# Minimum percentage of the search space to profile when using Optuna
231+
[ min_percentage_of_search_space: <int> | default: 5]
232+
233+
# Maximum percentage of the search space to profile when using Optuna
234+
[ max_percentage_of_search_space: <int> | default: 10]
235+
236+
# Minimum number of trials to profile when using Optuna
237+
[ optuna_min_trials: <int> | default: None]
238+
239+
# Maximum number of trials to profile when using Optuna
240+
[ optuna_max_trials: <int> | default: None]
241+
242+
# Number of trials without improvement before triggering early exit when using Optuna
243+
[ optuna_early_exit_threshold: <int> | default: 10]
244+
245+
# Use the concurrency formula instead of searching the concurrency space in Optuna search mode
246+
[ use_concurrency_formula: <bool> | default: false]
247+
248+
# Disables the sweeping of concurrencies for the top-N models after quick/optuna search completion
249+
[ concurrency_sweep_disable: <bool> | default: false]
250+
230251
# Always report GPU metrics, even if the model(s) is cpu_only
231252
[ always_report_gpu_metrics: <bool> | default: false]
232253

docs/config_search.md

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ limitations under the License.
2222
- [Automatic Brute Search](#automatic-brute-search)
2323
- [Manual Brute Search](#manual-brute-search)
2424
- [Quick Search Mode](#quick-search-mode)
25+
- [Optuna Search Mode](#optuna-search-mode)
2526
- [Ensemble Model Search](#ensemble-model-search)
2627
- [BLS Model Search](#bls-model-search)
2728
- [LLM Search](#llm-search)
@@ -48,6 +49,9 @@ Model Analyzer's `profile` subcommand supports multiple modes when searching to
4849
- Single BLS models
4950
- Multiple models being profiled concurrently
5051
- **Command:** `--run-config-search-mode quick`
52+
- [Optuna Search](config_search.md#optuna-search-mode) **-ALPHA RELEASE-**
53+
- **Search type:** Heuristic sweep using a hyperparameter optimization framework to find an optimal configuration
54+
- **Command:** `--run-config-search-mode optuna`
5155

5256
---
5357

@@ -276,6 +280,108 @@ profile_models:
276280

277281
---
278282

283+
## Optuna Search Mode
284+
285+
**-ALPHA RELEASE-**
286+
287+
_This mode has the following limitations:_
288+
289+
- **Ensemble, BLS or concurrent multi-model profiling is not supported**
290+
- **Profiling with request rate is not supported**
291+
292+
This mode uses a hyperparameter optimization framework to search the configuration
293+
space, looking for the maximal objective value within the specified constraints.
294+
Please see the [Optuna](https://optuna.org/) website if you are interested in specific details on how the algorithm functions.
295+
296+
Optuna allows you to search for every parameter that can be specified in the model configuration. Parameters can be specified
297+
with a min/max range (using the run-config-search options) or a list of parameters to test against can be set in the
298+
parameters/model_config_parameters field.
299+
300+
After optuna search has found the best config(s), it will then sweep the top-N configurations found (specified by `--num-configs-per-model`) over the default concurrency range before generation of the summary reports.
301+
302+
---
303+
304+
_An example model analyzer YAML config that performs an Optuna Search:_
305+
306+
```yaml
307+
model_repository: /path/to/model/repository/
308+
309+
run_config_search_mode: optuna
310+
profile_models:
311+
- model_A
312+
```
313+
314+
---
315+
316+
A number of new configuration options were added to support tailoring the Optuna search to your needs:
317+
318+
- `--min/max_percentage_of_search_space`: sets the percentage of the space you want Optuna to search
319+
- `--optuna-min/max-trials`: sets the number of trials Optuna will attempt
320+
- `--optuna-early-exit-threshold`: sets the number of trials without improvement before triggering early exit
321+
- `--use-concurrency-formula`: uses a formula (2 \* batch size \* instance group count), rather than sweeping concurrency
322+
323+
---
324+
325+
_An example that performs an Optuna Search using these new configuration options:_
326+
327+
```yaml
328+
model_repository: /path/to/model/repository/
329+
330+
run_config_search_mode: optuna
331+
run_config_search_max_instance_count: 8
332+
run_config_search_min_concurrency: 32
333+
run_config_search_max_concurrency: 256
334+
335+
use_concurrency_formula: True
336+
min_percentage_of_search_space: 10
337+
optuna_max_trials: 200
338+
optuna_early_exit_threshold: 15
339+
340+
profile_models:
341+
model_A:
342+
model_config_parameters:
343+
max_batch_size: [1, 4, 8, 32, 64, 128]
344+
dynamic_batching:
345+
max_queue_delay_microseconds: [100, 200, 300]
346+
parameters:
347+
batch_sizes: 1, 2, 4, 8, 16
348+
```
349+
350+
_The debug output showing how the space will be searched:_
351+
352+
```yaml
353+
Number of configs in search space: 720
354+
batch_sizes: [1, 2, 4, 8, 16] (5)
355+
max_batch_size: [1, 4, 8, 32, 64, 128] (6)
356+
instance_group: 1 to 8 (8)
357+
max_queue_delay_microseconds: [100, 200, 300] (3)
358+
359+
Minimum number of trials: 72 (10% of search space)
360+
Maximum number of trials: 200 (set by max trials)
361+
```
362+
363+
---
364+
365+
### Optuna Search in Detail
366+
367+
When performing an Optuna Search, Model Analyzer's goal is to maximize the configuration's `objective score`. First,
368+
MA profiles the default configuration and assigns it an `objective score` of zero. All future configurations
369+
are also assigned an `objective score`; with positive values indicating this configuration is better than the default
370+
configuration and negative values indicating it performs worse.
371+
372+
_Here is an example debug output:_
373+
374+
```yaml
375+
Trial 7 of 200:
376+
Creating model config: model_A_config_6
377+
Setting dynamic_batching to {'max_queue_delay_microseconds': 200}
378+
Setting instance_group to [{'count': 4, 'kind': 'KIND_GPU'}]
379+
Setting max_batch_size to 64
380+
381+
Profiling model_A_config_6: client batch size=4, concurrency=256
382+
Objective score for model_A_config_6: 57 --- Best: model_A_config_4 (83)
383+
```
384+
279385
## Ensemble Model Search
280386

281387
_This mode has the following limitations:_

docs/ensemble_quick_start.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ git pull origin main
4545
**3. Add a version directory to ensemble_add_sub**
4646

4747
```
48-
mkdir examples/quick/ensemble_add_sub/1
48+
mkdir examples/quick-start/ensemble_add_sub/1
4949
```
5050

5151
## `Step 2:` Pull and Run the SDK Container

0 commit comments

Comments
 (0)