Skip to content

Commit 4fb3901

Browse files
authored
Change the default sort for online mode (#341)
* Sort by throughput for online mode * update docs * fix unit test
1 parent 3d447b6 commit 4fb3901

File tree

3 files changed

+20
-21
lines changed

3 files changed

+20
-21
lines changed

docs/cli.md

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -34,27 +34,25 @@ in which it is being run. Currently model analyzer supports 2 modes.
3434
### Online Mode
3535

3636
This is the default mode. When in this mode, Model Analyzer will operate to find
37-
the optimal model configuration for an online inference scenario. In this
38-
scenario, Triton server will receive requests on demand with an expectation that
39-
latency will be minimized.
37+
the optimal model configuration for an online inference scenario. By default in
38+
online mode, the best model configuration will be the one that maximizes
39+
throughput. If a latency budget is specified to the [analyze subcommand](subcommand-analyze) via
40+
`--latency-budget`, then the best model configuration will be the one with the highest throughput in the given budget.
4041

41-
By default in online mode, the best model configuration will be the one that
42-
minimizes latency. If a latency budget is specified the best model configuration
43-
will be the one with the highest throughput in the given budget. The analyze and
44-
report subcommands also generate summaries specific to online inference. See the
45-
example [online summary](../examples/online_summary.pdf) and [detailed
46-
report](../examples/online_summary.pdf).
42+
In online mode the analyze and report subcommands will generate summaries specific to online inference.
43+
See the example [online summary](../examples/online_summary.pdf) and [online detailed report](../examples/online_summary.pdf).
4744

4845
### Offline Mode
4946

50-
The offline mode `--mode=offline` tells Model Analyzer to set its defaults to
51-
find a model that maximizes throughput. In the offline scenario, Triton
52-
processes requests offline and therefore inference throughput is the priority. A
53-
minimum throughput can be specified using `--min-throughput` to ignore any
54-
configuration that does not exceed a minimum number of inferences per second.
55-
Both the summary and the detailed report will contain alternative graphs in the
56-
offline mode. See the [offline summary](../examples/offline_summary.pdf) and
57-
[detailed report](../examples/offline_detailed_report.pdf) examples.
47+
The offline mode `--mode=offline` tells Model Analyzer to operate to find the
48+
optimal model configuration for an offline inference scenario. By default
49+
in offline mode, the best model configuration will be the one that maximizes throughput.
50+
A minimum throughput can be specified to the [analyze subcommand](subcommand-analyze)
51+
via `--min-throughput` to ignore any configuration that does not exceed a minimum number of inferences per second.
52+
53+
In offline mode the analyze and report subcommands will generate reports specific to offline inference.
54+
See the example [offline summary](../examples/offline_summary.pdf) and
55+
[offline detailed report](../examples/offline_detailed_report.pdf) examples.
5856

5957
## Model Analyzer Subcommands
6058

model_analyzer/config/input/config_defaults.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@
1919
#
2020

2121
DEFAULT_CHECKPOINT_DIRECTORY = os.path.join(os.getcwd(), 'checkpoints')
22-
DEFAULT_ONLINE_OBJECTIVES = {'perf_latency_p99': 10}
22+
DEFAULT_ONLINE_OBJECTIVES = {'perf_throughput': 10}
2323
DEFAULT_OFFLINE_OBJECTIVES = {'perf_throughput': 10}
2424

2525
#

tests/test_report_manager.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -275,7 +275,7 @@ def _test_summary_counts(self, add_table_fn, add_plot_fn,
275275
'''
276276
num_plots_in_summary_report = 2
277277
num_tables_in_summary_report = 1
278-
expected_config_count = top_n + 1 if default_within_top else top_n
278+
expected_config_count = top_n + 1 if not default_within_top else top_n
279279
expected_plot_count = num_plots_in_summary_report * expected_config_count
280280
expected_table_count = num_tables_in_summary_report * expected_config_count
281281

@@ -284,9 +284,10 @@ def _test_summary_counts(self, add_table_fn, add_plot_fn,
284284
metric_objectives={"perf_throughput": 10})
285285
avg_gpu_metrics = {0: {"gpu_used_memory": 6000, "gpu_utilization": 60}}
286286
for i in range(10):
287-
p99 = 20 - i if default_within_top else 20 + i
287+
p99 = 20 + i
288+
throughput = 100 - 10 * i if default_within_top else 100 + 10 * i
288289
avg_non_gpu_metrics = {
289-
"perf_throughput": 100 + 10 * i,
290+
"perf_throughput": throughput,
290291
"perf_latency_p99": p99,
291292
"cpu_used_ram": 1000
292293
}

0 commit comments

Comments
 (0)