Change the default sort for online mode (#341)

tgerdesnv · web-flow · commit 4fb390180634 · 2022-03-10T10:52:59.000-06:00
* Sort by throughput for online mode

* update docs

* fix unit test
diff --git a/docs/cli.md b/docs/cli.md
@@ -34,27 +34,25 @@ in which it is being run. Currently model analyzer supports 2 modes.
 ### Online Mode
 
 This is the default mode. When in this mode, Model Analyzer will operate to find
-the optimal model configuration for an online inference scenario. In this
-scenario, Triton server will receive requests on demand with an expectation that
-latency will be minimized.
+the optimal model configuration for an online inference scenario. By default in 
+online mode, the best model configuration will be the one that maximizes 
+throughput. If a latency budget is specified to the [analyze subcommand](subcommand-analyze) via 
+`--latency-budget`, then the best model configuration will be the one with the highest throughput in the given budget. 
 
-By default in online mode, the best model configuration will be the one that
-minimizes latency. If a latency budget is specified the best model configuration
-will be the one with the highest throughput in the given budget. The analyze and
-report subcommands also generate summaries specific to online inference. See the
-example [online summary](../examples/online_summary.pdf) and [detailed
-report](../examples/online_summary.pdf).
+In online mode the analyze and report subcommands will generate summaries specific to online inference. 
+See the example [online summary](../examples/online_summary.pdf) and [online detailed report](../examples/online_summary.pdf).
 
 ### Offline Mode
 
-The offline mode `--mode=offline` tells Model Analyzer to set its defaults to
-find a model that maximizes throughput. In the offline scenario, Triton
-processes requests offline and therefore inference throughput is the priority. A
-minimum throughput can be specified using `--min-throughput` to ignore any
-configuration that does not exceed a minimum number of inferences per second.
-Both the summary and the detailed report will contain alternative graphs in the
-offline mode. See the [offline summary](../examples/offline_summary.pdf) and
-[detailed report](../examples/offline_detailed_report.pdf) examples.
+The offline mode `--mode=offline` tells Model Analyzer to operate to find the
+optimal model configuration for an offline inference scenario.  By default
+in offline mode, the best model configuration will be the one that maximizes throughput.
+A minimum throughput can be specified to the [analyze subcommand](subcommand-analyze)
+via `--min-throughput` to ignore any configuration that does not exceed a minimum number of inferences per second. 
+
+In offline mode the analyze and report subcommands will generate reports specific to offline inference.
+See the example [offline summary](../examples/offline_summary.pdf) and
+[offline detailed report](../examples/offline_detailed_report.pdf) examples.
 
 ## Model Analyzer Subcommands
 
diff --git a/model_analyzer/config/input/config_defaults.py b/model_analyzer/config/input/config_defaults.py
@@ -19,7 +19,7 @@
 #
 
 DEFAULT_CHECKPOINT_DIRECTORY = os.path.join(os.getcwd(), 'checkpoints')
-DEFAULT_ONLINE_OBJECTIVES = {'perf_latency_p99': 10}
+DEFAULT_ONLINE_OBJECTIVES = {'perf_throughput': 10}
 DEFAULT_OFFLINE_OBJECTIVES = {'perf_throughput': 10}
 
 #
diff --git a/tests/test_report_manager.py b/tests/test_report_manager.py
@@ -275,7 +275,7 @@ def _test_summary_counts(self, add_table_fn, add_plot_fn,
         '''
         num_plots_in_summary_report = 2
         num_tables_in_summary_report = 1
-        expected_config_count = top_n + 1 if default_within_top else top_n
+        expected_config_count = top_n + 1 if not default_within_top else top_n
         expected_plot_count = num_plots_in_summary_report * expected_config_count
         expected_table_count = num_tables_in_summary_report * expected_config_count
 
@@ -284,9 +284,10 @@ def _test_summary_counts(self, add_table_fn, add_plot_fn,
             metric_objectives={"perf_throughput": 10})
         avg_gpu_metrics = {0: {"gpu_used_memory": 6000, "gpu_utilization": 60}}
         for i in range(10):
-            p99 = 20 - i if default_within_top else 20 + i
+            p99 = 20 + i
+            throughput = 100 - 10 * i if default_within_top else 100 + 10 * i
             avg_non_gpu_metrics = {
-                "perf_throughput": 100 + 10 * i,
+                "perf_throughput": throughput,
                 "perf_latency_p99": p99,
                 "cpu_used_ram": 1000
             }

Original file line number	Diff line number	Diff line change
`@@ -19,7 +19,7 @@`
`19`	`19`	`#`
`20`	`20`
`21`	`21`	`DEFAULT_CHECKPOINT_DIRECTORY = os.path.join(os.getcwd(), 'checkpoints')`
`22`		`-DEFAULT_ONLINE_OBJECTIVES = {'perf_latency_p99': 10}`
	`22`	`+DEFAULT_ONLINE_OBJECTIVES = {'perf_throughput': 10}`
`23`	`23`	`DEFAULT_OFFLINE_OBJECTIVES = {'perf_throughput': 10}`
`24`	`24`
`25`	`25`	`#`