Skip to content

Commit ac56b06

Browse files
nv-braftgerdesnv
authored andcommitted
Add L0 request rate test
1 parent 6de64a3 commit ac56b06

19 files changed

+13261
-40
lines changed

docs/config.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -513,10 +513,11 @@ cannot be specified globally.
513513

514514
Options available under this parameter are described in table below:
515515

516-
| Option Name | Description | Supporting Types |
517-
| :------------ | :------------------------------------------------------ | :------------------------------------------------- |
518-
| `concurrency` | Request concurrency used for generating the input load. | `<range>`, `<comma-delimited-list>`, or a `<list>` |
519-
| `batch_sizes` | Static batch size used for generating requests. | `<range>`, `<comma-delimited-list>`, or a `<list>` |
516+
| Option Name | Description | Supporting Types |
517+
| :------------- | :------------------------------------------------------ | :------------------------------------------------- |
518+
| `concurrency` | Request concurrency used for generating the input load. | `<range>`, `<comma-delimited-list>`, or a `<list>` |
519+
| `request_rate` | Request rate used for generating the input load. | `<range>`, `<comma-delimited-list>`, or a `<list>` |
520+
| `batch_sizes` | Static batch size used for generating requests. | `<range>`, `<comma-delimited-list>`, or a `<list>` |
520521

521522
An example `<parameter>` looks like below:
522523

@@ -765,6 +766,7 @@ More information about this can be found in the
765766
- Model Analyzer also provides certain arguments to the `perf_analyzer`
766767
instances it launches. They are the following:
767768
- `concurrency-range`
769+
- `request-rate-range`
768770
- `batch-size`
769771
- `model-name`
770772
- `measurement-mode`
@@ -773,7 +775,7 @@ More information about this can be found in the
773775
- `model-repository`
774776
- `protocol`
775777
- `url`
776-
If provided under the `perf_analyzer_flags` section, their values will be overriden. Caution should therefore be exercised when overriding these.
778+
If provided under the `perf_analyzer_flags` section, their values will be overridden. Caution should therefore be exercised when overriding these.
777779
<br>
778780

779781
## `<triton-server-flags>`

docs/config_search.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -117,12 +117,18 @@ You can also modify the minimum/maximum values that the automatic search space w
117117

118118
---
119119

120-
### [Request Concurrency Search Space](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md#request-concurrency)
120+
### [Request Concurrency Search Space](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/docs/inference_load_modes.md#concurrency-mode))
121121

122122
- `Default:` 1 to 1024 concurrencies, sweeping over powers of 2 (i.e. 1, 2, 4, 8, ...)
123123
- `--run-config-search-min-concurrency: <val>`: Changes the request concurrency minimum automatic search space value
124124
- `--run-config-search-max-concurrency: <val>`: Changes the request concurrency maximum automatic search space value
125125

126+
### [Request Rate Search Space](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/docs/inference_load_modes.md#request-rate-mode)
127+
128+
- `Default:` 1 to 1024 concurrencies, sweeping over powers of 2 (i.e. 1, 2, 4, 8, ...)
129+
- `--run-config-search-min-request-rate: <val>`: Changes the request rate minimum automatic search space value
130+
- `--run-config-search-max-request-rate: <val>`: Changes the request rate maximum automatic search space value
131+
126132
---
127133

128134
_An example YAML config that limits the search space:_
@@ -144,7 +150,7 @@ _This will perform an Automatic Brute Search with instance group counts: 3-5, ba
144150

145151
### **Interaction with Remote Triton Launch Mode**
146152

147-
When the triton launch mode is remote, _\*\*only concurrency values can be swept._\*\*<br>
153+
When the triton launch mode is remote, _\*\*only concurrency or request rate values can be swept._\*\*<br>
148154

149155
Model Analyzer will ignore any model config parameters because we have no way of accessing and modifying the model repository of the remote Triton Server.
150156

model_analyzer/config/input/config_command_profile.py

Lines changed: 35 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,8 @@
4444
DEFAULT_TRITON_SERVER_PATH, DEFAULT_PERF_ANALYZER_TIMEOUT, \
4545
DEFAULT_EXPORT_PATH, DEFAULT_FILENAME_MODEL_INFERENCE, DEFAULT_FILENAME_MODEL_GPU, \
4646
DEFAULT_FILENAME_SERVER_ONLY, DEFAULT_NUM_CONFIGS_PER_MODEL, DEFAULT_NUM_TOP_MODEL_CONFIGS, \
47-
DEFAULT_INFERENCE_OUTPUT_FIELDS, DEFAULT_GPU_OUTPUT_FIELDS, DEFAULT_SERVER_OUTPUT_FIELDS, \
47+
DEFAULT_INFERENCE_OUTPUT_FIELDS, DEFAULT_REQUEST_RATE_INFERENCE_OUTPUT_FIELDS, \
48+
DEFAULT_GPU_OUTPUT_FIELDS, DEFAULT_REQUEST_RATE_GPU_OUTPUT_FIELDS, DEFAULT_SERVER_OUTPUT_FIELDS, \
4849
DEFAULT_ONLINE_OBJECTIVES, DEFAULT_ONLINE_PLOTS, DEFAULT_OFFLINE_PLOTS, DEFAULT_MODEL_WEIGHTING
4950

5051
from model_analyzer.constants import LOGGER_NAME
@@ -1074,6 +1075,15 @@ def _autofill_values(self):
10741075
'min': self.min_throughput
10751076
}})
10761077

1078+
# Switch default output fields if request rate is being used
1079+
# and the user didn't specify a custom output field
1080+
if self._using_request_rate():
1081+
if not self._fields['inference_output_fields'].is_set_by_user():
1082+
self.inference_output_fields = DEFAULT_REQUEST_RATE_INFERENCE_OUTPUT_FIELDS
1083+
1084+
if not self._fields['gpu_output_fields'].is_set_by_user():
1085+
self.gpu_output_fields = DEFAULT_REQUEST_RATE_GPU_OUTPUT_FIELDS
1086+
10771087
new_profile_models = {}
10781088
for i, model in enumerate(self.profile_models):
10791089
new_model = {'cpu_only': (model.cpu_only() or cpu_only)}
@@ -1197,3 +1207,27 @@ def _autofill_values(self):
11971207

11981208
new_profile_models[model.model_name()] = new_model
11991209
self._fields['profile_models'].set_value(new_profile_models)
1210+
1211+
def _using_request_rate(self) -> bool:
1212+
if self.request_rate or self.request_rate_search_enable:
1213+
return True
1214+
elif self._fields['run_config_search_max_request_rate'].is_set_by_user() or \
1215+
self._fields['run_config_search_min_request_rate'].is_set_by_user():
1216+
return True
1217+
else:
1218+
return self._are_models_using_request_rate()
1219+
1220+
def _are_models_using_request_rate(self) -> bool:
1221+
model_using_request_rate = False
1222+
model_using_concurrency = False
1223+
for i, model in enumerate(self.profile_models):
1224+
if model.parameters() and 'request_rate' in model.parameters():
1225+
model_using_request_rate = True
1226+
else:
1227+
model_using_concurrency = True
1228+
1229+
if model_using_request_rate and model_using_concurrency:
1230+
raise TritonModelAnalyzerException("Parameters in all profiled models must use request-rate-range. "\
1231+
"Model Analyzer does not support mixing concurrency-range and request-rate-range.")
1232+
else:
1233+
return model_using_request_rate

model_analyzer/config/input/config_defaults.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,11 +108,21 @@
108108
'instance_group', 'max_batch_size', 'satisfies_constraints',
109109
'perf_throughput', 'perf_latency_p99'
110110
]
111+
DEFAULT_REQUEST_RATE_INFERENCE_OUTPUT_FIELDS = [
112+
'model_name', 'batch_size', 'request_rate', 'model_config_path',
113+
'instance_group', 'max_batch_size', 'satisfies_constraints',
114+
'perf_throughput', 'perf_latency_p99'
115+
]
111116
DEFAULT_GPU_OUTPUT_FIELDS = [
112117
'model_name', 'gpu_uuid', 'batch_size', 'concurrency', 'model_config_path',
113118
'instance_group', 'satisfies_constraints', 'gpu_used_memory',
114119
'gpu_utilization', 'gpu_power_usage'
115120
]
121+
DEFAULT_REQUEST_RATE_GPU_OUTPUT_FIELDS = [
122+
'model_name', 'gpu_uuid', 'batch_size', 'request_rate', 'model_config_path',
123+
'instance_group', 'satisfies_constraints', 'gpu_used_memory',
124+
'gpu_utilization', 'gpu_power_usage'
125+
]
116126
DEFAULT_SERVER_OUTPUT_FIELDS = [
117127
'model_name', 'gpu_uuid', 'gpu_used_memory', 'gpu_utilization',
118128
'gpu_power_usage'

model_analyzer/perf_analyzer/perf_config.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,8 @@ def extract_model_specific_parameters(self):
225225

226226
return {
227227
'batch-size': self._options['-b'],
228-
'concurrency-range': self._args['concurrency-range']
228+
'concurrency-range': self._args['concurrency-range'],
229+
'request-rate-range': self._args['request-rate-range']
229230
}
230231

231232
@classmethod

model_analyzer/plots/detailed_plot.py

Lines changed: 36 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ def data(self):
103103

104104
def add_run_config_measurement(self, run_config_measurement):
105105
"""
106-
Adds a measurment to this plot
106+
Adds a measurement to this plot
107107
108108
Parameters
109109
----------
@@ -113,9 +113,19 @@ def add_run_config_measurement(self, run_config_measurement):
113113
"""
114114

115115
# TODO-TMA-568: This needs to be updated because there will be multiple model configs
116-
self._data['concurrency'].append(
117-
run_config_measurement.model_specific_pa_params()[0]
118-
['concurrency-range'])
116+
if 'concurrency-range' in run_config_measurement.model_specific_pa_params(
117+
)[0] and run_config_measurement.model_specific_pa_params(
118+
)[0]['concurrency-range']:
119+
self._data['concurrency'].append(
120+
run_config_measurement.model_specific_pa_params()[0]
121+
['concurrency-range'])
122+
123+
if 'request-rate-range' in run_config_measurement.model_specific_pa_params(
124+
)[0] and run_config_measurement.model_specific_pa_params(
125+
)[0]['request-rate-range']:
126+
self._data['request_rate'].append(
127+
run_config_measurement.model_specific_pa_params()[0]
128+
['request-rate-range'])
119129

120130
self._data['perf_throughput'].append(
121131
run_config_measurement.get_non_gpu_metric_value(
@@ -135,13 +145,23 @@ def plot_data(self):
135145
on this plot's Axes object
136146
"""
137147

138-
# Sort the data by concurrency
139-
concurrency_sort_indices = list(
140-
zip(*sorted(enumerate(self._data['concurrency']),
141-
key=lambda x: x[1])))[0]
148+
# Need to change the default x-axis plot title for request rates
149+
if 'request_rate' in self._data and self._data['request_rate'][0]:
150+
self._ax_latency.set_xlabel('Client Request Rate')
151+
152+
# Sort the data by request rate or concurrency
153+
if 'request_rate' in self._data and self._data['request_rate'][0]:
154+
print(f"\n\nFound request rate: {self._data['request_rate']}\n\n")
155+
sort_indices = list(
156+
zip(*sorted(enumerate(self._data['request_rate']),
157+
key=lambda x: x[1])))[0]
158+
else:
159+
sort_indices = list(
160+
zip(*sorted(enumerate(self._data['concurrency']),
161+
key=lambda x: x[1])))[0]
142162

143163
sorted_data = {
144-
key: [data_list[i] for i in concurrency_sort_indices
164+
key: [data_list[i] for i in sort_indices
145165
] for key, data_list in self._data.items()
146166
}
147167

@@ -153,11 +173,14 @@ def plot_data(self):
153173
]))
154174
bottoms = None
155175

156-
sorted_data['concurrency'] = list(map(str, sorted_data['concurrency']))
176+
if 'request_rate' in self._data:
177+
sorted_data['indices'] = list(map(str, sorted_data['request_rate']))
178+
else:
179+
sorted_data['indices'] = list(map(str, sorted_data['concurrency']))
157180

158181
# Plot latency breakdown with concurrency casted as string to make uniform x
159182
for metric, label in labels.items():
160-
self._ax_latency.bar(sorted_data['concurrency'],
183+
self._ax_latency.bar(sorted_data['indices'],
161184
sorted_data[metric],
162185
width=self._bar_width,
163186
label=label,
@@ -171,7 +194,7 @@ def plot_data(self):
171194

172195
# Plot the inference line
173196
inference_line = self._ax_throughput.plot(
174-
sorted_data['concurrency'],
197+
sorted_data['indices'],
175198
sorted_data['perf_throughput'],
176199
label='Inferences/second',
177200
marker='o',
@@ -190,8 +213,7 @@ def plot_data(self):
190213
bbox_to_anchor=(self._legend_x, self._legend_y),
191214
prop=dict(size=self._legend_font_size))
192215
# Annotate inferences
193-
for x, y in zip(sorted_data['concurrency'],
194-
sorted_data['perf_throughput']):
216+
for x, y in zip(sorted_data['indices'], sorted_data['perf_throughput']):
195217
self._ax_throughput.annotate(
196218
str(round(y, 2)),
197219
xy=(x, y),

model_analyzer/plots/plot_manager.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ class PlotManager:
3636
of plots generated by model analyzer
3737
"""
3838

39-
def __init__(self, config: Union[ConfigCommandProfile, ConfigCommandReport],
39+
def __init__(self, config: Union[ConfigCommandProfile, ConfigCommandReport],
4040
result_manager: ResultManager,
4141
constraint_manager: ConstraintManager):
4242
"""
@@ -63,7 +63,8 @@ def __init__(self, config: Union[ConfigCommandProfile, ConfigCommandReport],
6363
os.makedirs(self._plot_export_directory, exist_ok=True)
6464

6565
# Dict of list of plots
66-
self._simple_plots: DefaultDict[str, Dict[str, SimplePlot]] = defaultdict()
66+
self._simple_plots: DefaultDict[str, Dict[str,
67+
SimplePlot]] = defaultdict()
6768
self._detailed_plots: Dict[str, DetailedPlot] = {}
6869

6970
def create_summary_plots(self):
@@ -186,7 +187,7 @@ def export_summary_plots(self):
186187

187188
def export_detailed_plots(self):
188189
"""
189-
Write detaild plots to disk
190+
Write detailed plots to disk
190191
"""
191192

192193
detailed_plot_dir = os.path.join(self._plot_export_directory,

model_analyzer/reports/report_manager.py

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -917,8 +917,13 @@ def _build_detailed_table(self, model_config_name):
917917
reverse=True)
918918
cpu_only = model_config.cpu_only()
919919

920-
first_column_header = 'Request Concurrency' if self._mode == 'online' else 'Client Batch Size'
921-
first_column_tag = 'concurrency-range' if self._mode == 'online' else 'batch-size'
920+
if self._was_measured_with_request_rate(measurements[0]):
921+
first_column_header = 'Request Rate' if self._mode == 'online' else 'Client Batch Size'
922+
first_column_tag = 'request-rate-range' if self._mode == 'online' else 'batch-size'
923+
else:
924+
first_column_header = 'Request Concurrency' if self._mode == 'online' else 'Client Batch Size'
925+
first_column_tag = 'concurrency-range' if self._mode == 'online' else 'batch-size'
926+
922927
if not cpu_only:
923928
headers = [
924929
first_column_header, 'p99 Latency (ms)',
@@ -1124,3 +1129,12 @@ def _cpu_metrics_were_gathered(self):
11241129
self._cpu_metrics_gathered_sticky = used_ram != 0
11251130

11261131
return self._cpu_metrics_gathered_sticky
1132+
1133+
def _was_measured_with_request_rate(
1134+
self, measurement: RunConfigMeasurement) -> bool:
1135+
if 'request-rate-range' in measurement.model_specific_pa_params(
1136+
)[0] and measurement.model_specific_pa_params(
1137+
)[0]['request-rate-range']:
1138+
return True
1139+
else:
1140+
return False

0 commit comments

Comments
 (0)