You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/README.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
<!--
2
-
Copyright (c) 2020-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2
+
Copyright (c) 2020-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3
3
4
4
Licensed under the Apache License, Version 2.0 (the "License");
5
5
you may not use this file except in compliance with the License.
@@ -47,4 +47,4 @@ The User Guide describes how to configure Model Analyzer, choose launch and sear
47
47
48
48
The following resources are recommended:
49
49
50
-
-[Perf Analyzer](https://github.com/triton-inference-server/client/blob/main/src/c++/perf_analyzer/README.md): Perf Analyzer is a CLI application built to generate inference requests and measures the latency of those requests and throughput of the model being served.
50
+
-[Perf Analyzer](https://github.com/triton-inference-server/perf_analyzer/blob/main/README.md): Perf Analyzer is a CLI application built to generate inference requests and measures the latency of those requests and throughput of the model being served.
Copy file name to clipboardExpand all lines: docs/config_search.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -98,7 +98,7 @@ It has two modes:
98
98
99
99
The parameters that are automatically searched are
100
100
[model maximum batch size](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#maximum-batch-size),
101
-
[model instance groups](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#instance-groups), and [request concurrencies](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/perf_analyzer.md#request-concurrency).
101
+
[model instance groups](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#instance-groups), and [request concurrencies](https://github.com/triton-inference-server/perf_analyzer/blob/main/docs/cli.md#measurement-options).
102
102
Additionally, [dynamic_batching](https://github.com/triton-inference-server/server/blob/master/docs/user_guide/model_configuration.md#dynamic-batcher) will be enabled if it is legal to do so.
103
103
104
104
_An example model analyzer YAML config that performs an Automatic Brute Search:_
@@ -128,13 +128,13 @@ You can also modify the minimum/maximum values that the automatic search space w
- `Default:`1 to 1024 concurrencies, sweeping over powers of 2 (i.e. 1, 2, 4, 8, ...)
140
140
- `--run-config-search-min-request-rate: <val>`: Changes the request rate minimum automatic search space value
@@ -422,7 +422,7 @@ _This mode has the following limitations:_
422
422
423
423
- Summary/Detailed reports do not include the new metrics
424
424
425
-
In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/genai-perf/README.md#cli) documentation for a list of the flags that can be specified.
425
+
In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/README.md#command-line-options) documentation for a list of the flags that can be specified.
426
426
427
427
LLMs can be optimized using either Quick or Brute search mode.
Copy file name to clipboardExpand all lines: docs/model_types.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -119,7 +119,7 @@ _Profiling this model type has the following limitations:_
119
119
120
120
- Summary/Detailed reports do not include the new metrics
121
121
122
-
In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/client/blob/main/src/c%2B%2B/perf_analyzer/genai-perf/README.md#cli) documentation for a list of the flags that can be specified.
122
+
In order to profile LLMs you must tell MA that the model type is LLM by setting `--model-type LLM` in the CLI/config file. You can specify CLI options to the GenAI-Perf tool using `genai_perf_flags`. See the [GenAI-Perf CLI](https://github.com/triton-inference-server/perf_analyzer/blob/main/genai-perf/README.md) documentation for a list of the flags that can be specified.
123
123
124
124
LLMs can be optimized using either Quick or Brute search mode.
0 commit comments