Skip to content

Commit 6f7ea79

Browse files
pvijayakrishmc-nv
andauthored
Update README for r24.10 (#939)
* Update README for Release 24.10 * Update README.md Co-authored-by: Misha Chornyi <[email protected]> * Revert "Update README.md" This reverts commit 682c999. --------- Co-authored-by: Misha Chornyi <[email protected]> Co-authored-by: Misha Chornyi <[email protected]>
1 parent 5e9c997 commit 6f7ea79

File tree

1 file changed

+107
-3
lines changed

1 file changed

+107
-3
lines changed

README.md

Lines changed: 107 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,110 @@ limitations under the License.
1818

1919
# Triton Model Analyzer
2020

21-
> [!Warning]
22-
>
23-
> You are currently on the `r24.10` branch which tracks under-development progress towards the next release. <br>
21+
Triton Model Analyzer is a CLI tool which can help you find a more optimal configuration, on a given piece of hardware, for single, multiple, ensemble, or BLS models running on a [Triton Inference Server](https://github.com/triton-inference-server/server/). Model Analyzer will also generate reports to help you better understand the trade-offs of the different configurations along with their compute and memory requirements.
22+
<br><br>
23+
24+
# Features
25+
26+
### Search Modes
27+
28+
- [Optuna Search](docs/config_search.md#optuna-search-mode) **_-ALPHA RELEASE-_** allows you to search for every parameter that can be specified in the model configuration, using a hyperparameter optimization framework. Please see the [Optuna](https://optuna.org/) website if you are interested in specific details on how the algorithm functions.
29+
30+
- [Quick Search](docs/config_search.md#quick-search-mode) will **sparsely** search the [Max Batch Size](https://github.com/triton-inference-server/server/blob/r24.10/docs/user_guide/model_configuration.md#maximum-batch-size),
31+
[Dynamic Batching](https://github.com/triton-inference-server/server/blob/r24.10/docs/user_guide/model_configuration.md#dynamic-batcher), and
32+
[Instance Group](https://github.com/triton-inference-server/server/blob/r24.10/docs/user_guide/model_configuration.md#instance-groups) spaces by utilizing a heuristic hill-climbing algorithm to help you quickly find a more optimal configuration
33+
34+
- [Automatic Brute Search](docs/config_search.md#automatic-brute-search) will **exhaustively** search the
35+
[Max Batch Size](https://github.com/triton-inference-server/server/blob/r24.10/docs/user_guide/model_configuration.md#maximum-batch-size),
36+
[Dynamic Batching](https://github.com/triton-inference-server/server/blob/r24.10/docs/user_guide/model_configuration.md#dynamic-batcher), and
37+
[Instance Group](https://github.com/triton-inference-server/server/blob/r24.10/docs/user_guide/model_configuration.md#instance-groups)
38+
parameters of your model configuration
39+
40+
- [Manual Brute Search](docs/config_search.md#manual-brute-search) allows you to create manual sweeps for every parameter that can be specified in the model configuration
41+
42+
### Model Types
43+
44+
- [Ensemble](docs/model_types.md#ensemble): Model Analyzer can help you find the optimal
45+
settings when profiling an ensemble model
46+
47+
- [BLS](docs/model_types.md#bls): Model Analyzer can help you find the optimal
48+
settings when profiling a BLS model
49+
50+
- [Multi-Model](docs/model_types.md#multi-model): Model Analyzer can help you
51+
find the optimal settings when profiling multiple concurrent models
52+
53+
- [LLM](docs/model_types.md#llm): Model Analyzer can help you
54+
find the optimal settings when profiling Large Language Models
55+
56+
### Other Features
57+
58+
- [Detailed and summary reports](docs/report.md): Model Analyzer is able to generate
59+
summarized and detailed reports that can help you better understand the trade-offs
60+
between different model configurations that can be used for your model.
61+
62+
- [QoS Constraints](docs/config.md#constraint): Constraints can help you
63+
filter out the Model Analyzer results based on your QoS requirements. For
64+
example, you can specify a latency budget to filter out model configurations
65+
that do not satisfy the specified latency threshold.
66+
<br><br>
67+
68+
# Examples and Tutorials
69+
70+
### **Single Model**
71+
72+
See the [Single Model Quick Start](docs/quick_start.md) for a guide on how to use Model Analyzer to profile, analyze and report on a simple PyTorch model.
73+
74+
### **Multi Model**
75+
76+
See the [Multi-model Quick Start](docs/mm_quick_start.md) for a guide on how to use Model Analyzer to profile, analyze and report on two models running concurrently on the same GPU.
77+
78+
### **Ensemble Model**
79+
80+
See the [Ensemble Model Quick Start](docs/ensemble_quick_start.md) for a guide on how to use Model Analyzer to profile, analyze and report on a simple Ensemble model.
81+
82+
### **BLS Model**
83+
84+
See the [BLS Model Quick Start](docs/bls_quick_start.md) for a guide on how to use Model Analyzer to profile, analyze and report on a simple BLS model.
85+
<br><br>
86+
87+
# Documentation
88+
89+
- [Installation](docs/install.md)
90+
- [Model Analyzer CLI](docs/cli.md)
91+
- [Launch Modes](docs/launch_modes.md)
92+
- [Configuring Model Analyzer](docs/config.md)
93+
- [Model Analyzer Metrics](docs/metrics.md)
94+
- [Model Config Search](docs/config_search.md)
95+
- [Model Types](docs/model_types.md)
96+
- [Checkpointing](docs/checkpoints.md)
97+
- [Model Analyzer Reports](docs/report.md)
98+
- [Deployment with Kubernetes](docs/kubernetes_deploy.md)
99+
<br><br>
100+
101+
# Terminology
102+
103+
Below are definitions of some commonly used terms in Model Analyzer:
104+
105+
- **Model Type** - Category of model being profiled. Examples of this include single, multi, ensemble, BLS, etc..
106+
- **Search Mode** - How Model Analyzer explores the possible configuration space when profiling. This is either exhaustive (brute) or heuristic (quick/optuna).
107+
- **Model Config Search** - The cross product of model type and search mode.
108+
- **Launch Mode** - How the Triton Server is deployed and used by Model Analyzer.
109+
110+
# Reporting problems, asking questions
111+
112+
We appreciate any feedback, questions or bug reporting regarding this
113+
project. When help with code is needed, follow the process outlined in
114+
the Stack Overflow (https://stackoverflow.com/help/mcve)
115+
document. Ensure posted examples are:
116+
117+
- minimal – use as little code as possible that still produces the
118+
same problem
119+
120+
- complete – provide all parts needed to reproduce the problem. Check
121+
if you can strip external dependency and still show the problem. The
122+
less time we spend on reproducing problems the more time we have to
123+
fix it
124+
125+
- verifiable – test the code you're about to provide to make sure it
126+
reproduces the problem. Remove all other problems that are not
127+
related to your request/question.

0 commit comments

Comments
 (0)