@@ -135,10 +135,12 @@ model_repository/
135135
136136The ` model.pt ` is the TorchScript model file.
137137
138- ### Parameters
138+ ## Configuration
139139
140140Triton exposes some flags to control the execution mode of the TorchScript models through the ` Parameters ` section of the model's ` config.pbtxt ` file.
141141
142+ ### Parameters
143+
142144* ` DISABLE_OPTIMIZED_EXECUTION ` :
143145 Boolean flag to disable the optimized execution of TorchScript models.
144146 By default, the optimized execution is always enabled.
@@ -154,7 +156,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
154156
155157 The section of model config file specifying this parameter will look like:
156158
157- ``` proto
159+ ``` yaml
158160 parameters : {
159161 key : " DISABLE_OPTIMIZED_EXECUTION"
160162 value : { string_value: "true" }
@@ -173,7 +175,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
173175
174176 To enable inference mode, use the configuration example below:
175177
176- ``` proto
178+ ``` yaml
177179 parameters : {
178180 key : " INFERENCE_MODE"
179181 value : { string_value: "true" }
@@ -193,7 +195,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
193195
194196 To disable cuDNN, use the configuration example below:
195197
196- ``` proto
198+ ``` yaml
197199 parameters : {
198200 key : " DISABLE_CUDNN"
199201 value : { string_value: "true" }
@@ -208,7 +210,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
208210
209211 To enable weight sharing, use the configuration example below:
210212
211- ``` proto
213+ ``` yaml
212214 parameters : {
213215 key : " ENABLE_WEIGHT_SHARING"
214216 value : { string_value: "true" }
@@ -226,7 +228,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
226228
227229 To enable cleaning of the CUDA cache after every execution, use the configuration example below:
228230
229- ``` proto
231+ ``` yaml
230232 parameters : {
231233 key : " ENABLE_CACHE_CLEANING"
232234 value : { string_value: "true" }
@@ -249,7 +251,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
249251
250252 To set the inter-op thread count, use the configuration example below:
251253
252- ``` proto
254+ ``` yaml
253255 parameters : {
254256 key : " INTER_OP_THREAD_COUNT"
255257 value : { string_value: "1" }
@@ -275,7 +277,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
275277
276278 To set the intra-op thread count, use the configuration example below:
277279
278- ``` proto
280+ ``` yaml
279281 parameters : {
280282 key : " INTRA_OP_THREAD_COUNT"
281283 value : { string_value: "1" }
@@ -291,9 +293,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
291293
292294 ` ENABLE_JIT_PROFILING `
293295
294- ### Support
295-
296- #### Model Instance Group Kind
296+ ### Model Instance Group Kind
297297
298298The PyTorch backend supports the following kinds of
299299[ Model Instance Groups] ( https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups )
@@ -319,6 +319,15 @@ where the input tensors are placed as follows:
319319 > [ !IMPORTANT]
320320 > If a device is not specified in the model, the backend uses the first available GPU device.
321321
322+ To set the model instance group, use the configuration example below:
323+
324+ ``` yaml
325+ instance_group {
326+ count : 2
327+ kind : KIND_GPU
328+ }
329+ ```
330+
322331### Customization
323332
324333The following PyTorch settings may be customized by setting parameters on the
@@ -347,7 +356,7 @@ The following PyTorch settings may be customized by setting parameters on the
347356
348357For example:
349358
350- ``` proto
359+ ``` yaml
351360parameters : {
352361 key : " NUM_THREADS"
353362 value : { string_value: "4" }
@@ -358,7 +367,7 @@ parameters: {
358367}
359368```
360369
361- ### Important Notes
370+ ## Important Notes
362371
363372* The execution of PyTorch model on GPU is asynchronous in nature.
364373 See
0 commit comments