@@ -135,10 +135,12 @@ model_repository/
135135
136136The ` model.pt ` is the TorchScript model file.
137137
138- ### Parameters
138+ ## Configuration
139139
140140Triton exposes some flags to control the execution mode of the TorchScript models through the ` Parameters ` section of the model's ` config.pbtxt ` file.
141141
142+ ### Parameters
143+
142144* ` DISABLE_OPTIMIZED_EXECUTION ` :
143145 Boolean flag to disable the optimized execution of TorchScript models.
144146 By default, the optimized execution is always enabled.
@@ -154,7 +156,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
154156
155157 The section of model config file specifying this parameter will look like:
156158
157- ``` proto
159+ ``` yaml
158160 parameters : {
159161 key : " DISABLE_OPTIMIZED_EXECUTION"
160162 value : { string_value: "true" }
@@ -173,7 +175,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
173175
174176 To enable inference mode, use the configuration example below:
175177
176- ``` proto
178+ ``` yaml
177179 parameters : {
178180 key : " INFERENCE_MODE"
179181 value : { string_value: "true" }
@@ -193,7 +195,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
193195
194196 To disable cuDNN, use the configuration example below:
195197
196- ``` proto
198+ ``` yaml
197199 parameters : {
198200 key : " DISABLE_CUDNN"
199201 value : { string_value: "true" }
@@ -208,7 +210,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
208210
209211 To enable weight sharing, use the configuration example below:
210212
211- ``` proto
213+ ``` yaml
212214 parameters : {
213215 key : " ENABLE_WEIGHT_SHARING"
214216 value : { string_value: "true" }
@@ -226,7 +228,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
226228
227229 To enable cleaning of the CUDA cache after every execution, use the configuration example below:
228230
229- ``` proto
231+ ``` yaml
230232 parameters : {
231233 key : " ENABLE_CACHE_CLEANING"
232234 value : { string_value: "true" }
@@ -249,7 +251,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
249251
250252 To set the inter-op thread count, use the configuration example below:
251253
252- ``` proto
254+ ``` yaml
253255 parameters : {
254256 key : " INTER_OP_THREAD_COUNT"
255257 value : { string_value: "1" }
@@ -270,7 +272,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
270272
271273 To set the intra-op thread count, use the configuration example below:
272274
273- ``` proto
275+ ``` yaml
274276 parameters : {
275277 key : " INTRA_OP_THREAD_COUNT"
276278 value : { string_value: "1" }
@@ -286,9 +288,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
286288
287289 ` ENABLE_JIT_PROFILING `
288290
289- ### Support
290-
291- #### Model Instance Group Kind
291+ ### Model Instance Group Kind
292292
293293The PyTorch backend supports the following kinds of
294294[ Model Instance Groups] ( https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups )
@@ -314,6 +314,15 @@ where the input tensors are placed as follows:
314314 > [ !IMPORTANT]
315315 > If a device is not specified in the model, the backend uses the first available GPU device.
316316
317+ To set the model instance group, use the configuration example below:
318+
319+ ``` yaml
320+ instance_group {
321+ count : 2
322+ kind : KIND_GPU
323+ }
324+ ```
325+
317326### Customization
318327
319328The following PyTorch settings may be customized by setting parameters on the
@@ -342,7 +351,7 @@ The following PyTorch settings may be customized by setting parameters on the
342351
343352For example:
344353
345- ``` proto
354+ ``` yaml
346355parameters : {
347356 key : " NUM_THREADS"
348357 value : { string_value: "4" }
@@ -353,7 +362,7 @@ parameters: {
353362}
354363```
355364
356- ### Important Notes
365+ ## Important Notes
357366
358367* The execution of PyTorch model on GPU is asynchronous in nature.
359368 See
0 commit comments