docs: Additional Improvements

whoisj · whoisj · commit 0659dc90e084 · 2025-10-13T17:16:42.000-04:00
Use yaml instead of proto for code blocks to improve readability.

Add example instance_group configuration.
diff --git a/README.md b/README.md
@@ -135,10 +135,12 @@ model_repository/
 
 The `model.pt` is the TorchScript model file.
 
-### Parameters
+## Configuration
 
 Triton exposes some flags to control the execution mode of the TorchScript models through the `Parameters` section of the model's `config.pbtxt` file.
 
+### Parameters
+
 * `DISABLE_OPTIMIZED_EXECUTION`:
   Boolean flag to disable the optimized execution of TorchScript models.
   By default, the optimized execution is always enabled.
@@ -154,7 +156,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
 
   The section of model config file specifying this parameter will look like:
 
-  ```proto
+  ```yaml
   parameters: {
     key: "DISABLE_OPTIMIZED_EXECUTION"
     value: { string_value: "true" }
@@ -173,7 +175,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
 
   To enable inference mode, use the configuration example below:
 
-  ```proto
+  ```yaml
   parameters: {
     key: "INFERENCE_MODE"
     value: { string_value: "true" }
@@ -193,7 +195,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
 
   To disable cuDNN, use the configuration example below:
 
-  ```proto
+  ```yaml
   parameters: {
     key: "DISABLE_CUDNN"
     value: { string_value: "true" }
@@ -208,7 +210,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
 
   To enable weight sharing, use the configuration example below:
 
-  ```proto
+  ```yaml
   parameters: {
     key: "ENABLE_WEIGHT_SHARING"
     value: { string_value: "true" }
@@ -226,7 +228,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
 
   To enable cleaning of the CUDA cache after every execution, use the configuration example below:
 
-  ```proto
+  ```yaml
   parameters: {
     key: "ENABLE_CACHE_CLEANING"
     value: { string_value: "true" }
@@ -249,7 +251,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
 
   To set the inter-op thread count, use the configuration example below:
 
-  ```proto
+  ```yaml
   parameters: {
     key: "INTER_OP_THREAD_COUNT"
     value: { string_value: "1" }
@@ -275,7 +277,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
 
   To set the intra-op thread count, use the configuration example below:
 
-  ```proto
+  ```yaml
   parameters: {
     key: "INTRA_OP_THREAD_COUNT"
     value: { string_value: "1" }
@@ -291,9 +293,7 @@ Triton exposes some flags to control the execution mode of the TorchScript model
 
     `ENABLE_JIT_PROFILING`
 
-### Support
-
-#### Model Instance Group Kind
+### Model Instance Group Kind
 
 The PyTorch backend supports the following kinds of
 [Model Instance Groups](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/model_configuration.md#instance-groups)
@@ -319,6 +319,15 @@ where the input tensors are placed as follows:
   > [!IMPORTANT]
   > If a device is not specified in the model, the backend uses the first available GPU device.
 
+To set the model instance group, use the configuration example below:
+
+```yaml
+instance_group {
+   count: 2
+   kind: KIND_GPU
+}
+```
+
 ### Customization
 
 The following PyTorch settings may be customized by setting parameters on the
@@ -347,7 +356,7 @@ The following PyTorch settings may be customized by setting parameters on the
 
 For example:
 
-```proto
+```yaml
 parameters: {
   key: "NUM_THREADS"
   value: { string_value: "4" }
@@ -358,7 +367,7 @@ parameters: {
 }
 ```
 
-### Important Notes
+## Important Notes
 
 * The execution of PyTorch model on GPU is asynchronous in nature.
   See