11<!-- 
2- # Copyright 2020-2024 , NVIDIA CORPORATION & AFFILIATES. All rights reserved. 
2+ # Copyright 2020-2025 , NVIDIA CORPORATION & AFFILIATES. All rights reserved. 
33# 
44# Redistribution and use in source and binary forms, with or without 
55# modification, are permitted provided that the following conditions 
@@ -81,8 +81,8 @@ Currently, Triton requires that a specially patched version of
8181PyTorch be used with the PyTorch backend. The full source for
8282these PyTorch versions are available as Docker images from
8383[ NGC] ( https://ngc.nvidia.com ) . For example, the PyTorch version
84- compatible with the 22.12  release of Triton is available as
85- nvcr.io/nvidia/pytorch:22.12 -py3.
84+ compatible with the 25.09  release of Triton is available as
85+ nvcr.io/nvidia/pytorch:25.09 -py3.
8686
8787Copy over the LibTorch and Torchvision headers and libraries from the
8888[ PyTorch NGC container] ( https://ngc.nvidia.com/catalog/containers/nvidia:pytorch ) 
@@ -306,54 +306,13 @@ instance in the
306306to ensure that the model instance and the tensors used for inference are
307307assigned to the same GPU device as on which the model was traced.
308308
309- # PyTorch 2.0 Backend  \[ Experimental \]   
309+ ##  PyTorch 2.0  
310310
311- >  [ !WARNING] 
312- >  * This feature is subject to change and removal.* 
313- 
314- Starting from 24.01, PyTorch models can be served directly via
315- [ Python runtime] ( src/model.py ) . By default, Triton will use the
316- [ LibTorch runtime] ( #pytorch-libtorch-backend )  for PyTorch models. To use Python
317- runtime, provide the following
318- [ runtime setting] ( https://github.com/triton-inference-server/backend/blob/main/README.md#backend-shared-library ) 
319- in the model configuration:
320- 
321- ``` 
322- runtime: "model.py" 
323- ``` 
324- 
325- ## Dependencies  
326- 
327- ### Python backend dependency  
328- 
329- This feature depends on
330- [ Python backend] ( https://github.com/triton-inference-server/python_backend ) ,
331- see
332- [ Python-based Backends] ( https://github.com/triton-inference-server/backend/blob/main/docs/python_based_backends.md ) 
333- for more details.
334- 
335- ### PyTorch dependency  
336- 
337- This feature will take advantage of the
338- [ ` torch.compile ` ] ( https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile ) 
339- optimization, make sure the
340- [ PyTorch 2.0+ pip package] ( https://pypi.org/project/torch )  is available in the
341- same Python environment.
342- 
343- Alternatively, a [ Python Execution Environment] ( #using-custom-python-execution-environments ) 
344- with the PyTorch dependency may be used. It can be created with the
345- [ provided script] ( tools/gen_pb_exec_env.sh ) . The resulting
346- ` pb_exec_env_model.py.tar.gz `  file should be placed at the same
347- [ backend shared library] ( https://github.com/triton-inference-server/backend/blob/main/README.md#backend-shared-library ) 
348- directory as the [ Python runtime] ( src/model.py ) .
349- 
350- ## Model Layout  
351- 
352- ### PyTorch 2.0 models  
311+ ### PyTorch 2.0 Models  
353312
354313The model repository should look like:
355314
356- ``` 
315+ ``` bash 
357316model_repository/
358317` -- model_directory
359318    | -- 1 
@@ -362,18 +321,18 @@ model_repository/
362321    ` -- config.pbtxt
363322` ` ` 
364323
365- The ` model.py `  contains the class definition of the PyTorch model. The class 
366- should extend the
324+ The ` model.py`   contains the class definition of the PyTorch model.
325+ The class  should extend the
367326[` torch.nn.Module`  ](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
368327The ` model.pt`   may be optionally provided which contains the saved
369328[` state_dict`  ](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
370329of the model.
371330
372- ### TorchScript models   
331+ # ## TorchScript Models 
373332
374333The model repository should look like:
375334
376- ``` 
335+ ` ` ` bash 
377336model_repository/ 
378337`  -- model_directory
379338    | -- 1
@@ -383,49 +342,51 @@ model_repository/
383342
384343The ` model.pt `  is the TorchScript model file.
385344
386- ## Customization  
345+ ###  Customization  
387346
388347The following PyTorch settings may be customized by setting parameters on the
389348` config.pbtxt ` .
390349
391350[ ` torch.set_num_threads(int) ` ] ( https://pytorch.org/docs/stable/generated/torch.set_num_threads.html#torch.set_num_threads ) 
392- -  Key: NUM_THREADS
393- -  Value: The number of threads used for intraop parallelism on CPU.
351+ 
352+ *  Key: ` NUM_THREADS ` 
353+ *  Value: The number of threads used for intra-op parallelism on CPU.
394354
395355[ ` torch.set_num_interop_threads(int) ` ] ( https://pytorch.org/docs/stable/generated/torch.set_num_interop_threads.html#torch.set_num_interop_threads ) 
396- -  Key: NUM_INTEROP_THREADS 
397- -  Value: The number of threads used for interop parallelism (e.g. in JIT 
398- interpreter) on CPU.
356+ 
357+ *  Key:  ` NUM_INTEROP_THREADS ` 
358+ *  Value: The number of threads used for interop parallelism (e.g. in JIT  interpreter) on CPU.
399359
400360[ ` torch.compile() `  parameters] ( https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile ) 
401- -  Key: TORCH_COMPILE_OPTIONAL_PARAMETERS
402- -  Value: Any of following parameter(s) encoded as a JSON object.
403-   -  fullgraph (* bool* ): Whether it is ok to break model into several subgraphs.
404-   -  dynamic (* bool* ): Use dynamic shape tracing.
405-   -  backend (* str* ): The backend to be used.
406-   -  mode (* str* ): Can be either "default", "reduce-overhead" or "max-autotune".
407-   -  options (* dict* ): A dictionary of options to pass to the backend.
408-   -  disable (* bool* ): Turn ` torch.compile() `  into a no-op for testing.
361+ 
362+ *  Key: ` TORCH_COMPILE_OPTIONAL_PARAMETERS ` 
363+ *  Value: Any of following parameter(s) encoded as a JSON object.
364+   *  ` fullgraph `  (` bool ` ): Whether it is ok to break model into several subgraphs.
365+   *  ` dynamic `  (` bool ` ): Use dynamic shape tracing.
366+   *  ` backend `  (` str ` ): The backend to be used.
367+   *  ` mode `  (` str ` ): Can be either ` "default" ` , ` "reduce-overhead" ` , or ` "max-autotune" ` .
368+   *  ` options `  (` dict ` ): A dictionary of options to pass to the backend.
369+   *  ` disable `  (` bool ` ): Turn ` torch.compile() `  into a no-op for testing.
409370
410371For example:
411- ``` 
372+ 
373+ ``` proto 
412374parameters: { 
413-      key: "NUM_THREADS" 
414-      value: { string_value: "4" } 
375+   key: "NUM_THREADS" 
376+   value: { string_value: "4" } 
415377} 
416378parameters: { 
417-      key: "TORCH_COMPILE_OPTIONAL_PARAMETERS" 
418-      value: { string_value: "{\"disable\": true}" } 
379+   key: "TORCH_COMPILE_OPTIONAL_PARAMETERS" 
380+   value: { string_value: "{\"disable\": true}" } 
419381} 
420382``` 
421383
422- ## Limitations  
384+ ###  Limitations  
423385
424386Following are few known limitations of this feature:
425- -  Python functions optimizable by ` torch.compile `  may not be served directly in
426- the ` model.py `  file, they need to be enclosed by a class extending the
427- [ ` torch.nn.Module ` ] ( https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module ) .
428- -  Model weights cannot be shared across multiple instances on the same GPU
429- device.
430- -  When using ` KIND_MODEL `  as model instance kind, the default device of the
431- first parameter on the model is used.
387+ 
388+ *  Python functions optimizable by ` torch.compile `  may not be served directly in the ` model.py `  file, they need to be enclosed by a class extending the
389+   [ ` torch.nn.Module ` ] ( https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module ) .
390+ *  Model weights cannot be shared across multiple instances on the same GPU device.
391+ *  When using ` KIND_MODEL `  as model instance kind, the default device of the first parameter on the model is used.
392+ 
0 commit comments