Skip to content

Commit da24e73

Browse files
committed
docs: Update README to reflect torch 2 support
Update the README file to remove the "experimental" tag from the documentaion. The existance of the tag was an oversight as Torch 2.x has been supported for 18+ months at this point. Signed-off-by: J Wyman <[email protected]>
1 parent db70751 commit da24e73

File tree

1 file changed

+39
-78
lines changed

1 file changed

+39
-78
lines changed

README.md

Lines changed: 39 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
<!--
2-
# Copyright 2020-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# Copyright 2020-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
#
44
# Redistribution and use in source and binary forms, with or without
55
# modification, are permitted provided that the following conditions
@@ -81,8 +81,8 @@ Currently, Triton requires that a specially patched version of
8181
PyTorch be used with the PyTorch backend. The full source for
8282
these PyTorch versions are available as Docker images from
8383
[NGC](https://ngc.nvidia.com). For example, the PyTorch version
84-
compatible with the 22.12 release of Triton is available as
85-
nvcr.io/nvidia/pytorch:22.12-py3.
84+
compatible with the 25.09 release of Triton is available as
85+
nvcr.io/nvidia/pytorch:25.09-py3.
8686

8787
Copy over the LibTorch and Torchvision headers and libraries from the
8888
[PyTorch NGC container](https://ngc.nvidia.com/catalog/containers/nvidia:pytorch)
@@ -306,54 +306,13 @@ instance in the
306306
to ensure that the model instance and the tensors used for inference are
307307
assigned to the same GPU device as on which the model was traced.
308308

309-
# PyTorch 2.0 Backend \[Experimental\]
309+
## PyTorch 2.0
310310

311-
> [!WARNING]
312-
> *This feature is subject to change and removal.*
313-
314-
Starting from 24.01, PyTorch models can be served directly via
315-
[Python runtime](src/model.py). By default, Triton will use the
316-
[LibTorch runtime](#pytorch-libtorch-backend) for PyTorch models. To use Python
317-
runtime, provide the following
318-
[runtime setting](https://github.com/triton-inference-server/backend/blob/main/README.md#backend-shared-library)
319-
in the model configuration:
320-
321-
```
322-
runtime: "model.py"
323-
```
324-
325-
## Dependencies
326-
327-
### Python backend dependency
328-
329-
This feature depends on
330-
[Python backend](https://github.com/triton-inference-server/python_backend),
331-
see
332-
[Python-based Backends](https://github.com/triton-inference-server/backend/blob/main/docs/python_based_backends.md)
333-
for more details.
334-
335-
### PyTorch dependency
336-
337-
This feature will take advantage of the
338-
[`torch.compile`](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile)
339-
optimization, make sure the
340-
[PyTorch 2.0+ pip package](https://pypi.org/project/torch) is available in the
341-
same Python environment.
342-
343-
Alternatively, a [Python Execution Environment](#using-custom-python-execution-environments)
344-
with the PyTorch dependency may be used. It can be created with the
345-
[provided script](tools/gen_pb_exec_env.sh). The resulting
346-
`pb_exec_env_model.py.tar.gz` file should be placed at the same
347-
[backend shared library](https://github.com/triton-inference-server/backend/blob/main/README.md#backend-shared-library)
348-
directory as the [Python runtime](src/model.py).
349-
350-
## Model Layout
351-
352-
### PyTorch 2.0 models
311+
### PyTorch 2.0 Models
353312

354313
The model repository should look like:
355314

356-
```
315+
```bash
357316
model_repository/
358317
`-- model_directory
359318
|-- 1
@@ -362,18 +321,18 @@ model_repository/
362321
`-- config.pbtxt
363322
```
364323

365-
The `model.py` contains the class definition of the PyTorch model. The class
366-
should extend the
324+
The `model.py` contains the class definition of the PyTorch model.
325+
The class should extend the
367326
[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
368327
The `model.pt` may be optionally provided which contains the saved
369328
[`state_dict`](https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-model-for-inference)
370329
of the model.
371330

372-
### TorchScript models
331+
### TorchScript Models
373332

374333
The model repository should look like:
375334

376-
```
335+
```bash
377336
model_repository/
378337
`-- model_directory
379338
|-- 1
@@ -383,49 +342,51 @@ model_repository/
383342

384343
The `model.pt` is the TorchScript model file.
385344

386-
## Customization
345+
### Customization
387346

388347
The following PyTorch settings may be customized by setting parameters on the
389348
`config.pbtxt`.
390349

391350
[`torch.set_num_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_threads.html#torch.set_num_threads)
392-
- Key: NUM_THREADS
393-
- Value: The number of threads used for intraop parallelism on CPU.
351+
352+
* Key: `NUM_THREADS`
353+
* Value: The number of threads used for intra-op parallelism on CPU.
394354

395355
[`torch.set_num_interop_threads(int)`](https://pytorch.org/docs/stable/generated/torch.set_num_interop_threads.html#torch.set_num_interop_threads)
396-
- Key: NUM_INTEROP_THREADS
397-
- Value: The number of threads used for interop parallelism (e.g. in JIT
398-
interpreter) on CPU.
356+
357+
* Key: `NUM_INTEROP_THREADS`
358+
* Value: The number of threads used for interop parallelism (e.g. in JIT interpreter) on CPU.
399359

400360
[`torch.compile()` parameters](https://pytorch.org/docs/stable/generated/torch.compile.html#torch-compile)
401-
- Key: TORCH_COMPILE_OPTIONAL_PARAMETERS
402-
- Value: Any of following parameter(s) encoded as a JSON object.
403-
- fullgraph (*bool*): Whether it is ok to break model into several subgraphs.
404-
- dynamic (*bool*): Use dynamic shape tracing.
405-
- backend (*str*): The backend to be used.
406-
- mode (*str*): Can be either "default", "reduce-overhead" or "max-autotune".
407-
- options (*dict*): A dictionary of options to pass to the backend.
408-
- disable (*bool*): Turn `torch.compile()` into a no-op for testing.
361+
362+
* Key: `TORCH_COMPILE_OPTIONAL_PARAMETERS`
363+
* Value: Any of following parameter(s) encoded as a JSON object.
364+
* `fullgraph` (`bool`): Whether it is ok to break model into several subgraphs.
365+
* `dynamic` (`bool`): Use dynamic shape tracing.
366+
* `backend` (`str`): The backend to be used.
367+
* `mode` (`str`): Can be either `"default"`, `"reduce-overhead"`, or `"max-autotune"`.
368+
* `options` (`dict`): A dictionary of options to pass to the backend.
369+
* `disable` (`bool`): Turn `torch.compile()` into a no-op for testing.
409370

410371
For example:
411-
```
372+
373+
```proto
412374
parameters: {
413-
key: "NUM_THREADS"
414-
value: { string_value: "4" }
375+
key: "NUM_THREADS"
376+
value: { string_value: "4" }
415377
}
416378
parameters: {
417-
key: "TORCH_COMPILE_OPTIONAL_PARAMETERS"
418-
value: { string_value: "{\"disable\": true}" }
379+
key: "TORCH_COMPILE_OPTIONAL_PARAMETERS"
380+
value: { string_value: "{\"disable\": true}" }
419381
}
420382
```
421383

422-
## Limitations
384+
### Limitations
423385

424386
Following are few known limitations of this feature:
425-
- Python functions optimizable by `torch.compile` may not be served directly in
426-
the `model.py` file, they need to be enclosed by a class extending the
427-
[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
428-
- Model weights cannot be shared across multiple instances on the same GPU
429-
device.
430-
- When using `KIND_MODEL` as model instance kind, the default device of the
431-
first parameter on the model is used.
387+
388+
* Python functions optimizable by `torch.compile` may not be served directly in the `model.py` file, they need to be enclosed by a class extending the
389+
[`torch.nn.Module`](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module).
390+
* Model weights cannot be shared across multiple instances on the same GPU device.
391+
* When using `KIND_MODEL` as model instance kind, the default device of the first parameter on the model is used.
392+

0 commit comments

Comments
 (0)