You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docsrc/user_guide/runtime.rst
+130-6Lines changed: 130 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,7 @@ programs just as you would otherwise via PyTorch API.
24
24
25
25
.. note:: If you are linking ``libtorchtrt_runtime.so``, likely using the following flags will help ``-Wl,--no-as-needed -ltorchtrt -Wl,--as-needed`` as there's no direct symbol dependency to anything in the Torch-TensorRT runtime for most Torch-TensorRT runtime applications
26
26
27
-
An example of how to use ``libtorchtrt_runtime.so`` can be found here: https://github.com/pytorch/TensorRT/tree/master/examples/torchtrt_runtime_example
27
+
An example of how to use ``libtorchtrt_runtime.so`` can be found here: https://github.com/pytorch/TensorRT/tree/master/examples/torchtrt_aoti_example
28
28
29
29
Plugin Library
30
30
---------------
@@ -87,8 +87,8 @@ Cudagraphs can accelerate certain models by reducing kernel overheads, as docume
87
87
with torch_tensorrt.runtime.enable_cudagraphs(trt_module):
88
88
...
89
89
90
-
In the current implementation, use of a new input shape (for instance in dynamic shape
91
-
cases), will cause the cudagraph to be re-recorded. Cudagraph recording is generally
90
+
In the current implementation, use of a new input shape (for instance in dynamic shape
91
+
cases), will cause the cudagraph to be re-recorded. Cudagraph recording is generally
92
92
not latency intensive, and future improvements include caching cudagraphs for multiple input shapes.
93
93
94
94
Dynamic Output Allocation Mode
@@ -101,11 +101,11 @@ Without dynamic output allocation, the output buffer is allocated based on the i
101
101
102
102
There are two scenarios in which dynamic output allocation is enabled:
103
103
104
-
1. The model has been identified at compile time to require dynamic output allocation for at least one TensorRT subgraph.
105
-
These models will engage the runtime mode automatically (with logging) and are incompatible with other runtime modes
104
+
1. The model has been identified at compile time to require dynamic output allocation for at least one TensorRT subgraph.
105
+
These models will engage the runtime mode automatically (with logging) and are incompatible with other runtime modes
106
106
such as CUDA Graphs.
107
107
108
-
Converters can declare that subgraphs that they produce will require the output allocator using `requires_output_allocator=True`
108
+
Converters can declare that subgraphs that they produce will require the output allocator using `requires_output_allocator=True`
109
109
there by forcing any model which utilizes the converter to automatically use the output allocator runtime mode. e.g.,
110
110
111
111
.. code-block:: python
@@ -131,3 +131,127 @@ there by forcing any model which utilizes the converter to automatically use the
131
131
# Enables Dynamic Output Allocation Mode, then resets the mode to its prior setting
132
132
with torch_tensorrt.runtime.enable_output_allocator(trt_module):
@@ -73,7 +74,7 @@ For `ir=ts`, this behavior stays the same in 2.X versions as well.
73
74
74
75
model = MyModel().eval().cuda()
75
76
inputs = [torch.randn((1, 3, 224, 224)).cuda()]
76
-
trt_ts = torch_tensorrt.compile(model, ir="ts", inputs=inputs) # Output is a ScriptModule object
77
+
trt_ts = torch_tensorrt.compile(model, ir="ts", arg_inputs=inputs) # Output is a ScriptModule object
77
78
torch.jit.save(trt_ts, "trt_model.ts")
78
79
79
80
# Later, you can load it and run inference
@@ -98,3 +99,26 @@ Here's an example usage
98
99
inputs = [torch.randn((1, 3, 224, 224)).cuda()]
99
100
model = torch_tensorrt.load(<file_path>).module()
100
101
model(*inputs)
102
+
103
+
b) PT2 Format
104
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
105
+
106
+
PT2 is a new format that allows models to be run outside of Python in the future. It utilizes `AOTInductor <https://docs.pytorch.org/docs/main/torch.compiler_aot_inductor.html>`_
107
+
to generate kernels for components that will not be run in TensorRT.
108
+
109
+
Here's an example on how to save and load Torch-TensorRT Module using AOTInductor in Python
0 commit comments