Skip to content

Commit 4d0335d

Browse files
peri044zewenli98
andauthored
chore: add dynamic shapes section in the resnet tutorial (#2904)
Co-authored-by: Evan Li <[email protected]>
1 parent 8cc57cb commit 4d0335d

File tree

3 files changed

+86
-163
lines changed

3 files changed

+86
-163
lines changed

docsrc/user_guide/dynamic_shapes.rst

Lines changed: 40 additions & 145 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@ Dynamic shapes with Torch-TensorRT
55

66
By default, you can run a pytorch model with varied input shapes and the output shapes are determined eagerly.
77
However, Torch-TensorRT is an AOT compiler which requires some prior information about the input shapes to compile and optimize the model.
8+
9+
Dynamic shapes using torch.export (AOT)
10+
------------------------------------
11+
812
In the case of dynamic input shapes, we must provide the (min_shape, opt_shape, max_shape) arguments so that the model can be optimized for
913
this range of input shapes. An example usage of static and dynamic shapes is as follows.
1014

@@ -30,168 +34,57 @@ Under the hood
3034

3135
There are two phases of compilation when we use ``torch_tensorrt.compile`` API with ``ir=dynamo`` (default).
3236

33-
- aten_tracer.trace (which uses torch.export to trace the graph with the given inputs)
37+
- torch_tensorrt.dynamo.trace (which uses torch.export to trace the graph with the given inputs)
3438

35-
In the tracing phase, we use torch.export along with the constraints. In the case of
36-
dynamic shaped inputs, the range can be provided to the tracing via constraints. Please
37-
refer to this `docstring <https://github.com/pytorch/pytorch/blob/5dcee01c2b89f6bedeef9dd043fd8d6728286582/torch/export/__init__.py#L372-L434>`_
38-
for detailed information on how to set constraints. In short, we create new inputs for
39-
torch.export tracing and provide constraints on the min and max values(provided by the user), a particular dimension can take.
40-
Please take a look at ``aten_tracer.py`` file to understand how this works under the hood.
39+
We use ``torch.export.export()`` API for tracing and exporting a PyTorch module into ``torch.export.ExportedProgram``. In the case of
40+
dynamic shaped inputs, the ``(min_shape, opt_shape, max_shape)`` range provided via ``torch_tensorrt.Input`` API is used to construct ``torch.export.Dim`` objects
41+
which is used in the ``dynamic_shapes`` argument for the export API.
42+
Please take a look at ``_tracer.py`` file to understand how this works under the hood.
4143

42-
- dynamo.compile (which compiles a torch.fx.GraphModule object using TensorRT)
44+
- torch_tensorrt.dynamo.compile (which compiles an torch.export.ExportedProgram object using TensorRT)
4345

44-
In the conversion to TensorRT, we use the user provided dynamic shape inputs.
45-
We perform shape analysis using dummy inputs (across min, opt and max shapes) and store the
46-
intermediate output shapes which can be used in case the graph has a mix of Pytorch
47-
and TensorRT submodules.
46+
In the conversion to TensorRT, the graph already has the dynamic shape information in the node's metadata which will be used during engine building phase.
4847

49-
Custom Constraints
50-
------------------
48+
Custom Dynamic Shape Constraints
49+
---------------------------------
5150

5251
Given an input ``x = torch_tensorrt.Input(min_shape, opt_shape, max_shape, dtype)``,
53-
Torch-TensorRT automatically sets the constraints during ``torch.export`` tracing as follows
54-
55-
.. code-block:: python
56-
57-
for dim in constraint_dims:
58-
if min_shape[dim] > 1:
59-
constraints.append(min_shape[dim] <= dynamic_dim(trace_input, dim))
60-
if max_shape[dim] > 1:
61-
constraints.append(dynamic_dim(trace_input, dim) <= max_shape[dim])
62-
63-
Sometimes, we might need to set additional constraints and Torchdynamo errors out if we don't specify them.
64-
For example, in the case of BERT model compilation, there are two inputs and a constraint has to be set involving the sequence length size of these two inputs.
65-
66-
.. code-block:: python
67-
68-
constraints.append(dynamic_dim(trace_inputs[0], 0) == dynamic_dim(trace_inputs[1], 0))
69-
70-
71-
If you have to provide any custom constraints to your model, the overall workflow for model compilation using ``ir=dynamo`` would involve a few steps.
72-
73-
.. code-block:: python
74-
75-
import torch
76-
import torch_tensorrt
77-
from torch_tensorrt.dynamo.lowering import apply_lowering_passes, get_decompositions
78-
# Assume the model has two inputs
79-
model = MyModel()
80-
torch_input_1 = torch.randn((1, 14), dtype=torch.int32).cuda()
81-
torch_input_2 = torch.randn((1, 14), dtype=torch.int32).cuda()
82-
83-
dynamic_inputs = [torch_tensorrt.Input(min_shape=[1, 14],
84-
opt_shape=[4, 14],
85-
max_shape=[8, 14],
86-
dtype=torch.int32),
87-
torch_tensorrt.Input(min_shape=[1, 14],
88-
opt_shape=[4, 14],
89-
max_shape=[8, 14],
90-
dtype=torch.int32)]
91-
92-
# Export the model with additional constraints
93-
constraints = []
94-
# The following constraints are automatically added by Torch-TensorRT in the
95-
# general case when you call torch_tensorrt.compile directly on MyModel()
96-
constraints.append(dynamic_dim(torch_input_1, 0) < 8)
97-
constraints.append(dynamic_dim(torch_input_2, 0) < 8)
98-
# This is an additional constraint as instructed by Torchdynamo
99-
constraints.append(dynamic_dim(torch_input_1, 0) == dynamic_dim(torch_input_2, 0))
100-
with unittest.mock.patch(
101-
"torch._export.DECOMP_TABLE", get_decompositions(experimental_decompositions)
102-
):
103-
graph_module = export(
104-
model, (torch_input_1, torch_input_2), constraints=constraints
105-
).module()
106-
107-
# Use the dynamo.compile API
108-
trt_mod = torch_tensorrt.dynamo.compile(graph_module, inputs=dynamic_inputs, **compile_spec)
109-
110-
Limitations
111-
-----------
112-
113-
If there are operations in the graph that use the dynamic dimension of the input, Pytorch
114-
introduces ``torch.ops.aten.sym_size.int`` ops in the graph. Currently, we cannot handle these operators and
115-
the compilation results in undefined behavior. We plan to add support for these operators and implement
116-
robust support for shape tensors in the next release. Here is an example of the limitation described above
52+
Torch-TensorRT attempts to automatically set the constraints during ``torch.export`` tracing by constructing
53+
`torch.export.Dim` objects with the provided dynamic dimensions accordingly. Sometimes, we might need to set additional constraints and Torchdynamo errors out if we don't specify them.
54+
If you have to set any custom constraints to your model (by using `torch.export.Dim`), we recommend exporting your program first before compiling with Torch-TensorRT.
55+
Please refer to this `documentation <https://pytorch.org/tutorials/intermediate/torch_export_tutorial.html#constraints-dynamic-shapes>`_ to export the Pytorch module with dynamic shapes.
56+
Here's a simple example that exports a matmul layer with some restrictions on dynamic dimensions.
11757

11858
.. code-block:: python
11959
12060
import torch
12161
import torch_tensorrt
12262
123-
class MyModule(torch.nn.Module):
63+
class MatMul(torch.nn.Module):
12464
def __init__(self):
12565
super().__init__()
126-
self.avgpool = torch.nn.AdaptiveAvgPool2d((1, 1))
127-
128-
def forward(self, x):
129-
x = self.avgpool(x)
130-
out = torch.flatten(x, 1)
131-
return out
13266
133-
model = MyModel().eval().cuda()
134-
# Compile with dynamic shapes
135-
inputs = torch_tensorrt.Input(min_shape=(1, 512, 1, 1),
136-
opt_shape=(4, 512, 1, 1),
137-
max_shape=(8, 512, 1, 1),
138-
dtype=torch.float32)
139-
trt_gm = torch_tensorrt.compile(model, ir="dynamo", inputs)
140-
141-
142-
The traced graph of `MyModule()` looks as follows
143-
144-
.. code-block:: python
145-
146-
Post export graph: graph():
147-
%arg0_1 : [num_users=2] = placeholder[target=arg0_1]
148-
%mean : [num_users=1] = call_function[target=torch.ops.aten.mean.dim](args = (%arg0_1, [-1, -2], True), kwargs = {})
149-
%sym_size : [num_users=1] = call_function[target=torch.ops.aten.sym_size.int](args = (%arg0_1, 0), kwargs = {})
150-
%view : [num_users=1] = call_function[target=torch.ops.aten.view.default](args = (%mean, [%sym_size, 512]), kwargs = {})
151-
return (view,)
152-
153-
154-
Here the ``%sym_size`` node captures the dynamic batch and uses it in the ``aten.view`` layer. This requires shape tensors support
155-
which would be a part of our next release.
156-
157-
Workaround (BERT static compilation example)
158-
------------------------------------------
159-
160-
In the case where you encounter the issues mentioned in the **Limitations** section,
161-
you can compile the model (static mode) with max input size that can be provided. In the cases of smaller inputs,
162-
we can pad them accordingly. This is only a workaround until we address the limitations.
163-
164-
.. code-block:: python
165-
166-
import torch
167-
import torch_tensorrt
168-
from transformers.utils.fx import symbolic_trace as transformers_trace
169-
170-
model = BertModel.from_pretrained("bert-base-uncased").cuda().eval()
171-
172-
# Input sequence length is 20.
173-
input1 = torch.randint(0, 5, (1, 20), dtype=torch.int32).to("cuda")
174-
input2 = torch.randint(0, 5, (1, 20), dtype=torch.int32).to("cuda")
175-
176-
model = transformers_trace(model, input_names=["input_ids", "attention_mask"]).eval().cuda()
177-
trt_mod = torch_tensorrt.compile(model, inputs=[input1, input2], **compile_spec)
178-
model_outputs = model(input, input2)
179-
180-
# If you have a sequence of length 14, pad 6 zero tokens and run inference
181-
# or recompile for sequence length of 14.
182-
input1 = torch.randint(0, 5, (1, 14), dtype=torch.int32).to("cuda")
183-
input2 = torch.randint(0, 5, (1, 14), dtype=torch.int32).to("cuda")
184-
trt_mod = torch_tensorrt.compile(model, inputs=[input1, input2], **compile_spec)
185-
model_outputs = model(input, input2)
67+
def forward(self, query, key):
68+
attn_weight = torch.matmul(query, key.transpose(-1, -2))
69+
return attn_weight
70+
71+
model = MatMul().eval().cuda()
72+
inputs = [torch.randn(1, 12, 7, 64).cuda(), torch.randn(1, 12, 7, 64).cuda()]
73+
seq_len = torch.export.Dim("seq_len", min=1, max=10)
74+
dynamic_shapes=({2: seq_len}, {2: seq_len})
75+
# Export the model first with custom dynamic shape constraints
76+
exp_program = torch.export.export(model, tuple(inputs), dynamic_shapes=dynamic_shapes)
77+
trt_gm = torch_tensorrt.dynamo.compile(exp_program, [inputs])
78+
# Run inference
79+
trt_gm(inputs)
18680
18781
188-
Dynamic shapes with ir=torch_compile
82+
Dynamic shapes using torch.compile (JIT)
18983
------------------------------------
19084

19185
``torch_tensorrt.compile(model, inputs, ir="torch_compile")`` returns a torch.compile boxed function with the backend
192-
configured to Tensorrt. In the case of ``ir=torch_compile``, when the input size changes, Dynamo will trigger a recompilation
193-
of the TensorRT engine automatically giving dynamic shape behavior similar to native PyTorch eager however with the cost of rebuilding
194-
TRT engine. This limitation will be addressed in future versions of Torch-TensorRT.
86+
configured to TensorRT. In the case of ``ir=torch_compile``, users can provide dynamic shape information for the inputs using ``torch._dynamo.mark_dynamic`` API (https://pytorch.org/docs/stable/torch.compiler_dynamic_shapes.html)
87+
to avoid recompilation of TensorRT engines.
19588

19689
.. code-block:: python
19790
@@ -200,10 +93,12 @@ TRT engine. This limitation will be addressed in future versions of Torch-Tensor
20093
20194
model = MyModel().eval().cuda()
20295
inputs = torch.randn((1, 3, 224, 224), dtype=float32)
203-
trt_gm = torch_tensorrt.compile(model, ir="torch_compile", inputs)
96+
# This indicates the dimension 0 is dynamic and the range is [1, 8]
97+
torch._dynamo.mark_dynamic(inputs, 0, min=1, max=8)
98+
trt_gm = torch.compile(model, backend="tensorrt")
20499
# Compilation happens when you call the model
205100
trt_gm(inputs)
206101
207-
# Recompilation happens with modified batch size
102+
# No recompilation of TRT engines with modified batch size
208103
inputs_bs2 = torch.randn((2, 3, 224, 224), dtype=torch.float32)
209-
trt_gm = torch_tensorrt.compile(model, ir="torch_compile", inputs_bs2)
104+
trt_gm(inputs_bs2)

examples/dynamo/torch_compile_resnet_example.py

Lines changed: 41 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -75,19 +75,45 @@
7575
new_batch_size_outputs = optimized_model(*new_batch_size_inputs)
7676

7777
# %%
78-
# Cleanup
79-
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
80-
81-
# Finally, we use Torch utilities to clean up the workspace
82-
torch._dynamo.reset()
78+
# Avoid recompilation by specifying dynamic shapes before Torch-TRT compilation
79+
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
8380

84-
# %%
85-
# Cuda Driver Error Note
86-
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
87-
#
88-
# Occasionally, upon exiting the Python runtime after Dynamo compilation with `torch_tensorrt`,
89-
# one may encounter a Cuda Driver Error. This issue is related to https://github.com/NVIDIA/TensorRT/issues/2052
90-
# and can be resolved by wrapping the compilation/inference in a function and using a scoped call, as in::
91-
#
92-
# if __name__ == '__main__':
93-
# compile_engine_and_infer()
81+
# The following code illustrates the workflow using ir=torch_compile (which uses torch.compile under the hood)
82+
inputs_bs8 = torch.randn((8, 3, 224, 224)).half().to("cuda")
83+
# This indicates dimension 0 of inputs_bs8 is dynamic whose range of values is [2, 16]
84+
torch._dynamo.mark_dynamic(inputs_bs8, 0, min=2, max=16)
85+
optimized_model = torch_tensorrt.compile(
86+
model,
87+
ir="torch_compile",
88+
inputs=inputs_bs8,
89+
enabled_precisions=enabled_precisions,
90+
debug=debug,
91+
workspace_size=workspace_size,
92+
min_block_size=min_block_size,
93+
torch_executed_ops=torch_executed_ops,
94+
)
95+
outputs_bs8 = optimized_model(inputs_bs8)
96+
97+
# No recompilation happens for batch size = 12
98+
inputs_bs12 = torch.randn((12, 3, 224, 224)).half().to("cuda")
99+
outputs_bs12 = optimized_model(inputs_bs12)
100+
101+
# The following code illustrates the workflow using ir=dynamo (which uses torch.export APIs under the hood)
102+
# dynamic shapes for any inputs are specified using torch_tensorrt.Input API
103+
compile_spec = {
104+
"inputs": [
105+
torch_tensorrt.Input(
106+
min_shape=(1, 3, 224, 224),
107+
opt_shape=(8, 3, 224, 224),
108+
max_shape=(16, 3, 224, 224),
109+
dtype=torch.half,
110+
)
111+
],
112+
"enabled_precisions": enabled_precisions,
113+
"ir": "dynamo",
114+
}
115+
trt_model = torch_tensorrt.compile(model, **compile_spec)
116+
117+
# No recompilation happens for batch size = 12
118+
inputs_bs12 = torch.randn((12, 3, 224, 224)).half().to("cuda")
119+
outputs_bs12 = trt_model(inputs_bs12)

examples/dynamo/vgg16_fp8_ptq.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
"""
22
.. _vgg16_fp8_ptq:
33
4-
Torch Compile VGG16 with FP8 and PTQ
4+
Deploy Quantized Models using Torch-TensorRT
55
======================================================
66
7-
This script is intended as a sample of the Torch-TensorRT workflow with `torch.compile` on a VGG16 model with FP8 and PTQ.
7+
Here we demonstrate how to deploy a model quantized to FP8 using the Dynamo frontend of Torch-TensorRT
88
"""
99

1010
# %%
@@ -100,7 +100,7 @@ def vgg16(num_classes=1000, init_weights=False):
100100

101101

102102
PARSER = argparse.ArgumentParser(
103-
description="Load pre-trained VGG model and then tune with FP8 and PTQ"
103+
description="Load pre-trained VGG model and then tune with FP8 and PTQ. For having a pre-trained VGG model, please refer to https://github.com/pytorch/TensorRT/tree/main/examples/int8/training/vgg16"
104104
)
105105
PARSER.add_argument(
106106
"--ckpt", type=str, required=True, help="Path to the pre-trained checkpoint"
@@ -226,6 +226,8 @@ def calibrate_loop(model):
226226
min_block_size=1,
227227
debug=False,
228228
)
229+
# You can also use torch compile path to compile the model with Torch-TensorRT:
230+
# trt_model = torch.compile(model, backend="tensorrt")
229231

230232
# Inference compiled Torch-TensorRT model over the testing dataset
231233
total = 0

0 commit comments

Comments
 (0)