You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a previous `tutorial <https://pytorch.org/tutorials/intermediate/torch_export_tutorial.html>`__ , we learnt how to use `torch.export <https://pytorch.org/docs/stable/export.html>`__.
6
-
This tutorial builds on the previous tutorial and explores the process of exporting popular models with code & addresses common challenges one might face with `torch.export`.
6
+
This tutorial expands on the previous one and explores the process of exporting popular models with code, as well as addresses common challenges that may arise with ``torch.export``.
7
7
8
-
You will learn how to export models for these usecases
8
+
In this tutorial, you will learn how to export models for these use cases:
9
9
10
10
* Video classifier (MViT)
11
11
* Pose Estimation (Yolov11 Pose)
12
12
* Image Captioning (BLIP)
13
13
* Promptable Image Segmentation (SAM2)
14
14
15
-
Each of the four models were chosen to demonstrate unique features of `torch.export`, some practical considerations
16
-
& issues faced in the implementation.
15
+
Each of the four models were chosen to demonstrate unique features of `torch.export`, as well as some practical considerations
16
+
and issues faced in the implementation.
17
17
18
18
Prerequisites
19
19
-------------
@@ -22,20 +22,20 @@ Prerequisites
22
22
* Basic understanding of ``torch.export`` and PyTorch Eager inference.
23
23
24
24
25
-
Key requirement for `torch.export`: No graph break
26
-
------------------------------------------------
25
+
Key requirement for ``torch.export``: No graph break
`torch.compile <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`__ speeds up PyTorch code by JIT compiling PyTorch code into optimized kernels. It optimizes the given model
29
-
using TorchDynamo and creates an optimized graph , which is then lowered into the hardware using the backend specified in the API.
29
+
using ``TorchDynamo`` and creates an optimized graph , which is then lowered into the hardware using the backend specified in the API.
30
30
When TorchDynamo encounters unsupported Python features, it breaks the computation graph, lets the default Python interpreter
31
-
handle the unsupported code, then resumes capturing the graph. This break in the computation graph is called a `graph break <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html#torchdynamo-and-fx-graphs>`__.
31
+
handle the unsupported code, and then resumes capturing the graph. This break in the computation graph is called a `graph break <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html#torchdynamo-and-fx-graphs>`__.
32
32
33
-
One of the key differences between `torch.export` and `torch.compile` is that `torch.export` doesn’t support graph breaks
34
-
i.e the entire model or part of the model that you are exporting needs to be a single graph. This is because handling graph breaks
35
-
involves interpreting the unsupported operation with default Python evaluation, which is incompatible with what torch.export is
33
+
One of the key differences between ``torch.export`` and ``torch.compile`` is that ``torch.export`` doesn’t support graph breaks
34
+
which means that the entire model or part of the model that you are exporting needs to be a single graph. This is because handling graph breaks
35
+
involves interpreting the unsupported operation with default Python evaluation, which is incompatible with what ``torch.export`` is
36
36
designed for.
37
37
38
-
You can identify graph breaks in your program by using the following
38
+
You can identify graph breaks in your program by using the following command:
39
39
40
40
.. code:: console
41
41
@@ -50,11 +50,11 @@ The models in this recipe have no graph break, but fail with `torch.export`
50
50
Video Classification
51
51
--------------------
52
52
53
-
MViT is a class of models based on `MultiScale Vision Transformers <https://arxiv.org/abs/2104.11227>`__. This has been trained for video classification using the `Kinetics-400 Dataset <https://arxiv.org/abs/1705.06950>`__.
53
+
MViT is a class of models based on `MultiScale Vision Transformers <https://arxiv.org/abs/2104.11227>`__. This model has been trained for video classification using the `Kinetics-400 Dataset <https://arxiv.org/abs/1705.06950>`__.
54
54
This model with a relevant dataset can be used for action recognition in the context of gaming.
55
55
56
56
57
-
The code below exports MViT by tracing with `batch_size=2` and then checks if the ExportedProgram can run with `batch_size=4`
57
+
The code below exports MViT by tracing with ``batch_size=2`` and then checks if the ExportedProgram can run with ``batch_size=4``.
58
58
59
59
.. code:: python
60
60
@@ -95,15 +95,15 @@ Error: Static batch size
95
95
96
96
97
97
By default, the exporting flow will trace the program assuming that all input shapes are static, so if you run the program with
98
-
inputs shapes that are different than the ones you used while tracing, you will run into an error.
98
+
input shapes that are different than the ones you used while tracing, you will run into an error.
99
99
100
100
Solution
101
101
~~~~~~~~
102
102
103
-
To address the error, we specify the first dimension of the input (`batch_size`) to be dynamic , specifying the expected range of `batch_size`.
104
-
In the corrected example shown below, we specify that the expected `batch_size` can range from 1 to 16.
105
-
One detail to notice that `min=2` is not a bug and is explained in `The 0/1 Specialization Problem <https://docs.google.com/document/d/16VPOa3d-Liikf48teAOmxLc92rgvJdfosIy-yoT38Io/edit?fbclid=IwAR3HNwmmexcitV0pbZm_x1a4ykdXZ9th_eJWK-3hBtVgKnrkmemz6Pm5jRQ#heading=h.ez923tomjvyk>`__. A detailed description of dynamic shapes
106
-
for torch.export can be found in the export tutorial. The code shown below demonstrates how to export mViT with dynamic batch sizes.
103
+
To address the error, we specify the first dimension of the input (``batch_size``) to be dynamic , specifying the expected range of ``batch_size``.
104
+
In the corrected example shown below, we specify that the expected ``batch_size`` can range from 1 to 16.
105
+
One detail to notice that ``min=2`` is not a bug and is explained in `The 0/1 Specialization Problem <https://docs.google.com/document/d/16VPOa3d-Liikf48teAOmxLc92rgvJdfosIy-yoT38Io/edit?fbclid=IwAR3HNwmmexcitV0pbZm_x1a4ykdXZ9th_eJWK-3hBtVgKnrkmemz6Pm5jRQ#heading=h.ez923tomjvyk>`__. A detailed description of dynamic shapes
106
+
for ``torch.export`` can be found in the export tutorial. The code shown below demonstrates how to export mViT with dynamic batch sizes:
107
107
108
108
.. code:: python
109
109
@@ -145,7 +145,7 @@ for torch.export can be found in the export tutorial. The code shown below demon
145
145
Pose Estimation
146
146
---------------
147
147
148
-
Pose Estimation is a popular Computer Vision concept that can be used to identify the location of joints of a human in a 2D image.
148
+
**Pose Estimation** is a Computer Vision concept that can be used to identify the location of joints of a human in a 2D image.
149
149
`Ultralytics <https://docs.ultralytics.com/tasks/pose/>`__ has published a Pose Estimation model based on `YOLO11 <https://docs.ultralytics.com/models/yolo11/>`__. This has been trained on the `COCO Dataset <https://cocodataset.org/#keypoints-2017>`__. This model can be used
150
150
for analyzing human pose for determining action or intent. The code below tries to export the YOLO11 Pose model with `batch_size=1`
151
151
@@ -171,16 +171,16 @@ Error: strict tracing with TorchDynamo
171
171
torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {zuf0} not in returned outputs FakeTensor(..., size=(6400, 1)) ((1, 1), 0).
172
172
173
173
174
-
By default `torch.export` traces your code using `TorchDynamo <https://pytorch.org/docs/stable/torch.compiler_dynamo_overview.html>`__, a byte-code analysis engine, which symbolically analyzes your code and builds a graph.
175
-
This analysis provides a stronger guarantee about safety but not all python code is supported. When we export the `yolo11n-pose` model using the
174
+
By default ``torch.export`` traces your code using `TorchDynamo <https://pytorch.org/docs/stable/torch.compiler_dynamo_overview.html>`__, a byte-code analysis engine, which symbolically analyzes your code and builds a graph.
175
+
This analysis provides a stronger guarantee about safety but not all Python code is supported. When we export the ``yolo11n-pose`` model using the
176
176
default strict mode, it errors.
177
177
178
178
Solution
179
179
~~~~~~~~
180
180
181
-
To address the above error `torch.export` supports non_strict mode where the program is traced using the python interpreter, which works similar to
182
-
PyTorch eager execution, the only difference is that all Tensor objects will be replaced by ProxyTensors, which will record all their operations into
183
-
a graph. By using `strict=False`, we are able to export the program.
181
+
To address the above error ,``torch.export`` supports the``non_strict`` mode where the program is traced using the Python interpreter, which works similar to
182
+
PyTorch eager execution. The only difference is that all ``Tensor`` objects will be replaced by ``ProxyTensors``, which will record all their operations into
183
+
a graph. By using ``strict=False``, we are able to export the program.
184
184
185
185
.. code:: python
186
186
@@ -199,9 +199,9 @@ a graph. By using `strict=False`, we are able to export the program.
199
199
Image Captioning
200
200
----------------
201
201
202
-
Image Captioning is the task of defining the contents of an image in words. In the context of gaming, Image Captioning can be used to enhance the
202
+
**Image Captioning** is the task of defining the contents of an image in words. In the context of gaming, Image Captioning can be used to enhance the
203
203
gameplay experience by dynamically generating text description of the various game objects in the scene, thereby providing the gamer with additional
204
-
details. `BLIP <https://arxiv.org/pdf/2201.12086>`__ is a popular model for Image Captioning `released by SalesForce Research <https://github.com/salesforce/BLIP>`__. The code below tries to export BLIP with `batch_size=1`
204
+
details. `BLIP <https://arxiv.org/pdf/2201.12086>`__ is a popular model for Image Captioning `released by SalesForce Research <https://github.com/salesforce/BLIP>`__. The code below tries to export BLIP with ``batch_size=1``
205
205
206
206
207
207
.. code:: python
@@ -223,12 +223,12 @@ details. `BLIP <https://arxiv.org/pdf/2201.12086>`__ is a popular model for Imag
223
223
224
224
225
225
226
-
Error: Unsupported python operations
226
+
Error: Unsupported Python Operations
227
227
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
228
228
229
-
While exporting a model, it might fail because the model implementation might contain certain python operations which are not yet supported by `torch.export`.
230
-
Some of these failures may have a workaround. BLIP is an example where the original model errors and making a small change in the code resolves the issue.
231
-
`torch.export` lists the common cases of supported and unsupported operations in `ExportDB <https://pytorch.org/docs/main/generated/exportdb/index.html>`__ and shows how you can modify your code to make it export compatible.
229
+
While exporting a model, it might fail because the model implementation might contain certain Python operations which are not yet supported by ``torch.export``.
230
+
Some of these failures may have a workaround. BLIP is an example where the original model errors, which can be resolved by making a small change in the code.
231
+
``torch.export`` lists the common cases of supported and unsupported operations in `ExportDB <https://pytorch.org/docs/main/generated/exportdb/index.html>`__ and shows how you can modify your code to make it export compatible.
232
232
233
233
.. code:: console
234
234
@@ -255,14 +255,14 @@ Clone the `tensor <https://github.com/salesforce/BLIP/blob/main/models/blip.py#L
255
255
Promptable Image Segmentation
256
256
-----------------------------
257
257
258
-
Image segmentation is a computer vision technique that divides a digital image into distinct groups of pixels, or segments, based on their characteristics.
259
-
Segment Anything Model(`SAM <https://ai.meta.com/blog/segment-anything-foundation-model-image-segmentation/>`__) introduced promptable image segmentation, which predicts object masks given prompts that indicate the desired object. `SAM 2 <https://ai.meta.com/sam2/>`__ is
258
+
**Image segmentation** is a computer vision technique that divides a digital image into distinct groups of pixels, or segments, based on their characteristics.
259
+
`Segment Anything Model (SAM)<https://ai.meta.com/blog/segment-anything-foundation-model-image-segmentation/>`__) introduced promptable image segmentation, which predicts object masks given prompts that indicate the desired object. `SAM 2 <https://ai.meta.com/sam2/>`__ is
260
260
the first unified model for segmenting objects across images and videos. The `SAM2ImagePredictor <https://github.com/facebookresearch/sam2/blob/main/sam2/sam2_image_predictor.py#L20>`__ class provides an easy interface to the model for prompting
261
261
the model. The model can take as input both point and box prompts, as well as masks from the previous iteration of prediction. Since SAM2 provides strong
262
262
zero-shot performance for object tracking, it can be used for tracking game objects in a scene. The code below tries to export SAM2ImagePredictor with batch_size=1
263
263
264
264
265
-
The tensor operations in the predict method of `SAM2ImagePredictor <https://github.com/facebookresearch/sam2/blob/main/sam2/sam2_image_predictor.py#L20>`__ are happening in the `_predict <https://github.com/facebookresearch/sam2/blob/main/sam2/sam2_image_predictor.py#L291>`__ method. So, we try to export this.
265
+
The tensor operations in the predict method of `SAM2ImagePredictor <https://github.com/facebookresearch/sam2/blob/main/sam2/sam2_image_predictor.py#L20>`__ are happening in the `_predict <https://github.com/facebookresearch/sam2/blob/main/sam2/sam2_image_predictor.py#L291>`__ method. So, we try to export like this.
266
266
267
267
.. code:: python
268
268
@@ -274,10 +274,10 @@ The tensor operations in the predict method of `SAM2ImagePredictor <https://gith
`torch.export` expects the module to be of type `torch.nn.Module`. However, the module we are trying to export is a class method. Hence it errors.
280
+
``torch.export`` expects the module to be of type ``torch.nn.Module``. However, the module we are trying to export is a class method. Hence it errors.
281
281
282
282
.. code:: console
283
283
@@ -294,7 +294,7 @@ Error: Model is not of type `torch.nn.Module`
294
294
Solution
295
295
~~~~~~~~
296
296
297
-
We write a helper class, which inherits from `torch.nn.Module` and call the `_predict method` in the `forward` method of the class. The complete code can be found `here <https://github.com/anijain2305/sam2/blob/ued/sam2/sam2_image_predictor.py#L293-L311>`__.
297
+
We write a helper class, which inherits from ``torch.nn.Module`` and call the ``_predict method`` in the ``forward`` method of the class. The complete code can be found `here <https://github.com/anijain2305/sam2/blob/ued/sam2/sam2_image_predictor.py#L293-L311>`__.
298
298
299
299
.. code:: python
300
300
@@ -316,4 +316,4 @@ We write a helper class, which inherits from `torch.nn.Module` and call the `_pr
316
316
Conclusion
317
317
----------
318
318
319
-
In this tutorial, we have learned how to use `torch.export` to export models for popular use cases by addressing challenges through correct configuration & simple code modifications.
319
+
In this tutorial, we have learned how to use ``torch.export`` to export models for popular use cases by addressing challenges through correct configuration and simple code modifications.
0 commit comments