Skip to content

Commit edce4b4

Browse files
committed
chore: update docs
Signed-off-by: Dheeraj Peri <[email protected]>
1 parent 3511e54 commit edce4b4

File tree

1 file changed

+40
-4
lines changed

1 file changed

+40
-4
lines changed

docsrc/user_guide/mixed_precision.rst

Lines changed: 40 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,18 @@
11
.. _mixed_precision:
22

33
Compile Mixed Precision models with Torch-TensorRT
4-
====================================
4+
===================================================
55
.. currentmodule:: torch_tensorrt.dynamo
66

77
.. automodule:: torch_tensorrt.dynamo
88
:members:
99
:undoc-members:
1010
:show-inheritance:
1111

12-
Consider the following Pytorch model which explicitly casts intermediate layer to run in FP16.
12+
Explicit Typing
13+
---------------
14+
15+
Consider the following PyTorch model which explicitly casts intermediate layer to run in FP16.
1316

1417
.. code-block:: python
1518
@@ -54,6 +57,7 @@ the compilation setting ``use_explicit_typing=True``. Compiling with this option
5457

5558
.. note:: If you enable ``use_explicit_typing=True``, only torch.float32 is supported in the enabled_precisions.
5659

60+
5761
.. code-block:: python
5862
5963
inputs = [torch.randn((1, 10), dtype=torch.float32).cuda()]
@@ -62,7 +66,7 @@ the compilation setting ``use_explicit_typing=True``. Compiling with this option
6266
with torch_tensorrt.logging.debug():
6367
trt_gm = torch_tensorrt.dynamo.compile(ep,
6468
inputs=inputs,
65-
use_explicit_typing=True
69+
use_explicit_typing=True,
6670
debug=True)
6771
6872
# Debug log info
@@ -71,4 +75,36 @@ the compilation setting ``use_explicit_typing=True``. Compiling with this option
7175
# Name: __myl_ResMulSumAddCas_myl0_1, LayerType: kgen, Inputs: [ { Name: __mye127_dconst, Dimensions: [10,30], Format/Datatype: Half }, { Name: linear2/addmm_1_constant_0 _ linear2/addmm_1_add_broadcast_to_same_shape_lhs_broadcast_constantHalf, Dimensions: [1,30], Format/Datatype: Half }, { Name: __myln_k_arg__bb1_2, Dimensions: [1,10], Format/Datatype: Half }], Outputs: [ { Name: __myln_k_arg__bb1_3, Dimensions: [1,30], Format/Datatype: Float }], TacticName: __myl_ResMulSumAddCas_0x5a3b318b5a1c97b7d5110c0291481337, StreamId: 0, Metadata:
7276
# Name: __myl_ResMulSumAdd_myl0_2, LayerType: kgen, Inputs: [ { Name: __mye142_dconst, Dimensions: [30,40], Format/Datatype: Float }, { Name: linear3/addmm_2_constant_0 _ linear3/addmm_2_add_broadcast_to_same_shape_lhs_broadcast_constantFloat, Dimensions: [1,40], Format/Datatype: Float }, { Name: __myln_k_arg__bb1_3, Dimensions: [1,30], Format/Datatype: Float }], Outputs: [ { Name: output0, Dimensions: [1,40], Format/Datatype: Float }], TacticName: __myl_ResMulSumAdd_0x3fad91127c640fd6db771aa9cde67db0, StreamId: 0, Metadata:
7377
74-
Now the ``linear2`` layer runs in FP16 as shown in the above logs.
78+
Now the ``linear2`` layer runs in FP16 as shown in the above logs.
79+
80+
81+
82+
FP32 Accumulation
83+
-----------------
84+
85+
When ``use_fp32_acc=True`` is set, Torch-TensorRT will attempt to use FP32 accumulation for matmul layers, even if the input and output tensors are in FP16. This is particularly useful for models that are sensitive to numerical errors introduced by lower-precision accumulation.
86+
87+
.. important::
88+
89+
When enabling ``use_fp32_acc=True``, **explicit typing must be enabled** by setting ``use_explicit_typing=True``. Without ``use_explicit_typing=True``, the accumulation type may not be properly respected, and you may not see the intended numerical benefits.
90+
91+
.. code-block:: python
92+
93+
inputs = [torch.randn((1, 10), dtype=torch.float16).cuda()]
94+
mod = MyModule().eval().cuda()
95+
ep = torch.export.export(mod, tuple(inputs))
96+
with torch_tensorrt.logging.debug():
97+
trt_gm = torch_tensorrt.dynamo.compile(
98+
ep,
99+
inputs=inputs,
100+
use_fp32_acc=True,
101+
use_explicit_typing=True, # Explicit typing must be enabled
102+
debug=True
103+
)
104+
105+
# Debug log info
106+
# Layers:
107+
# Name: __myl_MulSumAddCas_myl0_0, LayerType: kgen, Inputs: [ ... ], Outputs: [ ... ], Format/Datatype: Half, Accumulation: Float
108+
# ...
109+
110+
For more information on these settings, see the explicit typing examples above.

0 commit comments

Comments
 (0)