Skip to content

Commit 3efb89d

Browse files
authored
Reduce precision docs (#400)
* reduced precision docs * support matrix * docs
1 parent c9dcd5e commit 3efb89d

File tree

5 files changed

+163
-1
lines changed

5 files changed

+163
-1
lines changed

CHANGELOG.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,7 @@
1-
# Changes
1+
# Changes
2+
3+
## [Master]
4+
5+
### Added
6+
7+
- Added reduced precision documentation page

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
# torch2trt
22

3+
<a href="https://nvidia-ai-iot.github.io/torch2trt"><img src="https://img.shields.io/badge/-Documentation-brightgreen"/></a>
4+
35
torch2trt is a PyTorch to TensorRT converter which utilizes the
46
TensorRT Python API. The converter is
57

docs/images/check.svg

Lines changed: 1 addition & 0 deletions
Loading

docs/usage/reduced_precision.md

Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
# Reduced Precision
2+
3+
For certain platforms, reduced precision can result in substantial improvements in throughput,
4+
often with little impact on model accuracy.
5+
6+
# Support Matrix
7+
8+
Below is a table of layer precision support for various NVIDIA platforms.
9+
10+
| Platform | FP16 | INT8 |
11+
|----------|------|------|
12+
| Jetson Nano | ![X](../images/check.svg) | |
13+
| Jetson TX2 | ![X](../images/check.svg) | ![X](../images/check.svg) |
14+
| Jetson Xavier NX | ![X](../images/check.svg) | ![X](../images/check.svg) |
15+
| Jetson AGX Xavier | ![X](../images/check.svg) | ![X](../images/check.svg) |
16+
17+
!!! note
18+
19+
If the platform you're using is missing from this table or you spot anything incorrect
20+
please [let us know](https://github.com/NVIDIA-AI-IOT/torch2trt).
21+
22+
## FP16 Precision
23+
24+
To enable support for fp16 precision with TensorRT, torch2trt exposes the ``fp16_mode`` parameter.
25+
Converting a model with ``fp16_mode=True`` allows the TensorRT optimizer to select layers with fp16
26+
precision.
27+
28+
29+
```python
30+
model_trt = torch2trt(model, [data], fp16_mode=True)
31+
```
32+
33+
!!! note
34+
35+
When ``fp16_mode=True``, this does not necessarily mean that TensorRT will select FP16 layers.
36+
The optimizer attempts to automatically select tactics which result in the best performance.
37+
38+
## INT8 Precision
39+
40+
torch2trt also supports int8 precision with TensorRT with the ``int8_mode`` parameter. Unlike fp16 and fp32 precision, switching
41+
to in8 precision often requires calibration to avoid a significant drop in accuracy.
42+
43+
### Input Data Calibration
44+
45+
By default
46+
torch2trt will calibrate using the input data provided. For example, if you wanted
47+
to calibrate on a set of 64 random normal images you could do.
48+
49+
```python
50+
data = torch.randn(64, 3, 224, 224).cuda().eval()
51+
52+
model_trt = torch2trt(model, [data], int8_mode=True)
53+
```
54+
55+
### Dataset Calibration
56+
57+
In many instances, you may want to calibrate on more data than fits in memory. For this reason,
58+
torch2trt exposes the ``int8_calibration_dataset`` parameter. This parameter takes an input
59+
dataset that is used for calibration. If this parameter is specified, the input data is
60+
ignored during calibration. You create an input dataset by defining
61+
a class which implements the ``__len__`` and ``__getitem__`` methods.
62+
63+
* The ``__len__`` method should return the number of calibration samples
64+
* The ``__getitem__`` method must return a single calibration sample. This is a list of input tensors to the model. Each tensor should match the shape
65+
you provide to the ``inputs`` parameter when calling ``torch2trt``.
66+
67+
For example, say you trained an image classification network using the PyTorch [``ImageFolder``](https://pytorch.org/docs/stable/torchvision/datasets.html#imagefolder) dataset.
68+
You could wrap this dataset for calibration, by defining a new dataset which returns only the images without labels in list format.
69+
70+
```python
71+
from torchvision.datasets import ImageFolder
72+
from torchvision.transforms import ToTensor, Compose, Normalize
73+
74+
75+
class ImageFolderCalibDataset():
76+
77+
def __init__(self, root):
78+
self.dataset = ImageFolder(
79+
root=root,
80+
transform=Compose([
81+
transforms.Resize((224, 224)),
82+
transforms.ToTensor(),
83+
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
84+
])
85+
)
86+
87+
def __len__(self):
88+
return len(self.dataset)
89+
90+
def __getitem__(self, idx):
91+
image, _ = self.dataset[idx]
92+
image = image[None, ...] # add batch dimension
93+
return [image]
94+
```
95+
96+
You would then provide this calibration dataset to torch2trt as follows
97+
98+
```python
99+
dataset = ImageFolderCalibDataset('images')
100+
101+
model_trt = torch2trt(model, [data], int8_calib_dataset=dataset)
102+
```
103+
104+
### Calibration Algorithm
105+
106+
To override the default calibration algorithm that torch2trt uses, you can set the ``int8_calib_algoirthm``
107+
to the [``tensorrt.CalibrationAlgoType``](https://docs.nvidia.com/deeplearning/tensorrt/api/python_api/infer/Int8/Calibrator.html#iint8calibrator)
108+
that you wish to use. For example, to use the minmax calibration algoirthm you would do
109+
110+
```python
111+
import tensorrt as trt
112+
113+
model_trt = torch2trt(model, [data], int8_mode=True, int8_calib_algorithm=trt.CalibrationAlgoType.MINMAX_CALIBRATION)
114+
```
115+
116+
### Calibration Batch Size
117+
118+
During calibration, torch2trt pulls data in batches for the TensorRT calibrator. In some instances
119+
[developers have found](https://github.com/NVIDIA-AI-IOT/torch2trt/pull/398) that the calibration batch size can impact the calibrated model accuracy. To set the calibration batch size, you can set the ``int8_calib_batch_size``
120+
parameter. For example, to use a calibration batch size of 32 you could do
121+
122+
```python
123+
model_trt = torch2trt(model, [data], int8_mode=True, int8_calib_batch_size=32)
124+
```
125+
126+
## Binding Data Types
127+
128+
The data type of input and output bindings in TensorRT are determined by the original
129+
PyTorch module input and output data types.
130+
This does not directly impact whether the TensorRT optimizer will internally use fp16 or int8 precision.
131+
132+
For example, to create a model with half precision bindings, you would do the following
133+
134+
```python
135+
model = model.float()
136+
data = data.float()
137+
138+
model_trt = torch2trt(model, [data], fp16_mode=True)
139+
```
140+
141+
In this instance, the optimizer may choose to use fp16 precision layers internally, but the
142+
input and output data types are fp32. To use fp16 precision input and output bindings you would do
143+
144+
```python
145+
model = model.half()
146+
data = data.half()
147+
148+
model_trt = torch2trt(model, [data], fp16_mode=True)
149+
```
150+
151+
Now, the input and output bindings of the model are half precision, and internally the optimizer may
152+
choose to select fp16 layers as well.

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ nav:
3535
- Getting Started: getting_started.md
3636
- Usage:
3737
- Basic Usage: usage/basic_usage.md
38+
- Reduced Precision: usage/reduced_precision.md
3839
- Custom Converter: usage/custom_converter.md
3940
- Converters: converters.md
4041
- Benchmarks:

0 commit comments

Comments
 (0)