Skip to content

Commit 66c1cab

Browse files
committed
Merge branch 'master' of https://github.com/pytorch/TensorRT into trt_8.4ga
2 parents 369fcd9 + 1625cd3 commit 66c1cab

File tree

129 files changed

+6867
-1480
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

129 files changed

+6867
-1480
lines changed

.circleci/config.yml

Lines changed: 639 additions & 53 deletions
Large diffs are not rendered by default.

.github/code-owners.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@
110110
- "peri044"
111111
- "bowang007"
112112

113-
"component: docker":
113+
"channel: docker":
114114
- "andi4191"
115115
- "narendasan"
116116

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,3 +62,6 @@ bazel-Torch-TensorRT-Preview
6262
docsrc/src/
6363
bazel-TensorRT
6464
bazel-tensorrt
65+
.pytest_cache
66+
*.cache
67+
*cifar-10-batches-py*

README.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,14 @@
22

33
[![Documentation](https://img.shields.io/badge/docs-master-brightgreen)](https://nvidia.github.io/Torch-TensorRT/)
44

5-
> Ahead of Time (AOT) compiling for PyTorch JIT
5+
> Ahead of Time (AOT) compiling for PyTorch JIT and FX
66
7-
Torch-TensorRT is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.
7+
Torch-TensorRT is a compiler for PyTorch/TorchScript/FX, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript or FX program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.
88

99
Resources:
1010
- [Documentation](https://nvidia.github.io/Torch-TensorRT/)
11-
- [Torch-TensorRT Explained in 2 minutes!](https://www.youtube.com/watch?v=TU5BMU6iYZ0&ab_channel=NVIDIADeveloper)
11+
- [FX path Documentation](https://github.com/pytorch/TensorRT/blob/master/docsrc/tutorials/getting_started_with_fx_path.rst)
12+
- [Torch-TensorRT Explained in 2 minutes!](https://www.youtube.com/watch?v=TU5BMU6iYZ0&ab_channel=NVIDIADeveloper)
1213
- [Comprehensive Discusion (GTC Event)](https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31107/)
1314
- [Pre-built Docker Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). To use this container, make an NGC account and sign in to NVIDIA's registry with an API key. Refer to [this guide](https://docs.nvidia.com/ngc/ngc-catalog-user-guide/index.html#registering-activating-ngc-account) for the same.
1415

@@ -213,7 +214,7 @@ bazel build //:libtorchtrt --compilation_mode opt
213214
```
214215

215216
### FX path (Python only) installation
216-
If the user plan to try FX path (Python only) and would like to avoid bazel build. Please follow the steps below.
217+
If the user plans to try FX path (Python only) and would like to avoid bazel build. Please follow the steps below.
217218
``` shell
218219
cd py && python3 setup.py install --fx-only
219220
```

core/conversion/converters/converter_util.cpp

Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -200,6 +200,131 @@ nvinfer1::ITensor* tensor_to_const(ConversionCtx* ctx, at::Tensor t, const std::
200200
return out;
201201
}
202202

203+
// clamp x to [lower_bound, upper_bound]
204+
nvinfer1::ITensor* clamp(
205+
ConversionCtx* ctx,
206+
nvinfer1::ITensor* x,
207+
nvinfer1::ITensor* lower_bound,
208+
nvinfer1::ITensor* upper_bound,
209+
std::string const& name) {
210+
211+
auto max_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kMAX, x, lower_bound, "max layer for " + name);
212+
TORCHTRT_CHECK(max_layer, "Unable to create max layer for clamp");
213+
LOG_DEBUG(ctx->logger, "Create " << max_layer->getName() << " for clamp");
214+
auto max_itensor = max_layer->getOutput(0);
215+
216+
auto min_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kMIN, max_itensor, upper_bound, "min layer for " + name);
217+
TORCHTRT_CHECK(min_layer, "Unable to create min layer for clamp");
218+
LOG_DEBUG(ctx->logger, "Create " << min_layer->getName() << " for clamp");
219+
auto min_itensor = min_layer->getOutput(0);
220+
return min_itensor;
221+
}
222+
223+
// clamp x to [0, input_dim]
224+
nvinfer1::ITensor* clamp_to_input_dim(
225+
ConversionCtx* ctx,
226+
nvinfer1::ITensor* x,
227+
nvinfer1::ITensor* input_dim,
228+
int nbdims,
229+
std::string const& name) {
230+
231+
auto zero = torch::zeros({nbdims}).to(torch::kI32);
232+
auto zero_itensor = tensor_to_const(ctx, zero);
233+
auto one = torch::ones({nbdims}).to(torch::kI32);
234+
auto one_itensor = tensor_to_const(ctx, one);
235+
236+
auto upper_bound_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kSUB, input_dim, one_itensor, "sub layer for " + name);
237+
TORCHTRT_CHECK(upper_bound_layer, "Unable to create sub layer for clamp to inputDim");
238+
LOG_DEBUG(ctx->logger, "Create " << upper_bound_layer->getName() << " for clamp to inputDim");
239+
auto upper_bound = upper_bound_layer->getOutput(0);
240+
241+
auto max_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kMAX, x, zero_itensor, "max layer for " + name);
242+
TORCHTRT_CHECK(max_layer, "Unable to create max_layer for clamp to inputDim");
243+
LOG_DEBUG(ctx->logger, "Create " << max_layer->getName() << " for clamp to inputDim");
244+
auto max_itensor = max_layer->getOutput(0);
245+
246+
auto min_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kMIN, max_itensor, upper_bound, "min layer for " + name);
247+
TORCHTRT_CHECK(min_layer, "Unable to create min_layer for clamp to inputDim");
248+
LOG_DEBUG(ctx->logger, "Create " << min_layer->getName() << " for clamp to inputDim");
249+
auto min_itensor = min_layer->getOutput(0);
250+
return min_itensor;
251+
}
252+
253+
// return indices < 0 ? inputDims + indices : indices
254+
nvinfer1::ITensor* normalize_indices(
255+
ConversionCtx* ctx,
256+
nvinfer1::ITensor* input_dim,
257+
nvinfer1::ITensor* indices,
258+
int nbdims,
259+
std::string const& name) {
260+
261+
auto zero = torch::zeros({nbdims}).to(torch::kI32);
262+
auto neg = -torch::ones({nbdims}).to(torch::kI32);
263+
auto zero_itensor = tensor_to_const(ctx, zero);
264+
auto neg_itensor = tensor_to_const(ctx, neg);
265+
// find the indices that = -1
266+
auto signs = clamp(ctx, indices, neg_itensor, zero_itensor, "clamp layer for " + name);
267+
268+
// get the inputDim value where indices == -1, else 0
269+
auto mul = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kPROD, signs, input_dim, "prod layer for " + name);
270+
TORCHTRT_CHECK(mul, "Unable to create mul layer in normalize_indices");
271+
LOG_DEBUG(ctx->logger, "Create " << mul->getName() << " for normalize_indices");
272+
auto mul_itensor = mul->getOutput(0);
273+
274+
// add the inputDim value to indices where indices == -1
275+
auto sub = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kSUB, indices, mul_itensor, "sub layer for " + name);
276+
TORCHTRT_CHECK(sub, "Unable to create sub layer in normalize_indices");
277+
LOG_DEBUG(ctx->logger, "Create " << sub->getName() << " for normalize_indices");
278+
auto sub_itensor = sub->getOutput(0);
279+
return sub_itensor;
280+
}
281+
282+
std::vector<nvinfer1::ITensor*> normalize_start_and_end(
283+
ConversionCtx* ctx,
284+
nvinfer1::ITensor* in_shape,
285+
nvinfer1::ITensor* in_start,
286+
nvinfer1::ITensor* in_end,
287+
int nbdims,
288+
std::string const& name) {
289+
auto start = normalize_indices(ctx, in_shape, in_start, nbdims, "normalize start of " + name);
290+
auto out_start = clamp_to_input_dim(ctx, start, in_shape, nbdims, "clamp start to inputDim for " + name);
291+
auto end = normalize_indices(ctx, in_shape, in_end, nbdims, "normalize end of " + name);
292+
auto out_end = clamp_to_input_dim(ctx, end, in_shape, nbdims, "clamp end to inputDim for " + name);
293+
std::vector<nvinfer1::ITensor*> outputs;
294+
outputs.push_back(out_start);
295+
outputs.push_back(out_end);
296+
return outputs;
297+
}
298+
299+
// size = (end - start) / stride + 1, where range is [start, end], end is included
300+
nvinfer1::ITensor* get_slice_size(
301+
ConversionCtx* ctx,
302+
nvinfer1::ITensor* start,
303+
nvinfer1::ITensor* end,
304+
nvinfer1::ITensor* stride,
305+
int nbdims,
306+
std::string const& name) {
307+
at::Tensor one_tensor = torch::ones({nbdims}).to(torch::kI32);
308+
auto one_itensor = tensor_to_const(ctx, one_tensor);
309+
310+
auto sub_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kSUB, end, start, "get_slice_size sub layer for " + name);
311+
TORCHTRT_CHECK(sub_layer, "Unable to create sub layer in calculate_output_size");
312+
LOG_DEBUG(ctx->logger, "Create " << sub_layer->getName() << " for calculate_output_size");
313+
auto sub_itensor = sub_layer->getOutput(0);
314+
315+
auto div_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kDIV, sub_itensor, stride, "get_slice_size div layer for " + name);
316+
TORCHTRT_CHECK(div_layer, "Unable to create div layer in calculate_output_size");
317+
LOG_DEBUG(ctx->logger, "Create " << div_layer->getName() << " for calculate_output_size");
318+
auto div_itensor = div_layer->getOutput(0);
319+
320+
auto add_layer = add_elementwise(ctx, nvinfer1::ElementWiseOperation::kSUM, div_itensor, one_itensor, "get_slice_size sum layer for " + name);
321+
TORCHTRT_CHECK(add_layer, "Unable to create add layer in calculate_output_size");
322+
LOG_DEBUG(ctx->logger, "Create " << add_layer->getName() << " for calculate_output_size");
323+
auto size_itensor = add_layer->getOutput(0);
324+
325+
return size_itensor;
326+
}
327+
203328
} // namespace converters
204329
} // namespace conversion
205330
} // namespace core

core/conversion/converters/converter_util.h

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
#include <map>
44
#include <string>
5+
#include <limits>
56

67
#include "core/conversion/conversionctx/ConversionCtx.h"
78
#include "core/conversion/converters/Weights.h"
@@ -50,6 +51,35 @@ nvinfer1::ITensor* castITensor(ConversionCtx* ctx, nvinfer1::ITensor* tensor, nv
5051
// Freeze an at::Tensor in a IConstant layer
5152
nvinfer1::ITensor* tensor_to_const(ConversionCtx* ctx, at::Tensor t, const std::string& name = std::string());
5253

54+
nvinfer1::ITensor* clamp(
55+
ConversionCtx* ctx,
56+
nvinfer1::ITensor* x,
57+
nvinfer1::ITensor* lower_bound,
58+
nvinfer1::ITensor* upper_bound,
59+
std::string const& name);
60+
61+
nvinfer1::ITensor* normalize_indices(
62+
ConversionCtx* ctx,
63+
nvinfer1::ITensor* input_dim,
64+
nvinfer1::ITensor* indices,
65+
std::string const& name);
66+
67+
std::vector<nvinfer1::ITensor*> normalize_start_and_end(
68+
ConversionCtx* ctx,
69+
nvinfer1::ITensor* in_shape,
70+
nvinfer1::ITensor* in_start,
71+
nvinfer1::ITensor* in_end,
72+
int nbdims,
73+
std::string const& name);
74+
75+
nvinfer1::ITensor* get_slice_size(
76+
ConversionCtx* ctx,
77+
nvinfer1::ITensor* start,
78+
nvinfer1::ITensor* end,
79+
nvinfer1::ITensor* stride,
80+
int nbdims,
81+
std::string const& name);
82+
5383
} // namespace converters
5484
} // namespace conversion
5585
} // namespace core

0 commit comments

Comments
 (0)