Skip to content

Commit 6327eef

Browse files
peri044gs-olivezewenli98ArktischeHolyWu
authored
chore: cherry pick commits from main into release/2.3 (#2769)
Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: George S <[email protected]> Co-authored-by: Zewen (Evan) Li <[email protected]> Co-authored-by: MizuKuma <[email protected]> Co-authored-by: HolyWu <[email protected]> Co-authored-by: Hoonkyung Cho <[email protected]> Co-authored-by: Apurba Bose <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Michael Feliz <[email protected]> Co-authored-by: Aakash Apoorv <[email protected]>
1 parent 249280d commit 6327eef

27 files changed

+1695
-155
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@
55

66
> Ahead of Time (AOT) compiling for PyTorch JIT and FX
77
8-
Torch-TensorRT is a compiler for PyTorch/TorchScript/FX, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript or FX program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.
8+
Torch-TensorRT is a compiler for PyTorch/TorchScript/FX, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript or FX program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extension and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.
99

1010
Resources:
1111
- [Documentation](https://nvidia.github.io/Torch-TensorRT/)
1212
- [FX path Documentation](https://github.com/pytorch/TensorRT/blob/master/docsrc/tutorials/getting_started_with_fx_path.rst)
1313
- [Torch-TensorRT Explained in 2 minutes!](https://www.youtube.com/watch?v=TU5BMU6iYZ0&ab_channel=NVIDIADeveloper)
14-
- [Comprehensive Discusion (GTC Event)](https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31107/)
14+
- [Comprehensive Discussion (GTC Event)](https://www.nvidia.com/en-us/on-demand/session/gtcfall21-a31107/)
1515
- [Pre-built Docker Container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). To use this container, make an NGC account and sign in to NVIDIA's registry with an API key. Refer to [this guide](https://docs.nvidia.com/ngc/ngc-catalog-user-guide/index.html#registering-activating-ngc-account) for the same.
1616

1717
## NVIDIA NGC Container
@@ -44,7 +44,7 @@ If you would like to build outside a docker container, please follow the section
4444
#include "torch_tensorrt/torch_tensorrt.h"
4545

4646
...
47-
// Set input datatypes. Allowerd options torch::{kFloat, kHalf, kChar, kInt32, kBool}
47+
// Set input datatypes. Allowed options torch::{kFloat, kHalf, kChar, kInt32, kBool}
4848
// Size of input_dtypes should match number of inputs to the network.
4949
// If input_dtypes is not set, default precision follows traditional PyT / TRT rules
5050
auto input = torch_tensorrt::Input(dims, torch::kHalf);
@@ -305,7 +305,7 @@ Supported Python versions:
305305

306306
### In Torch-TensorRT?
307307

308-
Thanks for wanting to contribute! There are two main ways to handle supporting a new op. Either you can write a converter for the op from scratch and register it in the NodeConverterRegistry or if you can map the op to a set of ops that already have converters you can write a graph rewrite pass which will replace your new op with an equivalent subgraph of supported ops. Its preferred to use graph rewriting because then we do not need to maintain a large library of op converters. Also do look at the various op support trackers in the [issues](https://github.com/pytorch/TensorRT/issues) for information on the support status of various operators.
308+
Thanks for wanting to contribute! There are two main ways to handle supporting a new op. Either you can write a converter for the op from scratch and register it in the NodeConverterRegistry or if you can map the op to a set of ops that already have converters you can write a graph rewrite pass which will replace your new op with an equivalent subgraph of supported ops. It's preferred to use graph rewriting because then we do not need to maintain a large library of op converters. Also do look at the various op support trackers in the [issues](https://github.com/pytorch/TensorRT/issues) for information on the support status of various operators.
309309

310310
### In my application?
311311

core/conversion/converters/impl/interpolate.cpp

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -523,6 +523,37 @@ auto interpolate_registrations TORCHTRT_UNUSED =
523523
resize_layer_size(ctx, n, in, out_shape, {}, nvinfer1::InterpolationMode::kLINEAR, align_corners);
524524
}
525525

526+
return true;
527+
}})
528+
.pattern(
529+
{"aten::grid_sampler(Tensor input, Tensor grid, int interpolation_mode, int padding_mode, bool align_corners) -> Tensor",
530+
[](ConversionCtx* ctx, const torch::jit::Node* n, args& args) -> bool {
531+
auto in = args[0].ITensorOrFreeze(ctx);
532+
auto grid = args[1].ITensorOrFreeze(ctx);
533+
auto interpolation_mode = args[2].unwrapToInt();
534+
auto padding_mode = args[3].unwrapToInt();
535+
auto align_corners = args[4].unwrapToBool();
536+
537+
static const auto sample_map = std::map<int, nvinfer1::SampleMode>{
538+
{0, nvinfer1::SampleMode::kFILL},
539+
{1, nvinfer1::SampleMode::kCLAMP},
540+
{2, nvinfer1::SampleMode::kREFLECT}};
541+
542+
static const auto interpolation_map = std::map<int, nvinfer1::InterpolationMode>{
543+
{0, nvinfer1::InterpolationMode::kLINEAR},
544+
{1, nvinfer1::InterpolationMode::kNEAREST},
545+
{2, nvinfer1::InterpolationMode::kCUBIC}};
546+
547+
auto grid_sample_layer = ctx->net->addGridSample(*in, *grid);
548+
TORCHTRT_CHECK(
549+
grid_sample_layer, "Unable to create grid_sample layer from node: " << util::node_info(n));
550+
551+
grid_sample_layer->setAlignCorners(align_corners);
552+
grid_sample_layer->setSampleMode(sample_map.at(padding_mode));
553+
grid_sample_layer->setInterpolationMode(interpolation_map.at(interpolation_mode));
554+
555+
auto out_tensor = ctx->AssociateValueAndTensor(n->outputs()[0], grid_sample_layer->getOutput(0));
556+
LOG_DEBUG("Output tensor shape: " << out_tensor->getDimensions());
526557
return true;
527558
}});
528559

core/lowering/passes/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ target_sources(${lib_name}
2626
"${CMAKE_CURRENT_SOURCE_DIR}/unpack_rsqrt.cpp"
2727
"${CMAKE_CURRENT_SOURCE_DIR}/unpack_std.cpp"
2828
"${CMAKE_CURRENT_SOURCE_DIR}/unpack_var.cpp"
29+
"${CMAKE_CURRENT_SOURCE_DIR}/unpack_scaled_dot_product_attention.cpp"
2930
"${CMAKE_CURRENT_SOURCE_DIR}/view_to_reshape.cpp"
3031
"${CMAKE_CURRENT_SOURCE_DIR}/rewrite_inputs_with_params.cpp"
3132
)

docker/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/
4747
RUN add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /"
4848
RUN apt-get update
4949

50-
RUN apt-get install -y libnvinfer8=${TENSORRT_VERSION}.* libnvinfer-plugin8=${TENSORRT_VERSION}.* libnvinfer-dev=${TENSORRT_VERSION}.* libnvinfer-plugin-dev=${TENSORRT_VERSION}.* libnvonnxparsers8=${TENSORRT_VERSION}.* libnvonnxparsers-dev=${TENSORRT_VERSION}.* libnvparsers8=${TENSORRT_VERSION}.* libnvparsers-dev=${TENSORRT_VERSION}.*
50+
RUN apt-get install -y libnvinfer8=${TENSORRT_VERSION}.* libnvinfer-plugin8=${TENSORRT_VERSION}.* libnvinfer-dev=${TENSORRT_VERSION}.* libnvinfer-plugin-dev=${TENSORRT_VERSION}.* libnvonnxparsers8=${TENSORRT_VERSION}.* libnvonnxparsers-dev=${TENSORRT_VERSION}.* libnvparsers8=${TENSORRT_VERSION}.* libnvparsers-dev=${TENSORRT_VERSION}.* libnvinfer-headers-dev=${TENSORRT_VERSION}.* libnvinfer-headers-plugin-dev=${TENSORRT_VERSION}.*
5151

5252
# Setup Bazel via Bazelisk
5353
RUN wget -q https://github.com/bazelbuild/bazelisk/releases/download/v1.17.0/bazelisk-linux-amd64 -O /usr/bin/bazel &&\

docsrc/py_api/dynamo.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,8 @@ Functions
2222

2323
.. autofunction:: export
2424

25+
.. autofunction:: convert_module_to_trt_engine
26+
2527

2628

2729
Classes

py/torch_tensorrt/_compile.py

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from __future__ import annotations
22

3+
import collections.abc
34
import logging
45
from enum import Enum
56
from typing import Any, Callable, List, Optional, Sequence, Set
@@ -237,8 +238,6 @@ def compile(
237238
return compiled_fx_module
238239
elif target_ir == _IRType.dynamo:
239240
# Prepare torch and torchtrt inputs
240-
import collections.abc
241-
242241
from torch_tensorrt.dynamo.utils import prepare_inputs
243242

244243
if not isinstance(input_list, collections.abc.Sequence):
@@ -342,10 +341,19 @@ def convert_method_to_trt_engine(
342341
"convert_method_to_trt_engine call is not supported for ir=fx"
343342
)
344343
elif target_ir == _IRType.dynamo:
344+
# Prepare torch and torchtrt inputs
345+
from torch_tensorrt.dynamo.utils import prepare_inputs
346+
347+
if not isinstance(inputs, collections.abc.Sequence):
348+
inputs = [inputs]
349+
350+
# Export the module
351+
torchtrt_inputs = prepare_inputs(inputs)
352+
exp_program = torch_tensorrt.dynamo.trace(module, torchtrt_inputs, **kwargs)
353+
345354
return dynamo_convert_module_to_trt_engine( # type: ignore[no-any-return]
346-
module,
355+
exp_program,
347356
inputs=inputs,
348-
method_name=method_name,
349357
enabled_precisions=enabled_precisions_set,
350358
**kwargs,
351359
)

py/torch_tensorrt/dynamo/_compiler.py

Lines changed: 21 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -422,8 +422,7 @@ def contains_metadata(gm: torch.fx.GraphModule) -> bool:
422422

423423

424424
def convert_module_to_trt_engine(
425-
module: torch.fx.GraphModule,
426-
method_name: str = "forward",
425+
exported_program: ExportedProgram,
427426
inputs: Optional[Sequence[Input | torch.Tensor]] = None,
428427
enabled_precisions: (
429428
Set[torch.dtype | dtype] | Tuple[torch.dtype | dtype]
@@ -453,15 +452,15 @@ def convert_module_to_trt_engine(
453452
calibrator: object = None,
454453
allow_shape_tensors: bool = False,
455454
) -> bytes:
456-
"""Convert a GraphModule module method to a serialized TensorRT engine
455+
"""Convert an ExportedProgram to a serialized TensorRT engine
457456
458-
Converts a specified method of a module to a serialized TensorRT engine given a dictionary of conversion settings
457+
Converts an ExportedProgram to a serialized TensorRT engine given a dictionary of conversion settings
459458
460459
Arguments:
461-
module (torch.fx.GraphModule): Source module
460+
exported_program (torch.export.ExportedProgram): Source module
462461
463462
Keyword Args:
464-
inputs (List[Union(torch_tensorrt.Input, torch.Tensor)]): **Required** List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using
463+
inputs (Optional[Sequence[torch_tensorrt.Input | torch.Tensor]]): **Required** List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using
465464
torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum
466465
to select device type. ::
467466
@@ -476,30 +475,11 @@ def convert_module_to_trt_engine(
476475
), # Dynamic input shape for input #2
477476
torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings
478477
]
479-
480-
method_name (str): Name of method to convert
481-
input_signature Union(List, Tuple, torch_tensorrt.Input, torch.Tensor): A formatted collection of input specifications for the module. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using
482-
torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type. **This API should be considered beta-level stable and may change in the future** ::
483-
484-
input_signature=([
485-
torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
486-
torch_tensorrt.Input(
487-
min_shape=(1, 224, 224, 3),
488-
opt_shape=(1, 512, 512, 3),
489-
max_shape=(1, 1024, 1024, 3),
490-
dtype=torch.int32
491-
format=torch.channel_last
492-
), # Dynamic input shape for input #2
493-
], torch.randn((1, 3, 224, 244))) # Use an example tensor and let torch_tensorrt infer settings for input #3
494-
495-
device (Union(torch_tensorrt.Device, torch.device, dict)): Target device for TensorRT engines to run on ::
496-
497-
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
498-
478+
enabled_precisions (Optional[Set[torch.dtype | _enums.dtype]]): The set of datatypes that TensorRT can use
499479
debug (bool): Whether to print out verbose debugging information
500480
workspace_size (int): Workspace TRT is allowed to use for the module (0 is default)
501481
min_block_size (int): Minimum number of operators per TRT-Engine Block
502-
torch_executed_ops (Sequence[str]): Sequence of operations to run in Torch, regardless of converter coverage
482+
torch_executed_ops (Set[str]): Set of operations to run in Torch, regardless of converter coverage
503483
pass_through_build_failures (bool): Whether to fail on TRT engine build errors (True) or not (False)
504484
max_aux_streams (Optional[int]): Maximum number of allowed auxiliary TRT streams for each engine
505485
version_compatible (bool): Provide version forward-compatibility for engine plan files
@@ -566,13 +546,25 @@ def convert_module_to_trt_engine(
566546
"dla_global_dram_size": dla_global_dram_size,
567547
}
568548

549+
# Decompose the exported program
550+
exported_program = exported_program.run_decompositions(
551+
get_decompositions(enable_experimental_decompositions)
552+
)
553+
gm = exported_program.module()
554+
logger.debug("Input graph: " + str(gm.graph))
555+
556+
# Apply lowering on the graph module
557+
torch_inputs = get_torch_inputs(input_list, device)
558+
gm = apply_lowering_passes(gm, torch_inputs)
559+
logger.debug("Lowered Input graph: " + str(gm.graph))
560+
569561
settings = CompilationSettings(**compilation_options)
570562
logger.info("Compilation Settings: %s\n", settings)
571563
try:
572-
interpreter_result = interpret_module_to_result(module, input_list, settings)
564+
interpreter_result = interpret_module_to_result(gm, input_list, settings)
573565
except UnsupportedOperatorException:
574566
logger.error(
575-
f"Conversion of module {module} not currently fully supported or convertible!",
567+
f"Conversion of module {gm} not currently fully supported or convertible!",
576568
exc_info=True,
577569
)
578570
except Exception as e:

0 commit comments

Comments
 (0)