-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Open
Description
Description
When exporting a PyTorch model using torch.nn.functional.grid_sample:
Opset 18 → TensorRT output matches onnx exactly
Opset 20 → The GridSample node produces large numerical errors
TensorRT build logs show no warnings or errors.
Environment
TensorRT Version: 10.13.3
NVIDIA GPU: RTX 5090
NVIDIA Driver Version: 575.64.03
CUDA Version: 12.9
CUDNN Version: 9.1.0
Operating System: Ubuntu 24.04
Python Version (if applicable): 3.10.19
PyTorch Version (if applicable): 2.9.0+cu128
ONNX IR: 0.0.10
Minimal Reproduction Code
import torch
from torch.nn import functional as F
class GridSampleTest(torch.nn.Module):
def __init__(self, mode="bilinear", padding_mode="zeros", align_corners=False):
super().__init__()
self.mode = mode
self.padding_mode = padding_mode
self.align_corners = align_corners
def forward(self, x, grid):
return F.grid_sample(
x, grid,
mode=self.mode,
padding_mode=self.padding_mode,
align_corners=self.align_corners
)
model = GridSampleTest().eval().cuda()
x = torch.randn(45056,1,1,256, device='cuda')
g = torch.randn(45056, 1, 9, 2, device='cuda')
torch.onnx.export(
model, (x, g), "grid_sample_test.onnx",
input_names=["x", "grid"], output_names=["y"],
opset_version=20, # opset_version=18
dynamic_axes={"x": {0: "B"}, "grid": {0: "B"}, "y": {0: "B"}}
)Commands or scripts:
polygraphy run grid_sample_test.onnx --onnxrt --trt --verbose --onnx-outputs mark all --trt-outputs mark all
Have you tried the latest release?: No
Polygraphy Log
[V] Loaded Module: polygraphy | Version: 0.49.26
[V] Loaded extension modules: []
[I] TF32 is disabled by default. Turn on TF32 for better performance with minor accuracy differences.
[I] onnxrt-runner-N0-11/20/25-10:34:53 | Activating and starting inference
[I] Loading model: grid_sample_test.onnx
[V] Loaded Module: onnx | Version: 1.19.1
[V] Marking all ONNX tensors as outputs
[V] Loaded Module: onnxruntime | Version: 1.23.2
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[V] Loading inputs from data loader
[V] Generating data using numpy seed: 1
[V] Loaded Module: numpy | Version: 2.2.6
[W] Input tensor: x [shape=BoundedShape(['s38', 1, 1, 256], min=None, max=None)] | Will generate data of shape: [1, 1, 1, 256].
If this is incorrect, please provide a custom data loader.
[V] Input tensor: x | Generating input data in range: [0.0, 1.0]
[W] Input tensor: grid [shape=BoundedShape(['s38', 1, 9, 2], min=None, max=None)] | Will generate data of shape: [1, 1, 9, 2].
If this is incorrect, please provide a custom data loader.
[V] Input tensor: grid | Generating input data in range: [0.0, 1.0]
[I] onnxrt-runner-N0-11/20/25-10:34:53
---- Inference Input(s) ----
{x [dtype=float32, shape=(1, 1, 1, 256)],
grid [dtype=float32, shape=(1, 1, 9, 2)]}
[V] onnxrt-runner-N0-11/20/25-10:34:53 | Input metadata is: {x [dtype=float32, shape=('s38', 1, 1, 256)],
grid [dtype=float32, shape=('s38', 1, 9, 2)]}
[V] Loaded Module: torch | Version: 2.9.0+cu128
[I] onnxrt-runner-N0-11/20/25-10:34:53
---- Inference Output(s) ----
{y [dtype=float32, shape=(1, 1, 1, 9)]}
[I] onnxrt-runner-N0-11/20/25-10:34:53 | Completed 1 iteration(s) in 0.1719 ms | Average inference time: 0.1719 ms.
[I] trt-runner-N0-11/20/25-10:34:53 | Activating and starting inference
[V] Loaded Module: tensorrt | Version: 10.13.3.9
[V] [MemUsageChange] Init CUDA: CPU +31, GPU +0, now: CPU 163, GPU 3148 (MiB)
[V] [MemUsageChange] Init builder kernel library: CPU +1552, GPU +4, now: CPU 1917, GPU 3152 (MiB)
[V] ----------------------------------------------------------------
[V] Input filename: grid_sample_test.onnx
[V] ONNX IR version: 0.0.10
[V] Opset version: 20
[V] Producer name: pytorch
[V] Producer version: 2.9.0+cu128
[V] Domain:
[V] Model version: 0
[V] Doc string:
[V] ----------------------------------------------------------------
[V] Executing postprocessing step [ModifyNetworkOutputs]
[V] Marking 1 tensors as outputs
[V] Setting TensorRT Optimization Profiles
[W] Input tensor: x (dtype=DataType.FLOAT, shape=(-1, 1, 1, 256)) | No shapes provided; Will use shape: [1, 1, 1, 256] for min/opt/max in profile.
[W] This will cause the tensor to have a static shape. If this is incorrect, please set the range of shapes for this input tensor.
[W] Input tensor: grid (dtype=DataType.FLOAT, shape=(-1, 1, 9, 2)) | No shapes provided; Will use shape: [1, 1, 9, 2] for min/opt/max in profile.
[V] Input tensor: x (dtype=DataType.FLOAT, shape=(-1, 1, 1, 256)) | Setting input tensor shapes to: (min=[1, 1, 1, 256], opt=[1, 1, 1, 256], max=[1, 1, 1, 256])
[V] Input tensor: grid (dtype=DataType.FLOAT, shape=(-1, 1, 9, 2)) | Setting input tensor shapes to: (min=[1, 1, 9, 2], opt=[1, 1, 9, 2], max=[1, 1, 9, 2])
[I] Configuring with profiles:[
Profile 0:
{x [min=[1, 1, 1, 256], opt=[1, 1, 1, 256], max=[1, 1, 1, 256]],
grid [min=[1, 1, 9, 2], opt=[1, 1, 9, 2], max=[1, 1, 9, 2]]}
]
[W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.
[I] Building engine with configuration:
Flags | []
Engine Capability | EngineCapability.STANDARD
Memory Pools | [WORKSPACE: 32109.50 MiB, TACTIC_DRAM: 32109.50 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
Tactic Sources | [EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
Profiling Verbosity | ProfilingVerbosity.DETAILED
Preview Features | [PROFILE_SHARING_0806]
[V] Global timing cache in use. Profiling results in this builder pass will be stored.
[V] Compiler backend is used during engine build.
[V] Detected 2 inputs and 1 output network tensors.
[V] Total Host Persistent Memory: 80 bytes
[V] Total Device Persistent Memory: 0 bytes
[V] Max Scratch Memory: 0 bytes
[V] Total Activation Memory: 0 bytes
[V] Total Weights Memory: 0 bytes
[V] Compiler backend is used during engine execution.
[V] Engine generation completed in 0.213137 seconds.
[V] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 1 MiB
[I] Finished engine building in 0.236 seconds
[V] Loaded engine size: 0 MiB
[V] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[V] Found candidate CUDA libraries: ['/usr/local/cuda-12.9/lib64/libcudart.so.12.9.79', '/usr/local/cuda-12.9/lib64/libcudart.so', '/usr/local/cuda-12.9/lib64/libcudart.so.12']
[I] trt-runner-N0-11/20/25-10:34:53
---- Inference Input(s) ----
{x [dtype=float32, shape=(1, 1, 1, 256)],
grid [dtype=float32, shape=(1, 1, 9, 2)]}
[V] trt-runner-N0-11/20/25-10:34:53 | Input metadata is: {x [dtype=float32, shape=(1, 1, 1, 256)],
grid [dtype=float32, shape=(1, 1, 9, 2)]}
[I] trt-runner-N0-11/20/25-10:34:53
---- Inference Output(s) ----
{y [dtype=float32, shape=(1, 1, 1, 9)]}
[I] trt-runner-N0-11/20/25-10:34:53 | Completed 1 iteration(s) in 0.6733 ms | Average inference time: 0.6733 ms.
[V] Successfully ran: ['onnxrt-runner-N0-11/20/25-10:34:53', 'trt-runner-N0-11/20/25-10:34:53']
[I] Accuracy Comparison | onnxrt-runner-N0-11/20/25-10:34:53 vs. trt-runner-N0-11/20/25-10:34:53
[I] Comparing Output: 'y' (dtype=float32, shape=(1, 1, 1, 9)) with 'y' (dtype=float32, shape=(1, 1, 1, 9))
[I] Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I] onnxrt-runner-N0-11/20/25-10:34:53: y | Stats: mean=0.30805, std-dev=0.12337, var=0.015221, median=0.3287, min=0.1003 at (0, 0, 0, 5), max=0.46257 at (0, 0, 0, 8), avg-magnitude=0.30805, p90=0.42193, p95=0.44225, p99=0.4585
[I] ---- Values ----
[[[[0.39737493 0.10553478 0.3287046 0.2827478 0.4117708 0.10029647
0.4003233 0.28308854 0.46256799]]]]
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(0.1 , 0.157) | 2 | ##########################
(0.157, 0.214) | 0 |
(0.214, 0.271) | 0 |
(0.271, 0.328) | 2 | ##########################
(0.328, 0.385) | 1 | #############
(0.385, 0.442) | 3 | ########################################
(0.442, 0.499) | 1 | #############
(0.499, 0.555) | 0 |
(0.555, 0.612) | 0 |
(0.612, 0.669) | 0 |
[I] trt-runner-N0-11/20/25-10:34:53: y | Stats: mean=0.39464, std-dev=0.1704, var=0.029035, median=0.37008, min=0.12633 at (0, 0, 0, 5), max=0.66923 at (0, 0, 0, 6), avg-magnitude=0.39464, p90=0.57917, p95=0.6242, p99=0.66023
[I] ---- Values ----
[[[[0.5566532 0.15679139 0.34473667 0.34473667 0.43367636 0.12632953
0.6692329 0.3700842 0.5495479 ]]]]
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(0.1 , 0.157) | 2 | ##########################
(0.157, 0.214) | 0 |
(0.214, 0.271) | 0 |
(0.271, 0.328) | 0 |
(0.328, 0.385) | 3 | ########################################
(0.385, 0.442) | 1 | #############
(0.442, 0.499) | 0 |
(0.499, 0.555) | 1 | #############
(0.555, 0.612) | 1 | #############
(0.612, 0.669) | 1 | #############
[I] Error Metrics: y
[I] Minimum Required Tolerance: elemwise error | [abs=0.26891] OR [rel=0.40182] (requirements may be lower if both abs/rel tolerances are set)
[I] Absolute Difference | Stats: mean=0.086598, std-dev=0.076889, var=0.005912, median=0.061989, min=0.016032 at (0, 0, 0, 2), max=0.26891 at (0, 0, 0, 6), avg-magnitude=0.086598, p90=0.1812, p95=0.22506, p99=0.26014
[I] ---- Values ----
[[[[0.15927827 0.05125661 0.01603207 0.06198886 0.02190557 0.02603306
0.2689096 0.08699566 0.08697993]]]]
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(0.016 , 0.0413) | 3 | ########################################
(0.0413, 0.0666) | 2 | ##########################
(0.0666, 0.0919) | 2 | ##########################
(0.0919, 0.117 ) | 0 |
(0.117 , 0.142 ) | 0 |
(0.142 , 0.168 ) | 1 | #############
(0.168 , 0.193 ) | 0 |
(0.193 , 0.218 ) | 0 |
(0.218 , 0.244 ) | 0 |
(0.244 , 0.269 ) | 1 | #############
[I] Relative Difference | Stats: mean=0.21012, std-dev=0.11188, var=0.012517, median=0.20607, min=0.046505 at (0, 0, 0, 2), max=0.40182 at (0, 0, 0, 6), avg-magnitude=0.21012, p90=0.34189, p95=0.37185, p99=0.39583
[I] ---- Values ----
[[[[0.28613555 0.3269096 0.04650526 0.17981511 0.05051133 0.20607264
0.40181768 0.23506992 0.15827543]]]]
[I] ---- Histogram ----
Bin Range | Num Elems | Visualization
(0.0465, 0.082) | 2 | ########################################
(0.082 , 0.118) | 0 |
(0.118 , 0.153) | 0 |
(0.153 , 0.189) | 2 | ########################################
(0.189 , 0.224) | 1 | ####################
(0.224 , 0.26 ) | 1 | ####################
(0.26 , 0.295) | 1 | ####################
(0.295 , 0.331) | 1 | ####################
(0.331 , 0.366) | 0 |
(0.366 , 0.402) | 1 | ####################
[E] FAILED | Output: 'y' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E] FAILED | Mismatched outputs: ['y']
[E] Accuracy Summary | onnxrt-runner-N0-11/20/25-10:34:53 vs. trt-runner-N0-11/20/25-10:34:53 | Passed: 0/1 iterations | Pass Rate: 0.0%
[E] FAILED | Runtime: 2.230s | Command: polygraphy run grid_sample_test.onnx --onnxrt --trt --verbose --onnx-outputs mark all --trt-outputs mark allpifroggi and Shushpancheak
Metadata
Metadata
Assignees
Labels
No labels