Skip to content

GridSample outputs incorrect results in ONNX opset 20 (opset 18 works correctly) #4646

@Alcoholrithm

Description

@Alcoholrithm

Description

When exporting a PyTorch model using torch.nn.functional.grid_sample:

Opset 18 → TensorRT output matches onnx exactly
Opset 20 → The GridSample node produces large numerical errors

TensorRT build logs show no warnings or errors.

Environment

TensorRT Version: 10.13.3

NVIDIA GPU: RTX 5090

NVIDIA Driver Version: 575.64.03

CUDA Version: 12.9

CUDNN Version: 9.1.0

Operating System: Ubuntu 24.04

Python Version (if applicable): 3.10.19

PyTorch Version (if applicable): 2.9.0+cu128

ONNX IR: 0.0.10

Minimal Reproduction Code

import torch
from torch.nn import functional as F
class GridSampleTest(torch.nn.Module):
    def __init__(self, mode="bilinear", padding_mode="zeros", align_corners=False):
        super().__init__()
        self.mode = mode
        self.padding_mode = padding_mode
        self.align_corners = align_corners

    def forward(self, x, grid):
        return F.grid_sample(
            x, grid,
            mode=self.mode,
            padding_mode=self.padding_mode,
            align_corners=self.align_corners
        )

model = GridSampleTest().eval().cuda()

x = torch.randn(45056,1,1,256, device='cuda')

g = torch.randn(45056, 1, 9, 2, device='cuda')
torch.onnx.export(
    model, (x, g), "grid_sample_test.onnx",
    input_names=["x", "grid"], output_names=["y"],
    opset_version=20, # opset_version=18
    dynamic_axes={"x": {0: "B"}, "grid": {0: "B"}, "y": {0: "B"}}
)

Commands or scripts:
polygraphy run grid_sample_test.onnx --onnxrt --trt --verbose --onnx-outputs mark all --trt-outputs mark all

Have you tried the latest release?: No

Polygraphy Log

[V] Loaded Module: polygraphy | Version: 0.49.26 
[V] Loaded extension modules: []
[I] TF32 is disabled by default. Turn on TF32 for better performance with minor accuracy differences.
[I] onnxrt-runner-N0-11/20/25-10:34:53  | Activating and starting inference
[I] Loading model: grid_sample_test.onnx
[V] Loaded Module: onnx | Version: 1.19.1 
[V] Marking all ONNX tensors as outputs
[V] Loaded Module: onnxruntime | Version: 1.23.2
[I] Creating ONNX-Runtime Inference Session with providers: ['CPUExecutionProvider']
[V] Loading inputs from data loader
[V] Generating data using numpy seed: 1
[V] Loaded Module: numpy | Version: 2.2.6 
[W] Input tensor: x [shape=BoundedShape(['s38', 1, 1, 256], min=None, max=None)] | Will generate data of shape: [1, 1, 1, 256].
    If this is incorrect, please provide a custom data loader.
[V] Input tensor: x | Generating input data in range: [0.0, 1.0]
[W] Input tensor: grid [shape=BoundedShape(['s38', 1, 9, 2], min=None, max=None)] | Will generate data of shape: [1, 1, 9, 2].
    If this is incorrect, please provide a custom data loader.
[V] Input tensor: grid | Generating input data in range: [0.0, 1.0]
[I] onnxrt-runner-N0-11/20/25-10:34:53
    ---- Inference Input(s) ----
    {x [dtype=float32, shape=(1, 1, 1, 256)],
     grid [dtype=float32, shape=(1, 1, 9, 2)]}
[V] onnxrt-runner-N0-11/20/25-10:34:53  | Input metadata is: {x [dtype=float32, shape=('s38', 1, 1, 256)],
     grid [dtype=float32, shape=('s38', 1, 9, 2)]}
[V] Loaded Module: torch | Version: 2.9.0+cu128 
[I] onnxrt-runner-N0-11/20/25-10:34:53
    ---- Inference Output(s) ----
    {y [dtype=float32, shape=(1, 1, 1, 9)]}
[I] onnxrt-runner-N0-11/20/25-10:34:53  | Completed 1 iteration(s) in 0.1719 ms | Average inference time: 0.1719 ms.
[I] trt-runner-N0-11/20/25-10:34:53     | Activating and starting inference
[V] Loaded Module: tensorrt | Version: 10.13.3.9
[V] [MemUsageChange] Init CUDA: CPU +31, GPU +0, now: CPU 163, GPU 3148 (MiB)
[V] [MemUsageChange] Init builder kernel library: CPU +1552, GPU +4, now: CPU 1917, GPU 3152 (MiB)
[V] ----------------------------------------------------------------
[V] Input filename:   grid_sample_test.onnx
[V] ONNX IR version:  0.0.10
[V] Opset version:    20
[V] Producer name:    pytorch
[V] Producer version: 2.9.0+cu128
[V] Domain:
[V] Model version:    0
[V] Doc string:
[V] ----------------------------------------------------------------
[V] Executing postprocessing step [ModifyNetworkOutputs]
[V] Marking 1 tensors as outputs
[V] Setting TensorRT Optimization Profiles
[W] Input tensor: x (dtype=DataType.FLOAT, shape=(-1, 1, 1, 256)) | No shapes provided; Will use shape: [1, 1, 1, 256] for min/opt/max in profile.
[W] This will cause the tensor to have a static shape. If this is incorrect, please set the range of shapes for this input tensor.
[W] Input tensor: grid (dtype=DataType.FLOAT, shape=(-1, 1, 9, 2)) | No shapes provided; Will use shape: [1, 1, 9, 2] for min/opt/max in profile.
[V] Input tensor: x (dtype=DataType.FLOAT, shape=(-1, 1, 1, 256)) | Setting input tensor shapes to: (min=[1, 1, 1, 256], opt=[1, 1, 1, 256], max=[1, 1, 1, 256])
[V] Input tensor: grid (dtype=DataType.FLOAT, shape=(-1, 1, 9, 2)) | Setting input tensor shapes to: (min=[1, 1, 9, 2], opt=[1, 1, 9, 2], max=[1, 1, 9, 2])
[I] Configuring with profiles:[
        Profile 0:
            {x [min=[1, 1, 1, 256], opt=[1, 1, 1, 256], max=[1, 1, 1, 256]],
             grid [min=[1, 1, 9, 2], opt=[1, 1, 9, 2], max=[1, 1, 9, 2]]}
    ]
[W] profileSharing0806 is on by default in TensorRT 10.0. This flag is deprecated and has no effect.
[I] Building engine with configuration:
    Flags                  | []
    Engine Capability      | EngineCapability.STANDARD
    Memory Pools           | [WORKSPACE: 32109.50 MiB, TACTIC_DRAM: 32109.50 MiB, TACTIC_SHARED_MEMORY: 1024.00 MiB]
    Tactic Sources         | [EDGE_MASK_CONVOLUTIONS, JIT_CONVOLUTIONS]
    Profiling Verbosity    | ProfilingVerbosity.DETAILED
    Preview Features       | [PROFILE_SHARING_0806]
[V] Global timing cache in use. Profiling results in this builder pass will be stored.
[V] Compiler backend is used during engine build.
[V] Detected 2 inputs and 1 output network tensors.
[V] Total Host Persistent Memory: 80 bytes
[V] Total Device Persistent Memory: 0 bytes
[V] Max Scratch Memory: 0 bytes
[V] Total Activation Memory: 0 bytes
[V] Total Weights Memory: 0 bytes
[V] Compiler backend is used during engine execution.
[V] Engine generation completed in 0.213137 seconds.
[V] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 0 MiB, GPU 1 MiB
[I] Finished engine building in 0.236 seconds
[V] Loaded engine size: 0 MiB
[V] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)
[V] Found candidate CUDA libraries: ['/usr/local/cuda-12.9/lib64/libcudart.so.12.9.79', '/usr/local/cuda-12.9/lib64/libcudart.so', '/usr/local/cuda-12.9/lib64/libcudart.so.12']
[I] trt-runner-N0-11/20/25-10:34:53
    ---- Inference Input(s) ----
    {x [dtype=float32, shape=(1, 1, 1, 256)],
     grid [dtype=float32, shape=(1, 1, 9, 2)]}
[V] trt-runner-N0-11/20/25-10:34:53     | Input metadata is: {x [dtype=float32, shape=(1, 1, 1, 256)],
     grid [dtype=float32, shape=(1, 1, 9, 2)]}
[I] trt-runner-N0-11/20/25-10:34:53
    ---- Inference Output(s) ----
    {y [dtype=float32, shape=(1, 1, 1, 9)]}
[I] trt-runner-N0-11/20/25-10:34:53     | Completed 1 iteration(s) in 0.6733 ms | Average inference time: 0.6733 ms.
[V] Successfully ran: ['onnxrt-runner-N0-11/20/25-10:34:53', 'trt-runner-N0-11/20/25-10:34:53']
[I] Accuracy Comparison | onnxrt-runner-N0-11/20/25-10:34:53 vs. trt-runner-N0-11/20/25-10:34:53
[I]     Comparing Output: 'y' (dtype=float32, shape=(1, 1, 1, 9)) with 'y' (dtype=float32, shape=(1, 1, 1, 9))
[I]         Tolerance: [abs=1e-05, rel=1e-05] | Checking elemwise error
[I]         onnxrt-runner-N0-11/20/25-10:34:53: y | Stats: mean=0.30805, std-dev=0.12337, var=0.015221, median=0.3287, min=0.1003 at (0, 0, 0, 5), max=0.46257 at (0, 0, 0, 8), avg-magnitude=0.30805, p90=0.42193, p95=0.44225, p99=0.4585
[I]             ---- Values ----
                    [[[[0.39737493 0.10553478 0.3287046  0.2827478  0.4117708  0.10029647
                        0.4003233  0.28308854 0.46256799]]]]
[I]             ---- Histogram ----
                Bin Range      |  Num Elems | Visualization
                (0.1  , 0.157) |          2 | ##########################
                (0.157, 0.214) |          0 |
                (0.214, 0.271) |          0 |
                (0.271, 0.328) |          2 | ##########################
                (0.328, 0.385) |          1 | #############
                (0.385, 0.442) |          3 | ########################################
                (0.442, 0.499) |          1 | #############
                (0.499, 0.555) |          0 |
                (0.555, 0.612) |          0 |
                (0.612, 0.669) |          0 |
[I]         trt-runner-N0-11/20/25-10:34:53: y | Stats: mean=0.39464, std-dev=0.1704, var=0.029035, median=0.37008, min=0.12633 at (0, 0, 0, 5), max=0.66923 at (0, 0, 0, 6), avg-magnitude=0.39464, p90=0.57917, p95=0.6242, p99=0.66023
[I]             ---- Values ----
                    [[[[0.5566532  0.15679139 0.34473667 0.34473667 0.43367636 0.12632953
                        0.6692329  0.3700842  0.5495479 ]]]]
[I]             ---- Histogram ----
                Bin Range      |  Num Elems | Visualization
                (0.1  , 0.157) |          2 | ##########################
                (0.157, 0.214) |          0 |
                (0.214, 0.271) |          0 |
                (0.271, 0.328) |          0 |
                (0.328, 0.385) |          3 | ########################################
                (0.385, 0.442) |          1 | #############
                (0.442, 0.499) |          0 |
                (0.499, 0.555) |          1 | #############
                (0.555, 0.612) |          1 | #############
                (0.612, 0.669) |          1 | #############
[I]         Error Metrics: y
[I]             Minimum Required Tolerance: elemwise error | [abs=0.26891] OR [rel=0.40182] (requirements may be lower if both abs/rel tolerances are set)
[I]             Absolute Difference | Stats: mean=0.086598, std-dev=0.076889, var=0.005912, median=0.061989, min=0.016032 at (0, 0, 0, 2), max=0.26891 at (0, 0, 0, 6), avg-magnitude=0.086598, p90=0.1812, p95=0.22506, p99=0.26014
[I]                 ---- Values ----
                        [[[[0.15927827 0.05125661 0.01603207 0.06198886 0.02190557 0.02603306
                            0.2689096  0.08699566 0.08697993]]]]
[I]                 ---- Histogram ----
                    Bin Range        |  Num Elems | Visualization
                    (0.016 , 0.0413) |          3 | ########################################
                    (0.0413, 0.0666) |          2 | ##########################
                    (0.0666, 0.0919) |          2 | ##########################
                    (0.0919, 0.117 ) |          0 |
                    (0.117 , 0.142 ) |          0 |
                    (0.142 , 0.168 ) |          1 | #############
                    (0.168 , 0.193 ) |          0 |
                    (0.193 , 0.218 ) |          0 |
                    (0.218 , 0.244 ) |          0 |
                    (0.244 , 0.269 ) |          1 | #############
[I]             Relative Difference | Stats: mean=0.21012, std-dev=0.11188, var=0.012517, median=0.20607, min=0.046505 at (0, 0, 0, 2), max=0.40182 at (0, 0, 0, 6), avg-magnitude=0.21012, p90=0.34189, p95=0.37185, p99=0.39583
[I]                 ---- Values ----
                        [[[[0.28613555 0.3269096  0.04650526 0.17981511 0.05051133 0.20607264
                            0.40181768 0.23506992 0.15827543]]]]
[I]                 ---- Histogram ----
                    Bin Range       |  Num Elems | Visualization
                    (0.0465, 0.082) |          2 | ########################################
                    (0.082 , 0.118) |          0 |
                    (0.118 , 0.153) |          0 |
                    (0.153 , 0.189) |          2 | ########################################
                    (0.189 , 0.224) |          1 | ####################
                    (0.224 , 0.26 ) |          1 | ####################
                    (0.26  , 0.295) |          1 | ####################
                    (0.295 , 0.331) |          1 | ####################
                    (0.331 , 0.366) |          0 |
                    (0.366 , 0.402) |          1 | ####################
[E]         FAILED | Output: 'y' | Difference exceeds tolerance (rel=1e-05, abs=1e-05)
[E]     FAILED | Mismatched outputs: ['y']
[E] Accuracy Summary | onnxrt-runner-N0-11/20/25-10:34:53 vs. trt-runner-N0-11/20/25-10:34:53 | Passed: 0/1 iterations | Pass Rate: 0.0%
[E] FAILED | Runtime: 2.230s | Command: polygraphy run grid_sample_test.onnx --onnxrt --trt --verbose --onnx-outputs mark all --trt-outputs mark all

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions