Fix incorrect Conv/ConvTranspose output with kernel_shape=2 in CUDA Provider #26219
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a bug where Conv and ConvTranspose operations with
kernel_shape=2
(or any kernel size) produce incorrect output when using CUDAExecutionProvider with cuDNN 9+. This regression was introduced between ONNX Runtime versions 1.16.0 and 1.22.0.Problem
Users reported that segmentation models using Conv layers with
kernel_shape=2
produced completely wrong output patterns when:The same models worked correctly with:
kernel_shape=3
Root Cause
In the
CreateCudnnFeExecutionPlan
function (used by cuDNN 9+ frontend API), tensor dimensions are transformed from NHWC to NCHW format in theUpdateState
function (lines 377-387 in conv.cc) before being passed to the function. However, when creatingCudnnFeTensor
objects, the code incorrectly passedLayout == LAYOUT_NHWC
(orw_in_nhwc
) as thenhwc
parameter, telling the stride generator that the already-transformed NCHW dimensions were in NHWC format.This caused the
generateStrides
function to calculate incorrect memory strides:The incorrect strides caused cuDNN to read/write data from wrong memory locations, producing garbled output.
Changes
1. Fixed
CudnnFeTensor
calls inconv.cc
(5 instances)Changed all tensor creation calls in
CreateCudnnFeExecutionPlan
to passfalse
(NCHW) instead ofLayout == LAYOUT_NHWC
:2. Fixed
CudnnFeTensor
calls inconv_transpose.cc
(3 instances)Applied the same fix to ConvTranspose operator for consistency:
3. Added safety checks in
generateStrides
(cudnn_common.cc
)Fixed potential out-of-bounds array access when
nbDims < 4
withchannels_last=true
by adding proper bounds checking.4. Added test case (
conv_test.cc
)Added
ConvNhwcKernel2x2
test specifically for kernel_shape={2, 2} with padding to prevent regression.Impact
Testing
Fixes #[issue_number]
Original prompt
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.