Skip to content

Conversation

tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Oct 3, 2025

Users with RTX 5090 GPUs are experiencing runtime errors when using onnxruntime-gpu:

[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Slice node. 
Name:'Slice_34' Status Message: CUDA error cudaErrorNoKernelImageForDevice:
no kernel image is available for execution on the device

This occurs because RTX 5090 uses CUDA compute architecture 12.0 (SM 12.0). The incompatibility of onnxruntime-gpu 1.23 was built with 90a-virtual. The 90a architecture is a specialized, non-forward-compatible version of the Hopper architecture, making it incompatible with future GPU generations like Blackwell.

This change will revert 90a-virtual back to 90-virtual as used in 1.22. This shall bring back the compatibility in Blackwell GPU.

The FPA_INTB_GEMM is disabled by default. It need some extra work to make it compatible with 90-virtual and no 90a-real use case.

Related:
#26002
#26226
#26181

@hariharans29
Copy link
Member

Just curious - What is the binary size impact for this ?

@tianleiwu tianleiwu marked this pull request as draft October 3, 2025 19:52
@tianleiwu
Copy link
Contributor Author

Just curious - What is the binary size impact for this ?

Binary will be smaller (for example 4MB is reduced for Linux wheel).

@tianleiwu tianleiwu marked this pull request as ready for review October 5, 2025 20:18
@tianleiwu tianleiwu closed this Oct 7, 2025
@tianleiwu tianleiwu reopened this Oct 7, 2025
@tianleiwu tianleiwu merged commit 11b23ad into main Oct 7, 2025
154 of 170 checks passed
@tianleiwu tianleiwu deleted the tlwu/90_virtual branch October 7, 2025 22:17
@JulienMaille
Copy link
Contributor

Does this affect both the cuda backend and the dml backend or only the former?

@snnn
Copy link
Member

snnn commented Oct 9, 2025

Sorry it did not get into the 1.23.1 patch release.

apsonawane pushed a commit that referenced this pull request Oct 17, 2025
Users with RTX 5090 GPUs are experiencing runtime errors when using
onnxruntime-gpu:
```
[ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Slice node.
Name:'Slice_34' Status Message: CUDA error cudaErrorNoKernelImageForDevice:
no kernel image is available for execution on the device
```
This occurs because RTX 5090 uses CUDA compute architecture 12.0 (SM
12.0). The incompatibility of `onnxruntime-gpu` 1.23 was built with
`90a-virtual`. The `90a` architecture is a specialized,
non-forward-compatible version of the Hopper architecture, making it
incompatible with future GPU generations like Blackwell.

This change will revert `90a-virtual` back to `90-virtual` as used in
1.22. This shall bring back the compatibility in Blackwell GPU.

The FPA_INTB_GEMM is disabled by default. It need some extra work to
make it compatible with 90-virtual and no 90a-real use case.

Related:
#26002
#26226
#26181
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants