'onnxruntime_providers_cuda.dll' crashes with access violation during TLS callback initialization (v1.22.1, CUDA 12.9, Windows)

### Describe the issue

## Summary
The CUDA execution provider DLL crashes with an access violation (0xC0000005) when attempting to load it on Windows, either via `LoadLibrary()` or through the official ONNX Runtime API. The crash occurs during Thread Local Storage (TLS) callback initialization, specifically when the second TLS callback attempts to read from a null pointer (0x0000000000000000).

## Environment

| Field | Value |
|-------|-------|
| **ONNX Runtime Version** | 1.22.1 (built from source) |
| **Operating System** | Windows 10.0.19045 Build 19045 |
| **CUDA Version** | 12.9 |
| **cuDNN Version** | 9.11.0.98 for CUDA 12 |
| **Visual Studio Version** | Visual Studio 2022 v17.14.6 (June 2025) |
| **CMake Version** | 3.30.0 |
| **Build Configuration** | Release |

## Steps to Reproduce

1. Build ONNX Runtime from source using the following command:
```batch
build.bat --cmake_path "C:\Program Files\CMake\bin\cmake.exe" ^
    --config Release ^
    --use_cuda ^
    --cuda_version 12.9 ^
    --build_shared_lib ^
    --parallel ^
    --compile_no_warning_as_error ^
    --cudnn_home D:\ThirdPartyLib\cudann\cudnn-windows-x86_64-9.11.0.98_cuda12 ^
    --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9" ^
    --skip_tests ^
    --cmake_generator "Visual Studio 17 2022"
```

2. Copy the generated DLLs to the application directory:
   - `onnxruntime.dll`
   - `onnxruntime_providers_shared.dll`
   - `onnxruntime_providers_cuda.dll`

3. Create a minimal test application:
```cpp
#include <windows.h>
#include <iostream>

int main() {
    HMODULE hModule = LoadLibrary(_T("onnxruntime_providers_cuda.dll"));
    if (hModule == NULL) {
        DWORD error = GetLastError();
        std::cerr << "LoadLibrary failed with error: 0x" << std::hex << error << std::endl;
        WCHAR errorMsg[256];
        FormatMessageW(FORMAT_MESSAGE_FROM_SYSTEM, NULL, error, 0, errorMsg, 256, NULL);
        std::wcout << L"Error message: " << errorMsg << std::endl;
    } else {
        std::cout << "DLL loaded successfully" << std::endl;
        FreeLibrary(hModule);
    }
    return 0;
}
```

4. Run the application - it crashes immediately during DLL initialization

## Expected Behavior
The DLL should load successfully without crashing, allowing the CUDA execution provider to be used for inference acceleration.

## Actual Behavior
The application crashes with an unhandled exception during the DLL's TLS callback initialization:
```
Exception thrown at 0x00007FF982E66931 (onnxruntime_providers_cuda.dll): 0xC0000005: 
Access violation reading location 0x0000000000000000.
```

## Error Output / Stack Trace

### Debugger Output
```
Exception thrown at 0x00007FF982E66931 (onnxruntime_providers_cuda.dll) in workbench_gpu.exe: 
0xC0000005: Access violation reading location 0x0000000000000000.

Unhandled exception at 0x00007FF982E66931 (onnxruntime_providers_cuda.dll) in workbench_gpu.exe: 
0xC0000005: Access violation reading location 0x0000000000000000.
```

### Loader Trace (via gflags.exe)
```
LdrpInitializeNode - INFO: Calling init routine 00007FF982F7ED1C for DLL "onnxruntime_providers_cuda.dll"
LdrpCallTlsInitializers - INFO: Calling TLS callback 00007FF982F7DECC for DLL "onnxruntime_providers_cuda.dll" at 00007FF9825E0000
LdrpCallTlsInitializers - INFO: Calling TLS callback 00007FF982F7E600 for DLL "onnxruntime_providers_cuda.dll" at 00007FF9825E0000
Exception thrown at 0x00007FF982E66931 (onnxruntime_providers_cuda.dll): 0xC0000005: Access violation reading location 0x0000000000000000.
```

## Additional Information

### What I've Tried
- ✅ Verified all CUDA and cuDNN dependencies are present and accessible via PATH
- ✅ Tested with both Debug and Release builds - same crash
- ✅ Pre-initialized CUDA context before loading (`cudaSetDevice(0); cudaFree(0);`) - didn't help
- ✅ Loaded DLLs in dependency order (onnxruntime.dll → onnxruntime_providers_shared.dll → onnxruntime_providers_cuda.dll)
- ✅ Used official ONNX Runtime C API for initialization - crashes the same way
- ✅ Verified there is only a single CUDA installation on system
- ✅ Confirmed NVIDIA driver supports CUDA 12.9 via nvidia-smi

### Analysis
The crash occurs specifically in the second TLS callback. 

## Impact
This issue completely prevents the use of CUDA 12.9 acceleration with ONNX Runtime on Windows when building from source.

## Dependencies
![Image](https://github.com/user-attachments/assets/740f49e7-20ee-41ee-b507-b4719ec88d87)

[snap_trace_tail.txt](https://github.com/user-attachments/files/21628805/snap_trace_tail.txt)



### Urgency

This issue prevents the use of CUDA 12.9 with ONNX Runtime on Windows when building from source. 

### Target platform

Windows

### Build script

```
build.bat --cmake_path "C:\Program Files\CMake\bin\cmake.exe" ^
    --config Release ^
    --use_cuda ^
    --cuda_version 12.9 ^
    --build_shared_lib ^
    --parallel ^
    --compile_no_warning_as_error ^
    --cudnn_home D:\ThirdPartyLib\cudann\cudnn-windows-x86_64-9.11.0.98_cuda12 ^
    --cuda_home "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.9" ^
    --skip_tests ^
    --cmake_generator "Visual Studio 17 2022"
```

### Error / output

### Debugger Output
```
Exception thrown at 0x00007FF982E66931 (onnxruntime_providers_cuda.dll) in workbench_gpu.exe: 
0xC0000005: Access violation reading location 0x0000000000000000.

Unhandled exception at 0x00007FF982E66931 (onnxruntime_providers_cuda.dll) in workbench_gpu.exe: 
0xC0000005: Access violation reading location 0x0000000000000000.
```

### Loader Trace (via gflags.exe)
```
LdrpInitializeNode - INFO: Calling init routine 00007FF982F7ED1C for DLL "onnxruntime_providers_cuda.dll"
LdrpCallTlsInitializers - INFO: Calling TLS callback 00007FF982F7DECC for DLL "onnxruntime_providers_cuda.dll" at 00007FF9825E0000
LdrpCallTlsInitializers - INFO: Calling TLS callback 00007FF982F7E600 for DLL "onnxruntime_providers_cuda.dll" at 00007FF9825E0000
Exception thrown at 0x00007FF982E66931 (onnxruntime_providers_cuda.dll): 0xC0000005: Access violation reading location 0x0000000000000000.
```

### Visual Studio Version

Visual Studio 2022 v17.14.6 (June 2025)

### GCC / Compiler Version

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

'onnxruntime_providers_cuda.dll' crashes with access violation during TLS callback initialization (v1.22.1, CUDA 12.9, Windows) #25670

Describe the issue

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Error Output / Stack Trace

Debugger Output

Loader Trace (via gflags.exe)

Additional Information

What I've Tried

Analysis

Impact

Dependencies

Urgency

Target platform

Build script

Error / output

Debugger Output

Loader Trace (via gflags.exe)

Visual Studio Version

GCC / Compiler Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Field	Value
ONNX Runtime Version	1.22.1 (built from source)
Operating System	Windows 10.0.19045 Build 19045
CUDA Version	12.9
cuDNN Version	9.11.0.98 for CUDA 12
Visual Studio Version	Visual Studio 2022 v17.14.6 (June 2025)
CMake Version	3.30.0
Build Configuration	Release

'onnxruntime_providers_cuda.dll' crashes with access violation during TLS callback initialization (v1.22.1, CUDA 12.9, Windows) #25670

Description

Describe the issue

Summary

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Error Output / Stack Trace

Debugger Output

Loader Trace (via gflags.exe)

Additional Information

What I've Tried

Analysis

Impact

Dependencies

Urgency

Target platform

Build script

Error / output

Debugger Output

Loader Trace (via gflags.exe)

Visual Studio Version

GCC / Compiler Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions