-
Notifications
You must be signed in to change notification settings - Fork 798
Closed
Labels
Description
Describe the bug
Hey! I am learning to use SYCL but I encountered a little issue when using sycl::atomic_ref::exchange. Things work fine on CPU, but when I switched to GPU even a very simple test (see below) crash with a CUDA error. Other atomic primitives such as store or load works fine.
To reproduce
- Include code snippet as short as possible
#include <sycl.hpp>
int main()
{
sycl::queue queue(sycl::gpu_selector_v);
std::cout << "Device: " << queue.get_device().get_info<sycl::info::device::name>() << std::endl;
queue.submit([&](sycl::handler& diana)
{
sycl::stream out(1024, 256, diana);
diana.parallel_for(1, [=](sycl::id<> id)
{
int memory = 3;
sycl::atomic_ref<int,
sycl::memory_order::relaxed,
sycl::memory_scope::work_item> at(memory);
int load = at.exchange(123);
out << "id " << id << " load " << load << sycl::endl;
});
});
queue.wait_and_throw();
}- Specify the command which should be used to compile the program
icpx -fsycl -fsycl-targets=nvptx64-nvidia-cuda main.cpp - Specify the command which should be used to launch the program
./a.out- Indicate what is wrong and what was expected
This is my output; obviously it crashes which is not what one would expect.
Device: NVIDIA GeForce RTX 4090
<CUDA>[ERROR]:
UR CUDA ERROR:
Value: 719
Name: CUDA_ERROR_LAUNCH_FAILED
Description: unspecified launch failure
Function: urEnqueueMemBufferRead
Source Location: /tmp/tmp.nlKu2FwFq5/intel-llvm-mirror/build/_deps/unified-runtime-src/source/adapters/cuda/enqueue.cpp:1777
terminate called after throwing an instance of 'sycl::_V1::exception'
what(): Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Aborted (core dumped)
Environment
- OS: Fedora Linux 40 x86_64 6.11.6-200.fc40.x86_64
- Target device and vendor: NVIDIA GeForce RTX 4090
- DPC++ version:
icpx --versionoutput:
Intel(R) oneAPI DPC++/C++ Compiler 2025.0.0 (2025.0.0.20241008)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2025.0/bin/compiler
Configuration file: /opt/intel/oneapi/compiler/2025.0/bin/compiler/../icpx.cfg
- Dependencies version: Header of
nvidia-smi:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
And output of sycl-ls --verbose:
[opencl:cpu][opencl:0] Intel(R) OpenCL, AMD Ryzen 9 3900X 12-Core Processor OpenCL 3.0 (Build 0) [2024.18.10.0.08_160000]
[cuda:gpu][cuda:0] NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 4090 8.9 [CUDA 12.6]
Platforms: 2
Platform [#1]:
Version : OpenCL 3.0 LINUX
Name : Intel(R) OpenCL
Vendor : Intel(R) Corporation
Devices : 1
Device [#0]:
Type : cpu
Version : OpenCL 3.0 (Build 0)
Name : AMD Ryzen 9 3900X 12-Core Processor
Vendor : Intel(R) Corporation
Driver : 2024.18.10.0.08_160000
Num SubDevices : 0
Num SubSubDevices : 0
Aspects : cpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations usm_system_allocations usm_atomic_host_allocations usm_atomic_shared_allocations atomic64 ext_oneapi_srgb ext_oneapi_native_assert ext_intel_legacy_image ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_tangle_group ext_oneapi_private_alloca
info::device::sub_group_sizes: 4 8 16 32 64
Architecture: x86_64
Platform [#2]:
Version : CUDA 12.6
Name : NVIDIA CUDA BACKEND
Vendor : NVIDIA Corporation
Devices : 1
Device [#0]:
Type : gpu
Version : 8.9
Name : NVIDIA GeForce RTX 4090
Vendor : NVIDIA Corporation
Driver : CUDA 12.6
UUID : 1367131105491041301142711512019110415220878
Num SubDevices : 0
Num SubSubDevices : 0
Aspects : gpu fp16 fp64 online_compiler online_linker queue_profiling usm_device_allocations usm_host_allocations usm_shared_allocations ext_intel_pci_address usm_atomic_shared_allocations atomic64 ext_intel_device_info_uuid ext_oneapi_cuda_async_barrier ext_intel_free_memory ext_intel_device_id ext_intel_memory_clock_rate ext_intel_memory_bus_widthImages are not fully supported by the CUDA BE, their support is disabled by default. Their partial support can be activated by setting SYCL_PI_CUDA_ENABLE_IMAGE_SUPPORT environment variable at runtime.
ext_oneapi_bindless_images ext_oneapi_bindless_images_shared_usm ext_oneapi_bindless_images_1d_usm ext_oneapi_bindless_images_2d_usm ext_oneapi_external_memory_import ext_oneapi_external_semaphore_import ext_oneapi_mipmap ext_oneapi_mipmap_anisotropy ext_oneapi_mipmap_level_reference ext_oneapi_ballot_group ext_oneapi_fixed_size_group ext_oneapi_opportunistic_group ext_oneapi_graph ext_oneapi_limited_graph ext_oneapi_cubemap ext_oneapi_cubemap_seamless_filtering ext_oneapi_bindless_sampled_image_fetch_1d_usm ext_oneapi_bindless_sampled_image_fetch_2d_usm ext_oneapi_bindless_sampled_image_fetch_2d ext_oneapi_bindless_sampled_image_fetch_3d ext_oneapi_queue_profiling_tag ext_oneapi_virtual_mem ext_oneapi_image_array ext_oneapi_unique_addressing_per_dim ext_oneapi_bindless_images_sample_1d_usm ext_oneapi_bindless_images_sample_2d_usm
info::device::sub_group_sizes: 32
Architecture: nvidia_gpu_sm_89
default_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 4090 8.9 [CUDA 12.6]
accelerator_selector() : No device of requested type available. Please chec...
cpu_selector() : cpu, Intel(R) OpenCL, AMD Ryzen 9 3900X 12-Core Processor OpenCL 3.0 (Build 0) [2024.18.10.0.08_160000]
gpu_selector() : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 4090 8.9 [CUDA 12.6]
custom_selector(gpu) : gpu, NVIDIA CUDA BACKEND, NVIDIA GeForce RTX 4090 8.9 [CUDA 12.6]
custom_selector(cpu) : cpu, Intel(R) OpenCL, AMD Ryzen 9 3900X 12-Core Processor OpenCL 3.0 (Build 0) [2024.18.10.0.08_160000]
custom_selector(acc) : No device of requested type available. Please chec...
Additional context
No response