Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
464174d
Implement backend content extension
lbushi25 Jan 14, 2025
217270c
Merge branch 'device_image_backend_content' of https://github.com/lbu…
lbushi25 Jan 15, 2025
776a3ed
Add tests for extension
lbushi25 Jan 15, 2025
3080741
Merge branch 'intel:sycl' into device_image_backend_content
lbushi25 Jan 15, 2025
12bdf3f
Change extension spec status section
lbushi25 Jan 15, 2025
a4f6372
Merge branch 'device_image_backend_content' of https://github.com/lbu…
lbushi25 Jan 15, 2025
c15a31a
Add more tests and fix ABI related failures
lbushi25 Jan 16, 2025
0ba58c7
Add checks for span feature
lbushi25 Jan 16, 2025
50018cb
Refactoring
lbushi25 Jan 17, 2025
b20f042
More refactoring
lbushi25 Jan 17, 2025
2c7b110
Formatting
lbushi25 Jan 17, 2025
b884eec
Merge branch 'intel:sycl' into device_image_backend_content
lbushi25 Jan 17, 2025
36f4095
Add windows symbols
lbushi25 Jan 17, 2025
5d4a6bf
Add check for std::span feature
lbushi25 Jan 17, 2025
273f6df
Add comments explaining limitations of free function kernel usage
lbushi25 Jan 17, 2025
192cf75
Add asserts for extra safety
lbushi25 Jan 17, 2025
5d7d500
Define feature macro for device image backend content
lbushi25 Jan 17, 2025
5050a3c
Improve testing logic
lbushi25 Jan 17, 2025
ab765f9
Make comments more helpful
lbushi25 Jan 18, 2025
e8be859
Enhance tests
lbushi25 Jan 20, 2025
fa68a24
Resolve merge
lbushi25 Jan 20, 2025
74c0772
Remove rogue binary file
lbushi25 Jan 20, 2025
78e7ae6
Improve L0_interop_test.cpp to avoid false positives
lbushi25 Jan 20, 2025
b2594e8
Use __has_include to guard the inclusion of <span>
lbushi25 Jan 21, 2025
5b88761
Update basic_test.cpp
lbushi25 Jan 21, 2025
ccdd15c
Apply feedback
lbushi25 Jan 22, 2025
e4914b8
Merge branch 'sycl' into device_image_backend_content
lbushi25 Jan 22, 2025
21f4a9b
Update sycl_symbols_linux.dump
lbushi25 Jan 22, 2025
91853e0
Update sycl_symbols_linux.dump
lbushi25 Jan 22, 2025
70c168a
Add windows symbols
lbushi25 Jan 23, 2025
a43e3b4
Merge branch 'device_image_backend_content' of https://github.com/lbu…
lbushi25 Jan 23, 2025
cc5189e
Apply suggestions
lbushi25 Jan 23, 2025
7b8458c
Fix symbols
lbushi25 Jan 23, 2025
ab163f7
Add c++20 flag to compilation of tests
lbushi25 Jan 23, 2025
247932e
Update negative_test.cpp
lbushi25 Jan 23, 2025
688521d
Update sycl_symbols_linux.dump
lbushi25 Jan 23, 2025
c1a5ea7
Update sycl_symbols_linux.dump
lbushi25 Jan 23, 2025
bb5b787
Update basic_test.cpp
lbushi25 Jan 23, 2025
a3f837d
Update basic_test.cpp
lbushi25 Jan 23, 2025
d6a2ca5
Modify spec and add windows symbols
lbushi25 Jan 23, 2025
d5c214c
Add windows symbols
lbushi25 Jan 23, 2025
587941f
Resolve merge conflict in spec
lbushi25 Jan 23, 2025
2e38117
Update basic_test.cpp
lbushi25 Jan 23, 2025
4ee145e
Update basic_test.cpp
lbushi25 Jan 23, 2025
3308fac
Simplify basic_test.cpp
lbushi25 Jan 23, 2025
bc695cf
Update sycl_ext_oneapi_device_image_backend_content.asciidoc
lbushi25 Jan 23, 2025
3b2691f
Make span const
lbushi25 Jan 23, 2025
cf94a3e
Merge branch 'device_image_backend_content' of https://github.com/lbu…
lbushi25 Jan 23, 2025
743c9a7
Fix test failures on HIP/CUDA
lbushi25 Jan 24, 2025
97637cf
Fix test failures on HIP/CUDA
lbushi25 Jan 24, 2025
afe72b8
Fix test failures on HIP/CUDA
lbushi25 Jan 24, 2025
d1f2644
Update sycl_symbols_windows.dump
lbushi25 Jan 24, 2025
910ce34
Add OpenCL interoperability test
lbushi25 Jan 28, 2025
d267155
Add OpenCL interoperability test
lbushi25 Jan 28, 2025
a9e1e21
Merge branch 'sycl' into device_image_backend_content
lbushi25 Jan 28, 2025
2c916f6
Add CUDA test
lbushi25 Jan 31, 2025
16186ba
Merge branch 'device_image_backend_content' of https://github.com/lbu…
lbushi25 Jan 31, 2025
77eda18
Add CUDA test
lbushi25 Jan 31, 2025
a29db25
Fix typo in CUDA test
lbushi25 Feb 5, 2025
248fc93
Update CUDA_interop_test.cpp
lbushi25 Feb 5, 2025
c666340
Merge branch 'intel:sycl' into device_image_backend_content
lbushi25 Feb 5, 2025
4ccce5a
Fix compilation error in CUDA_interop_test.cpp
lbushi25 Feb 5, 2025
2e4d7f8
Apply suggestions
lbushi25 Feb 6, 2025
9aa2184
Fix compilation error in L0 test
lbushi25 Feb 6, 2025
6a88f93
Adjust symbols
lbushi25 Feb 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -44,12 +44,10 @@ the SYCL specification refer to that revision.

== Status

This is a proposed extension specification, intended to gather community
feedback.
Interfaces defined in this specification may not be implemented yet or may be
in a preliminary state.
The specification itself may also change in incompatible ways before it is
finalized.
This is an experimental extension specification, intended to provide early
access to features and gather community feedback. Interfaces defined in this
specification are implemented in {dpcpp}, but they are not finalized and may
change incompatibly in future versions of {dpcpp} without prior notice.
*Shipping software products should not rely on APIs defined in this
specification.*

Expand Down Expand Up @@ -101,7 +99,7 @@ class device_image {
backend ext_oneapi_get_backend() const noexcept;
std::vector<std::byte> ext_oneapi_get_backend_content() const;
std::span<std::byte> ext_oneapi_get_backend_content_view() const; // Requires C++20
std::span<const std::byte> ext_oneapi_get_backend_content_view() const; // Requires C++20
/*...*/
};
Expand Down Expand Up @@ -148,15 +146,15 @@ See below for a description of the formats used by {dpcpp}.
a@
[source,c++]
----
std::span<std::byte> ext_oneapi_get_content_backend_view() const;
std::span<const std::byte> ext_oneapi_get_content_backend_view() const;
----
!====

Minimum C++ Version: {cpp}20

_Constraints:_ Available only when `State` is `bundle_state::executable`.

_Returns:_ A view of the raw backend content for this device image.
_Returns:_ An immutable view of the raw backend content for this device image.
The data behind this view has the same lifetime as the `device_image` object.
The format of this data is implementation-defined.
See below for a description of the formats used by {dpcpp}.
Expand Down
46 changes: 40 additions & 6 deletions sycl/include/sycl/kernel_bundle.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,15 @@
#include <sycl/ext/oneapi/properties/property.hpp> // build_options
#include <sycl/ext/oneapi/properties/property_value.hpp> // and log

#include <array> // for array
#include <cstddef> // for std::byte
#include <cstring> // for size_t, memcpy
#include <functional> // for function
#include <iterator> // for distance
#include <memory> // for shared_ptr, operator==, hash
#include <array> // for array
#include <cstddef> // for std::byte
#include <cstring> // for size_t, memcpy
#include <functional> // for function
#include <iterator> // for distance
#include <memory> // for shared_ptr, operator==, hash
#if __has_include(<span>)
#include <span>
#endif
#include <string> // for string
#include <type_traits> // for enable_if_t, remove_refer...
#include <utility> // for move
Expand Down Expand Up @@ -123,6 +126,13 @@ class __SYCL_EXPORT device_image_plain {

template <class T>
friend T detail::createSyclObjFromImpl(decltype(T::impl) ImplObj);

backend ext_oneapi_get_backend_impl() const noexcept;

#if (!defined(_HAS_STD_BYTE) || _HAS_STD_BYTE != 0)
std::pair<const std::byte *, const std::byte *>
ext_oneapi_get_backend_content_view_impl() const;
#endif // HAS_STD_BYTE
};
} // namespace detail

Expand All @@ -145,6 +155,30 @@ class device_image : public detail::device_image_plain,
return device_image_plain::has_kernel(KernelID, Dev);
}

backend ext_oneapi_get_backend() const noexcept {
return device_image_plain::ext_oneapi_get_backend_impl();
}

#if (!defined(_HAS_STD_BYTE) || _HAS_STD_BYTE != 0)
template <sycl::bundle_state T = State,
typename = std::enable_if_t<T == bundle_state::executable>>
std::vector<std::byte> ext_oneapi_get_backend_content() const {
const auto view =
device_image_plain::ext_oneapi_get_backend_content_view_impl();
return std::vector(view.first, view.second);
}

#ifdef __cpp_lib_span
template <sycl::bundle_state T = State,
typename = std::enable_if_t<T == bundle_state::executable>>
std::span<const std::byte> ext_oneapi_get_backend_content_view() const {
const auto view =
device_image_plain::ext_oneapi_get_backend_content_view_impl();
return std::span<const std::byte>{view.first, view.second};
}
#endif // __cpp_lib_span
#endif // _HAS_STD_BYTE

private:
device_image(detail::DeviceImageImplPtr Impl)
: device_image_plain(std::move(Impl)) {}
Expand Down
1 change: 1 addition & 0 deletions sycl/source/feature_test.hpp.in
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ inline namespace _V1 {
#define SYCL_EXT_ONEAPI_WORK_GROUP_SCRATCH_MEMORY 1
#define SYCL_EXT_ONEAPI_WORK_GROUP_STATIC 1
#define SYCL_EXT_ONEAPI_NUM_COMPUTE_UNITS 1
#define SYCL_EXT_ONEAPI_DEVICE_IMAGE_BACKEND_CONTENT 1
// In progress yet
#define SYCL_EXT_ONEAPI_ATOMIC16 0

Expand Down
13 changes: 13 additions & 0 deletions sycl/source/kernel_bundle.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,19 @@ ur_native_handle_t device_image_plain::getNative() const {
return impl->getNative();
}

backend device_image_plain::ext_oneapi_get_backend_impl() const noexcept {
return impl->get_context().get_backend();
}

std::pair<const std::byte *, const std::byte *>
device_image_plain::ext_oneapi_get_backend_content_view_impl() const {
return std::make_pair(
reinterpret_cast<const std::byte *>(
impl->get_bin_image_ref()->getRawData().BinaryStart),
reinterpret_cast<const std::byte *>(
impl->get_bin_image_ref()->getRawData().BinaryEnd));
}

////////////////////////////
///// kernel_bundle_plain
///////////////////////////
Expand Down
30 changes: 30 additions & 0 deletions sycl/test-e2e/DeviceImageBackendContent/CUDA_interop_test.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
// RUN: %{build} %cuda_options -o %t.out
// RUN: %{run} %t.out
// REQUIRES: cuda, cuda_dev_kit

#include <cuda.h>
#include <sycl/backend.hpp>
#include <sycl/detail/core.hpp>
#include <vector>

int main() {
sycl::queue q;
sycl::context ctxt = q.get_context();
sycl::kernel_id k_id = sycl::get_kernel_id<class mykernel>();
auto bundle =
sycl::get_kernel_bundle<sycl::bundle_state::executable>(ctxt, {k_id});
assert(!bundle.empty());
sycl::kernel krn = bundle.get_kernel(k_id);
sycl::buffer<int> buf(sycl::range<1>(1));
q.submit([&](sycl::handler &cgh) {
sycl::accessor acc(buf, cgh);
cgh.single_task<class mykernel>(krn, [=]() { acc[0] = 42; });
});
const auto img = *(bundle.begin());
const auto bytes = img.ext_oneapi_get_backend_content();
CUmodule m;
CUresult result =
cuModuleLoadData(&m, reinterpret_cast<const void *>(bytes.data()));
assert(result == CUDA_SUCCESS);
return 0;
}
94 changes: 94 additions & 0 deletions sycl/test-e2e/DeviceImageBackendContent/L0_interop_test.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
// REQUIRES: level_zero, level_zero_dev_kit, aspect-usm_shared_allocations
// RUN: %{build} %level_zero_options -fno-sycl-dead-args-optimization -o %t.out
// RUN: %{run} %t.out
//
#include <level_zero/ze_api.h>
#include <sycl/detail/core.hpp>
#include <sycl/ext/oneapi/backend/level_zero.hpp>
#include <sycl/ext/oneapi/free_function_queries.hpp>
#include <sycl/usm.hpp>
#include <vector>

namespace syclext = sycl::ext::oneapi;
namespace syclexp = sycl::ext::oneapi::experimental;

extern "C" SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(
(syclexp::nd_range_kernel<1>)) void iota(int *ptr) {
size_t id = syclext::this_work_item::get_nd_item<1>().get_global_linear_id();
ptr[id] = 42;
}

int main() {
sycl::queue q;
sycl::context ctxt = q.get_context();
sycl::device d = ctxt.get_devices()[0];
// The following ifndef is required due to a number of limitations of free
// function kernels. See CMPLRLLVM-61498.
// TODO: Remove it once these limitations are no longer there.
#ifndef __SYCL_DEVICE_ONLY__
// First, run the kernel using the SYCL API.
auto bundle = sycl::get_kernel_bundle<sycl::bundle_state::executable>(ctxt);
sycl::kernel_id iota_id = syclexp::get_kernel_id<iota>();
sycl::kernel k_iota = bundle.get_kernel(iota_id);
int *ptr = sycl::malloc_shared<int>(1, q);
*ptr = 0;
q.submit([&](sycl::handler &cgh) {
cgh.set_args(ptr);
cgh.parallel_for(sycl::nd_range{{1}, {1}}, k_iota);
}).wait();

// Now, run the kernel by first getting its image as an executable,
// making an L0 kernel out of it and then making a SYCL kernel out of
// the L0 kernel. Run this kernel on the SYCL API and verify
// that it has the same result as the kernel that was run directly on SYCL
// API. First, get a kernel bundle that contains the kernel "iota".
auto exe_bndl = sycl::get_kernel_bundle<sycl::bundle_state::executable>(
ctxt, {d},
[&](const sycl::device_image<sycl::bundle_state::executable> &img) {
return img.has_kernel(iota_id, d);
});
assert(!exe_bndl.empty());
std::vector<std::byte> bytes;
const sycl::device_image<sycl::bundle_state::executable> &img =
*(exe_bndl.begin());
bytes = img.ext_oneapi_get_backend_content();

auto ZeContext = sycl::get_native<sycl::backend::ext_oneapi_level_zero>(ctxt);
auto ZeDevice = sycl::get_native<sycl::backend::ext_oneapi_level_zero>(d);

ze_result_t status;
ze_module_desc_t moduleDesc = {
ZE_STRUCTURE_TYPE_MODULE_DESC,
nullptr,
ZE_MODULE_FORMAT_IL_SPIRV,
bytes.size(),
reinterpret_cast<unsigned char *>(bytes.data()),
nullptr,
nullptr};
ze_module_handle_t ZeModule;
status = zeModuleCreate(ZeContext, ZeDevice, &moduleDesc, &ZeModule, nullptr);
assert(status == ZE_RESULT_SUCCESS);

ze_kernel_desc_t kernelDesc = {ZE_STRUCTURE_TYPE_KERNEL_DESC, nullptr, 0,
"__sycl_kernel_iota"};
ze_kernel_handle_t ZeKernel;
status = zeKernelCreate(ZeModule, &kernelDesc, &ZeKernel);
assert(status == ZE_RESULT_SUCCESS);
sycl::kernel k_iota_twin =
sycl::make_kernel<sycl::backend::ext_oneapi_level_zero>(
{sycl::make_kernel_bundle<sycl::backend::ext_oneapi_level_zero,
sycl::bundle_state::executable>({ZeModule},
ctxt),
ZeKernel},
ctxt);
int *ptr_twin = sycl::malloc_shared<int>(1, q);
*ptr_twin = 1;
q.submit([&](sycl::handler &cgh) {
cgh.set_args(ptr_twin);
cgh.parallel_for(sycl::nd_range{{1}, {1}}, k_iota_twin);
}).wait();
assert(*ptr_twin == *ptr);
sycl::free(ptr, q);
sycl::free(ptr_twin, q);
#endif
}
81 changes: 81 additions & 0 deletions sycl/test-e2e/DeviceImageBackendContent/OCL_interop_test.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
// REQUIRES: opencl, opencl_icd, aspect-usm_shared_allocations
// RUN: %{build} %opencl_lib -fno-sycl-dead-args-optimization -o %t.out
// RUN: %{run} %t.out
//
#include <sycl/backend.hpp>
#include <sycl/detail/cl.h>
#include <sycl/detail/core.hpp>
#include <sycl/ext/oneapi/free_function_queries.hpp>
#include <sycl/usm.hpp>
#include <vector>

namespace syclext = sycl::ext::oneapi;
namespace syclexp = sycl::ext::oneapi::experimental;

extern "C" SYCL_EXT_ONEAPI_FUNCTION_PROPERTY(
(syclexp::nd_range_kernel<1>)) void iota(int *ptr) {
size_t id = syclext::this_work_item::get_nd_item<1>().get_global_linear_id();
ptr[id] = 42;
}

int main() {
sycl::queue q;
sycl::context ctxt = q.get_context();
sycl::device d = ctxt.get_devices()[0];
// The following ifndef is required due to a number of limitations of free
// function kernels. See CMPLRLLVM-61498.
// TODO: Remove it once these limitations are no longer there.
#ifndef __SYCL_DEVICE_ONLY__
// First, run the kernel using the SYCL API.

auto bundle = sycl::get_kernel_bundle<sycl::bundle_state::executable>(ctxt);
sycl::kernel_id iota_id = syclexp::get_kernel_id<iota>();
sycl::kernel k_iota = bundle.get_kernel(iota_id);

int *ptr = sycl::malloc_shared<int>(1, q);
*ptr = 0;
q.submit([&](sycl::handler &cgh) {
cgh.set_args(ptr);
cgh.parallel_for(sycl::nd_range{{1}, {1}}, k_iota);
}).wait();
// Now, run the kernel by first getting its image as an executable,
// making an OCL kernel out of it and then making a SYCL kernel out of
// the OCL kernel. Run this kernel on the SYCL API and verify
// that it has the same result as the kernel that was run directly on SYCL
// API. First, get a kernel bundle that contains the kernel "iota".
auto exe_bndl = sycl::get_kernel_bundle<sycl::bundle_state::executable>(
ctxt, {d},
[&](const sycl::device_image<sycl::bundle_state::executable> &img) {
return img.has_kernel(iota_id, d);
});
assert(!exe_bndl.empty());
std::vector<std::byte> bytes;
const sycl::device_image<sycl::bundle_state::executable> &img =
*(exe_bndl.begin());
bytes = img.ext_oneapi_get_backend_content();
std::cout << bytes.size() << std::endl;
auto clContext = sycl::get_native<sycl::backend::opencl>(ctxt);
auto clDevice = sycl::get_native<sycl::backend::opencl>(d);

cl_int status;
auto clProgram = clCreateProgramWithIL(
clContext, reinterpret_cast<unsigned char *>(bytes.data()), bytes.size(),
&status);
assert(status == CL_SUCCESS);
status = clBuildProgram(clProgram, 1, &clDevice, "", nullptr, nullptr);
assert(status == CL_SUCCESS);
auto clKernel = clCreateKernel(clProgram, "__sycl_kernel_iota", &status);
assert(status == CL_SUCCESS);
sycl::kernel k_iota_twin =
sycl::make_kernel<sycl::backend::opencl>(clKernel, ctxt);
int *ptr_twin = sycl::malloc_shared<int>(1, q);
*ptr_twin = 1;
q.submit([&](sycl::handler &cgh) {
cgh.set_args(ptr_twin);
cgh.parallel_for(sycl::nd_range{{1}, {1}}, k_iota_twin);
}).wait();
assert(*ptr_twin == *ptr);
sycl::free(ptr, q);
sycl::free(ptr_twin, q);
#endif
}
50 changes: 50 additions & 0 deletions sycl/test-e2e/DeviceImageBackendContent/basic_test.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
// RUN: %{build} -std=c++20 -o %t.out
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had some complains from Codeplay developers about such tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me to a thread where I can have a look at the concerns?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a link, try searching through git log with changes containing -std=++20

// RUN: %{run} %t.out

#include <sycl/detail/core.hpp>
#include <sycl/kernel_bundle.hpp>
#include <type_traits>

int main() {
sycl::queue q;
sycl::context ctxt = q.get_context();
sycl::buffer<int> buf(sycl::range<1>(1));
sycl::kernel_id k_id = sycl::get_kernel_id<class mykernel>();
auto bundle =
sycl::get_kernel_bundle<sycl::bundle_state::executable>(ctxt, {k_id});
assert(!bundle.empty());
sycl::kernel krn = bundle.get_kernel(k_id);
q.submit([&](sycl::handler &cgh) {
sycl::accessor acc(buf, cgh);
cgh.single_task<class mykernel>(krn, [=]() { acc[0] = 42; });
});
sycl::backend backend;
std::vector<std::byte> bytes;
#ifdef __cpp_lib_span
std::span<const std::byte> bytes_view;
#endif
for (const auto &img : bundle) {
// Check that all 3 functions of the api return correct types and compile.
// Furthermore, check that the backend corresponds to the backend of the
// bundle Check that the view of the content is indeed equal to the
// content.
static_assert(std::is_same_v<decltype(img.ext_oneapi_get_backend()),
decltype(backend)>);
static_assert(std::is_same_v<decltype(img.ext_oneapi_get_backend_content()),
decltype(bytes)>);
backend = img.ext_oneapi_get_backend();
assert(backend == bundle.get_backend());
bytes = img.ext_oneapi_get_backend_content();
#ifdef __cpp_lib_span
static_assert(
std ::is_same_v<decltype(img.ext_oneapi_get_backend_content_view()),
decltype(bytes_view)>);
bytes_view = img.ext_oneapi_get_backend_content_view();
assert(bytes_view.size() == bytes.size());
for (size_t i = 0; i < bytes.size(); ++i) {
assert(bytes[i] == bytes_view[i]);
}
#endif
}
return 0;
}
Loading