Skip to content
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
862c187
[SYCL][RTC] Initial support for device globals
jopperm Jan 9, 2025
e4ef41c
Add `ext_oneapi_has_device_global`.
jopperm Jan 10, 2025
d45415d
Add get_address, get_size, and attempt to design a copy-API.
jopperm Jan 13, 2025
e6d205b
Add symbols to dump (linux only for now)
jopperm Jan 13, 2025
4d0d08d
Drop proposed copy methods on kernel_bundle
jopperm Jan 13, 2025
03c24ca
Add proper device check
jopperm Jan 13, 2025
00dd1b2
Return USM pointer from ext_oneapi_get_device_global_address
jopperm Jan 13, 2025
1aaa3b8
Add missing methods on kernel_bundle
jopperm Jan 13, 2025
4b7dd5c
Add test for device_image_scope globals.
jopperm Jan 13, 2025
1f57684
Windows symbols
jopperm Jan 14, 2025
afba729
Move device global tests to separate file and mark unsupported on ope…
jopperm Jan 15, 2025
d1177c8
Bump sycl.hpp counter for added RTC test.
jopperm Jan 15, 2025
e96e7e0
Use bundle's context for adhoc queue.
jopperm Jan 15, 2025
5c529aa
Merge remote-tracking branch 'upstream/sycl' into rtc-devglobs
jopperm Jan 16, 2025
dd081f6
Fix unused variable in structured bindings for old GCC
jopperm Jan 16, 2025
e5df76d
Use unordered set
jopperm Jan 24, 2025
104123e
Merge remote-tracking branch 'upstream/sycl' into rtc-devglobs
jopperm Jan 24, 2025
880b5ad
Bump sycl.hpp counter again
jopperm Jan 24, 2025
a8868fb
Address feedback.
jopperm Jan 28, 2025
7ec7639
Merge remote-tracking branch 'upstream/sycl' into rtc-devglobs
jopperm Jan 28, 2025
e2e0cc7
Update REQUIRES and UNSUPPORTED tags in test
jopperm Jan 28, 2025
2ea6266
Merge remote-tracking branch 'upstream/sycl' into rtc-devglobs
jopperm Feb 28, 2025
6409766
Untangle destruction of device global map entries
jopperm Mar 12, 2025
ae8f617
Merge remote-tracking branch 'upstream/sycl' into rtc-devglobs
jopperm Mar 12, 2025
92dce3d
Drop device from queries
jopperm Mar 12, 2025
265eee1
Revert format change
jopperm Mar 12, 2025
a0ab174
Windows symbols
jopperm Mar 12, 2025
62d223b
Update sycl/test/e2e_test_requirements/no_sycl_hpp_in_e2e_tests.cpp
jopperm Mar 12, 2025
1be7568
Fail a bit earlier and more consistently
jopperm Mar 12, 2025
46261f0
Merge remote-tracking branch 'upstream/sycl' into rtc-devglobs
jopperm Mar 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions sycl-jit/jit-compiler/lib/rtc/DeviceCompilation.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -513,8 +513,9 @@ jit_compiler::performPostLink(std::unique_ptr<llvm::Module> Module,
/*IROutputOnly=*/false, EmitOnlyKernelsAsEntryPoints);
assert(Splitter->hasMoreSplits());

// TODO: Call `verifyNoCrossModuleDeviceGlobalUsage` if device globals shall
// be processed.
if (auto Err = Splitter->verifyNoCrossModuleDeviceGlobalUsage()) {
return std::move(Err);
}

// TODO: This allocation assumes that there are no further splits required,
// i.e. there are no mixed SYCL/ESIMD modules.
Expand Down Expand Up @@ -557,7 +558,7 @@ jit_compiler::performPostLink(std::unique_ptr<llvm::Module> Module,
GlobalBinImageProps PropReq{
/*EmitKernelParamInfo=*/true, /*EmitProgramMetadata=*/true,
/*EmitExportedSymbols=*/true, /*EmitImportedSymbols=*/true,
/*DeviceGlobals=*/false};
/*DeviceGlobals=*/true};
PropertySetRegistry Properties =
computeModuleProperties(MDesc.getModule(), MDesc.entries(), PropReq);
// TODO: Manually add `compile_target` property as in
Expand Down
60 changes: 60 additions & 0 deletions sycl/include/sycl/kernel_bundle.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,21 @@ class __SYCL_EXPORT kernel_bundle_plain {
return ext_oneapi_get_kernel(detail::string_view{name});
}

bool ext_oneapi_has_device_global(const std::string &name,
const device &dev) {
return ext_oneapi_has_device_global(detail::string_view{name}, dev);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the indirection via string_view?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the design for ext_oneapi_has_kernel, which underwent ABI neutralisation (cf. #13447).

}

void *ext_oneapi_get_device_global_address(const std::string &name,
const device &dev) {
return ext_oneapi_get_device_global_address(detail::string_view{name}, dev);
}

size_t ext_oneapi_get_device_global_size(const std::string &name,
const device &dev) {
return ext_oneapi_get_device_global_size(detail::string_view{name}, dev);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: if possible, it would be nice to mark these methods as const.

Probably it's better to add documentation for new APIs first so that feedback from spec reviewers can be taken into account.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems to be possible. Currently I have declared these functions following the other new members of the kernel_compiler extension. I hope to start on the extension spec soon; I can bring the constness discussion over to that PR if you'd like.


protected:
// \returns a kernel object which represents the kernel identified by
// kernel_id passed
Expand Down Expand Up @@ -229,6 +244,13 @@ class __SYCL_EXPORT kernel_bundle_plain {
private:
bool ext_oneapi_has_kernel(detail::string_view name);
kernel ext_oneapi_get_kernel(detail::string_view name);

bool ext_oneapi_has_device_global(detail::string_view name,
const device &dev);
void *ext_oneapi_get_device_global_address(detail::string_view name,
const device &dev);
size_t ext_oneapi_get_device_global_size(detail::string_view name,
const device &dev);
};

} // namespace detail
Expand Down Expand Up @@ -449,6 +471,44 @@ class kernel_bundle : public detail::kernel_bundle_plain,
return detail::kernel_bundle_plain::ext_oneapi_get_kernel(name);
}

/////////////////////////
// ext_oneapi_has_device_global
// only true if created from source and has this global for the given device
/////////////////////////
template <bundle_state _State = State,
typename = std::enable_if_t<_State == bundle_state::executable>>
bool ext_oneapi_has_device_global(const std::string &name,
const device &dev) {
return detail::kernel_bundle_plain::ext_oneapi_has_device_global(name, dev);
}

/////////////////////////
// ext_oneapi_get_device_global_address
// kernel_bundle must be created from source, throws if device global is not
// present for the given device, or has `device_image_scope` property.
// Returns a USM pointer to the variable's allocation on the device.
/////////////////////////
template <bundle_state _State = State,
typename = std::enable_if_t<_State == bundle_state::executable>>
void *ext_oneapi_get_device_global_address(const std::string &name,
const device &dev) {
return detail::kernel_bundle_plain::ext_oneapi_get_device_global_address(
name, dev);
}

/////////////////////////
// ext_oneapi_get_device_global_size
// kernel_bundle must be created from source, throws if device global is not
// present for the given device. Returns size in bytes.
/////////////////////////
template <bundle_state _State = State,
typename = std::enable_if_t<_State == bundle_state::executable>>
size_t ext_oneapi_get_device_global_size(const std::string &name,
const device &dev) {
return detail::kernel_bundle_plain::ext_oneapi_get_device_global_size(name,
dev);
}

private:
kernel_bundle(detail::KernelBundleImplPtr Impl)
: kernel_bundle_plain(std::move(Impl)) {}
Expand Down
7 changes: 6 additions & 1 deletion sycl/source/detail/jit_compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1150,14 +1150,19 @@ sycl_device_binaries jit_compiler::createDeviceBinaryImage(
}

for (const auto &FPS : DevImgInfo.Properties) {
bool IsDeviceGlobalsPropSet =
FPS.Name == __SYCL_PROPERTY_SET_SYCL_DEVICE_GLOBALS;
PropertySetContainer PropSet{FPS.Name.c_str()};
for (const auto &FPV : FPS.Values) {
if (FPV.IsUIntValue) {
PropSet.addProperty(
PropertyContainer{FPV.Name.c_str(), FPV.UIntValue});
} else {
std::string PrefixedName =
(IsDeviceGlobalsPropSet ? OffloadEntryPrefix : "") +
FPV.Name.c_str();
PropSet.addProperty(PropertyContainer{
FPV.Name.c_str(), FPV.Bytes.begin(), FPV.Bytes.size(),
PrefixedName.c_str(), FPV.Bytes.begin(), FPV.Bytes.size(),
sycl_property_type::SYCL_PROPERTY_TYPE_BYTE_ARRAY});
}
}
Expand Down
150 changes: 144 additions & 6 deletions sycl/source/detail/kernel_bundle_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@
#include <cstdint>
#include <cstring>
#include <memory>
#include <unordered_set>
#include <vector>

#include "split_string.hpp"
Expand Down Expand Up @@ -380,8 +381,9 @@ class kernel_bundle_impl {
// program manager integration, only for sycl_jit language
kernel_bundle_impl(context Ctx, std::vector<device> Devs,
const std::vector<kernel_id> &KernelIDs,
std::vector<std::string> KNames, std::string Pfx,
syclex::source_language Lang)
const std::vector<std::string> &KNames,
const std::vector<std::string> &DGNames,
const std::string &Pfx, syclex::source_language Lang)
: kernel_bundle_impl(Ctx, Devs, KernelIDs, bundle_state::executable) {
assert(Lang == syclex::source_language::sycl_jit);
// Mark this bundle explicitly as "interop" to ensure that its kernels are
Expand All @@ -391,6 +393,7 @@ class kernel_bundle_impl {
// from the (unprefixed) kernel name.
MIsInterop = true;
KernelNames = KNames;
DeviceGlobalNames = DGNames;
Prefix = Pfx;
Language = Lang;
}
Expand Down Expand Up @@ -509,17 +512,65 @@ class kernel_bundle_impl {
// `jit_compiler::compileSYCL(..)` uses `CompilationID + '$'` as prefix
// for offload entry names.
std::string Prefix = CompilationID + '$';
auto PrefixLen = Prefix.length();
for (const auto &KernelID : PM.getAllSYCLKernelIDs()) {
std::string_view KernelName{KernelID.get_name()};
if (KernelName.find(Prefix) == 0) {
KernelIDs.push_back(KernelID);
KernelName.remove_prefix(Prefix.length());
KernelName.remove_prefix(PrefixLen);
KernelNames.emplace_back(KernelName);
}
}

return std::make_shared<kernel_bundle_impl>(
MContext, MDevices, KernelIDs, KernelNames, Prefix, Language);
// Determine IDs of all device globals referenced by this bundle's
// kernels. These IDs are also prefixed.
std::unordered_set<std::string> DeviceGlobalIDSet;
std::vector<std::string> DeviceGlobalIDVec;
std::vector<std::string> DeviceGlobalNames;
for (const auto &RawImg : PM.getRawDeviceImages(KernelIDs)) {
for (const auto &DeviceGlobalProp : RawImg->getDeviceGlobals()) {
std::string_view DeviceGlobalName{DeviceGlobalProp->Name};
assert(DeviceGlobalName.find(Prefix) == 0);
bool Inserted = false;
std::tie(std::ignore, Inserted) =
DeviceGlobalIDSet.emplace(DeviceGlobalName);
if (Inserted) {
DeviceGlobalIDVec.emplace_back(DeviceGlobalName);
DeviceGlobalName.remove_prefix(PrefixLen);
DeviceGlobalNames.emplace_back(DeviceGlobalName);
}
}
}

// Create the executable bundle.
auto ExecBundle = std::make_shared<kernel_bundle_impl>(
MContext, MDevices, KernelIDs, KernelNames, DeviceGlobalNames, Prefix,
Language);

// Device globals are usually statically allocated and registered in the
// integration footer, which we don't have in the RTC context. Instead, we
// dynamically allocate storage tied to the executable kernel bundle.
for (auto *DeviceGlobalEntry :
PM.getDeviceGlobalEntries(DeviceGlobalIDVec)) {

size_t AllocSize = DeviceGlobalEntry->MDeviceGlobalTSize; // init value
if (!DeviceGlobalEntry->MIsDeviceImageScopeDecorated) {
// USM pointer. TODO: it's actually a decorated multi_ptr.
AllocSize += sizeof(void *);
}
auto Alloc = std::make_unique<std::byte[]>(AllocSize);
std::string_view DeviceGlobalName{DeviceGlobalEntry->MUniqueId};
PM.addOrInitDeviceGlobalEntry(Alloc.get(), DeviceGlobalName.data());
ExecBundle->DeviceGlobalAllocations.push_back(std::move(Alloc));

// Drop the RTC prefix from the entry's symbol name. Note that the PM
// still manages this device global under its prefixed name.
assert(DeviceGlobalName.find(Prefix) == 0);
DeviceGlobalName.remove_prefix(PrefixLen);
DeviceGlobalEntry->MUniqueId = DeviceGlobalName;
}

return ExecBundle;
}

ur_program_handle_t UrProgram = nullptr;
Expand Down Expand Up @@ -626,6 +677,8 @@ class kernel_bundle_impl {
KernelNames, Language);
}

// Utility methods for kernel_compiler functionality
private:
std::string adjust_kernel_name(const std::string &Name,
syclex::source_language Lang) {
// Once name demangling support is in, we won't need this.
Expand All @@ -637,6 +690,54 @@ class kernel_bundle_impl {
return isMangled ? Name : "__sycl_kernel_" + Name;
}

std::string mangle_device_global_name(const std::string &Name) {
// TODO: Support device globals declared in namespaces.
return "_Z" + std::to_string(Name.length()) + Name;
}

bool is_valid_device(const device &DeviceCand) {
// Check if the device is in this bundle's list of devices.
if (std::count(MDevices.begin(), MDevices.end(), DeviceCand)) {
return true;
}

// Otherwise, if the device candidate is a sub-device it is also valid if
// its parent is valid.
if (!getSyclObjImpl(DeviceCand)->isRootDevice()) {
try {
return is_valid_device(
DeviceCand.get_info<info::device::parent_device>());
} catch (std::exception &e) {
__SYCL_REPORT_EXCEPTION_TO_STREAM("exception in is_valid_device", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This __SYCL_REPORT_EXCEPTION_TO_STREAM macro is usually used in noexcept functions (like destructors) when we have no other option. It doesn't seem needed here. Why not let the exception go free so that the caller can decide? If your intention is to suppress it, then I'd recommend doing that and leaving a comment why.

Also, that macro will assert(false) in ndef NDEBUG , which will bring the app to a halt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I totally missed that. The blueprint for that method came from device_bundle_impl::has_kernel, which indeed is noexpect.

}
}
return false;
}

DeviceGlobalMapEntry *get_device_global_entry(const std::string &Name,
const device &Dev) {
if (Language != syclex::source_language::sycl_jit || Prefix.empty()) {
throw sycl::exception(make_error_code(errc::invalid),
"Querying device globals by name is only available "
"in kernel_bundles successfully built from "
"kernel_bundle<bundle_state:ext_oneapi_source> "
"with 'sycl_jit' source language.");
}

if (!ext_oneapi_has_device_global(Name, Dev)) {
throw sycl::exception(make_error_code(errc::invalid),
"device global '" + Name +
"' not found in kernel_bundle");
}

std::vector<DeviceGlobalMapEntry *> Entries =
ProgramManager::getInstance().getDeviceGlobalEntries(
{Prefix + mangle_device_global_name(Name)});
assert(Entries.size() == 1);
return Entries.front();
}

public:
bool ext_oneapi_has_kernel(const std::string &Name) {
auto it = std::find(KernelNames.begin(), KernelNames.end(),
adjust_kernel_name(Name, Language));
Expand Down Expand Up @@ -698,6 +799,40 @@ class kernel_bundle_impl {
return detail::createSyclObjFromImpl<kernel>(KernelImpl);
}

bool ext_oneapi_has_device_global(const std::string &Name,
const device &Dev) {
if (!is_valid_device(Dev)) {
return false;
}

std::string MangledName = mangle_device_global_name(Name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we currently check this part?

only true if kernel_bundle was created from source

Would that enable an early return, avoiding the linear search?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A non-RTC bundle would just have an empty list of names, so the linear search isn't really a problem, but we can at least detect that before doing the string operations (and map lookups in case of kernels); see 1be7568.

return std::find(DeviceGlobalNames.begin(), DeviceGlobalNames.end(),
MangledName) != DeviceGlobalNames.end();
}

void *ext_oneapi_get_device_global_address(const std::string &Name,
const device &Dev) {
DeviceGlobalMapEntry *Entry = get_device_global_entry(Name, Dev);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to check the validity of the device here, too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validity of device is checked in ext_oneapi_has_device_global, and get_device_global_entry throws an exception if ext_oneapi_has_device_global returns false. Is that sufficient or should the device validity be checked first?

if (Entry->MIsDeviceImageScopeDecorated) {
throw sycl::exception(make_error_code(errc::invalid),
"Cannot query USM pointer for device global with "
"'device_image_scope' property");
}

// TODO: Is this the right approach? Should we just pass the queue as an
// argument?
queue InitQueue{MContext, Dev};
auto &USMMem =
Entry->getOrAllocateDeviceGlobalUSM(getSyclObjImpl(InitQueue));
InitQueue.wait_and_throw();
return USMMem.getPtr();
}

size_t ext_oneapi_get_device_global_size(const std::string &Name,
const device &Dev) {
return get_device_global_entry(Name, Dev)->MDeviceGlobalTSize;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to check the validity of the device here, too?

}

bool empty() const noexcept { return MDeviceImages.empty(); }

backend get_backend() const noexcept {
Expand Down Expand Up @@ -956,10 +1091,13 @@ class kernel_bundle_impl {
// Language is for both state::source and state::executable.
syclex::source_language Language = syclex::source_language::opencl;
const std::variant<std::string, std::vector<std::byte>> Source;
// only kernel_bundles created from source have KernelNames member.
// only kernel_bundles created from source have the following members.
std::vector<std::string> KernelNames;
std::vector<std::string> DeviceGlobalNames;
std::string Prefix;
include_pairs_t IncludePairs;

std::vector<std::unique_ptr<std::byte[]>> DeviceGlobalAllocations;
};

} // namespace detail
Expand Down
16 changes: 16 additions & 0 deletions sycl/source/kernel_bundle.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,22 @@ kernel kernel_bundle_plain::ext_oneapi_get_kernel(detail::string_view name) {
return impl->ext_oneapi_get_kernel(name.data(), impl);
}

bool kernel_bundle_plain::ext_oneapi_has_device_global(detail::string_view name,
const device &dev) {
return impl->ext_oneapi_has_device_global(name.data(), dev);
}

void *kernel_bundle_plain::ext_oneapi_get_device_global_address(
detail::string_view name, const device &dev) {
return impl->ext_oneapi_get_device_global_address(name.data(), dev);
}

size_t
kernel_bundle_plain::ext_oneapi_get_device_global_size(detail::string_view name,
const device &dev) {
return impl->ext_oneapi_get_device_global_size(name.data(), dev);
}

//////////////////////////////////
///// sycl::detail free functions
//////////////////////////////////
Expand Down
Loading
Loading