-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[Liboffload] Add function for checking ELF file device compatibility #156259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-offload Author: Jonas Greifenhain (cadivus) ChangesIntroduces a new API to check whether an ELF binary is compatible with a given device. Full diff: https://github.com/llvm/llvm-project/pull/156259.diff 2 Files Affected:
diff --git a/offload/liboffload/API/Device.td b/offload/liboffload/API/Device.td
index 5b54c79d83f9d..10931fa1ca807 100644
--- a/offload/liboffload/API/Device.td
+++ b/offload/liboffload/API/Device.td
@@ -129,3 +129,27 @@ def olGetDeviceInfoSize : Function {
Return<"OL_ERRC_INVALID_DEVICE">
];
}
+
+def olElfIsCompatibleWithDevice : Function {
+ let desc = "Checks if the given ELF binary is compatible with the specified device.";
+ let details = [
+ "This function determines whether an ELF image can be executed on the specified device."
+ ];
+ let params = [
+ Param<"ol_device_handle_t", "Device", "handle of the device to check against", PARAM_IN>,
+ Param<"const void*", "ElfData", "pointer to the ELF image data in memory", PARAM_IN>,
+ Param<"size_t", "ElfSize", "size in bytes of the ELF image", PARAM_IN>,
+ Param<"bool*", "IsCompatible", "set to true if the ELF is compatible, false otherwise", PARAM_OUT>
+ ];
+ let returns = [
+ Return<"OL_ERRC_INVALID_DEVICE", [
+ "If the provided device handle is invalid."
+ ]>,
+ Return<"OL_ERRC_INVALID_ARGUMENT", [
+ "If `ElfData` is null or `ElfSize` is zero."
+ ]>,
+ Return<"OL_ERRC_NULL_POINTER", [
+ "If `IsCompatible` is null."
+ ]>
+ ];
+}
diff --git a/offload/liboffload/src/OffloadImpl.cpp b/offload/liboffload/src/OffloadImpl.cpp
index 7e8e297831f45..a93dc064e839c 100644
--- a/offload/liboffload/src/OffloadImpl.cpp
+++ b/offload/liboffload/src/OffloadImpl.cpp
@@ -592,6 +592,27 @@ Error olGetDeviceInfoSize_impl(ol_device_handle_t Device,
return olGetDeviceInfoImplDetail(Device, PropName, 0, nullptr, PropSizeRet);
}
+Error olElfIsCompatibleWithDevice_impl(ol_device_handle_t Device,
+ const void *ElfData, size_t ElfSize,
+ bool *IsCompatible) {
+ GenericDeviceTy *DeviceTy = Device->Device;
+ int32_t DeviceId = DeviceTy->getDeviceId();
+ GenericPluginTy &DevicePlugin = DeviceTy->Plugin;
+
+ StringRef Image(reinterpret_cast<const char *>(ElfData), ElfSize);
+
+ Expected<bool> ResultOrErr = DevicePlugin.isELFCompatible(DeviceId, Image);
+ if (!ResultOrErr) {
+ consumeError(ResultOrErr.takeError());
+ return createOffloadError(
+ ErrorCode::INVALID_ARGUMENT,
+ "elf compatibility can not be checked for device");
+ }
+
+ *IsCompatible = *ResultOrErr;
+ return Error::success();
+}
+
Error olIterateDevices_impl(ol_device_iterate_cb_t Callback, void *UserData) {
for (auto &Platform : OffloadContext::get().Platforms) {
for (auto &Device : Platform.Devices) {
|
6ddb6d5
to
614943f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ELF is too specific, just make it an image and let the runtime check if it's LLVM-IR, PTX, ELF, SPIR-V, whatever.
Thanks for your feedback! How would you do that since my goal here is to call the |
Use the same type of entry point |
This is preparation for the language offloading (I can't create stacked PRs here since I can't push to user branches). I will use the function here: |
We should instead let |
Thanks for making this - We've had to work around this in UR by querying the device itself and branching on AMD/Nvidia. I agree with the prior comments that this should accept any image, rather than just ELF binaries. However, I also wonder if this is the best way of doing it. It'd be nice to allow users to check whether a backend supports a specific format before they load (or even compile) a binary. I think a better design might be two new device info queries:
Both of which returning arrays of what formats/targets they support in priority order. Then the user can query ahead of time and load only the appropriate binary. |
Expected<bool> ResultOrErr = DevicePlugin.isELFCompatible(DeviceId, Image); | ||
if (!ResultOrErr) { | ||
consumeError(ResultOrErr.takeError()); | ||
return createOffloadError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
createOffloadError
has an overload which allows it to accept another error.
You mean |
Yes, you can detect CUDA fatbins w/ magic bytes, though the difficulty is getting the appropriate ELF out since we rely on binary tooling, I wonder if there's some API for that on the NVIDIA side. |
@jhuber6 The code for the detection exists, I just need to move it:
If you disable a few assertions across LLVM, you can technically load Cuda fat binaries. To not just hack it, I think it needs many changes across ofloading. |
@jhuber6 I looked more into it. There is a cuda function that can load fat binaries. The problem is that we will never know which Cubin was loaded. llvm-project/offload/plugins-nextgen/common/src/RPC.cpp Lines 163 to 167 in cc9acb9
The ELF is needed here: llvm-project/offload/plugins-nextgen/common/src/GlobalHandler.cpp Lines 88 to 94 in 023a98c
How about moving this fatbin loading logic |
Yeah that's what I mentioned about possibly needing to extract. Might be able to do without it for fall back to looking up the loaded symbol later. But yes, we should support this natively for compatible platforms. |
Introduces a new API to check whether an ELF binary is compatible with a given device.