Skip to content

Conversation

cadivus
Copy link

@cadivus cadivus commented Aug 31, 2025

Introduces a new API to check whether an ELF binary is compatible with a given device.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot
Copy link
Member

llvmbot commented Aug 31, 2025

@llvm/pr-subscribers-offload

Author: Jonas Greifenhain (cadivus)

Changes

Introduces a new API to check whether an ELF binary is compatible with a given device.


Full diff: https://github.com/llvm/llvm-project/pull/156259.diff

2 Files Affected:

  • (modified) offload/liboffload/API/Device.td (+24)
  • (modified) offload/liboffload/src/OffloadImpl.cpp (+21)
diff --git a/offload/liboffload/API/Device.td b/offload/liboffload/API/Device.td
index 5b54c79d83f9d..10931fa1ca807 100644
--- a/offload/liboffload/API/Device.td
+++ b/offload/liboffload/API/Device.td
@@ -129,3 +129,27 @@ def olGetDeviceInfoSize : Function {
     Return<"OL_ERRC_INVALID_DEVICE">
   ];
 }
+
+def olElfIsCompatibleWithDevice : Function {
+  let desc = "Checks if the given ELF binary is compatible with the specified device.";
+  let details = [
+    "This function determines whether an ELF image can be executed on the specified device."
+  ];
+  let params = [
+    Param<"ol_device_handle_t", "Device", "handle of the device to check against", PARAM_IN>,
+    Param<"const void*", "ElfData", "pointer to the ELF image data in memory", PARAM_IN>,
+    Param<"size_t", "ElfSize", "size in bytes of the ELF image", PARAM_IN>,
+    Param<"bool*", "IsCompatible", "set to true if the ELF is compatible, false otherwise", PARAM_OUT>
+  ];
+  let returns = [
+    Return<"OL_ERRC_INVALID_DEVICE", [
+      "If the provided device handle is invalid."
+    ]>,
+    Return<"OL_ERRC_INVALID_ARGUMENT", [
+      "If `ElfData` is null or `ElfSize` is zero."
+    ]>,
+    Return<"OL_ERRC_NULL_POINTER", [
+      "If `IsCompatible` is null."
+    ]>
+  ];
+}
diff --git a/offload/liboffload/src/OffloadImpl.cpp b/offload/liboffload/src/OffloadImpl.cpp
index 7e8e297831f45..a93dc064e839c 100644
--- a/offload/liboffload/src/OffloadImpl.cpp
+++ b/offload/liboffload/src/OffloadImpl.cpp
@@ -592,6 +592,27 @@ Error olGetDeviceInfoSize_impl(ol_device_handle_t Device,
   return olGetDeviceInfoImplDetail(Device, PropName, 0, nullptr, PropSizeRet);
 }
 
+Error olElfIsCompatibleWithDevice_impl(ol_device_handle_t Device,
+                                       const void *ElfData, size_t ElfSize,
+                                       bool *IsCompatible) {
+  GenericDeviceTy *DeviceTy = Device->Device;
+  int32_t DeviceId = DeviceTy->getDeviceId();
+  GenericPluginTy &DevicePlugin = DeviceTy->Plugin;
+
+  StringRef Image(reinterpret_cast<const char *>(ElfData), ElfSize);
+
+  Expected<bool> ResultOrErr = DevicePlugin.isELFCompatible(DeviceId, Image);
+  if (!ResultOrErr) {
+    consumeError(ResultOrErr.takeError());
+    return createOffloadError(
+        ErrorCode::INVALID_ARGUMENT,
+        "elf compatibility can not be checked for device");
+  }
+
+  *IsCompatible = *ResultOrErr;
+  return Error::success();
+}
+
 Error olIterateDevices_impl(ol_device_iterate_cb_t Callback, void *UserData) {
   for (auto &Platform : OffloadContext::get().Platforms) {
     for (auto &Device : Platform.Devices) {

@cadivus cadivus force-pushed the llvm_upstream/elf-compatibility-check branch from 6ddb6d5 to 614943f Compare August 31, 2025 17:52
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ELF is too specific, just make it an image and let the runtime check if it's LLVM-IR, PTX, ELF, SPIR-V, whatever.

@cadivus
Copy link
Author

cadivus commented Aug 31, 2025

ELF is too specific, just make it an image and let the runtime check if it's LLVM-IR, PTX, ELF, SPIR-V, whatever.

Thanks for your feedback! How would you do that since my goal here is to call the isELFCompatible function of the device plugin?

@jhuber6
Copy link
Contributor

jhuber6 commented Aug 31, 2025

ELF is too specific, just make it an image and let the runtime check if it's LLVM-IR, PTX, ELF, SPIR-V, whatever.

Thanks for your feedback! How would you do that since my goal here is to call the isELFCompatible function of the device plugin?

Use the same type of entry point libomptarget uses. We only support LLVM-IR and ELF right now, but that will likely be expanded in the future. Though, a part of me is wondering if we even need this. We already have a load image function. Couldn't people just try to load the image and check the error? I guess this is necessary in cases where people would want to know if an image can run on a device, but not actually load it if it is.

@cadivus
Copy link
Author

cadivus commented Aug 31, 2025

Though, a part of me is wondering if we even need this.

This is preparation for the language offloading (I can't create stacked PRs here since I can't push to user branches). I will use the function here:
https://github.com/jdoerfert/llvm-project/blob/llvm_kernel_languages/offload/languages/kernel/src/LanguageRegistration.cpp

@jhuber6
Copy link
Contributor

jhuber6 commented Aug 31, 2025

Though, a part of me is wondering if we even need this.

This is preparation for the language offloading (I can't create stacked PRs here since I can't push to user branches). I will use the function here: https://github.com/jdoerfert/llvm-project/blob/llvm_kernel_languages/offload/languages/kernel/src/LanguageRegistration.cpp

We should instead let olCreateImage or whatever it's called consume a fat binary directly. If we specificlaly need a query then we have have olIsImageSupported or some other function name. Basically, we want a single entry point and rely on the runtime to detect the appropriate magic bits.

@RossBrunton
Copy link
Contributor

Thanks for making this - We've had to work around this in UR by querying the device itself and branching on AMD/Nvidia.

I agree with the prior comments that this should accept any image, rather than just ELF binaries.

However, I also wonder if this is the best way of doing it. It'd be nice to allow users to check whether a backend supports a specific format before they load (or even compile) a binary. I think a better design might be two new device info queries:

getDeviceInfo(OL_DEVICE_INFO_IMAGE_FORMAT) -> { IMAGE_FORMAT_ELF, IMAGE_FORMAT_ELF_BUNDLED }
getDeviceInfo(OL_DEVICE_INFO_IMAGE_TARGET) -> { IMAGE_TARGET_AMD }

Both of which returning arrays of what formats/targets they support in priority order. Then the user can query ahead of time and load only the appropriate binary.

@RossBrunton RossBrunton self-requested a review September 1, 2025 09:28
Expected<bool> ResultOrErr = DevicePlugin.isELFCompatible(DeviceId, Image);
if (!ResultOrErr) {
consumeError(ResultOrErr.takeError());
return createOffloadError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createOffloadError has an overload which allows it to accept another error.

@cadivus
Copy link
Author

cadivus commented Sep 2, 2025

We should instead let olCreateImage or whatever it's called consume a fat binary directly.

You mean olCreateProgram in the offload/liboffload/API/Program.td? So I should basically move the loading functions from https://github.com/jdoerfert/llvm-project/blob/llvm_kernel_languages/offload/languages/kernel/src/LanguageRegistration.cpp to there?

@jhuber6
Copy link
Contributor

jhuber6 commented Sep 2, 2025

Yes, you can detect CUDA fatbins w/ magic bytes, though the difficulty is getting the appropriate ELF out since we rely on binary tooling, I wonder if there's some API for that on the NVIDIA side.

@cadivus
Copy link
Author

cadivus commented Sep 2, 2025

@jhuber6 The code for the detection exists, I just need to move it:

const char *llvmRegisterFatBinary(const char *Binary) {

  const auto *FW = reinterpret_cast<const FatbinWrapperTy *>(Binary);
  // printf("%s : %i : %s (%p:%p) :: %i\n", __PRETTY_FUNCTION__, FW->Magic,
  //        FW->Data, FW->Data, FW->DataEnd, FW->Version);

  // printf("%s : %s : %lu\n", FW->Data, HIP_FATBIN_MAGIC_STR,
  //        HIP_FATBIN_MAGIC_STR_LEN);
  if (FW->Magic == 0x466243b1) {
    readTUFatbin(Binary, FW);
  } else if (FW->Magic == 0x48495046) {
    if (!memcmp(FW->Data, HIP_FATBIN_MAGIC_STR, HIP_FATBIN_MAGIC_STR_LEN))
      readHIPFatbinEntries(Binary, FW->Data);
    else
      readTUFatbin(Binary, FW);
  } else {
    fprintf(stderr, "Unknown fatbin format");
  }

  return Binary;
}

(https://github.com/jdoerfert/llvm-project/blob/3c8c45d17024356438278f48f9d9e894f96c8c4e/offload/languages/kernel/src/LanguageRegistration.cpp#L219)

If you disable a few assertions across LLVM, you can technically load Cuda fat binaries. To not just hack it, I think it needs many changes across ofloading.

@cadivus
Copy link
Author

cadivus commented Sep 2, 2025

@jhuber6 I looked more into it. There is a cuda function that can load fat binaries. The problem is that we will never know which Cubin was loaded.

RPCServerTy::isDeviceUsingRPC(plugin::GenericDeviceTy &Device,
plugin::GenericGlobalHandlerTy &Handler,
plugin::DeviceImageTy &Image) {
return Handler.isSymbolInImage(Device, Image, "__llvm_rpc_client");
}

The ELF is needed here:

bool GenericGlobalHandlerTy::isSymbolInImage(GenericDeviceTy &Device,
DeviceImageTy &Image,
StringRef SymName) {
// Get the ELF object file for the image. Notice the ELF object may already
// be created in previous calls, so we can reuse it. If this is unsuccessful
// just return false as we couldn't find it.
auto ELFObjOrErr = getELFObjectFile(Image);

How about moving this fatbin loading logic
https://github.com/jdoerfert/llvm-project/blob/3c8c45d17024356438278f48f9d9e894f96c8c4e/offload/languages/kernel/src/LanguageRegistration.cpp#L57
to olCreateProgram (and the same for HIP) and close this PR?

@jhuber6
Copy link
Contributor

jhuber6 commented Sep 2, 2025

Yeah that's what I mentioned about possibly needing to extract. Might be able to do without it for fall back to looking up the loaded symbol later. But yes, we should support this natively for compatible platforms.

@cadivus
Copy link
Author

cadivus commented Sep 4, 2025

@jhuber6 This PR implements the fat bin loading for CUDA: #156955
If the approach is fine, I will create a PR for AMD HIP too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants