[mlir][Target] Support Fatbin target for static nvptxcompiler #118044

MikaOvO · 2024-11-29T03:12:13Z

Background

In lib/Target/LLVM/NVVM/Target.cpp, NVPTXSerializer compile PTX to binary with two different flows controlled by MLIR_ENABLE_NVPTXCOMPILER.

If building mlir with -DMLIR_ENABLE_NVPTXCOMPILER=ON, the flow does not check if the target is gpu::CompilationTarget::Fatbin, and compile PTX to cubin directly, which is not consistent with another flow.

Implement

Use static nvfatbin library.

I have tested it locally, the two flows can return the same Fatbin result after inputing the same GpuModule.

llvmbot · 2024-11-29T03:12:45Z

@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Zichen Lu (MikaOvO)

Changes

In lib/Target/LLVM/NVVM/Target.cpp, NVPTXSerializer compile PTX to binary with two different flows controlled by MLIR_ENABLE_NVPTXCOMPILER.

If building mlir with -DMLIR_ENABLE_NVPTXCOMPILER=ON, the flow does not check if the target is gpu::CompilationTarget::Fatbin, and compile PTX to cubin directly, which is not consistent with another flow.

I have tested it locally, the two flows can return the same Fatbin result after inputing the same GpuModule.

Full diff: https://github.com/llvm/llvm-project/pull/118044.diff

2 Files Affected:

(modified) mlir/lib/Target/LLVM/CMakeLists.txt (+14-2)
(modified) mlir/lib/Target/LLVM/NVVM/Target.cpp (+41)

diff --git a/mlir/lib/Target/LLVM/CMakeLists.txt b/mlir/lib/Target/LLVM/CMakeLists.txt
index 422f7e5fa7caec..4843377f1797fa 100644
--- a/mlir/lib/Target/LLVM/CMakeLists.txt
+++ b/mlir/lib/Target/LLVM/CMakeLists.txt
@@ -66,12 +66,14 @@ if ("NVPTX" IN_LIST LLVM_TARGETS_TO_BUILD)
       set(MLIR_CUDAToolkit_ROOT ${CUDAToolkit_LIBRARY_ROOT})
     endif()
 
-    # Add the `nvptxcompiler` library.
+    # Add the `nvptxcompiler` library and the `nvfatbin` library.
     if(MLIR_ENABLE_NVPTXCOMPILER)
       # Find the `nvptxcompiler` library.
       # TODO: Bump the MLIR CMake version to 3.25 and use `CUDA::nvptxcompiler_static`.
       find_library(MLIR_NVPTXCOMPILER_LIB_PATH nvptxcompiler_static
                   PATHS ${CUDAToolkit_LIBRARY_DIR} NO_DEFAULT_PATH)
+      find_library(MLIR_NVFATBIN_LIB_PATH nvfatbin_static
+                  PATHS ${CUDAToolkit_LIBRARY_DIR} NO_DEFAULT_PATH)
 
       # Fail if `nvptxcompiler_static` couldn't be found.
       if(MLIR_NVPTXCOMPILER_LIB_PATH STREQUAL "MLIR_NVPTXCOMPILER_LIB_PATH-NOTFOUND")
@@ -79,14 +81,24 @@ if ("NVPTX" IN_LIST LLVM_TARGETS_TO_BUILD)
                 "Requested using the `nvptxcompiler` library backend but it couldn't be found.")
       endif()
 
+      # Fail if `nvfatbin_static` couldn't be found.
+      if(MLIR_NVFATBIN_LIB_PATH STREQUAL "MLIR_NVFATBIN_LIB_PATH-NOTFOUND")
+        message(FATAL_ERROR
+                "Requested using the `nvfatbin` library backend but it couldn't be found.")
+      endif()    
+
       add_library(MLIR_NVPTXCOMPILER_LIB STATIC IMPORTED GLOBAL)
+      add_library(MLIR_NVFATBIN_LIB STATIC IMPORTED GLOBAL)
       # Downstream projects can modify this path and use it in CMake. For example:
       # add_library(MLIR_NVPTXCOMPILER_LIB STATIC IMPORTED GLOBAL)
       # set_property(TARGET MLIR_NVPTXCOMPILER_LIB PROPERTY IMPORTED_LOCATION ${...})  
       # where `...` is to be replaced with the path to the library.
       set_property(TARGET MLIR_NVPTXCOMPILER_LIB PROPERTY IMPORTED_LOCATION ${MLIR_NVPTXCOMPILER_LIB_PATH})
-      # Link against `nvptxcompiler_static`. TODO: use `CUDA::nvptxcompiler_static`.
+      set_property(TARGET MLIR_NVFATBIN_LIB PROPERTY IMPORTED_LOCATION ${MLIR_NVFATBIN_LIB_PATH})  
+      # Link against `nvptxcompiler_static` and `nvfatbin_static`. 
+      # TODO: use `CUDA::nvptxcompiler_static` and `CUDA::nvfatbin_static`.
       target_link_libraries(MLIRNVVMTarget PRIVATE MLIR_NVPTXCOMPILER_LIB)
+      target_link_libraries(MLIRNVVMTarget PRIVATE MLIR_NVFATBIN_LIB)
       target_include_directories(obj.MLIRNVVMTarget PUBLIC ${CUDAToolkit_INCLUDE_DIRS})
     endif()
   else()
diff --git a/mlir/lib/Target/LLVM/NVVM/Target.cpp b/mlir/lib/Target/LLVM/NVVM/Target.cpp
index bca26e3a0e84a9..1c5aa24dffc9d0 100644
--- a/mlir/lib/Target/LLVM/NVVM/Target.cpp
+++ b/mlir/lib/Target/LLVM/NVVM/Target.cpp
@@ -473,6 +473,18 @@ NVPTXSerializer::compileToBinary(const std::string &ptxCode) {
     }                                                                          \
   } while (false)
 
+#include <nvFatbin.h>
+
+#define RETURN_ON_NVFATBIN_ERROR(expr)                                         \
+  do {                                                                         \
+    auto result = (expr);                                                      \
+    if (result != nvFatbinResult::NVFATBIN_SUCCESS) {                          \
+      emitError(loc) << llvm::Twine(#expr).concat(" failed with error: ")      \
+                     << nvFatbinGetErrorString(result);                        \
+      return std::nullopt;                                                     \
+    }                                                                          \
+  } while (false)
+
 std::optional<SmallVector<char, 0>>
 NVPTXSerializer::compileToBinaryNVPTX(const std::string &ptxCode) {
   Location loc = getOperation().getLoc();
@@ -486,6 +498,11 @@ NVPTXSerializer::compileToBinaryNVPTX(const std::string &ptxCode) {
       targetOptions.tokenizeCmdOptions();
   cmdOpts.second.append(
       {"-arch", getTarget().getChip().data(), "--opt-level", optLevel.c_str()});
+  bool useFatbin32 = false;
+  for (const char *option : cmdOpts.second) {
+    if (StringRef(option) == "-32")
+      useFatbin32 = true;
+  }
 
   // Create the compiler handle.
   RETURN_ON_NVPTXCOMPILER_ERROR(
@@ -538,6 +555,30 @@ NVPTXSerializer::compileToBinaryNVPTX(const std::string &ptxCode) {
   });
 #undef DEBUG_TYPE
   RETURN_ON_NVPTXCOMPILER_ERROR(nvPTXCompilerDestroy(&compiler));
+
+  if (targetOptions.getCompilationTarget() == gpu::CompilationTarget::Fatbin) {
+    const char *cubinOpts[1] = {"-64"};
+    if (useFatbin32) {
+      cubinOpts[0] = {"-32"};
+    }
+    nvFatbinHandle handle;
+
+    auto chip = getTarget().getChip();
+    chip.consume_front("sm_");
+
+    RETURN_ON_NVFATBIN_ERROR(nvFatbinCreate(&handle, cubinOpts, 1));
+    RETURN_ON_NVFATBIN_ERROR(nvFatbinAddCubin(
+        handle, binary.data(), binary.size(), chip.data(), nullptr));
+    RETURN_ON_NVFATBIN_ERROR(nvFatbinAddPTX(
+        handle, ptxCode.data(), ptxCode.size(), chip.data(), nullptr, nullptr));
+
+    size_t fatbinSize;
+    RETURN_ON_NVFATBIN_ERROR(nvFatbinSize(handle, &fatbinSize));
+    SmallVector<char, 0> fatbin(fatbinSize, 0);
+    RETURN_ON_NVFATBIN_ERROR(nvFatbinGet(handle, (void *)fatbin.data()));
+    RETURN_ON_NVFATBIN_ERROR(nvFatbinDestroy(&handle));
+    return fatbin;
+  }
   return binary;
 }
 #endif // MLIR_ENABLE_NVPTXCOMPILER

MikaOvO · 2024-11-29T03:35:32Z

Should we add a MLIR_ENABLE_NVFATBIN?

Dinistro

Dropped a few drive by nits

mlir/lib/Target/LLVM/NVVM/Target.cpp

mlir/lib/Target/LLVM/CMakeLists.txt

github-actions · 2024-12-02T06:24:40Z

✅ With the latest revision this PR passed the C/C++ code formatter.

MikaOvO · 2024-12-02T06:27:35Z

Hi, I I added MLIR_ENABLE_NVPTXCOMPILER_NVFATBIN to prevent users who are using NVPTXCompiler from getting errors about not finding libnvfatbin_static.a.

Could you please help review this and provide suggestions on whether the macro RETURN_ON_NVFATBIN_ERROR needs to be changed to lambda? Thanks!
@joker-eph @grypp

joker-eph · 2024-12-05T21:22:49Z

Hi, I I added MLIR_ENABLE_NVPTXCOMPILER_NVFATBIN to prevent users who are using NVPTXCompiler from getting errors about not finding libnvfatbin_static.a.

What happens in this case:

If building mlir with -DMLIR_ENABLE_NVPTXCOMPILER=ON, the flow does not check if the target is gpu::CompilationTarget::Fatbin, and compile PTX to cubin directly, which is not consistent with another flow.

How do you preserve consistency when MLIR_ENABLE_NVPTXCOMPILER_NVFATBIN=OFF?

joker-eph · 2024-12-05T21:23:30Z

to prevent users who are using NVPTXCompiler from getting errors about not finding libnvfatbin_static.a.

Isn't libnvptx_compiler.a distributed alongside libnvfatbin_static.a? If yes, then which use-case is this addressing?

MikaOvO · 2024-12-06T03:21:28Z

How do you preserve consistency when MLIR_ENABLE_NVPTXCOMPILER_NVFATBIN=OFF?
Isn't libnvptx_compiler.a distributed alongside libnvfatbin_static.a? If yes, then which use-case is this addressing?

My concern is downstream projects that use custom extracted Cuda Toolkit (not extracted libnvfatbin_static.a) will directly cause the original build to fail after this MR if there is no MLIR_ENABLE_NVPTXCOMPILER_NVFATBIN.

Of course, this does not preserve consistency when MLIR_ENABLE_NVPTXCOMPILER_NVFATBIN=OFF.

joker-eph · 2024-12-06T05:09:48Z

My concern is downstream projects that use custom extracted Cuda Toolkit (not extracted libnvfatbin_static.a) will directly cause the original build to fail after this MR if there is no MLIR_ENABLE_NVPTXCOMPILER_NVFATBIN.

I don't see this as a problem? Seems like a legitimate break and they should update their flow.

Of course, this does not preserve consistency when MLIR_ENABLE_NVPTXCOMPILER_NVFATBIN=OFF.

That makes it something we should not support then IMO.
If someone care about this configuration, they could propose a patch to implement it fully I think (there is a command line version of the fatbin library I imagine?)

MikaOvO · 2024-12-06T05:24:24Z

You are right. I thought there are two approaches here:
The first is to remove MLIR_ENABLE_NVPTXCOMPILER_NVFATBIN directly, the other is using fatbinary (do you mean this command line version of the fatbin library?) when MLIR_ENABLE_NVPTXCOMPILER_NVFATBIN=OFF.

Which one is better? Thanks.

joker-eph · 2024-12-06T05:39:12Z

Either is fine, but I would take the simplest approach unless there is an actual identified need.

MikaOvO · 2024-12-10T06:47:48Z

I have fixed by removing MLIR_ENABLE_NVPTXCOMPILER_NVFATBIN, could you help review? Thanks! @joker-eph

MikaOvO requested a review from grypp as a code owner November 29, 2024 03:12

llvmbot added mlir:llvm mlir labels Nov 29, 2024

MikaOvO mentioned this pull request Nov 29, 2024

[mlir][nvvm] NVPTXSerializer does not support FATBIN target in compileToBinaryNVPTX flow #116747

Closed

Dinistro reviewed Nov 29, 2024

View reviewed changes

mlir/lib/Target/LLVM/NVVM/Target.cpp Outdated Show resolved Hide resolved

mlir/lib/Target/LLVM/NVVM/Target.cpp Show resolved Hide resolved

mlir/lib/Target/LLVM/NVVM/Target.cpp Outdated Show resolved Hide resolved

MikaOvO force-pushed the users/mikaovo/Support_Fatbin_target_for_static_nvptxcompiler branch from 05d10a6 to bbfcb48 Compare November 29, 2024 09:43

joker-eph reviewed Nov 29, 2024

View reviewed changes

mlir/lib/Target/LLVM/CMakeLists.txt Outdated Show resolved Hide resolved

MikaOvO force-pushed the users/mikaovo/Support_Fatbin_target_for_static_nvptxcompiler branch 2 times, most recently from 640d37c to 2b562b9 Compare December 2, 2024 06:21

MikaOvO force-pushed the users/mikaovo/Support_Fatbin_target_for_static_nvptxcompiler branch from 2b562b9 to 34456e0 Compare December 2, 2024 06:26

MikaOvO force-pushed the users/mikaovo/Support_Fatbin_target_for_static_nvptxcompiler branch 4 times, most recently from deb7ef5 to d6c59d4 Compare December 2, 2024 07:36

[mlir][Target] Support Fatbin target for static nvptxcompiler

66fea13

MikaOvO force-pushed the users/mikaovo/Support_Fatbin_target_for_static_nvptxcompiler branch from 91fdf32 to 66fea13 Compare December 10, 2024 06:01

joker-eph approved these changes Dec 10, 2024

View reviewed changes

joker-eph merged commit 4971e53 into llvm:main Dec 10, 2024
8 checks passed

[mlir][Target] Support Fatbin target for static nvptxcompiler #118044

[mlir][Target] Support Fatbin target for static nvptxcompiler #118044

Uh oh!

Conversation

MikaOvO commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Implement

Uh oh!

llvmbot commented Nov 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MikaOvO commented Nov 29, 2024

Uh oh!

Dinistro left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MikaOvO commented Dec 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joker-eph commented Dec 5, 2024

Uh oh!

joker-eph commented Dec 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MikaOvO commented Dec 6, 2024

Uh oh!

joker-eph commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MikaOvO commented Dec 6, 2024

Uh oh!

joker-eph commented Dec 6, 2024

Uh oh!

MikaOvO commented Dec 10, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

MikaOvO commented Nov 29, 2024 •

edited

Loading

llvmbot commented Nov 29, 2024 •

edited

Loading

github-actions bot commented Dec 2, 2024 •

edited

Loading

MikaOvO commented Dec 2, 2024 •

edited

Loading

joker-eph commented Dec 5, 2024 •

edited

Loading

joker-eph commented Dec 6, 2024 •

edited

Loading