Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion mlir/include/mlir/Dialect/GPU/Transforms/Passes.td
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,9 @@ def GpuModuleToBinaryPass
Option<"compilationTarget", "format", "std::string", [{"fatbin"}],
"The target representation of the compilation process.">,
Option<"elfSection", "section", "std::string", [{""}],
"ELF section where binary is to be located.">
"ELF section where binary is to be located.">,
Option<"dumpIntermediates", "dump-intermediates", "std::string", [{""}],
"Directory to dump intermediate artifacts (LLVM IR, device assembly).">
];
}

Expand Down
71 changes: 68 additions & 3 deletions mlir/lib/Dialect/GPU/Transforms/ModuleToBinary.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@

#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/StringSwitch.h"
#include "llvm/Support/FileSystem.h"
#include "llvm/Support/Path.h"
#include "llvm/Support/ToolOutputFile.h"

using namespace mlir;
using namespace mlir::gpu;
Expand All @@ -26,6 +29,26 @@ namespace mlir {
#include "mlir/Dialect/GPU/Transforms/Passes.h.inc"
} // namespace mlir

static LogicalResult
dumpToFile(Operation *op, StringRef dumpDir, const llvm::Twine &filename,
function_ref<void(llvm::raw_ostream &)> writeContent) {
if (dumpDir.empty())
return success();

llvm::SmallString<128> path(dumpDir);
llvm::sys::path::append(path, filename);

std::error_code ec;
llvm::ToolOutputFile output(path, ec, llvm::sys::fs::OF_None);
if (ec)
return op->emitError() << "Failed to create file '" << path
<< "': " << ec.message();

writeContent(output.os());
output.keep();
return success();
}

namespace {
class GpuModuleToBinaryPass
: public impl::GpuModuleToBinaryPassBase<GpuModuleToBinaryPass> {
Expand Down Expand Up @@ -64,11 +87,53 @@ void GpuModuleToBinaryPass::runOnOperation() {
SmallVector<Attribute> librariesToLink;
for (const std::string &path : linkFiles)
librariesToLink.push_back(StringAttr::get(&getContext(), path));

Operation *op = getOperation();

// Create dump directory if specified.
if (!dumpIntermediates.empty()) {
if (std::error_code ec =
llvm::sys::fs::create_directories(dumpIntermediates)) {
op->emitError() << "Failed to create dump directory '"
<< dumpIntermediates << "': " << ec.message();
return signalPassFailure();
}
}

// Create callbacks for dumping intermediate artifacts if requested.
auto initialIRCallback = [&](llvm::Module &mod) {
if (failed(
dumpToFile(op, dumpIntermediates, mod.getName() + ".initial.ll",
[&](llvm::raw_ostream &os) { mod.print(os, nullptr); })))
signalPassFailure();
Comment on lines +107 to +110
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we make callbacks always successful, ie. no signalPassFailure(), or we need to make callbacks return a LogicalResult and update everywhere, eg. https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Dialect/GPU/IR/CompilationInterfaces.h#L130-L133

IMO it's better making them return a LogicalResult.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either we make callbacks always successful, ie. no signalPassFailure(),

I don't quite see why? The LogicalResult does not necessarily seem relevant to the ModuleToObject::run logic right now.
The callbacks are setup here and the failure can be handled out-of-band.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this actually looks straightforward enough change, I can do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption of ModuleToObject logic is that these callbacks don't create invalid/failed states.
IMO if we allow them to trigger a signalPassFailure, ModuleToObject should fail, otherwise ModuleToObject could continue when it should abort.

In this particular case, while failing to dump it's unlikely to crash ModuleToObject, It's a waste of resources to let it continue if we know much earlier that the pass it's going to fail.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assumption of ModuleToObject logic is that these callbacks don't create invalid/failed states.
IMO if we allow them to trigger a signalPassFailure, ModuleToObject should fail, otherwise ModuleToObject could continue when it should abort.

I don't agree: from the perspective other the ModuleToObject logic. we want to preserve this assumption that these callbacks don't create invalid states.
The failure is unrelated to the ModuleToObject logic here, hence it should be transparent and handled in the pass.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the pass knows if ModuleToObject succeeds?

ModuleToObject isn't failable right now, but I'm failing to link the question to the current discussion actually?

Also, in a frail system (take at least disk almost full, low ram, SWAP full...), the initial callback can fail, put the system in a an even more fragile state. Since ModuleToObject never became aware, it will continue, create some temp files (both NVVM and ROCDL targets create files), and fail again, but with a higher potential to crash, and it will be harder to debug where the problem started.

I don't quite see a concern with this, LLVM isn't resilient to system issues such as "swap filing up" or "machine running out of memory" in general.

Copy link
Contributor

@fabianmcg fabianmcg Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is failable: https://github.com/llvm/llvm-project/blob/main/mlir/include/mlir/Target/LLVM/ModuleToObject.h#L44

(TODO for me, use FailureOr instead of std::optional)

While LLVM it's not resilient to the issues I mentioned. It makes for a better debugging/user experience to fail at the earliest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think making these callbacks fallible is good improvement regardless of this PR, and it doesn't looks like particularly complicated change, I can do it as separate PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have objections with the change depending on how it looks, I'm not convinced that these should be failable conceptually (in terms of what these APIs are meant to provide).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

};

auto linkedIRCallback = [&](llvm::Module &mod) {
if (failed(
dumpToFile(op, dumpIntermediates, mod.getName() + ".linked.ll",
[&](llvm::raw_ostream &os) { mod.print(os, nullptr); })))
signalPassFailure();
};

auto optimizedIRCallback = [&](llvm::Module &mod) {
if (failed(
dumpToFile(op, dumpIntermediates, mod.getName() + ".opt.ll",
[&](llvm::raw_ostream &os) { mod.print(os, nullptr); })))
signalPassFailure();
};

auto isaCallback = [&](StringRef isa) {
if (failed(dumpToFile(op, dumpIntermediates, "kernel.isa",
[&](llvm::raw_ostream &os) { os << isa; })))
signalPassFailure();
};

TargetOptions targetOptions(toolkitPath, librariesToLink, cmdOptions,
elfSection, *targetFormat, lazyTableBuilder);
elfSection, *targetFormat, lazyTableBuilder,
initialIRCallback, linkedIRCallback,
optimizedIRCallback, isaCallback);
if (failed(transformGpuModulesToBinaries(
getOperation(), OffloadingLLVMTranslationAttrInterface(nullptr),
targetOptions)))
op, OffloadingLLVMTranslationAttrInterface(nullptr), targetOptions)))
return signalPassFailure();
}

Expand Down
8 changes: 7 additions & 1 deletion mlir/lib/Target/LLVM/ModuleToObject.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,13 @@ LogicalResult ModuleToObject::loadBitcodeFilesFromList(

std::unique_ptr<llvm::Module>
ModuleToObject::translateToLLVMIR(llvm::LLVMContext &llvmContext) {
return translateModuleToLLVMIR(&getOperation(), llvmContext);
Operation &op = getOperation();
StringRef name = "LLVMDialectModule";
// Try to get nicer name from the operation.
if (auto symOp = dyn_cast<SymbolOpInterface>(op);
symOp && symOp.getNameAttr())
name = symOp.getNameAttr().getValue();
return translateModuleToLLVMIR(&op, llvmContext, name);
}

LogicalResult
Expand Down
10 changes: 9 additions & 1 deletion mlir/lib/Target/LLVM/ROCDL/Target.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,11 @@ SerializeGPUModuleBase::SerializeGPUModuleBase(
Operation &module, ROCDLTargetAttr target,
const gpu::TargetOptions &targetOptions)
: ModuleToObject(module, target.getTriple(), target.getChip(),
target.getFeatures(), target.getO()),
target.getFeatures(), target.getO(),
targetOptions.getInitialLlvmIRCallback(),
targetOptions.getLinkedLlvmIRCallback(),
targetOptions.getOptimizedLlvmIRCallback(),
targetOptions.getISACallback()),
target(target), toolkitPath(targetOptions.getToolkitPath()),
librariesToLink(targetOptions.getLibrariesToLink()) {

Expand Down Expand Up @@ -428,6 +432,10 @@ std::optional<SmallVector<char, 0>> SerializeGPUModuleBase::moduleToObjectImpl(
getOperation().emitError() << "failed translating the module to ISA";
return std::nullopt;
}

if (isaCallback)
isaCallback(serializedISA.value());

#define DEBUG_TYPE "serialize-to-isa"
LLVM_DEBUG({
llvm::dbgs() << "ISA for module: "
Expand Down
17 changes: 17 additions & 0 deletions mlir/test/Dialect/GPU/module-to-binary-nvvm-intermediates.mlir
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
// REQUIRES: host-supports-nvptx
// RUN: rm -rf %t
// RUN: mlir-opt %s --gpu-module-to-binary='format=isa dump-intermediates=%t' | FileCheck %s
// RUN: test -f %t/kernel_module.initial.ll
// RUN: test -f %t/kernel_module.linked.ll
// RUN: test -f %t/kernel_module.opt.ll
// RUN: test -f %t/kernel.isa

module attributes {gpu.container_module} {
// CHECK-LABEL: gpu.binary @kernel_module

gpu.module @kernel_module [#nvvm.target<chip = "sm_70">] {
llvm.func @kernel(%arg0: f32) {
llvm.return
}
}
}
17 changes: 17 additions & 0 deletions mlir/test/Dialect/GPU/module-to-binary-rocdl-intermediates.mlir
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
// REQUIRES: host-supports-amdgpu
// RUN: rm -rf %t
// RUN: mlir-opt %s --gpu-module-to-binary='format=isa dump-intermediates=%t' | FileCheck %s
// RUN: test -f %t/kernel_module.initial.ll
// RUN: test -f %t/kernel_module.linked.ll
// RUN: test -f %t/kernel_module.opt.ll
// RUN: test -f %t/kernel.isa

module attributes {gpu.container_module} {
// CHECK-LABEL: gpu.binary @kernel_module

gpu.module @kernel_module [#rocdl.target<chip = "gfx942">] {
llvm.func @kernel(%arg0: f32) {
llvm.return
}
}
}