Skip to content

Conversation

@jackalcooper
Copy link
Contributor

@jackalcooper jackalcooper commented Aug 13, 2025

This PR introduces a mechanism to defer JIT engine initialization, enabling registration of required symbols before global constructor execution.

Problem

Modules containing gpu.module generate global constructors (e.g., kernel load/unload) that execute during engine creation. This can force premature symbol resolution, causing failures when:

  • Symbols are registered via mlirExecutionEngineRegisterSymbol after creation
  • Global constructors exist (even if not directly using unresolved symbols, e.g., an external function declaration)
  • GPU modules introduce mandatory binary loading logic

Usage

// Create engine without initialization
MlirExecutionEngine jit = mlirExecutionEngineCreate(..., /*shouldInitialize=*/false);

// Register required symbols
mlirExecutionEngineRegisterSymbol(jit, ...);

// Explicitly initialize (runs global constructors)
mlirExecutionEngineInitialize(jit);

@llvmbot
Copy link
Member

llvmbot commented Aug 13, 2025

@llvm/pr-subscribers-mlir

@llvm/pr-subscribers-mlir-execution-engine

Author: Shenghang Tsai (jackalcooper)

Changes

This PR introduces a mechanism to defer JIT engine initialization, enabling registration of required symbols before global constructor execution.

Problem

Modules containing GPU operations generate global constructors (e.g., kernel load/unload) that execute during engine creation. This can force premature symbol resolution, causing failures when:

  • Symbols are registered via mlirExecutionEngineRegisterSymbol after creation
  • Global constructors exist (even if not directly using unresolved symbols, e.g., an external function declaration)
  • GPU modules are present due to their mandatory constructor logic

Usage

// Create engine without initialization
MlirExecutionEngine jit = mlirExecutionEngineCreate(..., /*shouldInitialize=*/false);

// Register required symbols
mlirExecutionEngineRegisterSymbol(jit, ...);

// Explicitly initialize (runs global constructors)
mlirExecutionEngineInitialize(jit);

Full diff: https://github.com/llvm/llvm-project/pull/153373.diff

6 Files Affected:

  • (modified) mlir/include/mlir-c/ExecutionEngine.h (+7-3)
  • (modified) mlir/include/mlir/ExecutionEngine/ExecutionEngine.h (+7)
  • (modified) mlir/lib/CAPI/ExecutionEngine/ExecutionEngine.cpp (+6-1)
  • (modified) mlir/lib/ExecutionEngine/ExecutionEngine.cpp (+15-8)
  • (modified) mlir/test/CAPI/execution_engine.c (+57-2)
  • (modified) mlir/unittests/ExecutionEngine/Invoke.cpp (+40)
diff --git a/mlir/include/mlir-c/ExecutionEngine.h b/mlir/include/mlir-c/ExecutionEngine.h
index 99cddc5c2598d..eee56ee032c97 100644
--- a/mlir/include/mlir-c/ExecutionEngine.h
+++ b/mlir/include/mlir-c/ExecutionEngine.h
@@ -42,9 +42,13 @@ DEFINE_C_API_STRUCT(MlirExecutionEngine, void);
 /// that will be loaded are specified via `numPaths` and `sharedLibPaths`
 /// respectively.
 /// TODO: figure out other options.
-MLIR_CAPI_EXPORTED MlirExecutionEngine mlirExecutionEngineCreate(
-    MlirModule op, int optLevel, int numPaths,
-    const MlirStringRef *sharedLibPaths, bool enableObjectDump);
+MLIR_CAPI_EXPORTED MlirExecutionEngine
+mlirExecutionEngineCreate(MlirModule op, int optLevel, int numPaths,
+                          const MlirStringRef *sharedLibPaths,
+                          bool enableObjectDump, bool shouldInitialize);
+
+/// Execute the global constructors from the module.
+MLIR_CAPI_EXPORTED void mlirExecutionEngineInitialize(MlirExecutionEngine jit);
 
 /// Destroy an ExecutionEngine instance.
 MLIR_CAPI_EXPORTED void mlirExecutionEngineDestroy(MlirExecutionEngine jit);
diff --git a/mlir/include/mlir/ExecutionEngine/ExecutionEngine.h b/mlir/include/mlir/ExecutionEngine/ExecutionEngine.h
index 96ccebcd5685e..49b5d11685f49 100644
--- a/mlir/include/mlir/ExecutionEngine/ExecutionEngine.h
+++ b/mlir/include/mlir/ExecutionEngine/ExecutionEngine.h
@@ -99,6 +99,9 @@ struct ExecutionEngineOptions {
   /// If `enablePerfNotificationListener` is set, the JIT compiler will notify
   /// the llvm's global Perf notification listener.
   bool enablePerfNotificationListener = true;
+
+  /// Setting initialize=false to defer initialization
+  bool shouldInitialize = true;
 };
 
 /// JIT-backed execution engine for MLIR. Assumes the IR can be converted to
@@ -227,6 +230,8 @@ class ExecutionEngine {
       llvm::function_ref<llvm::orc::SymbolMap(llvm::orc::MangleAndInterner)>
           symbolMap);
 
+  void initialize();
+
 private:
   /// Ordering of llvmContext and jit is important for destruction purposes: the
   /// jit must be destroyed before the context.
@@ -250,6 +255,8 @@ class ExecutionEngine {
   /// Destroy functions in the libraries loaded by the ExecutionEngine that are
   /// called when this ExecutionEngine is destructed.
   SmallVector<LibraryDestroyFn> destroyFns;
+
+  bool isInitialized = false;
 };
 
 } // namespace mlir
diff --git a/mlir/lib/CAPI/ExecutionEngine/ExecutionEngine.cpp b/mlir/lib/CAPI/ExecutionEngine/ExecutionEngine.cpp
index 306cebd236be9..624858c10eb7c 100644
--- a/mlir/lib/CAPI/ExecutionEngine/ExecutionEngine.cpp
+++ b/mlir/lib/CAPI/ExecutionEngine/ExecutionEngine.cpp
@@ -22,7 +22,7 @@ using namespace mlir;
 extern "C" MlirExecutionEngine
 mlirExecutionEngineCreate(MlirModule op, int optLevel, int numPaths,
                           const MlirStringRef *sharedLibPaths,
-                          bool enableObjectDump) {
+                          bool enableObjectDump, bool shouldInitialize) {
   static bool initOnce = [] {
     llvm::InitializeNativeTarget();
     llvm::InitializeNativeTargetAsmParser(); // needed for inline_asm
@@ -60,6 +60,7 @@ mlirExecutionEngineCreate(MlirModule op, int optLevel, int numPaths,
   jitOptions.jitCodeGenOptLevel = static_cast<llvm::CodeGenOptLevel>(optLevel);
   jitOptions.sharedLibPaths = libPaths;
   jitOptions.enableObjectDump = enableObjectDump;
+  jitOptions.shouldInitialize = shouldInitialize;
   auto jitOrError = ExecutionEngine::create(unwrap(op), jitOptions);
   if (!jitOrError) {
     consumeError(jitOrError.takeError());
@@ -68,6 +69,10 @@ mlirExecutionEngineCreate(MlirModule op, int optLevel, int numPaths,
   return wrap(jitOrError->release());
 }
 
+extern "C" void mlirExecutionEngineInitialize(MlirExecutionEngine jit) {
+  unwrap(jit)->initialize();
+}
+
 extern "C" void mlirExecutionEngineDestroy(MlirExecutionEngine jit) {
   delete (unwrap(jit));
 }
diff --git a/mlir/lib/ExecutionEngine/ExecutionEngine.cpp b/mlir/lib/ExecutionEngine/ExecutionEngine.cpp
index f704fbfbe8fff..7862cda3203c8 100644
--- a/mlir/lib/ExecutionEngine/ExecutionEngine.cpp
+++ b/mlir/lib/ExecutionEngine/ExecutionEngine.cpp
@@ -106,7 +106,7 @@ void ExecutionEngine::dumpToObjectFile(StringRef filename) {
   }
   // Compilation is lazy and it doesn't populate object cache unless requested.
   // In case object dump is requested before cache is populated, we need to
-  // force compilation manually. 
+  // force compilation manually.
   if (cache->isEmpty()) {
     for (std::string &functionName : functionNames) {
       auto result = lookupPacked(functionName);
@@ -400,13 +400,9 @@ ExecutionEngine::create(Operation *m, const ExecutionEngineOptions &options,
     return symbolMap;
   };
   engine->registerSymbols(runtimeSymbolMap);
-
-  // Execute the global constructors from the module being processed.
-  // TODO: Allow JIT initialize for AArch64. Currently there's a bug causing a
-  // crash for AArch64 see related issue #71963.
-  if (!engine->jit->getTargetTriple().isAArch64())
-    cantFail(engine->jit->initialize(engine->jit->getMainJITDylib()));
-
+  if (options.shouldInitialize) {
+    engine->initialize();
+  }
   return std::move(engine);
 }
 
@@ -451,3 +447,14 @@ Error ExecutionEngine::invokePacked(StringRef name,
 
   return Error::success();
 }
+
+void ExecutionEngine::initialize() {
+  if (isInitialized) {
+    return;
+  }
+  // TODO: Allow JIT initialize for AArch64. Currently there's a bug causing a
+  // crash for AArch64 see related issue #71963.
+  if (!jit->getTargetTriple().isAArch64())
+    cantFail(jit->initialize(jit->getMainJITDylib()));
+  isInitialized = true;
+}
diff --git a/mlir/test/CAPI/execution_engine.c b/mlir/test/CAPI/execution_engine.c
index 4751288c3ee4b..1bf6d5d72be6e 100644
--- a/mlir/test/CAPI/execution_engine.c
+++ b/mlir/test/CAPI/execution_engine.c
@@ -69,7 +69,7 @@ void testSimpleExecution(void) {
   mlirRegisterAllLLVMTranslations(ctx);
   MlirExecutionEngine jit = mlirExecutionEngineCreate(
       module, /*optLevel=*/2, /*numPaths=*/0, /*sharedLibPaths=*/NULL,
-      /*enableObjectDump=*/false);
+      /*enableObjectDump=*/false, /*shouldInitialize=*/true);
   if (mlirExecutionEngineIsNull(jit)) {
     fprintf(stderr, "Execution engine creation failed");
     exit(2);
@@ -125,7 +125,7 @@ void testOmpCreation(void) {
   // against the OpenMP library.
   MlirExecutionEngine jit = mlirExecutionEngineCreate(
       module, /*optLevel=*/2, /*numPaths=*/0, /*sharedLibPaths=*/NULL,
-      /*enableObjectDump=*/false);
+      /*enableObjectDump=*/false, /*shouldInitialize=*/true);
   if (mlirExecutionEngineIsNull(jit)) {
     fprintf(stderr, "Engine creation failed with OpenMP");
     exit(2);
@@ -137,6 +137,60 @@ void testOmpCreation(void) {
   mlirContextDestroy(ctx);
 }
 
+// Helper variable to track callback invocations
+static int initCnt = 0;
+
+// Callback function that will be called during JIT initialization
+static void initCallback(void) { initCnt += 1; }
+
+// CHECK-LABEL: Running test 'testGlobalCtorJitCallback'
+void testGlobalCtorJitCallback(void) {
+  MlirContext ctx = mlirContextCreate();
+  registerAllUpstreamDialects(ctx);
+
+  // Create module with global constructor that calls our callback
+  MlirModule module = mlirModuleCreateParse(
+      ctx, mlirStringRefCreateFromCString(
+               // clang-format off
+"module {                                                                       \n"
+"  llvm.mlir.global_ctors ctors = [@ctor], priorities = [0 : i32], data = [#llvm.zero] \n"
+"  llvm.func @ctor() {                                                          \n"
+"    func.call @init_callback() : () -> ()                                      \n"
+"    llvm.return                                                                \n"
+"  }                                                                            \n"
+"  func.func private @init_callback() attributes { llvm.emit_c_interface }      \n"
+"}                                                                              \n"
+               // clang-format on
+               ));
+
+  lowerModuleToLLVM(ctx, module);
+  mlirRegisterAllLLVMTranslations(ctx);
+
+  // Create execution engine with initialization disabled
+  MlirExecutionEngine jit = mlirExecutionEngineCreate(
+      module, /*optLevel=*/2, /*numPaths=*/0, /*sharedLibPaths=*/NULL,
+      /*enableObjectDump=*/false, /*shouldInitialize=*/false);
+
+  if (mlirExecutionEngineIsNull(jit)) {
+    fprintf(stderr, "Execution engine creation failed");
+    exit(2);
+  }
+
+  // Register callback symbol before initialization
+  mlirExecutionEngineRegisterSymbol(
+      jit, mlirStringRefCreateFromCString("_mlir_ciface_init_callback"),
+      (void *)(uintptr_t)initCallback);
+
+  mlirExecutionEngineInitialize(jit);
+
+  // CHECK: Init count: 1
+  printf("Init count: %d\n", initCnt);
+
+  mlirExecutionEngineDestroy(jit);
+  mlirModuleDestroy(module);
+  mlirContextDestroy(ctx);
+}
+
 int main(void) {
 
 #define _STRINGIFY(x) #x
@@ -147,5 +201,6 @@ int main(void) {
 
   TEST(testSimpleExecution);
   TEST(testOmpCreation);
+  TEST(testGlobalCtorJitCallback);
   return 0;
 }
diff --git a/mlir/unittests/ExecutionEngine/Invoke.cpp b/mlir/unittests/ExecutionEngine/Invoke.cpp
index 887db227cfc4b..f44fe76faff1b 100644
--- a/mlir/unittests/ExecutionEngine/Invoke.cpp
+++ b/mlir/unittests/ExecutionEngine/Invoke.cpp
@@ -316,4 +316,44 @@ TEST(NativeMemRefJit, MAYBE_JITCallback) {
     ASSERT_EQ(elt, coefficient * count++);
 }
 
+static int initCnt = 0;
+// A helper function that will be called during the JIT's initialization.
+static void initCallback() { initCnt += 1; }
+
+TEST(GlobalCtorJit, MAYBE_JITCallback) {
+  std::string moduleStr = R"mlir(
+  llvm.mlir.global_ctors ctors = [@ctor], priorities = [0 : i32], data = [#llvm.zero]
+  llvm.func @ctor() {
+    func.call @init_callback() : () -> ()
+    llvm.return
+  }
+  func.func private @init_callback() attributes { llvm.emit_c_interface }
+  )mlir";
+
+  DialectRegistry registry;
+  registerAllDialects(registry);
+  registerBuiltinDialectTranslation(registry);
+  registerLLVMDialectTranslation(registry);
+  MLIRContext context(registry);
+  auto module = parseSourceString<ModuleOp>(moduleStr, &context);
+  ASSERT_TRUE(!!module);
+  ASSERT_TRUE(succeeded(lowerToLLVMDialect(*module)));
+  ExecutionEngineOptions jitOptions;
+  // Defer the initialization to register symbols used in ctors.
+  jitOptions.shouldInitialize = false;
+  auto jitOrError = ExecutionEngine::create(*module, jitOptions);
+  ASSERT_TRUE(!!jitOrError);
+  auto jit = std::move(jitOrError.get());
+  // Define any extra symbols so they're available at initialization.
+  jit->registerSymbols([&](llvm::orc::MangleAndInterner interner) {
+    llvm::orc::SymbolMap symbolMap;
+    symbolMap[interner("_mlir_ciface_init_callback")] = {
+        llvm::orc::ExecutorAddr::fromPtr(initCallback),
+        llvm::JITSymbolFlags::Exported};
+    return symbolMap;
+  });
+  jit->initialize();
+  ASSERT_EQ(initCnt, 1);
+}
+
 #endif // _WIN32

bool enablePerfNotificationListener = true;

/// Setting initialize=false to defer initialization
bool shouldInitialize = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need a flag for this, seems to me that since we introduce a separate initialize() method we don't need an option and users can just call initialize() separately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we need a flag for this, seems to me that since we introduce a separate initialize() method we don't need an option and users can just call initialize() separately.

It is introduced to reduce the changes downstream has to make.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not the usual criteria of development in LLVM though: we aim to keep the code here maintainable and preserve our ability to innovate. Downstream know they need to adapt when we find design issues we fix upstream.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we lazily initialize() when needed if the user hasn't called initialize()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we lazily initialize() when needed if the user hasn't called initialize()?

Good idea. The shouldInitialize flag is removed. Inside invokePacked we initialize() (noop if already initialized).

[](PyExecutionEngine &executionEngine) {
mlirExecutionEngineInitialize(executionEngine.get());
},
"Initialize the ExecutionEngine.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand the doc? This doc isn't very useful as it just repeats the method name. We need to describe the expectations from the user point of view.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand the doc? This doc isn't very useful as it just repeats the method name. We need to describe the expectations from the user point of view.

Docs for C++, C, and Python has been version-synchronized, with added guidance for calling different 'register symbols' variants.

@joker-eph joker-eph changed the title [MLIR][GPU][CAPI] Deferred ExecutionEngine Initialization for Symbol Registration [MLIR] Split ExecutionEngine Initialization out of ctor into an explicit method call Aug 13, 2025
Copy link
Collaborator

@joker-eph joker-eph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@joker-eph joker-eph merged commit 2f93693 into llvm:main Aug 13, 2025
9 checks passed
@kstoimenov
Copy link
Contributor

I think this broke sanitizer bots: https://lab.llvm.org/buildbot/#/builders/24/builds/11523. Could you please take a look?

@joker-eph
Copy link
Collaborator

Reverting in #153477 to unbreak the bot during the investigation.
@jackalcooper please open a new PR with the fix when you have it.

joker-eph added a commit that referenced this pull request Aug 13, 2025
…an explicit method call" (#153477)

Reverts #153373

Sanitizer bot is broken
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 13, 2025
… ctor into an explicit method call" (#153477)

Reverts llvm/llvm-project#153373

Sanitizer bot is broken
makslevental pushed a commit to makslevental/python_bindings_fork that referenced this pull request Aug 14, 2025
…an explicit method call" (#153477)

Reverts llvm/llvm-project#153373

Sanitizer bot is broken
makslevental pushed a commit to makslevental/python_bindings_fork that referenced this pull request Aug 14, 2025
…an explicit method call" (#153477)

Reverts llvm/llvm-project#153373

Sanitizer bot is broken
makslevental pushed a commit to makslevental/python_bindings_fork that referenced this pull request Aug 14, 2025
…an explicit method call" (#153477)

Reverts llvm/llvm-project#153373

Sanitizer bot is broken
bump-llvm bot pushed a commit to makslevental/python_bindings_fork that referenced this pull request Aug 14, 2025
…an explicit method call" (#153477)

Reverts llvm/llvm-project#153373

Sanitizer bot is broken
makslevental pushed a commit to makslevental/python_bindings_fork that referenced this pull request Aug 14, 2025
…an explicit method call" (#153477)

Reverts llvm/llvm-project#153373

Sanitizer bot is broken
bump-llvm bot pushed a commit to makslevental/python_bindings_fork that referenced this pull request Aug 14, 2025
…an explicit method call" (#153477)

Reverts llvm/llvm-project#153373

Sanitizer bot is broken
joker-eph added a commit that referenced this pull request Aug 17, 2025
…cit method call (#153524)

Retry landing #153373
## Major changes from previous attempt
- remove the test in CAPI because no existing tests in CAPI deal with
sanitizer exemptions
- update `mlir/docs/Dialects/GPU.md` to reflect the new behavior: load
GPU binary in global ctors, instead of loading them at call site.
- skip the test on Aarch64 since we have an issue with initialization there

---------

Co-authored-by: Mehdi Amini <[email protected]>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 17, 2025
…to an explicit method call (#153524)

Retry landing llvm/llvm-project#153373
## Major changes from previous attempt
- remove the test in CAPI because no existing tests in CAPI deal with
sanitizer exemptions
- update `mlir/docs/Dialects/GPU.md` to reflect the new behavior: load
GPU binary in global ctors, instead of loading them at call site.
- skip the test on Aarch64 since we have an issue with initialization there

---------

Co-authored-by: Mehdi Amini <[email protected]>
makslevental pushed a commit to makslevental/python_bindings_fork that referenced this pull request Aug 18, 2025
…cit method call (#153524)

Retry landing llvm/llvm-project#153373
## Major changes from previous attempt
- remove the test in CAPI because no existing tests in CAPI deal with
sanitizer exemptions
- update `mlir/docs/Dialects/GPU.md` to reflect the new behavior: load
GPU binary in global ctors, instead of loading them at call site.
- skip the test on Aarch64 since we have an issue with initialization there

---------

Co-authored-by: Mehdi Amini <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants