Skip to content

Conversation

@xal-0
Copy link
Member

@xal-0 xal-0 commented Nov 4, 2025

Overview

This PR overhauls the way linking works in Julia, both in the JIT and AOT. The point is to enable us to generate LLVM IR that depends only on the source IR, eliminating both nondeterminism and statefulness. This serves two purposes. First, if the IR is predictable, we can cache compile objects using the bitcode hash as a key, like how the ThinLTO cache works. #58592 was an early experiment along these lines. Second, we can reuse work that was done in a previous session, like pkgimages, but for the JIT.

We accomplish this by generating names that are unique only within the current LLVM module, removing most uses of the globalUniqueGeneratedNames counter. The replacement for jl_codegen_params_t, jl_codegen_output_t, represents a Julia "translation unit", and tracks the information we'll need to link the compiled module into the running session. When linking, we manipulate the JITLink LinkGraph (after compilation) instead of renaming functions in the LLVM IR (before).

Example

julia> @noinline foo(x) = x + 2.0
       baz(x) = foo(foo(x))

       code_llvm(baz, (Int64,); dump_module=true, optimize=false)

Nightly:

[...]
@"+Core.Float64#774" = private unnamed_addr constant ptr @"+Core.Float64#774.jit"
@"+Core.Float64#774.jit" = private alias ptr, inttoptr (i64 4797624416 to ptr)

; Function Signature: baz(Int64)
;  @ REPL[1]:2 within `baz`
define double @julia_baz_772(i64 signext %"x::Int64") #0 {
top:
  %pgcstack = call ptr @julia.get_pgcstack()
  %0 = call double @j_foo_775(i64 signext %"x::Int64")
  %1 = call double @j_foo_776(double %0)
  ret double %1
}

; Function Attrs: noinline optnone
define nonnull ptr @jfptr_baz_773(ptr %"function::Core.Function", ptr noalias nocapture noundef readonly %"args::Any[]", i32 %"nargs::UInt32") #1 {
top:
  %pgcstack = call ptr @julia.get_pgcstack()
  %0 = getelementptr inbounds i8, ptr %"args::Any[]", i32 0
  %1 = load ptr, ptr %0, align 8
  %.unbox = load i64, ptr %1, align 8
  %2 = call double @julia_baz_772(i64 signext %.unbox)
  %"+Core.Float64#774" = load ptr, ptr @"+Core.Float64#774", align 8
  %Float64 = ptrtoint ptr %"+Core.Float64#774" to i64
  %3 = inttoptr i64 %Float64 to ptr
  %current_task = getelementptr inbounds i8, ptr %pgcstack, i32 -152
  %"box::Float64" = call noalias nonnull align 8 dereferenceable(8) ptr @julia.gc_alloc_obj(ptr %current_task, i64 8, ptr %3) #5
  store double %2, ptr %"box::Float64", align 8
  ret ptr %"box::Float64"
}
[...]

Diff after this PR. Notice how each symbol gets the lowest possible integer suffix that will make it unique to the module, and how the two specializations for foo get different names:

@@ -4,18 +4,18 @@
 target triple = "arm64-apple-darwin24.6.0"
 
-@"+Core.Float64#774" = external global ptr
+@"+Core.Float64#_0" = external global ptr
 
 ; Function Signature: baz(Int64)
 ;  @ REPL[1]:2 within `baz`
-define double @julia_baz_772(i64 signext %"x::Int64") #0 {
+define double @julia_baz_0(i64 signext %"x::Int64") #0 {
 top:
   %pgcstack = call ptr @julia.get_pgcstack()
-  %0 = call double @j_foo_775(i64 signext %"x::Int64")
-  %1 = call double @j_foo_776(double %0)
+  %0 = call double @j_foo_0(i64 signext %"x::Int64")
+  %1 = call double @j_foo_1(double %0)
   ret double %1
 }
 
 ; Function Attrs: noinline optnone
-define nonnull ptr @jfptr_baz_773(ptr %"function::Core.Function", ptr noalias nocapture noundef readonly %"args::Any[]", i32 %"nargs::UInt32") #1 {
+define nonnull ptr @jfptr_baz_0(ptr %"function::Core.Function", ptr noalias nocapture noundef readonly %"args::Any[]", i32 %"nargs::UInt32") #1 {
 top:
   %pgcstack = call ptr @julia.get_pgcstack()
@@ -23,7 +23,7 @@
   %1 = load ptr, ptr %0, align 8
   %.unbox = load i64, ptr %1, align 8
-  %2 = call double @julia_baz_772(i64 signext %.unbox)
-  %"+Core.Float64#774" = load ptr, ptr @"+Core.Float64#774", align 8
-  %Float64 = ptrtoint ptr %"+Core.Float64#774" to i64
+  %2 = call double @julia_baz_0(i64 signext %.unbox)
+  %"+Core.Float64#_0" = load ptr, ptr @"+Core.Float64#_0", align 8
+  %Float64 = ptrtoint ptr %"+Core.Float64#_0" to i64
   %3 = inttoptr i64 %Float64 to ptr
   %current_task = getelementptr inbounds i8, ptr %pgcstack, i32 -152
@@ -39,8 +39,8 @@
 
 ; Function Signature: foo(Int64)
-declare double @j_foo_775(i64 signext) #3
+declare double @j_foo_0(i64 signext) #3
 
 ; Function Signature: foo(Float64)
-declare double @j_foo_776(double) #4
+declare double @j_foo_1(double) #4
 
 attributes #0 = { "frame-pointer"="all" "julia.fsig"="baz(Int64)" "probe-stack"="inline-asm" }

List of changes

  • Many sources of statefulness and nondeterminism in the emitted LLVM IR have been eliminated, namely:

    • Function symbols defined for CodeInstances
    • Global symbols referring to data on the Julia heap
    • Undefined function symbols referring to invoked external CodeInstances
  • jl_codeinst_params_t has become jl_codegen_output_t. It now represents one Julia "translation unit". More than one CodeInstance can be emitted to the same jl_codegen_output_t, if desired, though in the JIT every CI gets its own right now. One motivation behind this is to allow us to emit code on multiple threads and avoid the bitcode serialize/deserialize step we currently do, if that proves worthwhile.

    When we are done emitting to a jl_codegen_output_t, we call .finish(), which discards the intermediate state and returns only the LLVM module and the info needed for linking (jl_linker_info_t).

  • The new JLMaterializationUnit wraps emitting Julia LLVM modules and the associated jl_linker_info_t. It informs ORC that we can materialize symbols for the CIs defined by that output, and picks globally unique names for them. When it is materialized, it resolves all the call targets and generates trampolines for CodeInstances that are invoked but have the wrong calling convention, or are not yet compiled.

  • We now postpone linking decisions to after codegen whenever possible. For example, emit_invoke no longer tries to find a compiled version of the CodeInstance, and it no longer generates trampolines to adapt calling conventions. jl_analyze_workqueue's job has been absorbed into JuliaOJIT::linkOutput.

  • Some image_codegen differences have been removed:

    • Codegen no longer cares if a compiled CodeInstance came from an image. During ahead-of-time linking, we generate thunk functions that load the address from the fvars table.
  • In jl_emit_native_impl, emit every CodeInstance into one jl_codegen_output_t. We now defer the creation of the llvm::Linker for llvmcalls, which has construction cost that grows with the size of the destination module, until the very end.

  • RTDyld is removed completely, since we cannot control linking like we can with JITLink. Since Add JLJITLinkMemoryManager (ports memory manager to JITLink) #60105, platforms that previous used the optimized memory manager now use the new one.

General refactoring

  • Adapt the jl_callingconv_t enum from staticdata.c into jl_invoke_api_t and use it in more places. There is one enumerator for each special jl_callptr_t function that can go in a CodeInstance's invoke field, as well as one that indicates an invoke wrapper should be there. There is a convenience function for reading an invoke pointer and getting the API type, and vice versa.
  • Avoid using magic string values, and try to directly pass pointers to LLVM Function * or ORC string pool entries when possible.

Future work

  • DLSymOptimizer should be mostly removed, in favour of emitting raw ccalls and redirecting them to the appropriate target during linking.

  • We should support ahead-of-time linking multiple jl_codegen_output_ts together, in order to parallelize LLVM IR emission when compiling a system image.

  • We still pass strings to emit_call_specfun_other, even though the prototype for the function is now created by jl_codegen_output_t::get_call_target. We should hold on to the calling convention info so it doesn't have to be recomputed.

xal-0 added 3 commits November 3, 2025 15:28
Use JITLink everywhere

Rename jlcall_type, add jl_funcs_invoke_ptr

Move JLLinkingLayer into JuliaOJIT

Use jl_invoke_api_t elsewhere

Rename JL_INVOKE_JFPTR -> JL_INVOKE_SPECSIG

Put all special symbol names in one place

Add helper for specsig -> tojlinvoke (fptr1) and use it

Fix invariants for code_outputs

Document JIT invariants better; remove invalid assertions

Replace workqueue, partially support OpaqueClosure

Add JIT tests

Stop using strings so much

Don't create an LLVM::Linker unless necessary

Generate trampolines in aot_link_output

GCChecker annotations, misc changes

Re-add emit_always_inline

Get JLDebuginfoPlugin and eh_frame working again

Re-add OpaqueClosure MethodInstance global root

Fix GCChecker annotations

Clean up TODOs

Read dump compile

Use multiple threads in the JIT

Add PLT/GOT for external fns

Name Julia PLT GOT entries

Do emit_llvmcall_modules at the end

Suppress clang-tidy, static analyzer warnings

Keep temporary_roots alive during emit_always_inline

Mark pkg PLT thunks noinline

Don't attempt to emit inline codeinsts when IR is too large or missing

Improve thunk generation on x86

Fix infinite loop in emit_always_inline if inlining not possible

Use local names for global targets

Fix jl_get_llvmf_defn_impl cfunction hacks
@xal-0 xal-0 added compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM labels Nov 4, 2025
Comment on lines 872 to 877
class JLMaterializationUnit : public orc::MaterializationUnit {
public:
static JLMaterializationUnit Create(JuliaOJIT &JIT, ObjectLinkingLayer &OL,
std::unique_ptr<jl_linker_info_t> Info,
std::unique_ptr<MemoryBuffer> Obj) JL_NOTSAFEPOINT
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I have been wanting this for a long time!

Would it make sense to have a C-API for creating these? So that LLVM.jl could create them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, though I would not want to expose it in a way that would lock in some of the design choices, like how JLMaterializationUnit owns the object buffer.

I'm undecided on how much work should be deferred to materialization. Right now jl_compile_codeinst_now blocks all threads waiting on compilation until everything is compiled to object files, like on master. I'd like to leave the door open to letting ORC decide when to compile.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I have been wanting to try and ORC based setup for GPUCompiler

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still no C API, but fwiw I have switched this most recent version over to doing compilation in JLMaterializationUnit::materialize.

xal-0 added a commit to xal-0/julia that referenced this pull request Nov 11, 2025
Ports our RTDyLD memory manager to JITLink in order to avoid memory use
regressions after switching to JITLink everywhere (JuliaLang#60031).  This is
essentially a direct port: finalization must happen all at once, because
it invalidates all allocation `wr_ptr`s.  I decided it wasn't worth it
to associate `OnFinalizedFunction` callbacks with each block, since they
are large enough to make it extremely likely that all in-flight
allocations land in the same block; everything must be relocated before
finalization can happen.

I plan to add support for DualMapAllocator on ARM64 macOS, as well as an
alternative for executable memory to come later.  For now, we fall back
to the old MapperJITLinkMemoryManager.
xal-0 added a commit to xal-0/julia that referenced this pull request Nov 11, 2025
Ports our RTDyLD memory manager to JITLink in order to avoid memory use
regressions after switching to JITLink everywhere (JuliaLang#60031).  This is a
direct port: finalization must happen all at once, because it
invalidates all allocation `wr_ptr`s.  I decided it wasn't worth it to
associate `OnFinalizedFunction` callbacks with each block, since they
are large enough to make it extremely likely that all in-flight
allocations land in the same block; everything must be relocated before
finalization can happen.

I plan to add support for DualMapAllocator on ARM64 macOS, as well as an
alternative for executable memory later.  For now, we fall back to the
old MapperJITLinkMemoryManager.
xal-0 added a commit to xal-0/julia that referenced this pull request Nov 11, 2025
Ports our RTDyLD memory manager to JITLink in order to avoid memory use
regressions after switching to JITLink everywhere (JuliaLang#60031).  This is a
direct port: finalization must happen all at once, because it
invalidates all allocation `wr_ptr`s.  I decided it wasn't worth it to
associate `OnFinalizedFunction` callbacks with each block, since they
are large enough to make it extremely likely that all in-flight
allocations land in the same block; everything must be relocated before
finalization can happen.

I plan to add support for DualMapAllocator on ARM64 macOS, as well as an
alternative for executable memory later.  For now, we fall back to the
old MapperJITLinkMemoryManager.

Release JLJITLinkMemoryManager lock when calling FinalizedCallbacks
xal-0 added a commit to xal-0/julia that referenced this pull request Nov 11, 2025
Ports our RTDyLD memory manager to JITLink in order to avoid memory use
regressions after switching to JITLink everywhere (JuliaLang#60031).  This is a
direct port: finalization must happen all at once, because it
invalidates all allocation `wr_ptr`s.  I decided it wasn't worth it to
associate `OnFinalizedFunction` callbacks with each block, since they
are large enough to make it extremely likely that all in-flight
allocations land in the same block; everything must be relocated before
finalization can happen.

I plan to add support for DualMapAllocator on ARM64 macOS, as well as an
alternative for executable memory later.  For now, we fall back to the
old MapperJITLinkMemoryManager.

Release JLJITLinkMemoryManager lock when calling FinalizedCallbacks
xal-0 added a commit to xal-0/julia that referenced this pull request Nov 11, 2025
Ports our RTDyLD memory manager to JITLink in order to avoid memory use
regressions after switching to JITLink everywhere (JuliaLang#60031).  This is a
direct port: finalization must happen all at once, because it
invalidates all allocation `wr_ptr`s.  I decided it wasn't worth it to
associate `OnFinalizedFunction` callbacks with each block, since they
are large enough to make it extremely likely that all in-flight
allocations land in the same block; everything must be relocated before
finalization can happen.

I plan to add support for DualMapAllocator on ARM64 macOS, as well as an
alternative for executable memory later.  For now, we fall back to the
old MapperJITLinkMemoryManager.

Release JLJITLinkMemoryManager lock when calling FinalizedCallbacks
xal-0 added a commit to xal-0/julia that referenced this pull request Nov 11, 2025
Ports our RTDyLD memory manager to JITLink in order to avoid memory use
regressions after switching to JITLink everywhere (JuliaLang#60031).  This is a
direct port: finalization must happen all at once, because it
invalidates all allocation `wr_ptr`s.  I decided it wasn't worth it to
associate `OnFinalizedFunction` callbacks with each block, since they
are large enough to make it extremely likely that all in-flight
allocations land in the same block; everything must be relocated before
finalization can happen.

I plan to add support for DualMapAllocator on ARM64 macOS, as well as an
alternative for executable memory later.  For now, we fall back to the
old MapperJITLinkMemoryManager.

Release JLJITLinkMemoryManager lock when calling FinalizedCallbacks
xal-0 added a commit that referenced this pull request Nov 13, 2025
Ports our RTDyLD memory manager to JITLink in order to avoid memory use
regressions after switching to JITLink everywhere (#60031). This is a
direct port: finalization must happen all at once, because it
invalidates all allocation `wr_ptr`s. I decided it wasn't worth it to
associate `OnFinalizedFunction` callbacks with each block, since they
are large enough to make it extremely likely that all in-flight
allocations land in the same block; everything must be relocated before
finalization can happen.
@adienes
Copy link
Member

adienes commented Nov 14, 2025

eliminating both nondeterminism and the effect of redefining methods in the same session

there are several open issues observing inference changes when methods are redefined; does this PR affect those?

@xal-0
Copy link
Member Author

xal-0 commented Nov 14, 2025

No, this PR only changes code generation.

Unfortunately the "portable" LLVM way of generating thunks doesn't generate the
code we want.  Instead, on platforms where it makes sense, we'll steal the LLD
PLT thunk code, but in disassembled form.  At some point this should be moved to
after linking, where it can be in assembled form again.  Amusingly it will be
more portable in assembled form, because the assembler syntax for relocations
differs between object formats.
@xal-0
Copy link
Member Author

xal-0 commented Nov 18, 2025

This new commit fixes some horrible code generation in emit_pkg_plt_thunk by just emitting inline assembly, using PLT thunks stolen from LLD. This will be less hacky when it happens after linking. Since that requires the renaming of symbols post-compilation, it is out of scope for this PR.

KristofferC pushed a commit that referenced this pull request Nov 19, 2025
Ports our RTDyLD memory manager to JITLink in order to avoid memory use
regressions after switching to JITLink everywhere (#60031). This is a
direct port: finalization must happen all at once, because it
invalidates all allocation `wr_ptr`s. I decided it wasn't worth it to
associate `OnFinalizedFunction` callbacks with each block, since they
are large enough to make it extremely likely that all in-flight
allocations land in the same block; everything must be relocated before
finalization can happen.

(cherry picked from commit 6fa0e75)
xal-0 added a commit to xal-0/julia that referenced this pull request Nov 20, 2025
Replace all uses of `ptrdiff_t slide` and `int64_t slide` with `uint64_t`.  If a
JITted object is ever assigned an address in the upper half of the address
space, which is quite common on 32-bit Linux, the expression `SectionAddr -
SectionLoadAddr` has undefined behaviour.  This resulted in some very confusing
bugs that manifested far from the source.

It is easier to use unsigned integers everywhere we need a difference, since we
know they have two's complement representation.

Cherry-picked from JuliaLang#60031.

[1] https://buildkite.com/julialang/julia-master/builds/52196/steps/canvas?sid=019a9d6f-14a6-4ffc-be19-f2f835d1e719
xal-0 added a commit to xal-0/julia that referenced this pull request Nov 20, 2025
Replace all uses of `ptrdiff_t slide` and `int64_t slide` with `uint64_t`.  If a
JITted object is ever assigned an address in the upper half of the address space
on a platform with `sizeof(char *) = 4`, which is quite common on 32-bit Linux,
the following can happen:

In JITDebugInfoRegistry::registerJITObject, `SectionAddr - SectionLoadAddr`
is computed in uint64_t (ok), then cast to ptrdiff_t (two's complement of
the uint64_t version mod 2^32).  This is apparently implementation-defined
behaviour rather than undefined.

Say SectionAddr = 0x1000UL, SectionLoadAddr = 0xe93b2000UL and
size_t pointer = 0xe93b20abU.
```
(ptrdiff_t)(SectionAddr - SectionLoadAddr) == (ptrdiff_t)0xffffffff16c4f000
                                           == 382005248
```

jl_DI_for_fptr implicitly converts the ptrdiff_t to int64_t:
```
(int64_t)382005248 == 382005248L
```

lookup_pointer adds `size_t pointer` to `int64_t slide`.  Both are converted
to int64_t because it can represent every size_t:
```
(int64_t)0xe93b20abU + 382005248L == 3912966315L + 382005248L
                                  == 4294971563L
```

This is converted back to uint64_t by makeAddress, resulting in an address other
than the 0x10ab we expected:
```
(uint64_t)4294971563L == 0x1000010abUL
```

It is easier to use unsigned integers everywhere we need a difference, since
they avoid the problem of losing upper bits after sign extension and avoid weird
UB from signed overflow.

Cherry-picked from JuliaLang#60031.

[1] https://buildkite.com/julialang/julia-master/builds/52196/steps/canvas?sid=019a9d6f-14a6-4ffc-be19-f2f835d1e719
@xal-0 xal-0 requested a review from vtjnash February 2, 2026 23:11
@xal-0
Copy link
Member Author

xal-0 commented Feb 10, 2026

@nanosoldier runtests()

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

Report summary

❗ Packages that crashed

1 packages crashed only on the current version.

  • A segmentation fault happened: 1 packages

272 packages crashed on the previous version too.

✖ Packages that failed

35 packages failed only on the current version.

  • Package fails to precompile: 1 packages
  • Package has test failures: 6 packages
  • Package tests unexpectedly errored: 3 packages
  • Tests became inactive: 1 packages
  • Test duration exceeded the time limit: 24 packages

1203 packages failed on the previous version too.

✔ Packages that passed tests

5 packages passed tests only on the current version.

  • Other: 5 packages

5557 packages passed tests on the previous version too.

~ Packages that at least loaded

3439 packages successfully loaded on the previous version too.

➖ Packages that were skipped altogether

1 packages were skipped only on the current version.

  • Package could not be installed: 1 packages

907 packages were skipped on the previous version too.

@maleadt
Copy link
Member

maleadt commented Feb 10, 2026

Is this expected to be breaking for low-level packages like AllocCheck or CompilerCaching.jl? Both error now with duplicate definition errors when putting stuff in the Julia JIT (LLVM error: Duplicate definition of symbol 'jfptr_parent_0').

@vtjnash
Copy link
Member

vtjnash commented Feb 10, 2026

Putting stuff directly in Julia's JIT is generally expected to be fairly buggy. But we should make some new JITDylib for each thing you put in to at least protect it against such trivial conflicts from crashing.

@gbaraldi
Copy link
Member

They are putting it in the external jitdylib. Enzyme will also have an issue here. The idea is to have good debug info. We maybe need a replacement API that does the linking tricks

// now add it to our compilation results
jl_code_instance_t *codeinst = (jl_code_instance_t*)item;

// TODO: check
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check what?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "TODO: check" comments are places where I wanted to return to make sure the behaviour is unchanged from upstream. In this case I need to make sure I didn't cause a regression from aab7490: if it's okay to skip JL_CI_FLAGS_FROM_IMAGE code instances here and emit them later, in emit_always_inline.

@xal-0
Copy link
Member Author

xal-0 commented Feb 10, 2026

RE: @maleadt

Is this expected to be breaking for low-level packages like AllocCheck or CompilerCaching.jl?

Yes, but I added a new flag in CodegenParams to smooth the transition, while we figure out a more stable API for packages that integrate closely with the JIT.

The fix for AllocCheck.jl is to make the following change to GPUCompiler.jl:

diff --git i/src/jlgen.jl w/src/jlgen.jl
index 2812a02..ffae1d5 100644
--- i/src/jlgen.jl
+++ w/src/jlgen.jl
@@ -752,6 +752,9 @@ function compile_method_instance(@nospecialize(job::CompilerJob))
     if v"1.12.0-DEV.2126" <= VERSION < v"1.13-" || VERSION >= v"1.13.0-DEV.285"
         cgparams = (; force_emit_all = true , cgparams...)
     end
+    if v"1.14.0-DEV.1688" <= VERSION
+        cgparams = (; unique_names = true, cgparams...)
+    end
     params = Base.CodegenParams(; cgparams...)
 
     # generate IR

That makes jl_emit_native produce session-unique names for functions, so they can be added directly to the JIT like before. I will have to do some testing with CompilerCaching.jl

}

static jl_llvm_functions_t jl_emit_oc_wrapper(orc::ThreadSafeModule &m, jl_codegen_params_t &params, jl_method_instance_t *mi, jl_value_t *rettype)
// TODO: handle jl_invoke_type properly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds important, although I don't know what jl_invoke_type is (jl_invoke_api_t?)

Should this be fixed up, or elaborated so that it's clearer what's left TODO?

# define jl_unreachable() __builtin_unreachable()
#else
# define jl_unreachable() ((void)jl_assume(0))
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably want this to depend on JL_NDEBUG and be more like assert(false) in debug builds, right?

P.S. I have wanted this for ages

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't disagree, but all I did was move it from the bottom of this file :P

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xal-0
Copy link
Member Author

xal-0 commented Feb 12, 2026

@nanosoldier runtests(["CompilerCaching", "FMICore", "FunctionOperators", "Visor", "BorrowChecker", "DataFlowTasks", "Keccak", "RungeKuttaToolKit", "AllocCheck", "Nemo", "VectorizationBase", "Ariadne", "FactorRotations", "CrystalNets", "Juniper", "GLPK", "SCS", "Clarabel", "CDDLib", "DMRGenie", "RegularizedProblems", "Tesserae", "IMASdd", "SPlit", "EffectiveWaves", "GenericCharacterTables", "BEAST", "GongBetaAdrenergicSignaling", "StochasticDelayDiffEq", "SpiDy", "PeriodicMatrices", "Trixi", "AlgebraOfGraphics", "HypersurfaceRegions", "PowerGraphics", "UnfoldMakie"])

@xal-0 xal-0 force-pushed the local-names-linking branch from 576b136 to 13e2328 Compare February 12, 2026 19:09
@xal-0
Copy link
Member Author

xal-0 commented Feb 12, 2026

13e2328 fixes a bug that would cause bootstrapping to fail maybe 1/20 times, but I'd appreciate feedback on how I fixed it. This is an overview of the life cycle of a CodeInstance, post-local-names-linking (to be cleaned up and added to devdocs, eventually):

  1. A fresh CodeInstance is created by calling jl_new_codeinst.
  2. The CodeInstance is rooted: inference roots CodeInstances that will be added to the JIT by publishing them to the global cache. jl_eval_thunk does not root the MethodInstance the CodeInstance belongs to globally, but keeps it alive until it is invoked.
  3. jl_emit_codeinst_to_jit is called, creating a jl_emitted_output_t with the LLVM IR and jl_linker_info_t for this CodeInstance.
  4. jl_emit_codeinsts_to_jit then adds the output to the JIT, with JuliaOJIT::addOutput. addOutput takes the LinkerMutex, creating a JLMaterializationUnit for the CodeInstance.
  5. JLMaterializationUnit::Create chooses fresh ORC symbols for the CodeInstance functions, installing the CodeInstance => CISymbolPtr pair in the CISymbols map.
  6. The JLMaterializationUnit is added to JD, defining the new symbols in CISymbols, while still holding the LinkerMutex lock. If we did not hold the lock, a CodeInstance that invokes this CodeInstance could be added, and could look up the ORC symbols before a definition is available.
  7. It is now okay for the CodeInstance to be garbage collected, if that is possible (currently, only possible for top-level thunks).
  8. Compilation of the CodeInstance is triggered by jl_compile_codeinst. This calls JuliaOJIT::publishCIs, which triggers an ORC lookup for the symbols for that CI, found in CISymbols.
  9. ORC materializes the JLMaterializationUnit that defines those symbols. JLMaterializationUnit::materialize runs DLSymOptimizer, optimizes, and compiles the LLVM module into an object file in memory.
  10. A jitlink::LinkGraph is created for the object file, but it must be linked to the correct symbols for every invoked CodeInstance by JuliaOJIT::linkOutput. This is where the jl_linker_info_t is used.
  11. JuliaOJIT::linkOutput takes the LinkerMutex lock, then renames the specptr/invoke functions for this CodeInstance to match the symbols stored in CISymbols.
  12. JuliaOJIT::linkCallTarget is called for each invoked CodeInstance: we check if this CodeInstance has existing symbols in CISymbols. If it does not, but is defined in an image, we make a fresh symbol and define it. Otherwise, this CodeInstance has an invoke to a CodeInstance that was never added to the JIT, or has not finished emitting LLVM IR. We link to fresh symbols defined by a new JLTrampolineMaterializationUnit (TODO: describe this process more).
  13. Once the final address of the invoke/specptr functions is known, we publish this to the global cache with jl_publish_compiled_ci.

13e2328 fixed a bug that was rather painful to find:

  1. A CodeInstance (call it CI1) for a top-level thunk was added to the JIT by jl_eval_thunk (these are somewhat rare).
  2. We added a CodeInstance ptr => CISymbolPtr mapping for it to CISymbols.
  3. jl_eval_thunk invokes the associated MethodInstance.
  4. The MethodInstance and CodeInstance for the thunk are garbage collected.
  5. Some time later, a new CodeInstance is allocated (call it CI2), and it gets the same address as the old one.
  6. Type inference publishes it to the global cache. Some other CodeInstance CI3 invokes CI2, and CI3 is added to the JIT before CI2 is.
  7. When materializing CI3, linkCallTarget sees the mapping from CI2 to the ORC symbols for the top-level thunk, which are still valid, and links to it.
  8. Extremely weird errors result after jumping to the top-level thunk, because the thunks typically take no arguments and return nothing, so they don't usually crash. Generally something in CI3 explodes due to an unexpected nothing.

I don't love the solution of deleting entries from CISymbols when a new CodeInstance is allocated, but I also can't think of a better alternative.

xal-0 added a commit to xal-0/julia that referenced this pull request Feb 12, 2026
When JL_NDEBUG is undefined, we should use these as assertions.
Suggested by @topolarity in
JuliaLang#60031 (comment).
xal-0 added a commit to xal-0/julia that referenced this pull request Feb 12, 2026
When JL_NDEBUG is undefined, we should use these as assertions.
Suggested by @topolarity in
JuliaLang#60031 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

compiler:codegen Generation of LLVM IR and native code compiler:llvm For issues that relate to LLVM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants