Skip to content

Commit b88f64f

Browse files
authored
make codegen threadsafe, sinking the necessary lock now into JuliaOJIT (JuliaLang#55106)
This adds a new helper `jl_read_codeinst_invoke` that should help manage reading the state out of a CodeInstance correctly everywhere. Then replaces all of the places where we have optimizations in codegen where we check for this (to build a name in the JIT for it) with that call. And finally moves the `jl_codegen_lock` into `jl_ExecutionEngine->jitlock` so that it is now more clear that this is only protecting concurrent access to the JIT state it manages (which includes the invoke field of all CodeInstance objects). In a subsequent followup, that `jitlock` and `codeinst_in_flight` will be replaced with something akin to the new engine (for CodeInfo inference) which helps partition that JIT lock mechanism (for CodeInstance / JIT insertion) to correspond just to a single CodeInstance, and not globally to all of them.
1 parent e496b2e commit b88f64f

File tree

11 files changed

+154
-132
lines changed

11 files changed

+154
-132
lines changed

doc/src/devdocs/jit.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,20 +59,22 @@ In addition, there are a number of different transitional states that occur duri
5959

6060
1. When writing `invoke`, `specsigflags`, and `specptr`:
6161
1. Perform an atomic compare-exchange operation of specptr assuming the old value was NULL. This compare-exchange operation should have at least acquire-release ordering, to provide ordering guarantees of the remaining memory operations in the write.
62-
2. If `specptr` was non-null, cease the write operation and wait for bit 0b10 of `specsigflags` to be written.
62+
2. If `specptr` was non-null, cease the write operation and wait for bit 0b10 of `specsigflags` to be written, then restart from step 1 if desired.
6363
3. Write the new low bit of `specsigflags` to its final value. This may be a relaxed write.
6464
4. Write the new `invoke` pointer to its final value. This must have at least a release memory ordering to synchronize with reads of `invoke`.
6565
5. Set the second bit of `specsigflags` to 1. This must be at least a release memory ordering to synchronize with reads of `specsigflags`. This step completes the write operation and announces to all other threads that all fields have been set.
6666
2. When reading all of `invoke`, `specsigflags`, and `specptr`:
67-
1. Read the `invoke` field with at least an acquire memory ordering. This load will be referred to as `initial_invoke`.
68-
2. If `initial_invoke` is NULL, the codeinst is not yet executable. `invoke` is NULL, `specsigflags` may be treated as 0b00, `specptr` may be treated as NULL.
69-
3. Read the `specptr` field with at least an acquire memory ordering.
67+
1. Read the `specptr` field with any memory ordering.
68+
2. Read the `invoke` field with at least an acquire memory ordering. This load will be referred to as `initial_invoke`.
69+
3. If `initial_invoke` is NULL, the codeinst is not yet executable. `invoke` is NULL, `specsigflags` may be treated as 0b00, `specptr` may be treated as NULL.
7070
4. If `specptr` is NULL, then the `initial_invoke` pointer must not be relying on `specptr` to guarantee correct execution. Therefore, `invoke` is non-null, `specsigflags` may be treated as 0b00, `specptr` may be treated as NULL.
7171
5. If `specptr` is non-null, then `initial_invoke` might not be the final `invoke` field that uses `specptr`. This can occur if `specptr` has been written, but `invoke` has not yet been written. Therefore, spin on the second bit of `specsigflags` until it is set to 1 with at least acquire memory ordering.
72-
6. Re-read the `invoke` field with at least an acquire memory ordering. This load will be referred to as `final_invoke`.
72+
6. Re-read the `invoke` field with any memory ordering. This load will be referred to as `final_invoke`.
7373
7. Read the `specsigflags` field with any memory ordering.
7474
8. `invoke` is `final_invoke`, `specsigflags` is the value read in step 7, `specptr` is the value read in step 3.
7575
3. When updating a `specptr` to a different but equivalent function pointer:
7676
1. Perform a release store of the new function pointer to `specptr`. Races here must be benign, as the old function pointer is required to still be valid, and any new ones are also required to be valid as well. Once a pointer has been written to `specptr`, it must always be callable whether or not it is later overwritten.
7777

78+
Correctly reading these fields is implemented in `jl_read_codeinst_invoke`.
79+
7880
Although these write, read, and update steps are complicated, they ensure that the JIT can update codeinsts without invalidating existing codeinsts, and that the JIT can update codeinsts without invalidating existing `invoke` pointers. This allows the JIT to potentially reoptimize functions at higher optimization levels in the future, and also will allow the JIT to support concurrent compilation of functions in the future.

src/aotcompile.cpp

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -360,7 +360,6 @@ void *jl_create_native_impl(jl_array_t *methods, LLVMOrcThreadSafeModuleRef llvm
360360

361361
// compile all methods for the current world and type-inference world
362362

363-
JL_LOCK(&jl_codegen_lock);
364363
auto target_info = clone.withModuleDo([&](Module &M) {
365364
return std::make_pair(M.getDataLayout(), Triple(M.getTargetTriple()));
366365
});
@@ -412,7 +411,6 @@ void *jl_create_native_impl(jl_array_t *methods, LLVMOrcThreadSafeModuleRef llvm
412411
// finally, make sure all referenced methods also get compiled or fixed up
413412
jl_compile_workqueue(params, policy);
414413
}
415-
JL_UNLOCK(&jl_codegen_lock); // Might GC
416414
JL_GC_POP();
417415

418416
// process the globals array, before jl_merge_module destroys them
@@ -1991,7 +1989,6 @@ void jl_get_llvmf_defn_impl(jl_llvmf_dump_t* dump, jl_method_instance_t *mi, jl_
19911989
uint8_t measure_compile_time_enabled = jl_atomic_load_relaxed(&jl_measure_compile_time_enabled);
19921990
if (measure_compile_time_enabled)
19931991
compiler_start_time = jl_hrtime();
1994-
JL_LOCK(&jl_codegen_lock);
19951992
auto target_info = m.withModuleDo([&](Module &M) {
19961993
return std::make_pair(M.getDataLayout(), Triple(M.getTargetTriple()));
19971994
});
@@ -2006,7 +2003,6 @@ void jl_get_llvmf_defn_impl(jl_llvmf_dump_t* dump, jl_method_instance_t *mi, jl_
20062003
// To get correct names in the IR this needs to be at least 2
20072004
output.debug_level = params.debug_info_level;
20082005
auto decls = jl_emit_code(m, mi, src, output);
2009-
JL_UNLOCK(&jl_codegen_lock); // Might GC
20102006

20112007
Function *F = NULL;
20122008
if (m) {

src/cgutils.cpp

Lines changed: 0 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -4367,22 +4367,6 @@ static Value *emit_memoryref_ptr(jl_codectx_t &ctx, const jl_cgval_t &ref, const
43674367
return data;
43684368
}
43694369

4370-
jl_value_t *jl_fptr_wait_for_compiled(jl_value_t *f, jl_value_t **args, uint32_t nargs, jl_code_instance_t *m)
4371-
{
4372-
// This relies on the invariant that the JIT will have set the invoke ptr
4373-
// by the time it releases the codegen lock. If the code is refactored to make the
4374-
// codegen lock more fine-grained, this will have to be replaced by a per-codeinstance futex.
4375-
size_t nthreads = jl_atomic_load_acquire(&jl_n_threads);
4376-
// This should only be possible if there's more than one thread. If not, either there's a bug somewhere
4377-
// that resulted in this not getting cleared, or we're about to deadlock. Either way, that's bad.
4378-
if (nthreads == 1) {
4379-
jl_error("Internal error: Reached jl_fptr_wait_for_compiled in single-threaded execution.");
4380-
}
4381-
JL_LOCK(&jl_codegen_lock);
4382-
JL_UNLOCK(&jl_codegen_lock);
4383-
return jl_atomic_load_acquire(&m->invoke)(f, args, nargs, m);
4384-
}
4385-
43864370
// Reset us back to codegen debug type
43874371
#undef DEBUG_TYPE
43884372
#define DEBUG_TYPE "julia_irgen_codegen"

src/codegen.cpp

Lines changed: 45 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -5188,30 +5188,26 @@ static jl_cgval_t emit_invoke(jl_codectx_t &ctx, const jl_cgval_t &lival, ArrayR
51885188
}
51895189

51905190
// Check if it is already compiled (either JIT or externally)
5191-
if (cache_valid) {
5191+
if (need_to_emit && cache_valid) {
51925192
// optimization: emit the correct name immediately, if we know it
51935193
// TODO: use `emitted` map here too to try to consolidate names?
5194-
// WARNING: isspecsig is protected by the codegen-lock. If that lock is removed, then the isspecsig load needs to be properly atomically sequenced with this.
5195-
auto fptr = jl_atomic_load_relaxed(&codeinst->specptr.fptr);
5196-
if (fptr) {
5197-
while (!(jl_atomic_load_acquire(&codeinst->specsigflags) & 0b10)) {
5198-
jl_cpu_pause();
5199-
}
5200-
invoke = jl_atomic_load_relaxed(&codeinst->invoke);
5201-
if (specsig ? jl_atomic_load_relaxed(&codeinst->specsigflags) & 0b1 : invoke == jl_fptr_args_addr) {
5202-
protoname = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)fptr, codeinst);
5203-
if (ctx.external_linkage) {
5204-
// TODO: Add !specsig support to aotcompile.cpp
5205-
// Check that the codeinst is containing native code
5206-
if (specsig && jl_atomic_load_relaxed(&codeinst->specsigflags) & 0b100) {
5207-
external = true;
5208-
need_to_emit = false;
5209-
}
5210-
}
5211-
else { // ctx.use_cache
5194+
uint8_t specsigflags;
5195+
jl_callptr_t invoke;
5196+
void *fptr;
5197+
jl_read_codeinst_invoke(codeinst, &specsigflags, &invoke, &fptr, 0);
5198+
if (specsig ? specsigflags & 0b1 : invoke == jl_fptr_args_addr) {
5199+
protoname = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)fptr, invoke, codeinst);
5200+
if (ctx.external_linkage) {
5201+
// TODO: Add !specsig support to aotcompile.cpp
5202+
// Check that the codeinst is containing native code
5203+
if (specsig && (specsigflags & 0b100)) {
5204+
external = true;
52125205
need_to_emit = false;
52135206
}
52145207
}
5208+
else { // ctx.use_cache
5209+
need_to_emit = false;
5210+
}
52155211
}
52165212
}
52175213
if (need_to_emit) {
@@ -6737,7 +6733,7 @@ static Value *get_scope_field(jl_codectx_t &ctx)
67376733
"current_scope");
67386734
}
67396735

6740-
static Function *emit_tojlinvoke(jl_code_instance_t *codeinst, Module *M, jl_codegen_params_t &params)
6736+
static Function *emit_tojlinvoke(jl_code_instance_t *codeinst, StringRef theFptrName, Module *M, jl_codegen_params_t &params)
67416737
{
67426738
++EmittedToJLInvokes;
67436739
jl_codectx_t ctx(M->getContext(), params, codeinst);
@@ -6754,11 +6750,8 @@ static Function *emit_tojlinvoke(jl_code_instance_t *codeinst, Module *M, jl_cod
67546750
ctx.builder.SetInsertPoint(b0);
67556751
Function *theFunc;
67566752
Value *theFarg;
6757-
auto invoke = jl_atomic_load_relaxed(&codeinst->invoke);
6758-
bool cache_valid = params.cache;
67596753

6760-
if (cache_valid && invoke != NULL && invoke != &jl_fptr_wait_for_compiled) {
6761-
StringRef theFptrName = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)invoke, codeinst);
6754+
if (!theFptrName.empty()) {
67626755
theFunc = cast<Function>(
67636756
M->getOrInsertFunction(theFptrName, jlinvoke_func->_type(ctx.builder.getContext())).getCallee());
67646757
theFarg = literal_pointer_val(ctx, (jl_value_t*)codeinst);
@@ -6925,6 +6918,7 @@ static Function* gen_cfun_wrapper(
69256918
bool nest = (!ff || unionall_env);
69266919
jl_value_t *astrt = (jl_value_t*)jl_any_type;
69276920
void *callptr = NULL;
6921+
jl_callptr_t invoke = NULL;
69286922
int calltype = 0;
69296923
if (aliasname)
69306924
name = aliasname;
@@ -6933,16 +6927,10 @@ static Function* gen_cfun_wrapper(
69336927
if (lam && params.cache) {
69346928
// TODO: this isn't ideal to be unconditionally calling type inference (and compile) from here
69356929
codeinst = jl_compile_method_internal(lam, world);
6936-
auto invoke = jl_atomic_load_acquire(&codeinst->invoke);
6937-
auto fptr = jl_atomic_load_relaxed(&codeinst->specptr.fptr);
6930+
uint8_t specsigflags;
6931+
void *fptr;
6932+
jl_read_codeinst_invoke(codeinst, &specsigflags, &invoke, &fptr, 0);
69386933
assert(invoke);
6939-
if (fptr) {
6940-
while (!(jl_atomic_load_acquire(&codeinst->specsigflags) & 0b10)) {
6941-
jl_cpu_pause();
6942-
}
6943-
invoke = jl_atomic_load_relaxed(&codeinst->invoke);
6944-
}
6945-
// WARNING: this invoke load is protected by the codegen-lock. If that lock is removed, then the isspecsig load needs to be properly atomically sequenced with this.
69466934
if (invoke == jl_fptr_args_addr) {
69476935
callptr = fptr;
69486936
calltype = 1;
@@ -6952,7 +6940,7 @@ static Function* gen_cfun_wrapper(
69526940
callptr = (void*)codeinst->rettype_const;
69536941
calltype = 2;
69546942
}
6955-
else if (jl_atomic_load_relaxed(&codeinst->specsigflags) & 0b1) {
6943+
else if (specsigflags & 0b1) {
69566944
callptr = fptr;
69576945
calltype = 3;
69586946
}
@@ -7230,7 +7218,7 @@ static Function* gen_cfun_wrapper(
72307218
jlfunc_sret = false;
72317219
Function *theFptr = NULL;
72327220
if (calltype == 1) {
7233-
StringRef fname = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)callptr, codeinst);
7221+
StringRef fname = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)callptr, invoke, codeinst);
72347222
theFptr = cast_or_null<Function>(jl_Module->getNamedValue(fname));
72357223
if (!theFptr) {
72367224
theFptr = Function::Create(ctx.types().T_jlfunc, GlobalVariable::ExternalLinkage,
@@ -7275,7 +7263,7 @@ static Function* gen_cfun_wrapper(
72757263
assert(calltype == 3);
72767264
// emit a specsig call
72777265
bool gcstack_arg = JL_FEAT_TEST(ctx, gcstack_arg);
7278-
StringRef protoname = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)callptr, codeinst);
7266+
StringRef protoname = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)callptr, invoke, codeinst);
72797267
jl_returninfo_t returninfo = get_specsig_function(ctx, M, NULL, protoname, lam->specTypes, astrt, is_opaque_closure, gcstack_arg);
72807268
FunctionType *cft = returninfo.decl.getFunctionType();
72817269
jlfunc_sret = (returninfo.cc == jl_returninfo_t::SRet);
@@ -9887,7 +9875,7 @@ jl_llvm_functions_t jl_emit_codeinst(
98879875
!(params.imaging_mode || jl_options.incremental)) { // don't delete code when generating a precompile file
98889876
// Never end up in a situation where the codeinst has no invoke, but also no source, so we never fall
98899877
// through the cracks of SOURCE_MODE_ABI.
9890-
jl_atomic_store_release(&codeinst->invoke, &jl_fptr_wait_for_compiled);
9878+
jl_atomic_store_release(&codeinst->invoke, jl_fptr_wait_for_compiled_addr);
98919879
jl_atomic_store_release(&codeinst->inferred, jl_nothing);
98929880
}
98939881
}
@@ -9913,34 +9901,29 @@ void jl_compile_workqueue(
99139901
// try to emit code for this item from the workqueue
99149902
StringRef preal_decl = "";
99159903
bool preal_specsig = false;
9916-
auto invoke = jl_atomic_load_acquire(&codeinst->invoke);
9917-
bool cache_valid = params.cache;
9918-
// WARNING: isspecsig is protected by the codegen-lock. If that lock is removed, then the isspecsig load needs to be properly atomically sequenced with this.
9919-
if (cache_valid && invoke != NULL && invoke != &jl_fptr_wait_for_compiled) {
9920-
auto fptr = jl_atomic_load_relaxed(&codeinst->specptr.fptr);
9921-
if (fptr) {
9922-
while (!(jl_atomic_load_acquire(&codeinst->specsigflags) & 0b10)) {
9923-
jl_cpu_pause();
9924-
}
9925-
// in case we are racing with another thread that is emitting this function
9926-
invoke = jl_atomic_load_relaxed(&codeinst->invoke);
9927-
}
9904+
jl_callptr_t invoke = NULL;
9905+
if (params.cache) {
9906+
// WARNING: this correctness is protected by an outer lock
9907+
uint8_t specsigflags;
9908+
void *fptr;
9909+
jl_read_codeinst_invoke(codeinst, &specsigflags, &invoke, &fptr, 0);
9910+
//if (specsig ? specsigflags & 0b1 : invoke == jl_fptr_args_addr)
99289911
if (invoke == jl_fptr_args_addr) {
9929-
preal_decl = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)fptr, codeinst);
9912+
preal_decl = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)fptr, invoke, codeinst);
99309913
}
9931-
else if (jl_atomic_load_relaxed(&codeinst->specsigflags) & 0b1) {
9932-
preal_decl = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)fptr, codeinst);
9914+
else if (specsigflags & 0b1) {
9915+
preal_decl = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)fptr, invoke, codeinst);
99339916
preal_specsig = true;
99349917
}
99359918
}
9936-
else {
9919+
if (preal_decl.empty()) {
99379920
auto it = params.compiled_functions.find(codeinst);
99389921
if (it == params.compiled_functions.end()) {
99399922
// Reinfer the function. The JIT came along and removed the inferred
99409923
// method body. See #34993
99419924
if (policy != CompilationPolicy::Default &&
99429925
jl_atomic_load_relaxed(&codeinst->inferred) == jl_nothing) {
9943-
// Codegen lock is held, so SOURCE_MODE_FORCE_SOURCE_UNCACHED is not required
9926+
// XXX: SOURCE_MODE_FORCE_SOURCE is wrong here (neither sufficient nor necessary)
99449927
codeinst = jl_type_infer(codeinst->def, jl_atomic_load_relaxed(&codeinst->max_world), SOURCE_MODE_FORCE_SOURCE);
99459928
}
99469929
if (codeinst) {
@@ -9970,7 +9953,10 @@ void jl_compile_workqueue(
99709953
// expected specsig
99719954
if (!preal_specsig) {
99729955
// emit specsig-to-(jl)invoke conversion
9973-
Function *preal = emit_tojlinvoke(codeinst, mod, params);
9956+
StringRef invokeName;
9957+
if (invoke != NULL)
9958+
invokeName = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)invoke, invoke, codeinst);
9959+
Function *preal = emit_tojlinvoke(codeinst, invokeName, mod, params);
99749960
proto.decl->setLinkage(GlobalVariable::InternalLinkage);
99759961
//protodecl->setAlwaysInline();
99769962
jl_init_function(proto.decl, params.TargetTriple);
@@ -9987,7 +9973,10 @@ void jl_compile_workqueue(
99879973
// expected non-specsig
99889974
if (preal_decl.empty() || preal_specsig) {
99899975
// emit jlcall1-to-(jl)invoke conversion
9990-
preal_decl = emit_tojlinvoke(codeinst, mod, params)->getName();
9976+
StringRef invokeName;
9977+
if (invoke != NULL)
9978+
invokeName = jl_ExecutionEngine->getFunctionAtAddress((uintptr_t)invoke, invoke, codeinst);
9979+
preal_decl = emit_tojlinvoke(codeinst, invokeName, mod, params)->getName();
99919980
}
99929981
}
99939982
if (!preal_decl.empty()) {
@@ -10289,8 +10278,6 @@ extern "C" JL_DLLEXPORT_CODEGEN void jl_teardown_codegen_impl() JL_NOTSAFEPOINT
1028910278
if (jl_ExecutionEngine)
1029010279
jl_ExecutionEngine->printTimers();
1029110280
PrintStatistics();
10292-
JL_LOCK(&jl_codegen_lock); // TODO: If this lock gets removed reconsider
10293-
// LLVM global state/destructors (maybe a rwlock)
1029410281
}
1029510282

1029610283
// the rest of this file are convenience functions

0 commit comments

Comments
 (0)