-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
Description
This aims to summarize a design discussion between @vtjnash, @JeffBezanson, @StefanKarpinski, @gbaraldi, @topolarity, @xal-0, @mlechu, @oscardssmith and myself around ccall. Except where otherwise annotated, I've tried to capture my understanding the consensus view, but it is of course possible that I have misunderstood or failed to remember in objection. In such cases, error is mine.
Discussion of the problems and current design
The current evaluation semantics of ccall
are quite old and predate us having a particular good understanding of what the evaluation semantics of the language should be (and in particular, predate any notion of world ages, partitioned bindings, effects, etc.). For this reason, it is somewhat hard to give a coherent description of the current design.
However, I will try my best.
Syntax based disambiguation of (non-)library case
There are two basic cases for ccall. The first is call with a plain symbol or ptr:
ccall(sym, ...)
The second is
ccall((sym, lib), ...)
These cases are distinguished both syntactically a semantically. In particular,
x = sym
ccall(x)
is always the same as ccall(sym)
, but the same is not true for x = (sym, lib)
.
Evaluation sematics of the first argument
We first consider the basic case without libraries. Here we are familiar
with the usual ccall syntax:
julia> ccall(:sin, Float64, (Float64,), 1.0)
0.8414709848078965
What happens if we use an expression instead?
julia> f1() = ccall((println("Hello"); :sin), Float64, (Float64,), 1.0)
0.8414709848078965
julia> f1()
Hello
0.8414709848078965
julia> f1()
Hello
0.8414709848078965
So far so normal. However, this is where it starts to get weird. I think the
best way I can describe it is that ccall tries to find a symbol in this order
by:
-
Statically looking at the expression that is syntactically inside
ccall
and
determining whether or not it can figure out the value of the first argument. -
Using inference's constant propagation
-
Using codegen's constant propagation (but not LLVM's)
However, for the non-lib case, the first of these is generally a no-op.
This leads to the following observed behavior:
f2() = ccall((@noinline identity(:sin)), Float64, (Float64,), 1.0)
f3() = ccall(Base.compilerbarrier(:const, :sin), Float64, (Float64,), 1.0)
f4() = ccall(Base.compilerbarrier(:const, (@noinline identity(:sin))), Float64, (Float64,), 1.0)
f5() = ccall((@noinline identity(Base.compilerbarrier(:const, :sin))), Float64, (Float64,), 1.0)
julia> f2()
0.8414709848078965
julia> f3()
0.8414709848078965
julia> f4()
0.8414709848078965
julia> f5()
ERROR: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got a value of type Symbol
Stacktrace:
[1] f5()
@ Main ./REPL[19]:1
[2] top-level scope
@ REPL[24]:1
To me, this is entirely unintuitive and I had to actually go read the source to figure out which cases I think would work and which didn't.
Additional complications from lib lowering
The situation becomes more complicated when using the lib syntax:
julia> ccall(((println("Hello"); :sin), :openlibm), Float64, (Float64,), 1.0)
Internal error: encountered unexpected error during compilation of top-level scope:
ErrorException("unsupported or misplaced expression \"block\" in function top-level scope")
ijl_errorf at /home/keno/julia/src/rtutils.c:77
emit_expr at /home/keno/julia/src/codegen.cpp:6694
julia> function g1()
x = (println("Hello"); :sin)
ccall((x, :openlibm), Float64, (Float64,), 1.0)
end
ERROR: syntax: ccall function name and library expression cannot reference local variables
Stacktrace:
[1] top-level scope
@ REPL[1]:1
julia> function g2()
x = ((println("Hello"); :sin), :libopenlibm)
ccall(x, Float64, (Float64,), 1.0)
end
g2 (generic function with 1 method)
julia> g2()
Hello
0.8414709848078965
julia> syms() = (println("Hello"); :sin)
syms (generic function with 1 method)
julia> ccall((syms(), :openlibm), Float64, (Float64,), 1.0)
Hello
ERROR: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got a value of type Tuple{Symbol, Symbol}
Stacktrace:
[1] top-level scope
@ ./REPL[3]:1
Additional complications from bindings partition
Post-bindings partition, there is an additional complication that both cases 1 and 3 depend on inferred world age bounds of the rest of the function, which can lead to completely non-intuitive behavior. This is #57749, although I will mention it here also:
julia> const sinsym = :sin
:sin
julia> g3() = ccall((sinsym, :libopenlibm), Float64, (Float64,), 1.0)
f6 (generic function with 1 method)
julia> g3()
ERROR: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got a value of type Tuple{Symbol, Symbol}
Stacktrace:
[1] f6()
@ Main ./REPL[15]:1
[2] top-level scope
@ REPL[16]:1
julia> const completely_unrelated = 2
2
julia> g4() = (completely_unrelated; ccall((sinsym, :libopenlibm), Float64, (Float64,), 1.0))
g4 (generic function with 1 method)
julia> g4()
0.8414709848078965
This is arguably a separate inference bug that should be addressed by properly modeling this in inference, but stems from the same underlying confusion around the evaluation semantics of the first argument of ccall.
Hidden generic function call inside :foreigncall
This one might be a bit academic, but as of #50074, there is a hidden call to Libdl.dlopen
inside :foreigncall. This dynamic call edge is not modeled and thus
susceptible to #265-like issues, invisible to trimming, etc. Now, there is already a non-standard caching here, but I wanted to list it for completeness.
Implicit caching of dlsym
This one isn't so much a problem as it is an aspect of the current design that needs
to be preserved for performance. In particular, the codegen for :foreigncall
currently looks something like (in pesudo C syntax)
static void *cache;
# Expr(:foreigncall, (lib, sym), ...)
{
if (!_cache) {
_cache = jl_lazy_load_and_lookup(lib, sym) # Potentially calls Libdl.dlopen internally
}
(*_cache)(...)
}
Plus some optimizations to fold away the lookup if it can be resolved statically by the JIT or to turn the lookup into a PLT-like structure.
Additional desirable features
An additional desirable feature that was discussed was that in the context of --trim
, we would like to have the ability to statically link executables without assuming the presence of a dynamic linker at runtime, ideally while preserving the namespace scoping behavior of the current design. This is a little tricky, because the underlying systme linker generally does not have namespacing. How exactly to do this is outside the scope of this document, but the key consideration is that this constrains us to require a solution that juliac
might be able to turn the dynamic references into static ones.
Proposed solutions
The first and most immediate question to answer is what the evaluation scope of the first argument of ccall
is. I think there are roughly three reasonable answers:
- It gets evaluated in toplevel scope at definition time, i.e. the following would error:
function foo()
ccall(sym, ...)
end # Error: UndefinedVarError(:sym)
const sym = ...
- It gets evaluated (with usual evaluation semantic) at the same time as the cache logic. I.e. we'd have the following:
foo() = ccall((println("Hello"); :sin), ...)
julia> foo()
Hello
0.8414709848078965
julia> foo() # Does not print the second time because the lookup is already cached
0.8414709848078965
-
The expression and dlsym lookup get evaluated everytime
-
The expresssion gets evaluated every time, but the
dlsym
lookup is cached the first time it gets evaluated (for a particular native code instance) -
The expression gets evaluated every time, but the
dlsym
lookup is cached the first time it gets evaluated (for a particular native code instance) plus gets recached when the value of expression changes
Discussion
Option 3 is the most straightforward behavior in that it is completely dynamic and does not rely on any compiler information. However, because of the lack of caching, it is also prohibitive. However, in general, I think everyone agreed that if it was fast, it would be a good semantic, which is a useful guiding principle for selecting which option to use.
Option 4 is somewhat reminiscent of what we have right now, except that we have extra requirements (i.e. the symbol name needs to be a constant expression). These requirements make the behavior of ccall
very confusion, though to be fair, they also by default prohibit some problematic cases.
In particular, there's a question about non-constant cases like
julia> globalsin = :sin
julia> f() = ccall(globalsin, Float64, (Float64,), 1.0)
julia> f()
julia> globalsin = :cos
julia> f()
What should this do? Currently this case is disallowed due to the (semantically strange) constantness detection on the first argument. We do not have constantness detection anywhere else in the language, and in general our semantics are entirely value and type based, so we do need to decide some behavior for this case.
The tradeoff here is essentially one of performance vs surprise. Option 4 has detect performance, but a high surprise level. The answer would change whenever f
gets re-codegen'ed which is not something that users are traditionally expected to have a mental model of. Option 2 has the same problem (but differs for ccall((println("Hello"); globalsin)))
). Option 1 goes in the direction of retaining the performance while solving the surprise problem by never-reevaluating (although Revise might do so explicitly). Option 5 goes into the opposite direction of sacrificing some performance (in the - rare, unlikely - fully dynamic case), but reducing surprise by being closer to the native Option 3.
As a general design principle in Julia, we do tend to choose the most dynamic behavior (as long as it is still possible to optimize the common case), which would weigh in favor of option 5.
Implementation Considerations
Given the above considerations, I think the general consenus was that option 5 is preferred. It is close enough to the current semantics (and identical in the cases
that people actually use) that we should be able to do it without deprecation cycle.
To implement this, I had suggested the following lowering:
@assume_effects :consistent_once_per_process :effect_free :consistent_termination dlsym(...)::Ptr{Cvoid} = ...
function do_call()
# ccall((bar, lib), ...)
ptr = inline_cache(dlsym(lib, bar))
$(Expr(:foreigncall, ptr, ...))
end
where consistent_once_per_process
is a new non-IPO effect annotation that specifies that the result of a call may be assumed :consistent
within each process (i.e. may be cached for egal
arguments). And inline_cache
is a new_builtin that is semantically a no-op, but annotates to the optimizer to perform the caching optimization (using :consistent-cy inference to determine the scope of the region to cache). @vtjnash objected to this scheme on complexity grounds, @JeffBezanson objected on IR size grounds. However, I think people generally agreed that the semantics were reasonable.
Thus, the ultimate proposal is to treat the above as the semantic representation of what :foreigncall should do, but keep both the dlsym
generic call and the inline_cache inside :foreigncall where they are now. The various places that need to
analyze this (inference, trimming, etc.) should then model :foreigncall to include the generic call to dlsym
, including providing edges for it.