Clarify ccall evaluation semantics

This aims to summarize a design discussion between @vtjnash, @JeffBezanson, @StefanKarpinski,  @gbaraldi, @topolarity, @xal-0, @mlechu, @oscardssmith and myself around ccall. Except where otherwise annotated, I've tried to capture my understanding the consensus view, but it is of course possible that I have misunderstood or failed to remember in objection. In such cases, error is mine.

## Discussion of the problems and current design

The current evaluation semantics of `ccall` are quite old and predate us having a particular good understanding of what the evaluation semantics of the language should be (and in particular, predate any notion of world ages, partitioned bindings, effects, etc.). For this reason, it is somewhat hard to give a coherent description of the current design.
However, I will try my best.

### Syntax based disambiguation of (non-)library case

There are two basic cases for ccall. The first is call with a plain symbol or ptr:

```
ccall(sym, ...)
```

The second is

```
ccall((sym, lib), ...)
```

These cases are distinguished both syntactically a semantically. In particular,
```
x = sym
ccall(x)
```
is always the same as `ccall(sym)`, but the same is not true for `x = (sym, lib)`.

## Evaluation sematics of the first argument

We first consider the basic case without libraries. Here we are familiar
with the usual ccall syntax:

```
julia> ccall(:sin, Float64, (Float64,), 1.0)
0.8414709848078965
```

What happens if we use an expression instead?

```
julia> f1() = ccall((println("Hello"); :sin), Float64, (Float64,), 1.0)
0.8414709848078965

julia> f1()
Hello
0.8414709848078965

julia> f1()
Hello
0.8414709848078965
```

So far so normal. However, this is where it starts to get weird. I think the
best way I can describe it is that ccall tries to find a symbol in this order
by:

1. Statically looking at the expression that is syntactically inside `ccall` and
   determining whether or not it can figure out the value of the first argument.

2. Using inference's constant propagation

3. Using codegen's constant propagation (but not LLVM's)

However, for the non-lib case, the first of these is generally a no-op.

This leads to the following observed behavior:

```julia
f2() = ccall((@noinline identity(:sin)), Float64, (Float64,), 1.0)
f3() = ccall(Base.compilerbarrier(:const, :sin), Float64, (Float64,), 1.0)
f4() = ccall(Base.compilerbarrier(:const, (@noinline identity(:sin))), Float64, (Float64,), 1.0)
f5() = ccall((@noinline identity(Base.compilerbarrier(:const, :sin))), Float64, (Float64,), 1.0)
```

```julia-repl
julia> f2()
0.8414709848078965

julia> f3()
0.8414709848078965

julia> f4()
0.8414709848078965

julia> f5()
ERROR: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got a value of type Symbol
Stacktrace:
 [1] f5()
   @ Main ./REPL[19]:1
 [2] top-level scope
   @ REPL[24]:1
```

To me, this is entirely unintuitive and I had to actually go read the source to figure out which cases I think would work and which didn't.

### Additional complications from lib lowering

The situation becomes more complicated when using the lib syntax:
```
julia> ccall(((println("Hello"); :sin), :openlibm), Float64, (Float64,), 1.0)
Internal error: encountered unexpected error during compilation of top-level scope:
ErrorException("unsupported or misplaced expression \"block\" in function top-level scope")
ijl_errorf at /home/keno/julia/src/rtutils.c:77
emit_expr at /home/keno/julia/src/codegen.cpp:6694

julia> function g1()
               x = (println("Hello"); :sin)
               ccall((x, :openlibm), Float64, (Float64,), 1.0)
       end
ERROR: syntax: ccall function name and library expression cannot reference local variables
Stacktrace:
 [1] top-level scope
   @ REPL[1]:1

julia> function g2()
          x = ((println("Hello"); :sin), :libopenlibm)
          ccall(x, Float64, (Float64,), 1.0)
       end
g2 (generic function with 1 method)

julia> g2()
Hello
0.8414709848078965

julia> syms() = (println("Hello"); :sin)
syms (generic function with 1 method)

julia> ccall((syms(), :openlibm), Float64, (Float64,), 1.0)
Hello
ERROR: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got a value of type Tuple{Symbol, Symbol}
Stacktrace:
 [1] top-level scope
   @ ./REPL[3]:1
```

### Additional complications from bindings partition

Post-bindings partition, there is an additional complication that both cases 1 and 3 depend on inferred world age bounds of the rest of the function, which can lead to completely non-intuitive behavior. This is #57749, although I will mention it here also:

```julia-repl
julia> const sinsym = :sin
:sin

julia> g3() = ccall((sinsym, :libopenlibm), Float64, (Float64,), 1.0)
f6 (generic function with 1 method)

julia> g3()
ERROR: TypeError: in ccall: first argument not a pointer or valid constant expression, expected Ptr, got a value of type Tuple{Symbol, Symbol}
Stacktrace:
 [1] f6()
   @ Main ./REPL[15]:1
 [2] top-level scope
   @ REPL[16]:1

julia> const completely_unrelated = 2
2

julia> g4() = (completely_unrelated; ccall((sinsym, :libopenlibm), Float64, (Float64,), 1.0))
g4 (generic function with 1 method)

julia> g4()
0.8414709848078965
```

This is arguably a separate inference bug that should be addressed by properly modeling this in inference, but stems from the same underlying confusion around the evaluation semantics of the first argument of ccall.

### Hidden generic function call inside :foreigncall

This one might be a bit academic, but as of #50074, there is a hidden call to `Libdl.dlopen` inside :foreigncall. This dynamic call edge is not modeled and thus
susceptible to #265-like issues, invisible to trimming, etc. Now, there is already a non-standard caching here, but I wanted to list it for completeness.

### Implicit caching of dlsym

This one isn't so much a problem as it is an aspect of the current design that needs
to be preserved for performance. In particular, the codegen for `:foreigncall` currently looks something like (in pesudo C syntax)

```
static void *cache;
# Expr(:foreigncall, (lib, sym), ...)
{
	if (!_cache) {
		_cache = jl_lazy_load_and_lookup(lib, sym) # Potentially calls Libdl.dlopen internally
	}
	(*_cache)(...)
}
```

Plus some optimizations to fold away the lookup if it can be resolved statically by the JIT or to turn the lookup into a PLT-like structure.

## Additional desirable features

An additional desirable feature that was discussed was that in the context of `--trim`, we would like to have the ability to statically link executables without assuming the presence of a dynamic linker at runtime, ideally while preserving the namespace scoping behavior of the current design. This is a little tricky, because the underlying systme linker generally does not have namespacing. How exactly to do this is outside the scope of this document, but the key consideration is that this constrains us to require a solution that `juliac` might be able to turn the dynamic references into static ones.

## Proposed solutions

The first and most immediate question to answer is what the evaluation scope of the first argument of `ccall` is. I think there are roughly three reasonable answers:

1. It gets evaluated in toplevel scope at definition time, i.e. the following would error:
```
function foo()
	ccall(sym, ...)
end # Error: UndefinedVarError(:sym)
const sym = ...
```

2. It gets evaluated (with usual evaluation semantic) at the same time as the cache logic. I.e. we'd have the following:

```
foo() = ccall((println("Hello"); :sin), ...)

julia> foo()
Hello
0.8414709848078965

julia> foo() # Does not print the second time because the lookup is already cached
0.8414709848078965
```

3. The expression and dlsym lookup get evaluated everytime

4. The expresssion gets evaluated every time, but the `dlsym` lookup is cached the first time it gets evaluated (for a particular native code instance)

5. The expression gets evaluated every time, but the `dlsym` lookup is cached the first time it gets evaluated (for a particular native code instance) plus gets recached when the value of expression changes

### Discussion

Option 3 is the most straightforward behavior in that it is completely dynamic and does not rely on any compiler information. However, because of the lack of caching, it is also prohibitive. However, in general, I think everyone agreed that if it was fast, it would be a good semantic, which is a useful guiding principle for selecting which option to use.

Option 4 is somewhat reminiscent of what we have right now, except that we have extra requirements (i.e. the symbol name needs to be a constant expression). These requirements make the behavior of `ccall` very confusion, though to be fair, they also by default prohibit some problematic cases.

In particular, there's a question about non-constant cases like
```
julia> globalsin = :sin
julia> f() = ccall(globalsin, Float64, (Float64,), 1.0)
julia> f()
julia> globalsin = :cos
julia> f()
```

What should this do? Currently this case is disallowed due to the (semantically strange) constantness detection on the first argument. We do not have constantness detection anywhere else in the language, and in general our semantics are entirely value and type based, so we do need to decide some behavior for this case.

The tradeoff here is essentially one of performance vs surprise. Option 4 has detect performance, but a high surprise level. The answer would change whenever `f` gets re-codegen'ed which is not something that users are traditionally expected to have a mental model of. Option 2 has the same problem (but differs for `ccall((println("Hello"); globalsin)))`). Option 1 goes in the direction of retaining the performance while solving the surprise problem by never-reevaluating (although Revise might do so explicitly). Option 5 goes into the opposite direction of sacrificing some performance (in the - rare, unlikely - fully dynamic case), but reducing surprise by being closer to the native Option 3.

As a general design principle in Julia, we do tend to choose the most dynamic behavior (as long as it is still possible to optimize the common case), which would weigh in favor of option 5.

### Implementation Considerations

Given the above considerations, I think the general consenus was that option 5 is preferred. It is close enough to the current semantics (and identical in the cases
that people actually use) that we should be able to do it without deprecation cycle.

To implement this, I had suggested the following lowering:

```
@assume_effects :consistent_once_per_process :effect_free :consistent_termination dlsym(...)::Ptr{Cvoid} = ... 

function do_call()
	# ccall((bar, lib), ...)
	ptr = inline_cache(dlsym(lib, bar))
	$(Expr(:foreigncall, ptr, ...))
end
```

where `consistent_once_per_process` is a new non-IPO effect annotation that specifies that the result of a call may be assumed `:consistent` within each process (i.e. may be cached for `egal` arguments). And `inline_cache` is a new_builtin that is semantically a no-op, but annotates to the optimizer to perform the caching optimization (using :consistent-cy inference to determine the scope of the region to cache). @vtjnash objected to this scheme on complexity grounds, @jeffbezanson objected on IR size grounds. However, I think people generally agreed that the semantics were reasonable.

Thus, the ultimate proposal is to treat the above as the semantic representation of what :foreigncall should do, but keep both the `dlsym` generic call and the inline_cache inside :foreigncall where they are now. The various places that need to 
analyze this (inference, trimming, etc.) should then model :foreigncall to include the generic call to `dlsym`, including providing edges for it.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Clarify ccall evaluation semantics #57931

Discussion of the problems and current design

Syntax based disambiguation of (non-)library case

Evaluation sematics of the first argument

Additional complications from lib lowering

Additional complications from bindings partition

Hidden generic function call inside :foreigncall

Implicit caching of dlsym

Additional desirable features

Proposed solutions

Discussion

Implementation Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Clarify ccall evaluation semantics #57931

Description

Discussion of the problems and current design

Syntax based disambiguation of (non-)library case

Evaluation sematics of the first argument

Additional complications from lib lowering

Additional complications from bindings partition

Hidden generic function call inside :foreigncall

Implicit caching of dlsym

Additional desirable features

Proposed solutions

Discussion

Implementation Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions