Skip to content

ccall: make distinction of pointer vs name a syntactic distinction #59165

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Jul 31, 2025

We have long expected users to be explicit about the library name for ccall, and the @ccall macro has even always enforced that. That means users should have already been using explicit syntax, even though it wasn't strictly enforced. And indeed, the other syntax forms weren't handled reliably anyways (since doing so would require linearizing IR if and only if the runtime values required it, which is not something that is computable, and thus was often done wrong). This now intends to align the runtime and compiler to expect only those syntax forms that we could reliably handle in the past without errors, and adds explicit errors for other cases, most of which we previously knew would be unreliable due to reliance upon inference in making particular decisions for the semantics. The ccall function is already very special since it is more like a actual macro (it does not exist as a binding or value), so we can make unusual syntax decisions like this, mirroring @ccall also.

This fixes #57931, mostly by restricting the set of things that are allowed to the set of things that have an obvious and pre-existing behavior to be guaranteed to do that behavior. The hope is to PkgEval this to check if any packages are doing something unusual and see if we even need to allow anything else.

This drops support for #37123, since we were going to use that for LazyLibraries, be we decided that approach was quite buggy and that PR would make static compilation quite impossible to support, so we instead actually implemented LazyLibraries with a different approach. It could be re-enabled, but we never had correct lowering or inference support for it, so it is presumably still unused.

The goal is to cause breakage only where the package authors really failed to express intent with syntax, and otherwise to explicitly maintain support by adding cases to normalize the given syntax into one of the supported cases. All existing functionality (and more) can be accessed by explicit management of a pointer or by a LazyLibrary-like type, so this shouldn't cause any reduction in possible functionality, just possibly altered syntax.

In followup work, we may want to apply some similar improvements to the cglobal semantics, though not nearly as urgent, since cglobal on a pointer is permitted as a complicated bitcast right now, which doesn't make sense and probably should be deprecated, so that there is no ambiguity with that function's meaning.

Mostly written by Claude

@vtjnash vtjnash added minor change Marginal behavior change acceptable for a minor release ffi foreign function interfaces, ccall, etc. backport 1.12 Change should be backported to release-1.12 labels Jul 31, 2025
@Keno
Copy link
Member

Keno commented Jul 31, 2025

Can we specify that the variable must be a constant binding (in the evaluation world) and have it participate in the invalidation? I don't know that I like the time traveling binding behavior. Seems like a nice change otherwise though.

@vtjnash vtjnash added the needs pkgeval Tests for all registered packages should be run with this change label Jul 31, 2025
@vtjnash
Copy link
Member Author

vtjnash commented Jul 31, 2025

No, far too much code (all JLLs before they are recompiled to be lazy) relies upon the dynamic behavior of allowing a non-constant binding to evaluate at runtime and cache. Basically, I don't consider this to be time traveling anymore either, but rather that it is intended to express equivalent behavior to there existing a separate const cache variable, which can be out of date (ahead or behind) the variable that is supposed to compute it, and thus enables explicitly rejecting the notion of time travel here (data never time travels, only bindings and functions):

const call_target = Ref(C_NULL)
global lib = "runtime value"
global name = "runtime value"
f() = @atomiconce call_target[] = dlsym(name, dlopen(lib))

@Keno
Copy link
Member

Keno commented Jul 31, 2025

Ok. Perhaps we could still invalidate on binding replacement though? I don't think that'd be much harder to implement, and gets us 90% of the way there (and non-const could be disallowed in the future or in strict mode).

@vtjnash
Copy link
Member Author

vtjnash commented Jul 31, 2025

I'm still not sure I quite follow the request. Are you suggesting we add a semantic dependence on data, or simply best-effort? I think we already will do a best effort attempt which should happen in most cases (particularly after Jeff's PR, and other related corrections to trimming)

@Keno
Copy link
Member

Keno commented Jul 31, 2025

My request is to make that invalidation a semantic guarantee by making the completely generic fallback path record the world age it was created in and then checking if the latest binding partition covers that world age.

@topolarity topolarity changed the title ccall: make distinction of pointer vs name a syntatic distinction ccall: make distinction of pointer vs name a syntactic distinction Jul 31, 2025
@topolarity
Copy link
Member

topolarity commented Jul 31, 2025

Thanks for taking the initiative!

This feels like a much more pragmatic (and complete) solution to #57931 than anything else we've implemented so far.

I think I prefer this to the more dynamic options proposed in #57931, esp. since those only covered the case of a dynamically-chosen library ccall'd with statically-evaluated ABI / types. That seems like a narrow use case (and is anyway already possible with a function pointer and some amount of manual caching, as you mention) so preserving our existing static-by-default semantics, as determined by syntax, feels like a much less radical change to me.

This drops support for #37123, since we were going to use that for LazyLibraries, be we decided that approach was quite buggy and that PR would make static compilation quite impossible to support

Why does this make static compilation impossible?

It doesn't seem like, e.g., the behavior proposed in solution (5) in #57931 is impossible to statically compile, assuming it is type-stable (but not necessarily value-stable) - but maybe you were referring just to the particular implementation in that PR?

@vtjnash
Copy link
Member Author

vtjnash commented Jul 31, 2025

Why does this make static compilation impossible?

It seemed mainly that we couldn't quite define whether it was a static name expression or a dynamic pointer so it couldn't be quite linearized so it was sort of an embedded eval call in some unknown world

@@ -151,7 +151,7 @@ using ..GMP: BigInt, Limb, BITS_PER_LIMB, libgmp
const mpz_t = Ref{BigInt}
const bitcnt_t = Culong

gmpz(op::Symbol) = (Symbol(:__gmpz_, op), libgmp)
gmpz(op::Symbol) = Expr(:tuple, QuoteNode(Symbol(:__gmpz_, op)), GlobalRef(MPZ, :libgmp))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't be surprised if some other packages do this. We might need to allow it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we might, but until PkgEval results finish, we won't know if we need to add yet more cases or are fine with what we already have covering all uses. It is fairly easy to add (it is the same as the syntactic string case), but we won't know if it worth documenting yet another optional form of the same thing until we finish the test run.

if (nargs_tuple == 0)
jl_error("ccall function name cannot be empty tuple");
if (nargs_tuple > 2)
jl_error("ccall function name tuple can have at most 2 elements");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these kinds of syntax validity checks should be done by lowering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are value checks (it is preparing to call jl_toplevel_eval), so they are not straightforward to have duplicated in both of our lowering implementations now

@JeffBezanson
Copy link
Member

I'm not sure we should do this. I'm willing to hold out for a version that eliminates static_eval completely and has simpler semantics.

@vtjnash
Copy link
Member Author

vtjnash commented Aug 1, 2025

This version does eliminate it semantically, though I didn't quite remove it, it only gets trivial cases now and only as an optimization (to peek at whether the expression is constant before emitting a branch to contain it). It is still used by cglobal, which I left as followup work.

We have long expected users to be explicit about the library name for
`ccall`, and the `@ccall` macro has even always enforced that. That
means users should have already been using explicit syntax. And indeed,
other syntax forms weren't handled reliably (since doing so would
require linearizing IR if and only if the runtime values required it,
which is not something that is computable, and thus was often done
wrong). This now aligns the runtime and compiler to expect only syntax
forms that we could reliably handle before without errors, and adds
explicit errors for cases that we previously knew would be unreliable
because they relied upon inference making particular decisions for the
semantics. The `ccall` function is already very special since it is more
like a actual macro (it does not exist as a binding or value), so we can
make unusual syntax decisions like this, mirroring `@ccall`.

This drops support for #37123,
since we were going to use that for LazyLibraries, be we decided that
approach was quite buggy and that PR would make static compilation quite
impossible to support, so we instead actually implemented LazyLibraries
with a different approach. It could be re-enabled, but we never had
correct lowering or inference support for it, so it is presumably
still unused.

The goal is to cause breakage only where the package authors really
failed to express intent with syntax, and otherwise to explicitly
maintain support by adding cases to normalize the given syntax into one
of the supported cases. All existing functionality (and more) can be
accessed by explicit management of a pointer or by a LazyLibrary-like
type, so this shouldn't cause any reduction in possible functionality,
just possibly altered syntax.
@vtjnash vtjnash force-pushed the jn/syntax-verified-ccall branch from 8aed4d7 to cce193b Compare August 1, 2025 16:56
@topolarity
Copy link
Member

Unsurprisingly, looks like JuliaInterpreter and friends need some adjusting: https://buildkite.com/julialang/julia-master/builds/49600#01986719-957f-4f8a-9636-9470d50ae64b/836-1055

@vtjnash
Copy link
Member Author

vtjnash commented Aug 2, 2025

Yes, it looks like given the choices among possible buggy reimplementations of the ill-defined static_eval previously implemented here, it chose one of the buggiest versions (it just calls eval on the first argument and embeds the result into the new IR)

@vtjnash
Copy link
Member Author

vtjnash commented Aug 2, 2025

@nanosoldier runtests()

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

Report summary

❗ Packages that crashed

5 packages crashed on the previous version too.

✖ Packages that failed

988 packages failed only on the current version.

  • Package fails to precompile: 908 packages
  • Illegal method overwrites during precompilation: 13 packages
  • Package has test failures: 5 packages
  • Package tests unexpectedly errored: 21 packages
  • Networking-related issues were detected: 9 packages
  • There were unidentified errors: 2 packages
  • Tests became inactive: 1 packages
  • Test duration exceeded the time limit: 28 packages
  • Test log exceeded the size limit: 1 packages

1192 packages failed on the previous version too.

✔ Packages that passed tests

12 packages passed tests only on the current version.

  • Other: 12 packages

4832 packages passed tests on the previous version too.

~ Packages that at least loaded

3 packages successfully loaded only on the current version.

  • Other: 3 packages

2857 packages successfully loaded on the previous version too.

➖ Packages that were skipped altogether

6 packages were skipped only on the current version.

  • Package could not be installed: 6 packages

904 packages were skipped on the previous version too.

@vtjnash
Copy link
Member Author

vtjnash commented Aug 2, 2025

It looks like a majority of those were due to a recent breaking change in FFTW (JuliaMath/FFTW.jl#207) which could just be partly reverted it seems (it had been in preparation for LazyLibraries, but then we went a different way with implementation of those, so now this should be reverted to actually support them)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 1.12 Change should be backported to release-1.12 ffi foreign function interfaces, ccall, etc. minor change Marginal behavior change acceptable for a minor release needs pkgeval Tests for all registered packages should be run with this change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clarify ccall evaluation semantics
5 participants