Skip to content

Conversation

nickrobinson251
Copy link
Member

No description provided.

KristofferC and others added 30 commits March 1, 2024 10:30
…re in the sysimage (JuliaLang#52841)

When triggers of extension are in the sysimage it is easy to end up with
cycles in package loading. Say we have a package A with exts BExt and
CExt and say that B and C is in the sysimage.

- Upon loading A, we will immidiately start to precompile BExt (because
the trigger B is "loaded" by virtue of being in the sysimage).
- BExt will load A which will cause CExt to start precompiling (again
because C is in the sysimage).
- CExt will load A which will now cause BExt to start loading and we get
a cycle.

This is fixed in this PR by instead of looking at what modules are
loaded, we look at what modules are actually `require`d and only use
that to drive the loading of extensions.

Fixes JuliaLang#52132.

(cherry picked from commit 08d229f)
# Conflicts:
#	VERSION

# Conflicts:
#	VERSION

# Conflicts:
#	VERSION

# Conflicts:
#	VERSION

# Conflicts:
#	VERSION

# Conflicts:
#	VERSION
This needed updating for 1.10 (#102).

* port pool stats to 1.10

* increment/decrement current_pg_count

---------

Co-authored-by: K Pamnany <[email protected]>
Prepend `[signal (X) ]thread (Y) ` to each backtrace line that is
displayed.

Co-authored-by: Diogo Netto <[email protected]>
* Add GC metric `last_incremental_sweep`

* Update gc.c

* Update gc.c
Prevent transparent huge pages (THP) overallocating pysical memory.

Co-authored-by: Adnan Alhomssi <[email protected]>
Pass the types to the allocator functions.

-------

Before this PR, we were missing the types for allocations in two cases:

1. allocations from codegen
2. allocations in `gc_managed_realloc_`

The second one is easy: those are always used for buffers, right?

For the first one: we extend the allocation functions called from
codegen, to take the type as a parameter, and set the tag there.

I kept the old interfaces around, since I think that they cannot be
removed due to supporting legacy code?

------

An example of the generated code:
```julia
  %ptls_field6 = getelementptr inbounds {}**, {}*** %4, i64 2
  %13 = bitcast {}*** %ptls_field6 to i8**
  %ptls_load78 = load i8*, i8** %13, align 8
  %box = call noalias nonnull dereferenceable(32) {}* @ijl_gc_pool_alloc_typed(i8* %ptls_load78, i32 1184, i32 32, i64 4366152144) #7
```

Fixes JuliaLang#43688.
Fixes JuliaLang#45268.

Co-authored-by: Valentin Churavy <[email protected]>
Sweeping of object pools will either construct a free list through dead objects (if there is at least one live object in a given page) or return the page to the OS (if there are no live objects whatsoever). With this PR, we're basically constructing the free-lists for each GC page in parallel.
GC threads don't have tasks associated with them.
Presence is controlled by a build-time option. Start a separate
thread which simply sleeps. When heartbeats are enabled, this
thread wakes up at specified intervals to verify that user code
is heartbeating as requested and if not, prints task backtraces.

Also fixes the call to `maxthreadid` in `generate_precompile.jl`.
When enabling heartbeats, the user must specify:
- heartbeat_s: jl_heartbeat() must be called at least once every heartbeat_s; if it
  isn't, a one-line heartbeat loss report is printed
- show_tasks_after_n: after these many heartbeat_s have passed without jl_heartbeat()
  being called, print task backtraces and stop all reporting
- reset_after_n: after these many heartbeat_s have passed with jl_heartbeat()
  being called, print a heartbeats recovered message and reset reporting
`pool_live_bytes` was previously lazily updated during the GC, meaning
it was only accurate right after a GC.

Make this metric accurate if gathered after a GC has happened.
Otherwise we may just observe `gc_n_threads = 0` (`jl_gc_collect` sets
it to 0 in the very end of its body) and this function becomes a no-op.
…uliaLang#52164)

One of the limitations is that it's only accurate right after the GC.
Still might be helpful for observability purposes.
We're suffering from heavy fragmentation in some of our workloads.

Add a build-time option to enable 4k pages (instead of 16k) in the GC,
since that improves memory utilization considerably for us.

Drawback is that this may increase the number of `madvise` system calls
in the sweeping phase by a factor of 4, but concurrent page sweeping
should help with some of that.
…uliaLang#52943)

**EDIT**: fixes JuliaLang#52937 by
decreasing the contention on the page lists and only waking GC threads
up if we have a sufficiently large number of pages.

Seems to address the regression from the MWE of
JuliaLang#52937:

- master:
```
../julia-master/julia --project=. run_benchmarks.jl serial obj_arrays issue-52937 -n5 --gcthreads=1
bench = "issue-52937.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      24841 │     818 │        78 │        740 │           44 │             10088 │       96 │          3 │
│  median │      24881 │     834 │        83 │        751 │           45 │             10738 │       97 │          3 │
│ maximum │      25002 │     891 │        87 │        803 │           48 │             11074 │      112 │          4 │
│   stdev │         78 │      29 │         4 │         26 │            1 │               393 │        7 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
 ../julia-master/julia --project=. run_benchmarks.jl serial obj_arrays issue-52937 -n5 --gcthreads=8
bench = "issue-52937.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      29113 │    5200 │        68 │       5130 │           12 │              9724 │       95 │         18 │
│  median │      29354 │    5274 │        69 │       5204 │           12 │             10456 │       96 │         18 │
│ maximum │      29472 │    5333 │        70 │       5264 │           14 │             11913 │       97 │         18 │
│   stdev │        138 │      54 │         1 │         55 │            1 │               937 │        1 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

- PR:
```
../julia-master/julia --project=. run_benchmarks.jl serial obj_arrays issue-52937 -n5 --gcthreads=1
bench = "issue-52937.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      24475 │     761 │        77 │        681 │           40 │              9499 │       94 │          3 │
│  median │      24845 │     775 │        80 │        698 │           43 │             10793 │       97 │          3 │
│ maximum │      25128 │     811 │        85 │        726 │           47 │             12820 │      113 │          3 │
│   stdev │        240 │      22 │         3 │         21 │            3 │              1236 │        8 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
../julia-master/julia --project=. run_benchmarks.jl serial obj_arrays issue-52937 -n5 --gcthreads=8
bench = "issue-52937.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      24709 │     679 │        70 │        609 │           11 │              9981 │       95 │          3 │
│  median │      24869 │     702 │        70 │        631 │           12 │             10705 │       96 │          3 │
│ maximum │      24911 │     708 │        72 │        638 │           13 │             10820 │       98 │          3 │
│   stdev │         79 │      12 │         1 │         12 │            1 │               401 │        1 │          0 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

Also, performance on `objarray.jl` (an example of benchmark in which
sweeping parallelizes well with the current implementation) seems fine:

- master:
```
../julia-master/julia --project=. run_benchmarks.jl multithreaded bigarrays -n5 --gcthreads=1      
bench = "objarray.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      19301 │   10792 │      7485 │       3307 │         1651 │               196 │     4519 │         56 │
│  median │      21415 │   12646 │      9094 │       3551 │         1985 │               241 │     6576 │         59 │
│ maximum │      21873 │   13118 │      9353 │       3765 │         2781 │               330 │     8793 │         60 │
│   stdev │       1009 │     932 │       757 │        190 │          449 │                50 │     1537 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
../julia-master/julia --project=. run_benchmarks.jl multithreaded bigarrays -n5 --gcthreads=8
bench = "objarray.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      13135 │    4377 │      3350 │       1007 │          491 │               231 │     6062 │         33 │
│  median │      13164 │    4540 │      3370 │       1177 │          669 │               256 │     6383 │         35 │
│ maximum │      13525 │    4859 │      3675 │       1184 │          748 │               320 │     7528 │         36 │
│   stdev │        183 │     189 │       146 │         77 │          129 │                42 │      584 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```

- PR:
```
../julia-master/julia --project=. run_benchmarks.jl multithreaded bigarrays -n5 --gcthreads=1    
bench = "objarray.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      19642 │   10931 │      7566 │       3365 │         1653 │               204 │     5688 │         56 │
│  median │      21441 │   12717 │      8948 │       3770 │         1796 │               217 │     6972 │         59 │
│ maximum │      23494 │   14643 │     10576 │       4067 │         2513 │               248 │     8229 │         62 │
│   stdev │       1408 │    1339 │      1079 │        267 │          393 │                19 │      965 │          2 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
../julia-master/julia --project=. run_benchmarks.jl multithreaded bigarrays -n5 --gcthreads=8
bench = "objarray.jl"
┌─────────┬────────────┬─────────┬───────────┬────────────┬──────────────┬───────────────────┬──────────┬────────────┐
│         │ total time │ gc time │ mark time │ sweep time │ max GC pause │ time to safepoint │ max heap │ percent gc │
│         │         ms │      ms │        ms │         ms │           ms │                us │       MB │          % │
├─────────┼────────────┼─────────┼───────────┼────────────┼──────────────┼───────────────────┼──────────┼────────────┤
│ minimum │      13365 │    4544 │      3389 │       1104 │          516 │               255 │     6349 │         34 │
│  median │      13445 │    4624 │      3404 │       1233 │          578 │               275 │     6385 │         34 │
│ maximum │      14413 │    5278 │      3837 │       1441 │          753 │               300 │     7547 │         37 │
│   stdev │        442 │     303 │       194 │        121 │           89 │                18 │      522 │          1 │
└─────────┴────────────┴─────────┴───────────┴────────────┴──────────────┴───────────────────┴──────────┴────────────┘
```
This PR is to continue the work on the following PR:

Prevent OOMs during heap snapshot: Change to streaming out the snapshot
data (JuliaLang#51518 )

Here are the commit history:

```
* Streaming the heap snapshot!

This should prevent the engine from OOMing while recording the snapshot!

Now we just need to sample the files, either online, before downloading, or offline after downloading :)

If we're gonna do it offline, we'll want to gzip the files before downloading them.

* Allow custom filename; use original API

* Support legacy heap snapshot interface. Add reassembly function.

* Add tests

* Apply suggestions from code review

* Update src/gc-heap-snapshot.cpp

* Change to always save the parts in the same directory

This way you can always recover from an OOM

* Fix bug in reassembler: from_node and to_node were in the wrong order

* Fix correctness mistake: The edges have to be reordered according to the node order. That's the whole reason this is tricky.

But i'm not sure now whether the SoAs approach is actually an optimization.... It seems like we should probably prefer to inline the Edges right into the vector, rather than having to do another random lookup into the edges table?

* Debugging messed up edge array idxs

* Disable log message

* Write the .nodes and .edges as binary data

* Remove unnecessary logging

* fix merge issues

* attempt to add back the orphan node checking logic
```

---------

Co-authored-by: Nathan Daly <[email protected]>
Co-authored-by: Nathan Daly <[email protected]>
…53512)

This is a partial back-port of JuliaLang#50924, where we discovered that the
optimizer would ignore:
  1. must-throw `%XX = SlotNumber(_)` statements
  2. must-throw `goto #bb if not %x` statements

This is mostly harmless, except that in the case of (1) we can
accidentally fall through the statically deleted (`Const()`-wrapped)
code from inference and end up observing a control-flow edge that never
existed.

If the spurious edge is to a catch block, then the edge is invalid
semantically and breaks our SSA conversion.

This one-line change fixes (1) but not (2), which is enough for IR
validity.

Resolves part of JuliaLang#53366.

(cherry picked from commit 035d17a)
…liaLang#53553)

typeintersect: fix `UnionAll` unaliasing bug caused by innervars.
(cherry picked from commit 56f1c8a)
d-netto and others added 4 commits February 2, 2025 18:47
…7045) (#208)

This is still a work in progress, but it should help determine what a
straggler thread was doing during the stop-the-world phase and why it
failed to reach a safepoint in a timely manner.

We've encountered long TTSP issues in production, and this tool should
provide a valuable means to accurately diagnose them.
@nickrobinson251 nickrobinson251 changed the title [DO NOT MERGE] Comparison of v1.10.2+RAI to v1.10.2 DO NOT MERGE: Comparison of v1.10.2+RAI to v1.10.2 Feb 5, 2025
charnik and others added 25 commits February 10, 2025 20:09
…215)

Minor tweak to the error message: embed the exit code of the Julia child
process that failed to compile the package.
* inference: avoid inferring unreachable code methods (JuliaLang#51317)

(cherry picked from commit 0a82b71)

* inference: ensure inferring reachable code methods (JuliaLang#57088)

PR JuliaLang#51317 was a bit over-eager about inferring inferring unreachable
code methods. Filter out the Vararg case, since that can be handled by
simply removing it instead of discarding the whole call.

Fixes JuliaLang#56628

(cherry picked from commit eb9f24c)

---------

Co-authored-by: Jameson Nash <[email protected]>
…gler (JuliaLang#57579)

In the line of C code:

```C
const int64_t timeout = jl_options.timeout_for_safepoint_straggler_s * 1000000000;
```

`jl_options.timeout_for_safepoint_straggler_s` is an `int16_t` and
`1000000000` is an `int32_t`.

The result of `jl_options.timeout_for_safepoint_straggler_s *
1000000000` will be an `int32_t` which may not be large enough to hold
the value of `jl_options.timeout_for_safepoint_straggler_s` after
converting to nanoseconds, leading to overflow.
This was resulting in it being too aggressive at filtering out
"duplicate" results, resulting in possible inference mistakes or missing
guardsig entries.

Fixes:
JuliaLang#50722 (comment)
(cherry picked from commit 762801c)

Co-authored-by: Jameson Nash <[email protected]>
* Use faster PRNG in the allocations profiler (JuliaLang#57761)

`rand()` locks and is slow. This uses the seed from `ptls`.

* Use correct arguments for 1.10's `cong()`

Also remove a warning from array.c
This does not fix the underlying issue that can occur here which is a
collision of build_ids.lo between modules in IR decompression. Fixing
that requires a somewhat significant overhaul to the serialization of IR
(probably using the module identity as a key). This does mean we use a
lot more of the bits available here so it makes collisions a lot less
likely( they were already extremely rare) but hrtime does tend to only
use the lower bits of a 64 bit integer and this will hopefully add some
more randomness and make this even less likely

Co-authored-by: Gabriel Baraldi <[email protected]>
* static-show: improve accuracy of some printings (JuliaLang#52799)

- Show strings with escaping, rather than trying to output the text
unmodified.
- Show symbols with the same formatting as Strings
- Avoid accidentally defining a broken Core.show method for NamedTuple

* Make more types jl_static_show unambiguously (JuliaLang#58512)

Makes more types survive `jl_static_show` unambiguously:
- Symbols
- Symbols printed in the `:var"foo"` form use raw string escaping,
fixing `:var"a\b"`, `:var"a\\"`, `:var"$a"`, etc.
  - Symbols that require parens use parens (`:(=)`, ...)
- Signed integers: Except for `Int`, signed integers print like
`Int8(1)`.
- Floats: floats are printed in a naive but reversible (TODO: double
check) way. `Inf(16|32|)` and `NaN(16|32|)` are printed, and
`Float16`/`Float32` print the type (`Float32(1.5)`). `Float64`s are
printed with a trailing `.0` if it is necessary to disambiguate from
`Int`.

Fixes JuliaLang#52677,
JuliaLang#58484 (comment),
JuliaLang#58484 (comment),
and the specific case mentioned in JuliaLang#58484. Improves the situation for
round-trip (inexhaustive list):
- Non-canonical NaNs
- BFloat16
- User-defined primitive types. This one is tricky, because they can
have a size different from any type we have literals for.

* Use `julia__gnu_h2f_ieee` instead of `julia_half_to_float`

`julia_half_to_float` came in with an LLVM version upgrade after
v1.10.

---------

Co-authored-by: Jameson Nash <[email protected]>
Co-authored-by: Sam Schweigel <[email protected]>
* gf.c: include const-return methods in `--trace-compile`

These are never compiled by LLVM, but we want to log them since they are
inferred / compiled by our own compiler.

* Drop comment for const-return compilation traces

---------

Co-authored-by: Cody Tapscott <[email protected]>
…#240)

Pkg right now has to start a separate process to run precompilation for
the test environment which is annoying for multiple reasons

Corresponding Pkg PR: JuliaLang/Pkg.jl#3792

Co-authored-by: Kristoffer Carlsson <[email protected]>
Now that we re-export quite a lot from `Core` it seems sensible to
remove. This allows constructors like `Tuple{Type{Vector{Foo}},
UndefInitializer, Tuple{Int}}` to precompile properly.

Appears to have a minimal effect on the stdlib pkgimages:
```julia
--- before.txt  2025-05-23 08:36:20.171870043 -0400
+++ after.txt   2025-05-22 14:48:49.003869097 -0400
@@ -47,7 +47,7 @@
  20K ../julia/usr/share/julia/compiled/v1.13/Logging/pkgimage.so
  20K ../julia/usr/share/julia/compiled/v1.13/Logging/pkgimage.so
 3.5M ../julia/usr/share/julia/compiled/v1.13/Markdown/pkgimage.so
-3.6M ../julia/usr/share/julia/compiled/v1.13/Markdown/pkgimage.so
+3.5M ../julia/usr/share/julia/compiled/v1.13/Markdown/pkgimage.so
 184K ../julia/usr/share/julia/compiled/v1.13/Mmap/pkgimage.so
 184K ../julia/usr/share/julia/compiled/v1.13/Mmap/pkgimage.so
  28K ../julia/usr/share/julia/compiled/v1.13/MozillaCACerts_jll/pkgimage.so
```
Alternative to JuliaLang#58146.

We want to compile a subset of the possible specializations of a
function. To this end, we have a number of manually written `precompile`
statements. Creating this list is, unfortunately, error-prone, and the
list is also liable to going stale. Thus we'd like to validate each
`precompile` statement in the list.

The simple answer is, of course, to actually run the `precompile`s, and
we naturally do so, but this takes time.

We would like a relatively quick way to check the validity of a
`precompile` statement.
This is a dev-loop optimization, to allow us to check "is-precompilable"
in unit tests.

We can't use `hasmethod` as it has both false positives (too loose):
```julia
julia> hasmethod(sum, (AbstractVector,))
true

julia> precompile(sum, (AbstractVector,))
false

julia> Base.isprecompilable(sum, (AbstractVector,)) # <- this PR
false
```
and also false negatives (too strict):
```julia
julia> bar(@nospecialize(x::AbstractVector{Int})) = 42
bar (generic function with 1 method)

julia> hasmethod(bar, (AbstractVector,))
false

julia> precompile(bar, (AbstractVector,))
true

julia> Base.isprecompilable(bar, (AbstractVector,)) # <- this PR
true
```
We can't use `hasmethod && isconcretetype` as it has false negatives
(too strict):
```julia
julia> has_concrete_method(f, argtypes) = all(isconcretetype, argtypes) && hasmethod(f, argtypes)
has_concrete_method (generic function with 1 method)

julia> has_concrete_method(bar, (AbstractVector,))
false

julia> has_concrete_method(convert, (Type{Int}, Int32))
false

julia> precompile(convert, (Type{Int}, Int32))
true

julia> Base.isprecompilable(convert, (Type{Int}, Int32))  # <- this PR
true
```
`Base.isprecompilable` is essentially `precompile` without the actual
compilation.

Co-authored-by: Kiran Pamnany <[email protected]>
…iaLang#59327) (#251)

Fixes JuliaLang#59326.

Change the logic that decides not to specialize a function parameter
based on whether or not the supplied argument is a Function, and that
function is not used, so that it will still work if the SpecType is a
Union{Function,Nothing} or any other union that contains a Function.

The logic is changed from a hardcoded rule of `type_i == Function ||
type_i == Any || type_i == Base.Callable` to `type_i >: Function`.

This covers all of the above cases, but also includes custom
`Union{Function, T}` such as `Union{Function, Nothing}`.

---------

Co-authored-by: Nick Robinson <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.