Skip to content

[POC] gc: permanently mark pkg/sysimage objects to speed up GC#61474

Draft
topolarity wants to merge 1 commit intomasterfrom
ct/image-gc
Draft

[POC] gc: permanently mark pkg/sysimage objects to speed up GC#61474
topolarity wants to merge 1 commit intomasterfrom
ct/image-gc

Conversation

@topolarity
Copy link
Copy Markdown
Member

@topolarity topolarity commented Apr 1, 2026

Image objects are already never freed and are ~somewhat rarely mutated, making them candidates for "pretenuring".

Load image objects as permanently marked (GC_OLD_MARKED) so gc_try_setmark_tag returns 0 immediately and the mark phase never enters the image subgraph. Maintain any mutations as a dedicated image_remset which effectively roots any "new" referents that are referred to by the image subgraph.

This significantly speeds up (full) GC pause times when dominated by immutable objects in the sysimage / pkgimage heaps:

  GC full (not partial / quick) collection times:
  ┌───────────────────────────────┬─────────┬─────────┬─────────┐
  │           Benchmark           │ 1.12.5  │ Nightly │ This PR │
  ├───────────────────────────────┼─────────┼─────────┼─────────┤
  │ 1. Baseline (sysimage only)   │ 37.3 ms │ 40.3 ms │ 1.4 ms  │
  ├───────────────────────────────┼─────────┼─────────┼─────────┤
  │ 2. After loading packages     │ 39.4 ms │ 43.2 ms │ 3.4 ms  │
  ├───────────────────────────────┼─────────┼─────────┼─────────┤
  │ 3. Allocation pressure (1 MB) │ 40.0 ms │ 43.7 ms │ 3.5 ms  │ 
  ├───────────────────────────────┼─────────┼─────────┼─────────┤
  │ 4. Growing live set           │ 40.8 ms │ 43.8 ms │ 4.0 ms  │
  ├───────────────────────────────┼─────────┼─────────┼─────────┤
  │ 5. After method definitions   │ 40.5 ms │ 43.0 ms │ 4.1 ms  │
  ├───────────────────────────────┼─────────┼─────────┼─────────┤
  │ 6. Heavy JIT (eval in loop)   │ 42.3 ms │ 47.1 ms │ 7.4 ms  │
  ├───────────────────────────────┼─────────┼─────────┼─────────┤
  │ 7. Mixed 5-cycle pattern      │ 84.4 ms │ 93.3 ms │ 10.9 ms │
  ├───────────────────────────────┼─────────┼─────────┼─────────┤
  │ 8. Large transient (10 MB)    │ 41.8 ms │ 46.7 ms │ 4.4 ms  │
  └───────────────────────────────┴─────────┴─────────┴─────────┘

Partial / "quick" GC pause times are essentially unchanged. Benchmark script

Co-authored-by: Claude Opus 4.6 noreply@anthropic.com 🤖
This commit is almost entirely written by Claude after laying out a plan together.

Draft because I think the sweep in staticdata.c is probably unnecessary. I'm hoping to remove that before this is ready for review. This can likely also be generalized slightly to a "permalloc / pretenure" operation that applies to objects not necessarily in images. I'm not sure whether it'd be possible "unfreeze" those objects once they are promoted to permalloc'd / pretenured though.

@topolarity topolarity changed the title [POC] gc: pretenure image objects to skip sys/pkgimage sub-heap in mark phase [POC] gc: permanently mark pkg/sysimage objects to speed up GC Apr 1, 2026
Load image objects as permanently marked (GC_OLD_MARKED) so
gc_try_setmark_tag returns 0 immediately and the mark phase never
enters the image subgraph. This reduces full-sweep mark times by
10-25x for typical workloads with loaded packages.

Image objects live in separate mmap'd regions, are never freed, and
are rarely mutated, making them ideal candidates for pretenuring.

A persistent `image_remset` (htable) tracks image objects that have
been mutated to reference non-image (collectable) objects. These are
discovered at image load time by gc_scan_sysimg_remset and added
incrementally by the write barrier in jl_gc_queue_root. After a full
sweep (which clears per-thread remsets), gc_queue_image_remset pushes
these entries to the mark queue so their children are properly traced.
Quick sweeps don't need this because old objects retain their marks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vchuravy
Copy link
Copy Markdown
Member

vchuravy commented Apr 1, 2026

I don't quite follow why we need a different remset? The core of the remset idea is that we keep track of the dynamic frontier between generations, so I don't see anything preventing us from using the current remset for the eternal/permalloc generation.

Or is the issue that during mark we need to keep track off if they object is coming from an image?

So my idea would be that during a full GC you could look at the remset and scan all eternal objects. This would move us more cleany to a three generation GC, and of course the write-barrier would need to enque eternal objects that see writes of child objects in the young or old generation

@JeffBezanson JeffBezanson added GC Garbage collector performance Must go faster labels Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

GC Garbage collector performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants