-
Notifications
You must be signed in to change notification settings - Fork 997
runtime (gc_blocks.go): use a linked stack to scan marked objects #5102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This includes the commit from #5101, so that should be merged first. |
|
This improves performance significantly: |
The blocks GC originally used a fixed-size stack to hold objects to scan. When this stack overflowed, the GC would fully rescan all marked objects. This could cause the GC to degrade to O(n^2) when scanning large linked data structures. Instead of using a fixed-size stack, we now add a pointer field to the start of each object. This pointer field is used to implement an unbounded linked stack. This also consolidates the heap object scanning into one place, which simplifies the process. This comes at the cost of introducing a pointer field to the start of the object, plus the cost of aligning the result. This translates to: - 16 bytes of overhead on x86/arm64 with the conservative collector - 0 bytes of overhead on x86/arm64 with the precise collector (the layout field cost gets aligned up to 16 bytes anyway) - 8 bytes of overhead on other 64-bit systems - 4 bytes of overhead on 32-bit systems - 2 bytes of overhead on AVR
Loop over valid pointer locations in heap objects instead of checking if each location is valid. The conservative scanning code is now shared between markRoots and the heap scan. This also removes the ending alignment requirement from markRoots, since the new scan* functions do not require an aligned length. This requirement was occasionally violated by the linux global marking code. This saves some code space and has negligible impact on performance.
cfbf6c9 to
11d283d
Compare
|
I also decided to add the scanning logic rework commit to this PR because it is closely related. |
deadprogram
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the small header justifies the speed increase. From my initial benchmarks it does appear to have an impact on the larger nested/linked data struct.
|
Here my my benchmarks from tinybench: BeforeAfter |
|
Anyone else have any feedback before we merge? |
dgryski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gc_blocks and gc_conservative work LGTM, but I wouldn't mind a second set of eyes on the gc_precise changes.
I gave it a few more goings over and still LGTM. Now merging, thanks @niaow for all this awesome work! |
The blocks GC originally used a fixed-size stack to hold objects to scan. When this stack overflowed, the GC would fully rescan all marked objects. This could cause the GC to degrade to O(n^2) when scanning large linked data structures.
Instead of using a fixed-size stack, we now add a pointer field to the start of each object. This pointer field is used to implement an unbounded linked stack. This also consolidates the heap object scanning into one place, which simplifies the process.
This comes at the cost of introducing a pointer field to the start of the object, plus the cost of aligning the result. This translates to: