-
Notifications
You must be signed in to change notification settings - Fork 78
Description
We have agreed that we should let #1355 get merged as long as it works, even if not perfectly. But to properly support cancellable (or upgradable) concurrent GC, there are many things to be considered.
Motivation
This is motivated by the Concurrent Immix plan. A concurrent GC consists of an initial pause, a concurrent marking phase, and a final pause. During the concurrent marking phase, mutators can be running concurrently with GC workers. There is a possibility that mutators may run out of memory during concurrent marking. At this time, we have multiple choices of what to do with it.
- Let the allocating mutator block until the current GC finishes (after the final pause). Other mutators can continue, and may block, too, if they try to allocate and fail, too.
- Cancel the current concurrent marking, and restart with a stop-the-world GC. (In other words, we "upgrade" the concurrent GC to a STW GC.) Other mutators will stop, too, waiting for the next STW GC.
I (Kunshan) personally favors the first option, i.e. letting the concurrent GC finish. Reasons are:
- Cancelling the current GC will result in all the work done by the concurrent GC to be wasted.
- The complexity of cancelling an already running GC is high, including resetting metadata, etc., and is prone to error.
- Root-unreachable objects conservatively kept alive during this concurrent GC are guaranteed not to be reached during the next GC (even if it is concurrent).
On the other hands, the main reason for supporting cancellable concurrent GC is that
- A concurrent GC works on the heap snapshot when the GC was triggered (i.e. the snapshot-at-the-beginning, SATB), and objects allocated during the concurrent GC are conservatively considered alive. If the mutators manage to exhaust the memory, they must have been allocating so fast and generated many garbage objects. Therefore the SATB-based concurrent GC may not be freeing up memory fast enough to keep up with the allocation speed of mutators.
This issue is mainly about what to do if we do want to implement cancellable (or upgradable) concurrent GC.
Things to consider
In #1355, the InitialMark pause generates work packets in a hidden bucket, and moves the work packets into the Unconstrainted bucket after mutators are resumed. Then GC workers start working on
To cancel a concurrent GC, we need to
- Let all GC workers stop after finishing their current work packets, and park.
- The last parked worker removes all pending work packets from the Unconstrainted bucket. (Other buckets should be empty.) Then it closes all buckets.
- Clear the
concurrent_marking_active
state so that mutators no longer conservatively keep newly allocated lines alive. - Add a
ScheduleCollection
work packet. - The
ScheduleCollection
work packet schedules aStopMutators
work packet which stops mutators. - Ensure mark bits are cleared, including line mark bits, too.
- Proceed to do what a STW GC normally do.
First of all, there should be a place of no return during concurrent marking. Only one of the two things below shall happen.
- GC workers have finished concurrent marking, and have decided to initiate the FinalMark pause.
- A mutator interrupts the concurrent marking, and all GC workers shall stop after their current work packet. No GC workers shall then attempt to initiate the FinalMark pause.
This should be implemented with an atomic operation. If the mutator wins the race, we will cancel the concurrent marking. Otherwise the mutators will have to wait for the FinalMark pause to finish.
All GC workers should stop before clearing work buckets. If any GC worker is executing a work packet, it may add more work packets to the buckets.
We should not start clearing metadata (such as the mark bits) until all mutators have stopped. When mutators are still running, they may conservatively mark newly allocated lines as marked. Even after we clear concurrent_marking_active
, mutators may not see the global variable change promptly.
And we (developers) need to manually check whether each space clears its metadata at the beginning of a GC or at the end of a GC. Some metadata (such as the VO bits) must be cleared at the end of GC. Other metadata, notably the mark bits, may be cleared either at the beginning or at the end. If a metadata is normally cleared at the end, we must insert a clearing operation at the beginning of the STW GC that is "upgraded" from a cancelled concurrent GC.