-
-
Notifications
You must be signed in to change notification settings - Fork 924
Update GC docs for incremental collection. #1379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -112,7 +112,7 @@ simple type cast from the original object: :code:`((PyGC_Head *)(the_object)-1)` | |||||||||||||||||||||||||
As is explained later in the `Optimization: reusing fields to save memory`_ section, | ||||||||||||||||||||||||||
these two extra fields are normally used to keep doubly linked lists of all the | ||||||||||||||||||||||||||
objects tracked by the garbage collector (these lists are the GC generations, more on | ||||||||||||||||||||||||||
that in the `Optimization: generations`_ section), but they are also | ||||||||||||||||||||||||||
that in the `Optimization: incremental collection`_ section), but they are also | ||||||||||||||||||||||||||
reused to fulfill other purposes when the full doubly linked list structure is not | ||||||||||||||||||||||||||
needed as a memory optimization. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
@@ -356,37 +356,68 @@ follows these steps in order: | |||||||||||||||||||||||||
the reference counts fall to 0, triggering the destruction of all unreachable | ||||||||||||||||||||||||||
objects. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
Optimization: generations | ||||||||||||||||||||||||||
========================= | ||||||||||||||||||||||||||
Optimization: incremental collection | ||||||||||||||||||||||||||
==================================== | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
In order to limit the time each garbage collection takes, the GC | ||||||||||||||||||||||||||
implementation for the default build uses a popular optimization: | ||||||||||||||||||||||||||
generations. The main idea behind this concept is the assumption that most | ||||||||||||||||||||||||||
objects have a very short lifespan and can thus be collected soon after their | ||||||||||||||||||||||||||
creation. This has proven to be very close to the reality of many Python | ||||||||||||||||||||||||||
implementation for the default build uses incremental collection with two | ||||||||||||||||||||||||||
generations. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
The purpose of generations is to take advantage of what is known as the weak | ||||||||||||||||||||||||||
generational hypothesis: Most objects die young. | ||||||||||||||||||||||||||
This has proven to be very close to the reality of many Python | ||||||||||||||||||||||||||
programs as many temporary objects are created and destroyed very quickly. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
To take advantage of this fact, all container objects are segregated into | ||||||||||||||||||||||||||
three spaces/generations. Every new | ||||||||||||||||||||||||||
object starts in the first generation (generation 0). The previous algorithm is | ||||||||||||||||||||||||||
executed only over the objects of a particular generation and if an object | ||||||||||||||||||||||||||
survives a collection of its generation it will be moved to the next one | ||||||||||||||||||||||||||
(generation 1), where it will be surveyed for collection less often. If | ||||||||||||||||||||||||||
the same object survives another GC round in this new generation (generation 1) | ||||||||||||||||||||||||||
it will be moved to the last generation (generation 2) where it will be | ||||||||||||||||||||||||||
surveyed the least often. | ||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
The GC implementation for the free-threaded build does not use multiple | ||||||||||||||||||||||||||
generations. Every collection operates on the entire heap. | ||||||||||||||||||||||||||
two generations: young and old. Every new object starts in the young generation. | ||||||||||||||||||||||||||
willingc marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
|
||||||||||||||||||||||||||
To collect all unreachable cycles in the heap, the garbage collector must scan the | ||||||||||||||||||||||||||
|
To collect all unreachable cycles in the heap, the garbage collector must scan the | |
To detect and collect all unreachable cycles in the heap, the garbage collector must scan the |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe consider a different term here (and throughout below) since "cycle" already has an important meaning here and it can be confusing if we overload it:
whole heap. This whole heap scan is called a cycle. | |
whole heap. This whole heap scan is called a full collection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe instead of cycle use "full scavenge".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A collection is an increment, so that's not a good term.
Cycle is overloaded, so that's not great either.
"Scavenge" is the least ambiguous, but obscure.
Anyone have any other suggestions? I'll go with "scavenge" if not.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to limit the time each garbage collection takes, the previous algorithm | |
To limit the time each garbage collection takes, the detection and collection algorithm |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* All any objects reachable from those objects that have not yet been scanned this cycle. | |
* All objects reachable from those objects that have not yet been scanned this cycle. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe clarify that the objects that started in the old generation are considered "youngest of the old" instead of "oldest of the old" now (there's probably a better way of phrasing it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't really the oldest, it is the "least recently scanned". I'll rework this section.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any objects surviving this collection are moved to the old generation. | |
Any young generation objects surviving this collection are moved to the old generation, and reachable objects in the old generation remain in the old generation. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ollection from cycles. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using cycle of a "reference cycle" and a "gc execution" it's a bit confusing, can we use other terminology?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used "full scavenge"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure that I understand this section on unreachable cycles, I think you're saying that we want to ensure that we fully capture the unreachable cycle because we want to ensure that the cycle is either fully gc'd or not, to avoid partial processing, which could be problematic later on?
If so, maybe it's worth mentioning explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't scan the full cycle at once, we cannot collect it. It is otherwise safe.
The old generational collector would scan part cycles all the time; it just delayed the collection of the cycle, at worst until a full collection.
With the incremental collector, if we only scan part of a cycle, it may never be collected. Which would be a problem.
savannahostrowski marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can benefit from a concrete simple example showing how this can happen and how will be eventually cleaned. Either in English or with some diagram or pseudo code
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably should be called out in an Important block.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
allocations minus the number of deallocations exceeds ``threshold_0``, | |
allocations minus the number of deallocations exceeds ``threshold0``, |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct as mentioned in @mr-bronson's message. Good catch @mr-bronson. 😄
collection starts. ``threshold_1`` determines the fraction of the old | |
collection that is included in the increment. | |
The fraction is inversely proportional to ``threshold_1``, | |
as historically a larger ``threshold_1`` meant that old generation | |
collections were performed less frequency. | |
``threshold2`` is ignored. | |
collection starts. ``threshold1`` determines the fraction of the old | |
collection that is included in the increment. | |
The fraction is inversely proportional to ``threshold1``, | |
as historically a larger ``threshold1`` meant that old generation | |
collections were performed less frequency. | |
``threshold2`` is ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prior to 3.13, there we three generations. For that reason the | |
Prior to 3.14, there we three generations. For that reason the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.