Skip to content

Commit b42bbea

Browse files
committed
Documentation of the C API memory management, inline comments, rename UpdateRefNode
1 parent 928308e commit b42bbea

File tree

6 files changed

+160
-29
lines changed

6 files changed

+160
-29
lines changed

docs/contributor/IMPLEMENTATION_DETAILS.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -165,3 +165,120 @@ For embedders, it may be important to be able to interrupt Python threads by
165165
other means. We use the TruffleSafepoint mechanism to mark our threads waiting
166166
to acquire the GIL as blocked for the purpose of safepoints. The Truffle
167167
safepoint action mechanism can thus be used to kill threads waiting on the GIL.
168+
169+
## C Extensions and Memory Management
170+
171+
### High-level
172+
173+
C extensions assume reference counting, but on the managed side we want to leverage
174+
Java tracing GC. This creates a mismatch. The approach is to combine the two.
175+
176+
On the native side we use reference counting. The native code is responsible for doing
177+
the counting, i.e., calling the `Py_IncRef` and `Py_DecRef` API functions. Inside those
178+
functions we add special handling for the point when first reference from the native
179+
code is created and when the last reference from the native code is destroyed.
180+
181+
On the managed side we rely on tracing GC, so managed references are not ref-counted.
182+
For the ref-counting scheme on the native side, we approximate all the managed references
183+
as a single reference, i.e., we increment the refcount when object is referenced from managed
184+
code, and using a `PhantomReference` and reference queue we decrement the refcount when
185+
there are no longer any managed references (but we do not clean the object as long as
186+
`refcount > 0`, because that means that there are still native references to it).
187+
188+
### Details
189+
190+
There are two kinds of Python objects in GraalPy: managed and native.
191+
192+
#### Managed Objects
193+
194+
Managed objects are allocated in the interpreter. If there is no native code involved,
195+
we do not do anything special and let the Java GC handle them. When a managed object
196+
leaks to native extension:
197+
198+
* We wrap it in `PythonObjectNativeWrapper`. This is mostly in order to provide different
199+
interop protocol: we do not want to expose `toNative` and `asPointer` on Python objects.
200+
201+
* When NFI or Sulong call `toNative`/`asPointer` we:
202+
* Allocate C memory that will represent the object on the native side (including the refcount field)
203+
* Add a mapping of that memory address to the `PythonObjectNativeWrapper` object to a hash map `CApiTransitions.nativeLookup`.
204+
* We initialize the refcount field to a constant `MANAGED_REFCNT` (larger number, because some
205+
extensions like to special case on some small refcount values)
206+
* Create `PythonObjectReference`: a weak reference to the `PythonObjectNativeWrapper`,
207+
when this reference is enqueued (i.e., no managed references exist), we decrement the refcount by
208+
`MANAGED_REFCNT` and if the recount falls back to `0`, we deallocate the native memory of the object,
209+
otherwise we need to wait for the native code to eventually call `Py_DecRef` and make it `0`.
210+
211+
* When extension code wants to create a new reference, it will call `Py_IncRef`.
212+
In the C implementation of `Py_IncRef` we check if a managed object with
213+
`refcount==MANAGED_REFCNT` wants to increment its refcount. In such case, the native code is
214+
creating a first reference to the managed object, we must make sure to keep the object alive
215+
as long as there are some native references. We set a field `PythonObjectReference.strongReference`,
216+
which will keep the `PythonObjectNativeWrapper` alive even when all other managed references die.
217+
218+
* When extension code is done with the object, it will call `Py_DecRef`.
219+
In the C implementation of `Py_DecRef` we check if a managed object with refcount==MANAGED_REFCNT+1
220+
wants to decrement its refcount to MANAGED_REFCNT, which means that there are no native references
221+
to that object anymore. In such case we clear the `PythonObjectReference.strongReference` field,
222+
and the memory management is then again left solely to the Java tracing GC.
223+
224+
#### Native Objects
225+
226+
Native objects are backed by native memory and may never leak to managed code. If they do not
227+
leak to managed code, they are reference counted as usual, where `Py_DecRef` call that reaches
228+
`0` will deallocate the object. If a native object does leak to managed code:
229+
230+
* We increment the refcount of the native object by `MANAGED_REFCNT`
231+
* We create:
232+
* `PythonAbstractNativeObject` Java object to represent it
233+
* `NativeObjectReference`, a weak reference to the `PythonAbstractNativeObject`.
234+
* Save the mapping from the native object address to the `NativeObjectReference`
235+
object into hash map `CApiTransitions.nativeLookup` (next time this native object leaks to
236+
the managed code, we only fetch the existing wrapper and don't do any of this).
237+
* When `NativeObjectReference` is enqueued, we decrement the refcount by `MANAGED_REFCNT`
238+
and if it falls to `0`, it means that there are no references to the object even from
239+
native code, we can destroy it. If it does not fall to `0`, we just wait for the native
240+
code to eventually call `Py_DecRef` that makes it fall to `0`.
241+
242+
### Cycle GC
243+
244+
We leverage the CPython's GC module to detect cycles for objects that participate
245+
in the reference counting scheme (native objects or managed objects that leaked to native).
246+
See: https://devguide.python.org/internals/garbage-collector/index.html.
247+
248+
There are two issues:
249+
250+
* Objects that are referenced from the managed code have refcount >= `MANAGED_REFCNT` and
251+
until Java GC runs we do not know if they are garbage or not.
252+
* We cannot traverse the managed objects: since we don't do refcounting on the managed
253+
side, we cannot traverse them and decrement refcounts to see if there is a cycle.
254+
255+
The high level solution is that when we see a "dead" cycle going through a managed object
256+
(i.e., cycle not referenced by any native object from the "outside" of the collected set),
257+
we fully replicate the object graphs (and the cycle) on the managed side (refcounts of native objects
258+
in the cycle, which were not referenced from managed yet, will get new `NativeObjectReference`
259+
created and refcount incremented by `MANAGED_REFCNT`). Managed objects already refer
260+
to the `PythonAbstractNativeObject` wrappers of the native objects (e.g., some Python container
261+
with managed storage), but we also make the native wrappers refer to whatever their referents
262+
are on the Java side (we use `tp_traverse` to find their referents).
263+
264+
Then we make the managed objects in the cycle only weakly referenced on the Java side.
265+
One can think about this as pushing the baseline reference count when the
266+
object is eligible for being GC'ed and thus freed. Normally when the object has
267+
`refcount > MANAGED_REFCNT` we keep it alive with a strong reference assuming that
268+
there are some native references to it. In this case, we know that all the native
269+
references to that object are part of potentially dead cycle, and we do not
270+
count them into this limit. Let us call this limit *weak to strong limit*.
271+
272+
After this, if the managed objects are garbage, eventually Java GC will collect them
273+
together with the whole cycle.
274+
275+
If some of the managed objects are not garbage, and they leak back to native code,
276+
the native code can then access and resurrect the whole cycle. W.r.t. the refcounts
277+
integrity this is fine, because we did not alter the refcounts. The native references
278+
between the objects are still factored in their refcounts. What may seem like a problem
279+
is that we pushed the *weak to strong limit* for some objects. Such object may leak to
280+
native, get `Py_IncRef`'ed making it strong reference again. Since `Py_DecRef` is
281+
checking the same `MANAGED_REFCNT` limit for all objects, the subsequent `Py_DecRef`
282+
call for this object will not detect that the reference should be made weak again!
283+
However, this is OK, it only prolongs the collection: we will make it weak again in
284+
the next run of the cycle GC.

graalpython/com.oracle.graal.python.cext/src/gcmodule.c

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -457,7 +457,7 @@ validate_list(PyGC_Head *head, enum flagstates flags)
457457
static int
458458
visit_reachable(PyObject *op, PyGC_Head *reachable);
459459

460-
/* A traversal callback for move_weak_candidates.
460+
/* A traversal callback for move_unreachable.
461461
*
462462
* This function is only used to traverse the referents of an object that was
463463
* moved from 'unreachable' to 'young' because it was made reachable due to a
@@ -729,7 +729,7 @@ visit_reachable(PyObject *op, PyGC_Head *reachable)
729729
// an untracked object.
730730
assert(UNTAG(gc)->_gc_next != 0);
731731

732-
const Py_ssize_t gc_refs_reset = is_managed(gc) ? MANAGED_REFCNT + 1 : 1;
732+
const Py_ssize_t gc_refs_reset = is_managed(gc) ? MANAGED_REFCNT + 1 : 1; // GraalPy change
733733
if (UNTAG(gc)->_gc_next & NEXT_MASK_UNREACHABLE) {
734734
/* This had gc_refs = 0 when move_unreachable got
735735
* to it, but turns out it's reachable after all.
@@ -749,15 +749,18 @@ visit_reachable(PyObject *op, PyGC_Head *reachable)
749749
_PyGCHead_SET_PREV(next, prev);
750750

751751
gc_list_append(gc, reachable);
752-
gc_set_refs(gc, gc_refs_reset);
752+
gc_set_refs(gc, gc_refs_reset); // GraalPy change: 1->gc_refs_reset
753753
}
754-
else if (gc_refs == 0 || (gc_refs == MANAGED_REFCNT && is_managed(gc))) {
754+
else if (gc_refs == 0 ||
755+
// GraalPy change: the additional condition, managed objects with MANAGED_REFCNT
756+
// may be still reachanble from managed, we must not declare them unreachable
757+
(gc_refs == MANAGED_REFCNT && is_managed(gc))) {
755758
/* This is in move_unreachable's 'young' list, but
756759
* the traversal hasn't yet gotten to it. All
757760
* we need to do is tell move_unreachable that it's
758761
* reachable.
759762
*/
760-
gc_set_refs(gc, gc_refs_reset);
763+
gc_set_refs(gc, gc_refs_reset); // GraalPy change: 1->gc_refs_reset
761764
}
762765
/* Else there's nothing to do.
763766
* If gc_refs > 0, it must be in move_unreachable's 'young'
@@ -837,12 +840,12 @@ move_unreachable(PyGC_Head *young, PyGC_Head *unreachable,
837840
"refcount is too small");
838841
// NOTE: visit_reachable may change gc->_gc_next when
839842
// young->_gc_prev == gc. Don't do gc = GC_NEXT(gc) before!
840-
// GraalPy change
841-
// CALL_TRAVERSE(traverse, op, visit_reachable, (void *)young);
842843
if (PyTruffle_PythonGC()) {
844+
// GraalPy change: this branch, else branch is original CPython code
843845
cycle.head = NULL;
844846
cycle.n = 0;
845847
assert (cycle.reachable == weak_candidates );
848+
/* visit_collect_managed_referents is visit_reachable + capture the references into "cycle" */
846849
CALL_TRAVERSE(traverse, op, visit_collect_managed_referents, (void *)&cycle);
847850

848851
/* replicate any native reference to managed objects to Java */
@@ -856,6 +859,9 @@ move_unreachable(PyGC_Head *young, PyGC_Head *unreachable,
856859

857860
if (gc_refcnt == MANAGED_REFCNT && is_managed(gc) &&
858861
Py_REFCNT(op) > MANAGED_REFCNT) {
862+
// The refcount fell to MANAGED_REFCNT, we have a managed object that was
863+
// part of a cycle and is no longer referenced from native space.
864+
859865
// Assertion is enough because if Python GC is disabled, we will
860866
// never track managed objects.
861867
assert (PyTruffle_PythonGC());

graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/cext/PythonCextBuiltins.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@
133133
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.HandleContext;
134134
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.HandlePointerConverter;
135135
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.NativePtrToPythonWrapperNode;
136-
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.UpdateRefNode;
136+
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.UpdateStrongRefNode;
137137
import com.oracle.graal.python.builtins.objects.cext.common.CExtCommonNodes.CoerceNativePointerToLongNode;
138138
import com.oracle.graal.python.builtins.objects.cext.common.CExtCommonNodes.TransformExceptionToNativeNode;
139139
import com.oracle.graal.python.builtins.objects.cext.common.CExtCommonNodesFactory.TransformExceptionToNativeNodeGen;
@@ -1588,7 +1588,7 @@ static Object doNative(Object weakCandidates,
15881588
@Cached CStructAccess.ReadI64Node readI64Node,
15891589
@Cached CStructAccess.WriteLongNode writeLongNode,
15901590
@Cached NativePtrToPythonWrapperNode nativePtrToPythonWrapperNode,
1591-
@Cached UpdateRefNode updateRefNode) {
1591+
@Cached UpdateStrongRefNode updateRefNode) {
15921592
// guaranteed by the guard
15931593
assert PythonContext.get(inliningTarget).isNativeAccessAllowed();
15941594
assert PythonContext.get(inliningTarget).getOption(PythonOptions.PythonGC);
@@ -1618,7 +1618,7 @@ static Object doNative(Object weakCandidates,
16181618
if (GC_LOGGER.isLoggable(Level.FINE)) {
16191619
GC_LOGGER.fine(PythonUtils.formatJString("Breaking reference cycle for %s", abstractObjectNativeWrapper.ref));
16201620
}
1621-
updateRefNode.execute(inliningTarget, abstractObjectNativeWrapper, PythonAbstractObjectNativeWrapper.MANAGED_REFCNT);
1621+
updateRefNode.clearStrongRef(inliningTarget, abstractObjectNativeWrapper);
16221622
}
16231623

16241624
// next = GC_NEXT(gc)

graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/cext/PythonCextObjectBuiltins.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@
9292
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.NativeToPythonNode;
9393
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.PythonToNativeNode;
9494
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.ToPythonWrapperNode;
95-
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.UpdateRefNode;
95+
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.UpdateStrongRefNode;
9696
import com.oracle.graal.python.builtins.objects.cext.common.GetNextVaArgNode;
9797
import com.oracle.graal.python.builtins.objects.cext.structs.CFields;
9898
import com.oracle.graal.python.builtins.objects.cext.structs.CStructAccess;
@@ -165,7 +165,7 @@ abstract static class PyTruffle_NotifyRefCount extends CApiBinaryBuiltinNode {
165165
@Specialization
166166
static Object doGeneric(PythonAbstractObjectNativeWrapper wrapper, long refCount,
167167
@Bind("this") Node inliningTarget,
168-
@Cached UpdateRefNode updateRefNode) {
168+
@Cached UpdateStrongRefNode updateRefNode) {
169169
assert CApiTransitions.readNativeRefCount(HandlePointerConverter.pointerToStub(wrapper.getNativePointer())) == refCount;
170170
// refcounting on an immortal object should be a NOP
171171
assert refCount != PythonAbstractObjectNativeWrapper.IMMORTAL_REFCNT;
@@ -180,7 +180,7 @@ abstract static class PyTruffle_BulkNotifyRefCount extends CApiBinaryBuiltinNode
180180
@Specialization
181181
static Object doGeneric(Object arrayPointer, int len,
182182
@Bind("this") Node inliningTarget,
183-
@Cached UpdateRefNode updateRefNode,
183+
@Cached UpdateStrongRefNode updateRefNode,
184184
@Cached CStructAccess.ReadPointerNode readPointerNode,
185185
@Cached ToPythonWrapperNode toPythonWrapperNode) {
186186

graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/objects/cext/capi/CExtNodes.java

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -106,7 +106,7 @@
106106
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.NativeToPythonTransferNode;
107107
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.PythonToNativeNode;
108108
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.ResolveHandleNode;
109-
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.UpdateRefNode;
109+
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitions.UpdateStrongRefNode;
110110
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitionsFactory.NativeToPythonNodeGen;
111111
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.CApiTransitionsFactory.PythonToNativeNodeGen;
112112
import com.oracle.graal.python.builtins.objects.cext.capi.transitions.GetNativeWrapperNode;
@@ -1189,7 +1189,7 @@ static void doDecref(Node inliningTarget, Object pointerObj,
11891189
@Cached(inline = false) CApiTransitions.ToPythonWrapperNode toPythonWrapperNode,
11901190
@Cached InlinedBranchProfile isWrapperProfile,
11911191
@Cached InlinedBranchProfile isNativeObject,
1192-
@Cached UpdateRefNode updateRefNode,
1192+
@Cached UpdateStrongRefNode updateRefNode,
11931193
@Cached(inline = false) CStructAccess.ReadI64Node readRefcount,
11941194
@Cached(inline = false) CStructAccess.WriteLongNode writeRefcount,
11951195
@Cached(inline = false) PCallCapiFunction callDealloc) {
@@ -1295,7 +1295,7 @@ public static Object executeUncached(Object pointerObject) {
12951295
@Specialization
12961296
static Object resolveLongCached(Node inliningTarget, long pointer,
12971297
@Exclusive @Cached ResolveHandleNode resolveHandleNode,
1298-
@Exclusive @Cached UpdateRefNode updateRefNode) {
1298+
@Exclusive @Cached UpdateStrongRefNode updateRefNode) {
12991299
Object lookup = CApiTransitions.lookupNative(pointer);
13001300
if (lookup != null) {
13011301
if (lookup instanceof PythonAbstractObjectNativeWrapper objectNativeWrapper) {
@@ -1313,7 +1313,7 @@ static Object resolveLongCached(Node inliningTarget, long pointer,
13131313
static Object resolveGeneric(Node inliningTarget, Object pointerObject,
13141314
@CachedLibrary(limit = "3") InteropLibrary lib,
13151315
@Exclusive @Cached ResolveHandleNode resolveHandleNode,
1316-
@Exclusive @Cached UpdateRefNode updateRefNode) {
1316+
@Exclusive @Cached UpdateStrongRefNode updateRefNode) {
13171317
if (lib.isPointer(pointerObject)) {
13181318
Object lookup;
13191319
long pointer;

0 commit comments

Comments
 (0)