@@ -96,9 +96,9 @@ files to get an idea of how rules are set for patches to be applied.
96
96
97
97
We always run with a GIL, because C extensions in CPython expect to do so and
98
98
are usually not written to be reentrant. The reason to always have the GIL
99
- enabled is that when using Python, at least Sulong/LLVM is always available in
100
- the same context and we cannot know if someone may be using that (or another
101
- polyglot language or the Java host interop) to start additional threads that
99
+ enabled is that when using Python, another polyglot language or the Java host
100
+ interop can be available in the same context, and we cannot know if someone
101
+ may be using that to start additional threads that
102
102
could call back into Python. This could legitimately happen in C extensions when
103
103
the C extension authors use knowledge of how CPython works to do something
104
104
GIL-less in a C thread that is fine to do on CPython's data structures, but not
@@ -171,7 +171,8 @@ safepoint action mechanism can thus be used to kill threads waiting on the GIL.
171
171
### High-level
172
172
173
173
C extensions assume reference counting, but on the managed side we want to leverage
174
- Java tracing GC. This creates a mismatch. The approach is to combine the two.
174
+ Java tracing GC. This creates a mismatch. The approach is to do both, reference
175
+ counting and tracing GC, at the same time.
175
176
176
177
On the native side we use reference counting. The native code is responsible for doing
177
178
the counting, i.e., calling the ` Py_IncRef ` and ` Py_DecRef ` API functions. Inside those
@@ -193,12 +194,12 @@ There are two kinds of Python objects in GraalPy: managed and native.
193
194
194
195
Managed objects are allocated in the interpreter. If there is no native code involved,
195
196
we do not do anything special and let the Java GC handle them. When a managed object
196
- leaks to native extension:
197
+ is passed to a native extension code :
197
198
198
199
* We wrap it in ` PythonObjectNativeWrapper ` . This is mostly in order to provide different
199
200
interop protocol: we do not want to expose ` toNative ` and ` asPointer ` on Python objects.
200
201
201
- * When NFI or Sulong call ` toNative ` /` asPointer ` we:
202
+ * When NFI calls ` toNative ` /` asPointer ` we:
202
203
* Allocate C memory that will represent the object on the native side (including the refcount field)
203
204
* Add a mapping of that memory address to the ` PythonObjectNativeWrapper ` object to a hash map ` CApiTransitions.nativeLookup ` .
204
205
* We initialize the refcount field to a constant ` MANAGED_REFCNT ` (larger number, because some
@@ -216,38 +217,44 @@ as long as there are some native references. We set a field `PythonObjectReferen
216
217
which will keep the ` PythonObjectNativeWrapper ` alive even when all other managed references die.
217
218
218
219
* When extension code is done with the object, it will call ` Py_DecRef ` .
219
- In the C implementation of ` Py_DecRef ` we check if a managed object with refcount== MANAGED_REFCNT+1
220
+ In the C implementation of ` Py_DecRef ` we check if a managed object with ` refcount == MANAGED_REFCNT+1 `
220
221
wants to decrement its refcount to MANAGED_REFCNT, which means that there are no native references
221
222
to that object anymore. In such case we clear the ` PythonObjectReference.strongReference ` field,
222
223
and the memory management is then again left solely to the Java tracing GC.
223
224
224
225
#### Native Objects
225
226
226
- Native objects are backed by native memory and may never leak to managed code. If they do not
227
- leak to managed code, they are reference counted as usual, where ` Py_DecRef ` call that reaches
228
- ` 0 ` will deallocate the object. If a native object does leak to managed code:
227
+ Native objects allocated using ` PyObject_GC_New ` in the native code are backed by native memory
228
+ and may never be passed to managed code (as a return value of extension function or as an argument
229
+ to some C API call). If a native object is not made available to managed code, it is just reference
230
+ counted as usual, where ` Py_DecRef ` call that reaches ` 0 ` will deallocate the object. If a native
231
+ object is passed to managed code:
229
232
230
233
* We increment the refcount of the native object by ` MANAGED_REFCNT `
231
234
* We create:
232
- * ` PythonAbstractNativeObject ` Java object to represent it
235
+ * ` PythonAbstractNativeObject ` Java object to mirror it on the managed side
233
236
* ` NativeObjectReference ` , a weak reference to the ` PythonAbstractNativeObject ` .
234
- * Save the mapping from the native object address to the ` NativeObjectReference `
235
- object into hash map ` CApiTransitions.nativeLookup ` (next time this native object leaks to
236
- the managed code, we only fetch the existing wrapper and don't do any of this).
237
+ * Add mapping: native object address => ` NativeObjectReference ` into hash map ` CApiTransitions.nativeLookup `
238
+ * Next time we just fetch the existing wrapper and don't do any of this
237
239
* When ` NativeObjectReference ` is enqueued, we decrement the refcount by ` MANAGED_REFCNT `
238
- and if it falls to ` 0 ` , it means that there are no references to the object even from
239
- native code, we can destroy it. If it does not fall to ` 0 ` , we just wait for the native
240
- code to eventually call ` Py_DecRef ` that makes it fall to ` 0 ` .
240
+ * If the refcount falls to ` 0 ` , it means that there are no references to the object even from
241
+ native code, and we can destroy it. If it does not fall to ` 0 ` , we just wait for the native
242
+ code to eventually call ` Py_DecRef ` that makes it fall to ` 0 ` .
243
+
244
+ #### Weak References
245
+
246
+ TODO
241
247
242
248
### Cycle GC
243
249
244
250
We leverage the CPython's GC module to detect cycles for objects that participate
245
- in the reference counting scheme (native objects or managed objects that leaked to native).
251
+ in the reference counting scheme (native objects or managed objects that got passed
252
+ to native code).
246
253
See: https://devguide.python.org/internals/garbage-collector/index.html .
247
254
248
255
There are two issues:
249
256
250
- * Objects that are referenced from the managed code have refcount >= ` MANAGED_REFCNT ` and
257
+ * Objects that are referenced from the managed code have ` refcount >= MANAGED_REFCNT ` and
251
258
until Java GC runs we do not know if they are garbage or not.
252
259
* We cannot traverse the managed objects: since we don't do refcounting on the managed
253
260
side, we cannot traverse them and decrement refcounts to see if there is a cycle.
@@ -272,13 +279,13 @@ count them into this limit. Let us call this limit *weak to strong limit*.
272
279
After this, if the managed objects are garbage, eventually Java GC will collect them
273
280
together with the whole cycle.
274
281
275
- If some of the managed objects are not garbage, and they leak back to native code,
282
+ If some of the managed objects are not garbage, and they passed back to native code,
276
283
the native code can then access and resurrect the whole cycle. W.r.t. the refcounts
277
284
integrity this is fine, because we did not alter the refcounts. The native references
278
285
between the objects are still factored in their refcounts. What may seem like a problem
279
- is that we pushed the * weak to strong limit* for some objects. Such object may leak to
280
- native, get ` Py_IncRef ` 'ed making it strong reference again. Since ` Py_DecRef ` is
286
+ is that we pushed the * weak to strong limit* for some objects. Such an object may be
287
+ passed to native, get ` Py_IncRef ` 'ed making it strong reference again. Since ` Py_DecRef ` is
281
288
checking the same ` MANAGED_REFCNT ` limit for all objects, the subsequent ` Py_DecRef `
282
289
call for this object will not detect that the reference should be made weak again!
283
290
However, this is OK, it only prolongs the collection: we will make it weak again in
284
- the next run of the cycle GC.
291
+ the next run of the cycle GC on the native side .
0 commit comments