Skip to content

Commit fe73dc7

Browse files
committed
[GR-60660] Multi context C extensions with copying
PullRequest: graalpython/3537
2 parents 52f842c + 269c596 commit fe73dc7

File tree

40 files changed

+1799
-300
lines changed

40 files changed

+1799
-300
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ language runtime. The main focus is on user-observable behavior of the engine.
1919
* Added `GRAALPY_VERSION` and `GRAALPY_VERSION_NUM` C macros.
2020
* Remove `ginstall` module. It hasn't been necessary for several releases. Please, use `pip install`.
2121
* Remove experimental `SetupLLVMLibraryPaths` option. It was used to pre-set library path for LLVM toolchain's libc++. The path can still be set manually.
22+
* Added `GRAALPY_VERSION` and `GRAALPY_VERSION_NUM` C macros
23+
* Added experimental `python.IsolateNativeModules` option to allow loading native extensions multiple times in different contexts. See [the documentation](https://github.com/oracle/graalpython/blob/master/docs/user/Native-Extensions.md) for more information.
2224

2325
## Version 24.1.0
2426
* GraalPy is now considered stable for pure Python workloads. While many workloads involving native extension modules work, we continue to consider them experimental. You can use the command-line option `--python.WarnExperimentalFeatures` to enable warnings for such modules at runtime. In Java embeddings the warnings are enabled by default and you can suppress them by setting the context option 'python.WarnExperimentalFeatures' to 'false'.

docs/contributor/IMPLEMENTATION_DETAILS.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,3 +302,16 @@ checking the same `MANAGED_REFCNT` limit for all objects, the subsequent `Py_Dec
302302
call for this object will not detect that the reference should be made weak again!
303303
However, this is OK, it only prolongs the collection: we will make it weak again in
304304
the next run of the cycle GC on the native side.
305+
306+
## C extension copying
307+
308+
On Linux, Python native extensions expect to lookup Python C API functions in the global namespace and specify no explicit dependency on any libpython.
309+
To isolate them, we copy them with a new name, change their `SONAME`, add a `DT_NEEDED` dependency on a copy of our libpython shared object, and finally load them with `RTLD_LOCAL`.
310+
311+
On Windows there is no global namespace so native extensions already have a dependency on our libpython DLL.
312+
We copy them and just change the dependency to point to the context-local copy of libpython rather than the global one.
313+
314+
On macOS, while two-level namespaces exist, Python extensions historically use `-undefined dynamic_lookup` where they (just like in Linux) expect to find C API functions in any loaded image.
315+
We have to apply a similar workaround as on Linux, copy to a new name, change the `LC_ID_DYLIB` to that name, and add a `LC_LOAD_DYLIB` section to make the linker load the symbols from our libpython.
316+
317+
Note that any code signatures are invalidated by this process.

docs/user/Native-Extensions.md

Lines changed: 36 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,39 @@ Please do not update `pip` or use alternative tools such as `uv`.
1818
## Embedding limitations
1919

2020
Python native extensions run by default as native binaries, with full access to the underlying system.
21-
Native code is entirely unrestricted and can circumvent any security protections Truffle or the JVM may provide.
22-
Native data structures are not subject to the Java GC and the combination of them with Java data structures may lead to memory leaks.
23-
Native libraries generally cannot be loaded multiple times into the same process, and they may contain global state that cannot be safely reset.
24-
Thus, it is not possible to create multiple GraalPy contexts that access native modules within the same JVM.
25-
This includes the case when you create a context, close it, and then create another context.
26-
The second context will not be able to access native extensions.
21+
This has a few implications:
22+
23+
1. Native code is entirely unrestricted and can circumvent any security protections Truffle or the JVM may provide.
24+
2. Native data structures are not subject to the Java GC and the combination of them with Java data structures may lead to increased memory pressure or memory leaks.
25+
3. Native libraries generally cannot be loaded multiple times into the same process, and they may contain global state that cannot be safely reset.
26+
27+
### Full Native Access
28+
29+
The Context API allows to set options such as `allowIO`, `allowHostAccess`, `allowThreads` and more on the created contexts.
30+
To use Python native extensions on GraalPy, the `allowNativeAccess` option must be set to true, but this opens the door to full native access.
31+
This means that while Python code may be denied access to the host file system, thread- or subprocess creation, and more, the native extension is under no such restriction.
32+
33+
### Memory Management
34+
35+
Python C extensions, like the CPython reference implementation, use reference counting for memory management.
36+
This is fundamentally incompatible with JVM GCs.
37+
38+
Java objects may end up being referenced from native data structures which the JVM cannot trace, so to avoid crashing, GraalPy keeps such Java objects strongly referenced.
39+
To avoid memory leaks, GraalPy implements a cycle detector that regularly traces references between Java objects and native objects that have crossed between the two worlds and cleans up strong references that are no longer needed.
40+
41+
On the other side, reference-counted native extension objects may end up being referenced from Java objects, and in this case GraalPy bumps their reference count to make them unreclaimable.
42+
Any such references to native extension objects are registered with a `java.lang.ref.WeakReference` and when the JVM GC has collected the owning Java object, the reference count of the native object is reduced again.
43+
44+
Both of these mechanisms together mean there is additional delay between objects becoming unreachable and their memory being reclaimed when compared to the CPython implementation.
45+
This can manifest in increased memory usage when running C extensions.
46+
You can tweak the Context options `python.BackgroundGCTaskInterval`, `python.BackgroundGCTaskThreshold`, and `BackgroundGCTaskMinimum` to mitigate this.
47+
They control the minimum interval between cycle detections, how much RSS memory must have increased since the last time to trigger the cycle detector, and the absolute minimum RSS under which no cycle detection should be done.
48+
You can also manually trigger the detector with the Python `gc.collect()` call.
49+
50+
### Multi-Context and Native Libraries
51+
52+
To support creating multiple GraalPy contexts that access native modules within the same JVM or Native Image, we need to isolate them from each other.
53+
The current strategy for this is to copy the libraries and modify them such that the dynamic library loader of the operating system will isolate them for us.
54+
To do this, all GraalPy contexts in the same process (not just those in the same engine!) must set the `python.IsolateNativeModules` option to `true`.
55+
56+
For more details on this, see [our implementation details](https://github.com/oracle/graalpython/blob/master/docs/contributor/IMPLEMENTATION_DETAILS.md#c-extension-copying).

graalpython/com.oracle.graal.python.cext/include/cpython/object.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
/* Copyright (c) 2020, 2023, Oracle and/or its affiliates.
1+
/* Copyright (c) 2020, 2024, Oracle and/or its affiliates.
22
* Copyright (C) 1996-2020 Python Software Foundation
33
*
44
* Licensed under the PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2
@@ -42,8 +42,8 @@ PyAPI_FUNC(Py_ssize_t) _Py_GetRefTotal(void);
4242
typedef struct _Py_Identifier {
4343
const char* string;
4444
/* XXX Truffle change: CPython migrated away from keeping the pointer directly
45-
* in the struct to support subinterpreters. We don't have subinterpreters, so
46-
* we keep the object pointer for now */
45+
* in the struct to support subinterpreters. We do subinterpreters differently,
46+
* so we keep the object pointer for now */
4747
PyObject *object;
4848
} _Py_Identifier;
4949

graalpython/com.oracle.graal.python.cext/modules/clinic/_bz2module.c.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
/* Copyright (c) 2019, 2023, Oracle and/or its affiliates.
1+
/* Copyright (c) 2019, 2024, Oracle and/or its affiliates.
22
* Copyright (C) 1996-2020 Python Software Foundation
33
*
44
* Licensed under the PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2

graalpython/com.oracle.graal.python.cext/modules/clinic/_testmultiphase.c.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
/* Copyright (c) 2022, 2023, Oracle and/or its affiliates.
1+
/* Copyright (c) 2022, 2024, Oracle and/or its affiliates.
22
* Copyright (C) 1996-2022 Python Software Foundation
33
*
44
* Licensed under the PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2

graalpython/com.oracle.graal.python.cext/modules/clinic/pyexpat.c.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
/* Copyright (c) 2021, 2023, Oracle and/or its affiliates.
1+
/* Copyright (c) 2021, 2024, Oracle and/or its affiliates.
22
* Copyright (C) 1996-2021 Python Software Foundation
33
*
44
* Licensed under the PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2

graalpython/com.oracle.graal.python.cext/src/capi.c

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -280,6 +280,13 @@ PyObject* _Py_EllipsisObjectReference;
280280
PyObject* _Py_NoneStructReference;
281281
PyObject* _Py_NotImplementedStructReference;
282282

283+
/*
284+
* This holds the thread-local reference to the PythonThreadState on the
285+
* managed side. If additional contexts used the C API with the same Python
286+
* library, we have to either update when switching contexts, or before each
287+
* downcall, or just leave this NULL at all times and incur an upcall in the
288+
* getter.
289+
*/
283290
THREAD_LOCAL PyThreadState *tstate_current = NULL;
284291

285292
static void initialize_globals() {
@@ -916,7 +923,34 @@ void initialize_hashes();
916923
// defined in 'floatobject.c'
917924
void _PyFloat_InitState(PyInterpreterState* state);
918925

926+
/*
927+
* This is used to allow Truffle to enter/leave the context on native threads
928+
* that were not created by NFI/Truffle/Java and thus not previously attached
929+
* to the context. See e.g. PyGILState_Ensure. This is used by some C
930+
* extensions to allow calling Python APIs from natively created threads. This
931+
* poses a problem if multiple contexts use the same library, since we cannot
932+
* know which context should be entered. CPython has the same problem (see
933+
* https://docs.python.org/3/c-api/init.html#bugs-and-caveats), in particular
934+
* the following quote:
935+
*
936+
* Furthermore, extensions (such as ctypes) using these APIs to allow calling
937+
* of Python code from non-Python created threads will probably be broken
938+
* when using sub-interpreters.
939+
*
940+
* If we try to use the same libpython for multiple contexts, we can only
941+
* behave in a similar (likely broken) way as CPython: natively created threads
942+
* that use the PyGIL_* APIs to allow calling into Python will attach to the
943+
* first interpreter that initialized the C API (and thus set the
944+
* TRUFFLE_CONTEXT pointer) only.
945+
*/
919946
Py_LOCAL_SYMBOL TruffleContext* TRUFFLE_CONTEXT;
947+
948+
/*
949+
* This is only set during VM shutdown, so on the native side can only be used
950+
* to guard things that do not work during VM shutdown, not to guard things
951+
* that do not work during context shutdown! (This means that it would be safe
952+
* to share this global across multiple contexts.)
953+
*/
920954
Py_LOCAL_SYMBOL int32_t graalpy_finalizing;
921955

922956
PyAPI_FUNC(void) initialize_graal_capi(TruffleEnv* env, void **builtin_closures, GCState *gc) {
@@ -928,6 +962,30 @@ PyAPI_FUNC(void) initialize_graal_capi(TruffleEnv* env, void **builtin_closures,
928962

929963
_PyGC_InitState(gc);
930964

965+
/*
966+
* Initializing all these global fields with pointers to different contexts
967+
* could be ok even if all contexts share this library and its globals.
968+
* Each native stub is allocated using the AllocateNode and filled by each
969+
* context. The long value of the stub pointer is then given an index via
970+
* nativeStubLookupReserve, and the index is stored both in the stub for
971+
* fast access from native as well as in the PythonObjectReference which
972+
* wraps the stub on the managed side. The table is per context. Given that
973+
* C API initialisation is deterministic and we initialise only with
974+
* quasi-immortal objects, those indices are never repurposed for other
975+
* objects. So, while later contexts override the pointers for those global
976+
* objects with pointers to their own stubs, the index stored in those
977+
* stubs are the same across all contexts, and thus mapping to
978+
* context-specific objects works as intended. It'd a bit dodgy that the
979+
* ob_refcnt fields of those objects would show whacky behaviour. Maybe we
980+
* could just assert that everything stored during initialisation of the
981+
* GraalPy C API has IMMORTAL_REFCNT and that all stub indices actually
982+
* match. The only real problem would be if the last context exists and
983+
* actually frees the stub memory. We would have to VM-globally delay
984+
* freeing the "latest" stubs, but that's no big deal, we would simply keep
985+
* a reference to the "latest" handle stub table globally. When it changes
986+
* and the associated context already exited, we can free it now, when
987+
* context exits and its table is the "latest", we delay freeing it.
988+
*/
931989
initialize_builtins(builtin_closures);
932990
PyTruffle_Log(PY_TRUFFLE_LOG_FINE, "initialize_builtins: %fs", ((double) (clock() - t)) / CLOCKS_PER_SEC);
933991
Py_Truffle_Options = GraalPyTruffle_Native_Options();

graalpython/com.oracle.graal.python.shell/src/com/oracle/graal/python/shell/GraalPythonMain.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,7 @@ protected List<String> preprocessArguments(List<String> givenArgs, Map<String, S
177177
boolean posixBackendSpecified = false;
178178
boolean sha3BackendSpecified = false;
179179
boolean installSignalHandlersSpecified = false;
180+
boolean isolateNativeModulesSpecified = false;
180181
for (Iterator<String> argumentIterator = arguments.iterator(); argumentIterator.hasNext();) {
181182
String arg = argumentIterator.next();
182183
origArgs.add(arg);
@@ -276,6 +277,9 @@ protected List<String> preprocessArguments(List<String> givenArgs, Map<String, S
276277
if (matchesPythonOption(arg, "InstallSignalHandlers")) {
277278
installSignalHandlersSpecified = true;
278279
}
280+
if (matchesPythonOption(arg, "IsolateNativeModules")) {
281+
isolateNativeModulesSpecified = true;
282+
}
279283
// possibly a polyglot argument
280284
unrecognized.add(arg);
281285
continue;
@@ -432,6 +436,9 @@ protected List<String> preprocessArguments(List<String> givenArgs, Map<String, S
432436
if (!installSignalHandlersSpecified) {
433437
polyglotOptions.put("python.InstallSignalHandlers", "true");
434438
}
439+
if (!isolateNativeModulesSpecified) {
440+
polyglotOptions.put("python.IsolateNativeModules", "false");
441+
}
435442
// Never emit warnings that mess up the output
436443
unrecognized.add("--engine.WarnInterpreterOnly=false");
437444
return unrecognized;

graalpython/com.oracle.graal.python.test.integration/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ Additionally, one can change the polyglot artifacts version with
177177
<groupId>org.graalvm.python</groupId>
178178
<artifactId>python-embedding</artifactId>
179179
<version>${com.oracle.graal.python.test.polyglot.version}</version>
180-
</dependency>
180+
</dependency>
181181
<dependency>
182182
<groupId>junit</groupId>
183183
<artifactId>junit</artifactId>

0 commit comments

Comments
 (0)