|
| 1 | +## Python's global thread state |
| 2 | + |
| 3 | +In CPython, each stack frame is allocated on the heap, and there's a global |
| 4 | +thread state holding on to the chain of currently handled exceptions (e.g. if |
| 5 | +you're nested inside `except:` blocks) as well as the currently flying exception |
| 6 | +(e.g. we're just unwinding the stack). |
| 7 | + |
| 8 | +In PyPy, this is done via their virtualizable frames and a global reference to |
| 9 | +the current top frame. Each frame also has a "virtual reference" to its parent |
| 10 | +frame, so code can just "force" these references to make the stack reachable if |
| 11 | +necessary. |
| 12 | + |
| 13 | +Unfortunately, the elegant solution of "virtual references" doesn't work for us, |
| 14 | +mostly because we're not a tracing JIT: we want the reference to be "virtual" |
| 15 | +even when there are multiple compilation units. With PyPy's solution, this also |
| 16 | +isn't the case, but it only hurts them for nested loops when large stacks must |
| 17 | +be forced to the heap. |
| 18 | + |
| 19 | +In Graal Python, the implementation is thus a bit more involved. Here's how it |
| 20 | +works. |
| 21 | + |
| 22 | +#### The PFrame.Reference |
| 23 | + |
| 24 | +A `PFrame.Reference` is created when entering a Python function. By default it |
| 25 | +only holds on to another reference, that of the Python caller. If there are |
| 26 | +non-Python frames between the newly entered frame and the last Python frame, |
| 27 | +those are ignored - our linked list only connects Python frames. The entry point |
| 28 | +into the interpreter has a `PFrame.Reference` with no caller. |
| 29 | + |
| 30 | +###### ExecutionContext.CallContext and ExecutionContext.CalleeContext |
| 31 | + |
| 32 | +If we're only calling between Python, we always pass our `PFrame.Reference` as |
| 33 | +implicit argument to any callees. On entry, they will create their own |
| 34 | +`PFrame.Reference` as the next link in this backwards-connected |
| 35 | +linked-list. Usually the `PFrame.Reference` doesn't hold anything else, so this |
| 36 | +is pretty cheap even in the not inlined case. |
| 37 | + |
| 38 | +When an event forces the frame to materialize on the heap, the reference is |
| 39 | +filled. This is usually only the case when someone uses `sys._getframe` or |
| 40 | +accesses the traceback of an exception. If the stack is still live, we walk the |
| 41 | +stack and insert the "calling node" and create a "PyFrame" object that mirrors |
| 42 | +the locals in the Truffle frame. But we need to be able to do this also for |
| 43 | +frames that are no longer live, e.g. when an exception was a few frames up. To |
| 44 | +ensure this, we set a boolean flag on `PFrame.Reference` to mark it as "escaped" |
| 45 | +when it is attached to an exception (or anything else), but not accessed, |
| 46 | +yet. Whenever a Python call returns and its `PFrame.Reference` was marked such, |
| 47 | +the "PyFrame" is also filled in. This way, the stack is lazily forced to the |
| 48 | +heap as we return from functions. If we're lucky and it is never actually |
| 49 | +accessed *and* the calls are all inlined, those fill-in operations can be |
| 50 | +escape-analyzed away. |
| 51 | + |
| 52 | +To implement all this, we use the ExecutionContext.CallContext and |
| 53 | +ExecutionContext.CalleeContext classes. These also use profiling information to |
| 54 | +eagerly fill in frame information if the callees actually access the stack, for |
| 55 | +example, so that no further stack walks need to take place. |
| 56 | + |
| 57 | +###### ExecutionContext.IndirectCallContext and ExecutionContext.IndirectCalleeContext |
| 58 | + |
| 59 | +If we're mixing Python frames with non-Python frames, or if we are making calls |
| 60 | +to methods and cannot pass the Truffle frame, we need to store the last |
| 61 | +`PFrame.Reference` on the context so that, if we ever return back into a Python |
| 62 | +function, it can properly link to the last frame. However, this is potentially |
| 63 | +expensive, because it means storing a linked list of frames on the context. So |
| 64 | +instead, we do it only lazily. When an "indirect" Python callee needs its |
| 65 | +caller, it initially walks the stack to find it. But it will also tell the last |
| 66 | +Python node that made a call to a "foreign" callee that it will have to store |
| 67 | +its `PFrame.Reference` globally in the future for it to be available later. |
| 68 | + |
| 69 | +#### The current PException |
| 70 | + |
| 71 | +Now that we have a mechanism to lazily make available only as much frame state |
| 72 | +as needed, we use the same mechanism to also pass the currently handled |
| 73 | +exception. Unlike CPython we do not use a stack of currently handled exceptions, |
| 74 | +instead we utilize the call stack of Java by always passing the current exception |
| 75 | +and holding on to the last (if any) in a local variable. |
| 76 | + |
| 77 | +## Abstract operations on Python objects |
| 78 | + |
| 79 | +Many generic operations on Python objects in CPython are defined in the header |
| 80 | +files `abstract.c` and `abstract.h`. These operations are widely used and their |
| 81 | +interplay and intricacies are the cause for the conversion, error message, and |
| 82 | +control flow bugs when not mimicked correctly. Our current approach is to |
| 83 | +provide many of these abstract operations as part of the |
| 84 | +`PythonObjectLibrary`. Usually, this means there are at least two messages for |
| 85 | +each operation - one that takes a `ThreadState` argument, and one that |
| 86 | +doesn't. The intent is to allow passing of exception state and caller |
| 87 | +information similar to how we do it with the `PFrame` argument even across |
| 88 | +library messages, which cannot take a VirtualFrame. |
| 89 | + |
| 90 | +All nodes that are used in message implementations must allow uncached |
| 91 | +usage. Often (e.g. in the case of the generic `CallNode`) they offer execute |
| 92 | +methods with and without frames. If a `ThreadState` was passed to the message, a |
| 93 | +frame to pass to the node can be reconstructed using |
| 94 | +`PArguments.frameForCall(threadState)`. Here's an example: |
| 95 | + |
| 96 | +```java |
| 97 | +@ExportMessage |
| 98 | +long messageWithState(ThreadState state, |
| 99 | + @Cached CallNode callNode) { |
| 100 | + Object callable = ... |
| 101 | + |
| 102 | + if (state != null) { |
| 103 | + return callNode.execute(PArguments.frameForCall(state), callable, arguments); |
| 104 | + } else { |
| 105 | + return callNode.execute(callable, arguments); |
| 106 | + } |
| 107 | +} |
| 108 | +``` |
| 109 | + |
| 110 | +*Note*: It is **always** preferable to call an `execute` method with a |
| 111 | +`VirtualFrame` when both one with and without exist! The reason is that this |
| 112 | +avoids materialization of the frame state in more cases, as described on the |
| 113 | +section on Python's global thread state above. |
0 commit comments