Skip to content

Commit fe11052

Browse files
committed
add docs on implementation details
1 parent 27439ed commit fe11052

File tree

1 file changed

+113
-0
lines changed

1 file changed

+113
-0
lines changed

doc/IMPLEMENTATION_DETAILS.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
## Python's global thread state
2+
3+
In CPython, each stack frame is allocated on the heap, and there's a global
4+
thread state holding on to the chain of currently handled exceptions (e.g. if
5+
you're nested inside `except:` blocks) as well as the currently flying exception
6+
(e.g. we're just unwinding the stack).
7+
8+
In PyPy, this is done via their virtualizable frames and a global reference to
9+
the current top frame. Each frame also has a "virtual reference" to its parent
10+
frame, so code can just "force" these references to make the stack reachable if
11+
necessary.
12+
13+
Unfortunately, the elegant solution of "virtual references" doesn't work for us,
14+
mostly because we're not a tracing JIT: we want the reference to be "virtual"
15+
even when there are multiple compilation units. With PyPy's solution, this also
16+
isn't the case, but it only hurts them for nested loops when large stacks must
17+
be forced to the heap.
18+
19+
In Graal Python, the implementation is thus a bit more involved. Here's how it
20+
works.
21+
22+
#### The PFrame.Reference
23+
24+
A `PFrame.Reference` is created when entering a Python function. By default it
25+
only holds on to another reference, that of the Python caller. If there are
26+
non-Python frames between the newly entered frame and the last Python frame,
27+
those are ignored - our linked list only connects Python frames. The entry point
28+
into the interpreter has a `PFrame.Reference` with no caller.
29+
30+
###### ExecutionContext.CallContext and ExecutionContext.CalleeContext
31+
32+
If we're only calling between Python, we always pass our `PFrame.Reference` as
33+
implicit argument to any callees. On entry, they will create their own
34+
`PFrame.Reference` as the next link in this backwards-connected
35+
linked-list. Usually the `PFrame.Reference` doesn't hold anything else, so this
36+
is pretty cheap even in the not inlined case.
37+
38+
When an event forces the frame to materialize on the heap, the reference is
39+
filled. This is usually only the case when someone uses `sys._getframe` or
40+
accesses the traceback of an exception. If the stack is still live, we walk the
41+
stack and insert the "calling node" and create a "PyFrame" object that mirrors
42+
the locals in the Truffle frame. But we need to be able to do this also for
43+
frames that are no longer live, e.g. when an exception was a few frames up. To
44+
ensure this, we set a boolean flag on `PFrame.Reference` to mark it as "escaped"
45+
when it is attached to an exception (or anything else), but not accessed,
46+
yet. Whenever a Python call returns and its `PFrame.Reference` was marked such,
47+
the "PyFrame" is also filled in. This way, the stack is lazily forced to the
48+
heap as we return from functions. If we're lucky and it is never actually
49+
accessed *and* the calls are all inlined, those fill-in operations can be
50+
escape-analyzed away.
51+
52+
To implement all this, we use the ExecutionContext.CallContext and
53+
ExecutionContext.CalleeContext classes. These also use profiling information to
54+
eagerly fill in frame information if the callees actually access the stack, for
55+
example, so that no further stack walks need to take place.
56+
57+
###### ExecutionContext.IndirectCallContext and ExecutionContext.IndirectCalleeContext
58+
59+
If we're mixing Python frames with non-Python frames, or if we are making calls
60+
to methods and cannot pass the Truffle frame, we need to store the last
61+
`PFrame.Reference` on the context so that, if we ever return back into a Python
62+
function, it can properly link to the last frame. However, this is potentially
63+
expensive, because it means storing a linked list of frames on the context. So
64+
instead, we do it only lazily. When an "indirect" Python callee needs its
65+
caller, it initially walks the stack to find it. But it will also tell the last
66+
Python node that made a call to a "foreign" callee that it will have to store
67+
its `PFrame.Reference` globally in the future for it to be available later.
68+
69+
#### The current PException
70+
71+
Now that we have a mechanism to lazily make available only as much frame state
72+
as needed, we use the same mechanism to also pass the currently handled
73+
exception. Unlike CPython we do not use a stack of currently handled exceptions,
74+
instead we utilize the call stack of Java by always passing the current exception
75+
and holding on to the last (if any) in a local variable.
76+
77+
## Abstract operations on Python objects
78+
79+
Many generic operations on Python objects in CPython are defined in the header
80+
files `abstract.c` and `abstract.h`. These operations are widely used and their
81+
interplay and intricacies are the cause for the conversion, error message, and
82+
control flow bugs when not mimicked correctly. Our current approach is to
83+
provide many of these abstract operations as part of the
84+
`PythonObjectLibrary`. Usually, this means there are at least two messages for
85+
each operation - one that takes a `ThreadState` argument, and one that
86+
doesn't. The intent is to allow passing of exception state and caller
87+
information similar to how we do it with the `PFrame` argument even across
88+
library messages, which cannot take a VirtualFrame.
89+
90+
All nodes that are used in message implementations must allow uncached
91+
usage. Often (e.g. in the case of the generic `CallNode`) they offer execute
92+
methods with and without frames. If a `ThreadState` was passed to the message, a
93+
frame to pass to the node can be reconstructed using
94+
`PArguments.frameForCall(threadState)`. Here's an example:
95+
96+
```java
97+
@ExportMessage
98+
long messageWithState(ThreadState state,
99+
@Cached CallNode callNode) {
100+
Object callable = ...
101+
102+
if (state != null) {
103+
return callNode.execute(PArguments.frameForCall(state), callable, arguments);
104+
} else {
105+
return callNode.execute(callable, arguments);
106+
}
107+
}
108+
```
109+
110+
*Note*: It is **always** preferable to call an `execute` method with a
111+
`VirtualFrame` when both one with and without exist! The reason is that this
112+
avoids materialization of the frame state in more cases, as described on the
113+
section on Python's global thread state above.

0 commit comments

Comments
 (0)