Skip to content

Commit 5cb8901

Browse files
committed
Merge branch 'master'
2 parents 65a6606 + 2ab52b6 commit 5cb8901

File tree

120 files changed

+5668
-2668
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

120 files changed

+5668
-2668
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,9 @@ language runtime. The main focus is on user-observable behavior of the engine.
77

88
* Jython Compatiblity: Implement `from JavaType import *` to import all static members of a Java class
99
* Jython Compatiblity: Implement importing Python code from inside JAR files by adding `path/to/jarfile.jar!path/inside/jar` to `sys.path`
10+
* Added support for date and time interop.
11+
* Added support for setting the time zone via `Context.Builder.timeZone`.
12+
* PEP 570 - Python Positional-Only Parameters implemented
1013

1114
## Version 19.3.0
1215

doc/IMPLEMENTATION_DETAILS.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
## Python's global thread state
2+
3+
In CPython, each stack frame is allocated on the heap, and there's a global
4+
thread state holding on to the chain of currently handled exceptions (e.g. if
5+
you're nested inside `except:` blocks) as well as the currently flying exception
6+
(e.g. we're just unwinding the stack).
7+
8+
In PyPy, this is done via their virtualizable frames and a global reference to
9+
the current top frame. Each frame also has a "virtual reference" to its parent
10+
frame, so code can just "force" these references to make the stack reachable if
11+
necessary.
12+
13+
Unfortunately, the elegant solution of "virtual references" doesn't work for us,
14+
mostly because we're not a tracing JIT: we want the reference to be "virtual"
15+
even when there are multiple compilation units. With PyPy's solution, this also
16+
isn't the case, but it only hurts them for nested loops when large stacks must
17+
be forced to the heap.
18+
19+
In Graal Python, the implementation is thus a bit more involved. Here's how it
20+
works.
21+
22+
#### The PFrame.Reference
23+
24+
A `PFrame.Reference` is created when entering a Python function. By default it
25+
only holds on to another reference, that of the Python caller. If there are
26+
non-Python frames between the newly entered frame and the last Python frame,
27+
those are ignored - our linked list only connects Python frames. The entry point
28+
into the interpreter has a `PFrame.Reference` with no caller.
29+
30+
###### ExecutionContext.CallContext and ExecutionContext.CalleeContext
31+
32+
If we're only calling between Python, we pass our `PFrame.Reference` as implicit
33+
argument to any callees. On entry, they will create their own `PFrame.Reference`
34+
as the next link in this backwards-connected linked-list. As an optimization, we
35+
use assumptions both on the calling node as well as on the callee root node to
36+
avoid passing the reference (in the caller) and linking it (on the callee
37+
side). This assumption is invalidated the first time the reference is actually
38+
needed. But even then, often the `PFrame.Reference` doesn't hold on to anything
39+
else, because it was only used for traversal, so this is pretty cheap even in
40+
the not inlined case.
41+
42+
When an event forces the frame to materialize on the heap, the reference is
43+
filled. This is usually only the case when someone uses `sys._getframe` or
44+
accesses the traceback of an exception. If the stack is still live, we walk the
45+
stack and insert the "calling node" and create a "PyFrame" object that mirrors
46+
the locals in the Truffle frame. But we need to be able to do this also for
47+
frames that are no longer live, e.g. when an exception was a few frames up. To
48+
ensure this, we set a boolean flag on `PFrame.Reference` to mark it as "escaped"
49+
when it is attached to an exception (or anything else), but not accessed,
50+
yet. Whenever a Python call returns and its `PFrame.Reference` was marked such,
51+
the "PyFrame" is also filled in by copying from the VirtualFrame. This way, the
52+
stack is lazily forced to the heap as we return from functions. If we're lucky
53+
and it is never actually accessed *and* the calls are all inlined, those fill-in
54+
operations can be escape-analyzed away.
55+
56+
To implement all this, we use the ExecutionContext.CallContext and
57+
ExecutionContext.CalleeContext classes. These also use profiling information to
58+
eagerly fill in frame information if the callees actually access the stack, for
59+
example, so that no further stack walks need to take place.
60+
61+
###### ExecutionContext.IndirectCallContext and ExecutionContext.IndirectCalleeContext
62+
63+
If we're mixing Python frames with non-Python frames, or if we are making calls
64+
to methods and cannot pass the Truffle frame, we need to store the last
65+
`PFrame.Reference` on the context so that, if we ever return back into a Python
66+
function, it can properly link to the last frame. However, this is potentially
67+
expensive, because it means storing a linked list of frames on the context. So
68+
instead, we do it only lazily. When an "indirect" Python callee needs its
69+
caller, it initially walks the stack to find it. But it will also tell the last
70+
Python node that made a call to a "foreign" callee that it will have to store
71+
its `PFrame.Reference` globally in the future for it to be available later.
72+
73+
#### The current PException
74+
75+
Now that we have a mechanism to lazily make available only as much frame state
76+
as needed, we use the same mechanism to also pass the currently handled
77+
exception. Unlike CPython we do not use a stack of currently handled exceptions,
78+
instead we utilize the call stack of Java by always passing the current exception
79+
and holding on to the last (if any) in a local variable.
80+
81+
## Abstract operations on Python objects
82+
83+
Many generic operations on Python objects in CPython are defined in the header
84+
files `abstract.c` and `abstract.h`. These operations are widely used and their
85+
interplay and intricacies are the cause for the conversion, error message, and
86+
control flow bugs when not mimicked correctly. Our current approach is to
87+
provide many of these abstract operations as part of the
88+
`PythonObjectLibrary`. Usually, this means there are at least two messages for
89+
each operation - one that takes a `ThreadState` argument, and one that
90+
doesn't. The intent is to allow passing of exception state and caller
91+
information similar to how we do it with the `PFrame` argument even across
92+
library messages, which cannot take a VirtualFrame.
93+
94+
All nodes that are used in message implementations must allow uncached
95+
usage. Often (e.g. in the case of the generic `CallNode`) they offer execute
96+
methods with and without frames. If a `ThreadState` was passed to the message, a
97+
frame to pass to the node can be reconstructed using
98+
`PArguments.frameForCall(threadState)`. Here's an example:
99+
100+
```java
101+
@ExportMessage
102+
long messageWithState(ThreadState state,
103+
@Cached CallNode callNode) {
104+
Object callable = ...
105+
106+
if (state != null) {
107+
return callNode.execute(PArguments.frameForCall(state), callable, arguments);
108+
} else {
109+
return callNode.execute(callable, arguments);
110+
}
111+
}
112+
```
113+
114+
*Note*: It is **always** preferable to call an `execute` method with a
115+
`VirtualFrame` when both one with and without exist! The reason is that this
116+
avoids materialization of the frame state in more cases, as described on the
117+
section on Python's global thread state above.

graalpython/com.oracle.graal.python.cext/src/unicodeobject.c

Lines changed: 169 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,109 @@ static Py_ssize_t unicode_aswidechar(PyObject *unicode, wchar_t *w, Py_ssize_t s
109109
}
110110
}
111111

112+
#define _PyUnicode_UTF8(op) \
113+
(((PyCompactUnicodeObject*)(op))->utf8)
114+
#define _PyUnicode_UTF8_LENGTH(op) \
115+
(((PyCompactUnicodeObject*)(op))->utf8_length)
116+
#define _PyUnicode_WSTR(op) \
117+
(((PyASCIIObject*)(op))->wstr)
118+
#define _PyUnicode_WSTR_LENGTH(op) \
119+
(((PyCompactUnicodeObject*)(op))->wstr_length)
120+
#define _PyUnicode_LENGTH(op) \
121+
(((PyASCIIObject *)(op))->length)
122+
#define _PyUnicode_STATE(op) \
123+
(((PyASCIIObject *)(op))->state)
124+
#define _PyUnicode_DATA_ANY(op) \
125+
(((PyUnicodeObject*)(op))->data.any)
126+
127+
POLYGLOT_DECLARE_TYPE(PyUnicodeObject);
128+
129+
PyUnicodeObject* unicode_subtype_new(PyTypeObject *type, PyObject *unicode) {
130+
PyObject *self;
131+
Py_ssize_t length, char_size;
132+
int share_wstr, share_utf8;
133+
unsigned int kind;
134+
void *data;
135+
136+
if (unicode == NULL)
137+
return NULL;
138+
assert(_PyUnicode_CHECK(unicode));
139+
if (PyUnicode_READY(unicode) == -1) {
140+
Py_DECREF(unicode);
141+
return NULL;
142+
}
143+
144+
self = type->tp_alloc(type, 0);
145+
if (self == NULL) {
146+
Py_DECREF(unicode);
147+
return NULL;
148+
}
149+
kind = PyUnicode_KIND(unicode);
150+
length = PyUnicode_GET_LENGTH(unicode);
151+
152+
_PyUnicode_LENGTH(self) = length;
153+
_PyUnicode_STATE(self).interned = 0;
154+
_PyUnicode_STATE(self).kind = kind;
155+
_PyUnicode_STATE(self).compact = 0;
156+
_PyUnicode_STATE(self).ascii = _PyUnicode_STATE(unicode).ascii;
157+
_PyUnicode_STATE(self).ready = 1;
158+
_PyUnicode_WSTR(self) = NULL;
159+
_PyUnicode_UTF8_LENGTH(self) = 0;
160+
_PyUnicode_UTF8(self) = NULL;
161+
_PyUnicode_WSTR_LENGTH(self) = 0;
162+
_PyUnicode_DATA_ANY(self) = NULL;
163+
164+
share_utf8 = 0;
165+
share_wstr = 0;
166+
if (kind == PyUnicode_1BYTE_KIND) {
167+
char_size = 1;
168+
if (PyUnicode_MAX_CHAR_VALUE(unicode) < 128)
169+
share_utf8 = 1;
170+
}
171+
else if (kind == PyUnicode_2BYTE_KIND) {
172+
char_size = 2;
173+
if (sizeof(wchar_t) == 2)
174+
share_wstr = 1;
175+
}
176+
else {
177+
assert(kind == PyUnicode_4BYTE_KIND);
178+
char_size = 4;
179+
if (sizeof(wchar_t) == 4)
180+
share_wstr = 1;
181+
}
182+
183+
/* Ensure we won't overflow the length. */
184+
if (length > (PY_SSIZE_T_MAX / char_size - 1)) {
185+
PyErr_NoMemory();
186+
// Py_DECREF(unicode);
187+
// Py_DECREF(self);
188+
return NULL;
189+
}
190+
data = malloc((length + 1) * char_size);
191+
if (data == NULL) {
192+
PyErr_NoMemory();
193+
// Py_DECREF(unicode);
194+
// Py_DECREF(self);
195+
return NULL;
196+
}
197+
198+
_PyUnicode_DATA_ANY(self) = data;
199+
if (share_utf8) {
200+
_PyUnicode_UTF8_LENGTH(self) = length;
201+
_PyUnicode_UTF8(self) = data;
202+
}
203+
if (share_wstr) {
204+
_PyUnicode_WSTR_LENGTH(self) = length;
205+
_PyUnicode_WSTR(self) = (wchar_t *)data;
206+
}
207+
208+
memcpy(data, PyUnicode_DATA(unicode),
209+
kind * (length + 1));
210+
assert(_PyUnicode_CheckConsistency(self, 1));
211+
Py_DECREF(unicode);
212+
return (PyUnicodeObject*) polyglot_from_PyUnicodeObject((PyUnicodeObject*)self);
213+
}
214+
112215
PyObject* PyUnicode_FromString(const char* o) {
113216
return to_sulong(polyglot_from_string(o, SRC_CS));
114217
}
@@ -245,9 +348,8 @@ PyObject* PyUnicode_FromObject(PyObject* o) {
245348
return UPCALL_CEXT_O(_jls_PyUnicode_FromObject, native_to_java(o));
246349
}
247350

248-
UPCALL_ID(PyUnicode_GetLength);
249351
Py_ssize_t PyUnicode_GetLength(PyObject *unicode) {
250-
return UPCALL_CEXT_L(_jls_PyUnicode_GetLength, native_to_java(unicode));
352+
return PyUnicode_GET_LENGTH(unicode);
251353
}
252354

253355
UPCALL_ID(PyUnicode_Concat);
@@ -305,7 +407,7 @@ PyObject * PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *err
305407
PyObject *result;
306408
void *jerrors = errors != NULL ? polyglot_from_string(errors, SRC_CS) : NULL;
307409
int bo = byteorder != NULL ? *byteorder : 0;
308-
return polyglot_invoke(PY_TRUFFLE_CEXT, "PyTruffle_Unicode_DecodeUTF32", s, size, native_to_java(jerrors), bo, NULL);
410+
return polyglot_invoke(PY_TRUFFLE_CEXT, "PyTruffle_Unicode_DecodeUTF32", polyglot_from_i8_array(s, size), size, native_to_java(jerrors), bo, NULL);
309411
}
310412

311413
Py_ssize_t PyUnicode_AsWideChar(PyObject *unicode, wchar_t *w, Py_ssize_t size) {
@@ -525,3 +627,67 @@ UPCALL_ID(PyUnicode_Replace);
525627
PyObject * PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount) {
526628
return UPCALL_CEXT_O(_jls_PyUnicode_Replace, native_to_java(str), native_to_java(substr), native_to_java(replstr), maxcount);
527629
}
630+
631+
/* Generic helper macro to convert characters of different types.
632+
from_type and to_type have to be valid type names, begin and end
633+
are pointers to the source characters which should be of type
634+
"from_type *". to is a pointer of type "to_type *" and points to the
635+
buffer where the result characters are written to. */
636+
#define _PyUnicode_CONVERT_BYTES(from_type, to_type, begin, end, to) \
637+
do { \
638+
to_type *_to = (to_type *)(to); \
639+
const from_type *_iter = (from_type *)(begin); \
640+
const from_type *_end = (from_type *)(end); \
641+
Py_ssize_t n = (_end) - (_iter); \
642+
const from_type *_unrolled_end = \
643+
_iter + _Py_SIZE_ROUND_DOWN(n, 4); \
644+
while (_iter < (_unrolled_end)) { \
645+
_to[0] = (to_type) _iter[0]; \
646+
_to[1] = (to_type) _iter[1]; \
647+
_to[2] = (to_type) _iter[2]; \
648+
_to[3] = (to_type) _iter[3]; \
649+
_iter += 4; _to += 4; \
650+
} \
651+
while (_iter < (_end)) \
652+
*_to++ = (to_type) *_iter++; \
653+
} while (0)
654+
655+
656+
POLYGLOT_DECLARE_TYPE(Py_UCS4);
657+
658+
/* used from Java only to decode a native unicode object */
659+
void* native_unicode_as_string(PyObject *string) {
660+
Py_UCS4 *target = NULL;
661+
int kind = 0;
662+
void *data = NULL;
663+
void *result = NULL;
664+
Py_ssize_t len;
665+
if (PyUnicode_READY(string) == -1) {
666+
PyErr_Format(PyExc_TypeError, "provided unicode object is not ready");
667+
return NULL;
668+
}
669+
kind = PyUnicode_KIND(string);
670+
data = PyUnicode_DATA(string);
671+
len = PyUnicode_GET_LENGTH(string);
672+
if (kind == PyUnicode_1BYTE_KIND) {
673+
Py_UCS1 *start = (Py_UCS1 *) data;
674+
if (PyUnicode_IS_COMPACT_ASCII(string)) {
675+
return polyglot_from_string_n((const char *)data, sizeof(Py_UCS1) * len, "ascii");
676+
}
677+
return polyglot_from_string_n((const char *)data, sizeof(Py_UCS1) * len, "latin1");
678+
}
679+
else if (kind == PyUnicode_2BYTE_KIND) {
680+
Py_UCS2 *start = (Py_UCS2 *) data;
681+
target = PyMem_New(Py_UCS4, len);
682+
if (!target) {
683+
PyErr_NoMemory();
684+
return NULL;
685+
}
686+
_PyUnicode_CONVERT_BYTES(Py_UCS2, Py_UCS4, start, start + len, target);
687+
result = polyglot_from_string_n((const char *)target, sizeof(Py_UCS4) * len, "UTF-32");
688+
free(target);
689+
return result;
690+
}
691+
assert(kind == PyUnicode_4BYTE_KIND);
692+
return polyglot_from_string_n((const char *)data, sizeof(Py_UCS4) * len, "UTF-32");
693+
}
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
* Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
2+
* Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
33
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
44
*
55
* The Universal Permissive License (UPL), Version 1.0
@@ -38,46 +38,15 @@
3838
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
3939
* SOFTWARE.
4040
*/
41-
package com.oracle.graal.python.nodes.string;
4241

43-
import com.oracle.graal.python.builtins.objects.str.LazyString;
44-
import com.oracle.graal.python.builtins.objects.str.PString;
45-
import com.oracle.truffle.api.dsl.Cached;
46-
import com.oracle.truffle.api.dsl.GenerateUncached;
47-
import com.oracle.truffle.api.dsl.Specialization;
48-
import com.oracle.truffle.api.nodes.Node;
49-
import com.oracle.truffle.api.profiles.BranchProfile;
50-
import com.oracle.truffle.api.profiles.ValueProfile;
42+
package com.oracle.graal.python.test.basic;
5143

52-
@GenerateUncached
53-
public abstract class StringLenNode extends Node {
54-
public abstract int execute(Object self);
44+
import static com.oracle.graal.python.test.PythonTests.assertPrints;
45+
import org.junit.Test;
5546

56-
@Specialization
57-
public int len(String self) {
58-
return self.length();
59-
}
60-
61-
@Specialization
62-
public int len(PString self,
63-
@Cached("createClassProfile()") ValueProfile classProfile,
64-
@Cached("create()") BranchProfile uncommonStringTypeProfile) {
65-
Object profiled = classProfile.profile(self.getCharSequence());
66-
if (profiled instanceof String) {
67-
return ((String) profiled).length();
68-
} else if (profiled instanceof LazyString) {
69-
return ((LazyString) profiled).length();
70-
} else {
71-
uncommonStringTypeProfile.enter();
72-
return ((CharSequence) profiled).length();
73-
}
74-
}
75-
76-
public static StringLenNode create() {
77-
return StringLenNodeGen.create();
78-
}
79-
80-
public static StringLenNode getUncached() {
81-
return StringLenNodeGen.getUncached();
47+
public class ComplexTexts {
48+
@Test
49+
public void negativeZero() {
50+
assertPrints("(1-0j)\n", "print(complex(1,-0.0))");
8251
}
8352
}

0 commit comments

Comments
 (0)