Skip to content

Commit f7a2c06

Browse files
committed
[GR-40566] Bytecode interpreter as default
PullRequest: graalpython/2409
2 parents 09e5d43 + d3eddc8 commit f7a2c06

File tree

22 files changed

+54
-310
lines changed

22 files changed

+54
-310
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@ language runtime. The main focus is on user-observable behavior of the engine.
55

66
## Version 22.3.0
77
* Rename GraalPython to GraalPy. This change also updates the launchers we ship to include symlinks from `python` and `python3` to `graalpy`, for better integration with other tools.
8+
* New interpreter backend based on interpreting bytecode. This change should bring better startup performance and memory footprint while retaining good JIT-compiled performance. There is no support for GraalVM instrumentation tools on the bytecode backend yet, so using one of the instrumentation options (e.g. `--inspect`) falls back on the AST backend.
9+
* New parser generated from CPython's new PEG grammar definition. It brings better compatibility and enables us to implement the `ast` module.
10+
* Added support for tracing API (`sys.settrace`) which makes `pdb` and related tools work on GraalPy.
811
* Updated our pip support to automatically choose the best version for known packages. You can use `pip install pandas`, and pip will select the versions of pandas and numpy that we test in the GraalPy CI.
912

1013
## Version 22.2.0

ci.jsonnet

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
{ "overlay": "31221c98c6c5240fb2cb8f22be51dea1b325f108" }
1+
{ "overlay": "55b5a0614a8d46864bb0197b92d2e20033e58ed7" }

docs/contributor/IMPLEMENTATION_DETAILS.md

Lines changed: 0 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,5 @@
11
# Implementation Details
22

3-
## Abstract Operations on Python Objects
4-
5-
Many generic operations on Python objects in CPython are defined in the header
6-
files `object.h` and `abstract.h`. These operations are widely used and their
7-
interplay and intricacies are the cause for the conversion, error message, and
8-
control flow bugs when not mimicked correctly. Our current approach is to
9-
provide many of these abstract operations as part of the `PythonObjectLibrary`.
10-
11-
### Common operations in the PythonObjectLibrary
12-
13-
The code has evolved over time, so not all built-in nodes are prime examples of
14-
messages that should be used from the PythonObjectLibrary. We are refactoring
15-
this as we go, but here are a few examples for things you can (or should soon be
16-
able to) use the PythonObjectLibrary for:
17-
18-
- casting and coercion to `java.lang.String`, array-sized Java `int`, Python
19-
index, fileno, `double`, filesystem path, iterator, and more
20-
- reading the class of an object
21-
- accessing the `__dict__` attribute of an object
22-
- hashing objects and testing for equality
23-
- testing for truthy-ness
24-
- getting the length
25-
- testing for abstract types such as `mapping`, `sequence`, `callable`
26-
- invoking methods or executing callables
27-
- access objects through the buffer protocol
28-
29-
### PythonObjectLibrary functions with and without state
30-
31-
Usually, there are at least two messages for each operation - one that takes a
32-
`ThreadState` argument, and one that doesn't. The intent is to allow passing of
33-
exception state and caller information similar to how we do it with the `PFrame`
34-
argument even across library messages, which cannot take a VirtualFrame.
35-
36-
All nodes that are used in message implementations must allow uncached
37-
usage. Often (e.g. in the case of the generic `CallNode`) they offer execute
38-
methods with and without frames. If a `ThreadState` was passed to the message, a
39-
frame to pass to the node can be reconstructed using
40-
`PArguments.frameForCall(threadState)`. Here's an example:
41-
42-
```java
43-
@ExportMessage
44-
long messageWithState(ThreadState state,
45-
@Cached CallNode callNode) {
46-
Object callable = ...
47-
48-
if (state != null) {
49-
return callNode.execute(PArguments.frameForCall(state), callable, arguments);
50-
} else {
51-
return callNode.execute(callable, arguments);
52-
}
53-
}
54-
```
55-
56-
*Note*: It is **always** preferable to call an `execute` method with a
57-
`VirtualFrame` when both one with and without exist! The reason is that this
58-
avoids materialization of the frame state in more cases, as described on the
59-
section on Python's global thread state above.
60-
61-
### Other libraries in the codebase
62-
63-
Accessing hashing storages (the storage for `dict`, `set`, and `frozenset`)
64-
should be done via the `HashingStorageLibrary`. We are in the process of
65-
creating a `SequenceStorageLibrary` for sequence types (`tuple`, `list`) to
66-
replace the `SequenceStorageNodes` collection of classes.
67-
683
## Python Global Thread State
694

705
In CPython, each stack frame is allocated on the heap, and there's a global

docs/contributor/MISSING.md

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,6 @@ This is just a snapshot as of 2021-07-29.
3636

3737
#### These we should re-implement
3838
* **_codecs_cn, _codecs_hk, _codecs_iso2022, _codecs_jp, _codecs_kr, _codecs_tw, _multibytecodec**: We can just use our own codecs
39-
* **_ctypes, _ctypes_test**: Work in progress
4039
* **_string**: Empty right now, but its only two methods that we can re-implement
4140
* **_tracemalloc**: Memory allocation tracing, we should substitute with the Truffle instrument.
4241
* **_uuid**: Can be implemented ourselves, is just 1 function
@@ -47,25 +46,20 @@ This is just a snapshot as of 2021-07-29.
4746
* **parser**: We need to implement this for our parser
4847

4948
### Incompleteness on our part:
50-
* **_ast**: Used in various places, including the help system. Would be nice to support, ours is an empty shell
51-
* **_contextvars**: Very incomplete
52-
* **_multiprocessing**: Work in progress
49+
* **_contextvars**: Work in progress
5350
* **_signal**: Work in progress
5451
* **mmap**: We use this as a mixture from the C module, Python, and Java code. Needs major optimizations.
5552
* **resource**: This is about resources, there should be Truffle APIs for this (there are issues open)
5653
* **unicodedata**: A bit incomplete, but not difficult. Maybe should use a Java ICU library
5754

5855
### Basically complete or easy to make so
59-
* **_collections**: We've mostly implemented this in Python (a lot is taken from PyPy), but should intrinsify the module in Java for better performance
6056
* **_md5**: We use the Python impl from PyPy, but should intrinsify as Java code for performance
6157
* **_random**
6258
* **_sha1**: We use the Python impl from PyPy, but should intrinsify as Java code for performance
6359
* **_sha256**: We use the Python impl from PyPy, but should intrinsify as Java code for performance
6460
* **_sha512**: We use the Python impl from PyPy, but should intrinsify as Java code for performance
6561
* **binascii**: Just missing a few methods
66-
* **codecs**
6762
* **functools**: Missing a few functions, we mostly implemented it in Python, but should intrinsify the module in Java for better performance
6863
* **itertools**: We mostly just implement all this in Python (a lot is taken from PyPy), but should intrinsify the module in Java for better performance
6964
* **locale**: Partially Truffle APIs, should probably use more to play nice for embedders
7065
* **readline**: We re-implemented this in terms of JLine used in our launcher
71-
* **zipimport**: We have reimplemented this, but Python 3.8 is moving to a pure-Python impl that we can use

docs/user/ParserDetails.md

Lines changed: 2 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -8,30 +8,6 @@ permalink: /reference-manual/python/ParserDetails/
88

99
This guide elaborates on how Python files are parsed on the GraalVM Python runtime.
1010

11-
## Parser Performance
12-
13-
#### Loading code from serialized `.pyc` files is faster than parsing the `.py` file using ANTLR.
14-
15-
Creating the abstract syntax tree (AST) for a Python source has two phases.
16-
The first one creates a simple syntax tree (SST) and a scope tree.
17-
The second phase transforms the SST to the [Truffle Language Implementation framework](https://github.com/oracle/graal/blob/master/truffle/docs/README.md) tree.
18-
19-
For the transformation, the scope tree it needed.
20-
The scope tree contains scope locations for variable and function definitions, and information about scopes.
21-
The simple syntax tree contains nodes mirroring the source.
22-
Comparing the SST and the Language Implementation framework tree, the SST is much smaller.
23-
It contains just the nodes representing the source in a simple way.
24-
One SST node is usually translated to many the Language Implementation framework nodes.
25-
26-
The simple syntax tree can be created in two ways: with ANTLR parsing, or deserialization from an appropriate `*.pyc` file.
27-
If there is no appropriate `.pyc` file for a source, then the source is parsed with ANTLR.
28-
If the Python standard import logic finds an appropriate `.pyc` file, it will just trigger deserialization of the SST and scope tree from it.
29-
30-
The deserialization is much faster than source parsing with ANTLR and needs only roughly 30% of the time that ANTLR needs.
31-
Of course, the first import of a new file is a little bit slower -- besides parsing with ANTLR, the Python standard library import logic serializes the resulting code object to a `.pyc` file, which in our case means
32-
the SST and scope tree are serialized such a file.
33-
34-
3511
## Creating and Managing pyc Files
3612

3713
#### `.pyc` files are created automatically by the GraalVM Python runtime when no or an invalid `.pyc` file is found matching the desired `.py` file.
@@ -48,7 +24,7 @@ The hashcode is generated only based on the Python source by calling `source.has
4824
The `.pyc` files are also regenerated if a magic number in the Python parser is changed.
4925
The magic number is hard-coded in the Python source and can not be changed by the user (unless of course that user has access to the bytecode of Python).
5026

51-
The developers of GraalVM's Python runtime change the magic number when the format of SST or scope tree binary data is altered.
27+
The developers of GraalVM's Python runtime change the magic number when the bytecode format changes.
5228
This is an implementation detail, so the magic number does not have to correspond to the version of GraalVM's Python runtime (just like in CPython).
5329
The magic number of pyc is a function of the concrete Python runtime Java code that is running.
5430

@@ -76,29 +52,7 @@ top_folder
7652
By default the `__pycache__` directory is created on the same directory level as a source code file and in this directory all `.pyc` files from the same directory are stored.
7753
This folder may store `.pyc` files created with different versions of Python (including, e.g., CPython), so the user may see files ending in `*.cpython3-6.pyc` for example.
7854

79-
The current implementation also includes a copy of the original source text in the `.pyc` file.
80-
This is a minor performance optimization so you can create a `Source` object with the path to the original source file, but you do not need to read the original `*.py` file, which speeds up the process obtaining the Language Implementation framework tree (just one file is read).
81-
The structure of a `.graalpy.pyc` file is this:
82-
```python
83-
MAGIC_NUMBER
84-
source text
85-
binary data - scope tree
86-
binary data - simple syntax tree
87-
```
88-
89-
Note that the `.pyc` files are not an effective means to hide Python library source code from guest code, since the original source can still be recovered.
90-
Even if the source were omitted, the syntax tree contains enough information to decompile into source code easily.
91-
92-
The serialized SST and scope tree are stored in a Python `code` object as well, as the content of the attribute `co_code` (which contains bytecode on CPython). For example:
93-
```python
94-
>>> def add(x, y):
95-
... return x+y
96-
...
97-
>>> add.__code__.co_code
98-
b'\x01\x00\x00\x02[]K\xbf\xd1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 ...'
99-
```
100-
101-
#### `.pyc` files are largely managed automatically by the runtime in a manner compatible to CPython. Like on CPython there are options to specify their location, and if they should be written at all, and both of these options can be changed by guest code.
55+
`.pyc` files are largely managed automatically by the runtime in a manner compatible to CPython. Like on CPython there are options to specify their location, and if they should be written at all, and both of these options can be changed by guest code.
10256

10357
The creation of `*.pyc` files can be controlled in the same ways as on CPython
10458
(c.f. https://docs.python.org/3/using/cmdline.html):
@@ -134,10 +88,6 @@ files must be removed by the embedder as required.
13488

13589
## Security Considerations
13690

137-
The serialization of SST and scope tree is hand-written and during deserialization, it is not possible to load classes other than SST Nodes.
138-
Java serialization or other frameworks are not used to serialize Java objects.
139-
The main reason is performance, but this has the effect that no class loading can be forced by a maliciously crafted `.pyc` file.
140-
14191
All file operations (obtaining the data, timestamps, and writing `pyc` files)
14292
are done through the [FileSystem API](https://www.graalvm.org/sdk/javadoc/org/graalvm/polyglot/io/FileSystem.html). Embedders can modify all of these operations by means of custom (e.g., read-only) `FileSystem` implementations.
14393
The embedder can also effectively disable the creation of `.pyc` files by disabling I/O permissions for GraalVM's Python runtime.

docs/user/Tooling.md

Lines changed: 8 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -5,25 +5,18 @@ link_title: Tooling Support for Python
55
permalink: /reference-manual/python/Tooling/
66
---
77
# Tooling Support for Python
8-
9-
GraalVM's Python runtime is incomplete and cannot launch the standard Python debugger `pdb`.
10-
However, it can run the tools that GraalVM provides.
11-
The `graalpy --help:tools` command will give you more information about tools currently supported on Python.
8+
GraalVM Python runtime can run many standard Python tools as well as tools from the GraalVM ecosystem.
9+
The `graalpy --help:tools` command will give you more information about GraalVM tools currently supported on Python.
1210

1311
## Debugger
12+
The built-in `breakpoint()` function will use `pdb` by default.
1413

15-
To enable debugging, pass the `--inspect` option to the `graalpy` launcher.
16-
For example:
17-
```shell
18-
graalpy --inspect -c "breakpoint(); import os; os.exit()"
19-
Debugger listening on port 9229.
20-
To start debugging, open the following URL in Chrome:
21-
chrome-devtools://devtools/bundled/js_app.html?ws=127.0.1.1:9229/76fcb6dd-35267eb09c3
22-
```
14+
### PDB
15+
The standard python debugger `pdb` is supported on GraalVM. Refer to the offical [PDB documentation](https://docs.python.org/3/library/pdb.html) for usage.
2316

24-
The standard Python built-in `breakpoint()` will work using the [GraalVM's Chrome Inspector](https://github.com/oracle/graal/blob/master/docs/tools/chrome-debugger.md) implementation.
25-
You can inspect variables, set watch expressions, interactively evaluate code snippets, etc.
26-
However, this only works if you pass `--inspect` or some other inspect option. Otherwise, `pdb` is triggered as on CPython (and does not currently work).
17+
### Chrome Inspector
18+
To enable [GraalVM's Chrome Inspector](https://github.com/oracle/graal/blob/master/docs/tools/chrome-debugger.md) debugger, pass the `--inspect` option to the `graalpy` launcher.
19+
The built-in `breakpoint()` function will work using the Chrome Inspector implementation when `--inspect` is passed.
2720

2821
## Code Coverage
2922

graalpython/com.oracle.graal.python.test/src/com/oracle/graal/python/test/PythonTests.java

Lines changed: 7 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -112,12 +112,7 @@ public static Context enterContext(Map<String, String> options, String[] args) {
112112
PythonTests.outArray.reset();
113113
PythonTests.errArray.reset();
114114
Context prevContext = context;
115-
Context.Builder builder = Context.newBuilder().engine(engine).allowExperimentalOptions(true).allowAllAccess(true).options(options).arguments("python", args).option("python.Executable",
116-
executable);
117-
if (usingBytecodeCompiler()) {
118-
builder.option("python.EnableBytecodeInterpreter", "true").option("python.DisableFrozenModules", "true");
119-
}
120-
context = builder.build();
115+
context = Context.newBuilder().engine(engine).allowExperimentalOptions(true).allowAllAccess(true).options(options).arguments("python", args).option("python.Executable", executable).build();
121116
context.initialize("python");
122117
assert prevContext == null;
123118
context.enter();
@@ -143,6 +138,10 @@ public static void skipOnBytecodeInterpreter() {
143138
Assume.assumeFalse(PythonOptions.EnableBytecodeInterpreter.getDefaultValue());
144139
}
145140

141+
public static void skipOnLegacyASTInterpreter() {
142+
Assume.assumeTrue(PythonOptions.EnableBytecodeInterpreter.getDefaultValue());
143+
}
144+
146145
public static void assertBenchNoError(Path scriptName, String[] args) {
147146
final ByteArrayOutputStream byteArrayErr = new ByteArrayOutputStream();
148147
final ByteArrayOutputStream byteArrayOut = new ByteArrayOutputStream();
@@ -345,23 +344,12 @@ public static File getTestFile(Path filename) {
345344
}
346345
}
347346

348-
public static boolean usingBytecodeCompiler() {
349-
return System.getProperty("useBytecodeCompiler") != null;
350-
}
351-
352-
private static org.graalvm.polyglot.Source.Builder configureBuilder(org.graalvm.polyglot.Source.Builder builder) {
353-
if (usingBytecodeCompiler()) {
354-
return builder.mimeType(PythonLanguage.MIME_TYPE_SOURCE_FOR_BYTECODE);
355-
}
356-
return builder;
357-
}
358-
359347
public static org.graalvm.polyglot.Source createSource(String source) {
360-
return configureBuilder(org.graalvm.polyglot.Source.newBuilder("python", source, "Unnamed")).buildLiteral();
348+
return org.graalvm.polyglot.Source.newBuilder("python", source, "Unnamed").buildLiteral();
361349
}
362350

363351
public static org.graalvm.polyglot.Source createSource(File path) throws IOException {
364-
return configureBuilder(org.graalvm.polyglot.Source.newBuilder("python", path)).build();
352+
return org.graalvm.polyglot.Source.newBuilder("python", path).build();
365353
}
366354

367355
public static Value runScript(String[] args, File path, OutputStream out, OutputStream err) {

graalpython/com.oracle.graal.python.test/src/com/oracle/graal/python/test/grammar/ArgumentsTests.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,11 +27,11 @@
2727

2828
import static com.oracle.graal.python.test.PythonTests.assertLastLineErrorContains;
2929
import static com.oracle.graal.python.test.PythonTests.assertPrints;
30-
import static com.oracle.graal.python.test.PythonTests.usingBytecodeCompiler;
3130

32-
import org.junit.Assume;
3331
import org.junit.Test;
3432

33+
import com.oracle.graal.python.test.PythonTests;
34+
3535
public class ArgumentsTests {
3636

3737
@Test
@@ -146,7 +146,7 @@ public void KwArgs2() {
146146
@Test
147147
public void kwargsMerge() {
148148
// TODO AST interpreter doesn't maintain the order correctly
149-
Assume.assumeTrue(usingBytecodeCompiler());
149+
PythonTests.skipOnLegacyASTInterpreter();
150150
assertPrints("{'a': 1, 'b': 2, 'c': 3, 'd': 4, 'e': 5}\n", "\n" +
151151
"def foo(**kwargs):\n" +
152152
" print(kwargs)\n" +

graalpython/com.oracle.graal.python.test/src/com/oracle/graal/python/test/runtime/TracingTests.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,16 +40,16 @@
4040
*/
4141
package com.oracle.graal.python.test.runtime;
4242

43-
import com.oracle.graal.python.test.PythonTests;
44-
import org.junit.Assume;
4543
import org.junit.Before;
4644
import org.junit.Test;
4745

46+
import com.oracle.graal.python.test.PythonTests;
47+
4848
public class TracingTests {
4949

5050
@Before
5151
public void ensureBytecode() {
52-
Assume.assumeTrue(PythonTests.usingBytecodeCompiler());
52+
PythonTests.skipOnLegacyASTInterpreter();
5353
}
5454

5555
@Test

graalpython/com.oracle.graal.python.test/src/tests/test_sys_settrace.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,7 @@ def test_case(self):
155155
return test_case
156156

157157
@unittest.skipIf(skip, 'sys.settrace only works in the bytecode interpreter')
158+
@unittest.skipIf(True, 'Disabled due to GR-40754')
158159
class TraceTests(unittest.TestCase):
159160
def trace(self, frame, event, arg):
160161
code = frame.f_code

0 commit comments

Comments
 (0)