Skip to content

Commit 29defcb

Browse files
committed
Merge branch 'master' into feature/GR-20901
# Conflicts: # CHANGELOG.md # graalpython/com.oracle.graal.python/src/com/oracle/graal/python/builtins/modules/BuiltinFunctions.java # graalpython/com.oracle.graal.python/src/com/oracle/graal/python/parser/PythonParserImpl.java # graalpython/com.oracle.graal.python/src/com/oracle/graal/python/parser/antlr/Python3Parser.java # graalpython/com.oracle.graal.python/src/com/oracle/graal/python/runtime/object/PythonObjectFactory.java
2 parents 518d092 + 249c4aa commit 29defcb

File tree

507 files changed

+16468
-9480
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

507 files changed

+16468
-9480
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,3 +78,5 @@ Python3.g4.stamp
7878
/.*.csv*
7979
/graal_dumps
8080
*.jfr
81+
82+
graalpython/com.oracle.graal.python/src/com/oracle/graal/python/parser/antlr/.antlr/

CHANGELOG.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,9 @@
33
This changelog summarizes major changes between GraalVM versions of the Python
44
language runtime. The main focus is on user-observable behavior of the engine.
55

6-
## Version 20.1.1
6+
## Version 20.2
7+
8+
* Escaping Unicode characters using the character names in strings like "\N{GREEK CAPITAL LETTER DELTA}".
79
* When a `*.py` file is imported, `*.pyc` file is created. It contains binary data to speed up parsing.
810
* Adding option `PyCachePrefix`, which is equivalent to PYTHONPYCACHEPREFIX environment variable, which is also accepted now.
911
* Adding optin `DontWriteBytecodeFlag`. Equivalent to the Python -B flag. Don't write bytecode files.

SECURITY.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Reporting Security Vulnerabilities
2+
3+
The GraalVM team values the independent security research community and believes
4+
that responsible disclosure of security vulnerabilities in GraalVM Community
5+
Edition as well as GraalVM Enterprise Edition helps us ensure the security and
6+
privacy of all our users.
7+
8+
If you believe you have found a security vulnerability, please submit a report
9+
to [email protected] preferably with a proof of concept. Please refer to
10+
[Reporting
11+
Vulnerabilities](https://www.oracle.com/corporate/security-practices/assurance/vulnerability/reporting.html)
12+
for additional information including our public encryption key for secure
13+
email. We ask that you do not contact project contributors directly or through
14+
other channels about a report.
15+
16+
### Security Updates, Alerts and Bulletins
17+
18+
GraalVM Community Edition security updates will be released on a quarterly basis
19+
in conjunction withe GraalVM Enterprise Edition security updates that are part
20+
of the Oracle Critical Patch Update program. Security updates are released on
21+
the Tuesday closest to the 17th day of January, April, July and October. A
22+
pre-release announcement will be published on the Thursday preceding each
23+
Critical Patch Update release. For additional information including past
24+
advisories, please refer to [Security
25+
Alerts](https://www.oracle.com/security-alerts/).
26+
27+
### Security-Related Information
28+
29+
Please refer to the [GraalVM Security
30+
Guide](https://www.graalvm.org/docs/security-guide/) for security related topics
31+
such as how to support trusted and less trusted code execution using the Truffle
32+
language framework, or compiler mitigations for transitive execution
33+
attacks. However please note that we do not currently support the execution of
34+
untrusted or adversarial code. Non-vulnerability related security issues may be
35+
discussed on GitHub Issues or the Security channel in the [GraalVM Slack
36+
Workspace](https://graalvm.slack.com/)
37+

THIRD_PARTY_LICENSE.txt

Lines changed: 420 additions & 1 deletion
Large diffs are not rendered by default.

ci.jsonnet

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
{ "overlay": "342d54c7c779683771fd346ae919235334523e58" }
1+
{ "overlay": "c949a434d869d789978d54fd6f3e8ebe3a03c82a" }

docs/contributor/CONTRIBUTING.md

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,24 +51,48 @@ are:
5151
- `python-unittest` - Run the unittests written in Python, including those for the C extension API
5252
- `python-license` - Check that all files have the correct copyright headers applied to them
5353

54-
###### Builtin modules and classes
54+
###### Built-In modules and classes
5555

56-
For the most part, builtin modules and classes are implemented in the
56+
For the most part, built-in modules and classes are implemented in the
5757
`com.oracle.graal.python.builtins` package. For each module or class, there's
5858
Java class annoted with `@CoreFunctions`. Each function in a module or a class
5959
is implemented in a Node annotated with `@Builtin`. Take a look at the existing
6060
implementations to get a feel for how this is done. For now, when adding new
6161
classes or modules, they need to be added to the list in
6262
`com.oracle.graal.python.builtins.Python3Core`.
6363

64-
Some builtin functions, modules, and classes are implemented in pure Python. The
64+
Some built-in functions, modules, and classes are implemented in pure Python. The
6565
files for this are in `graalpython/lib-graalpython`. These files are listed in
6666
the Java `com.oracle.graal.python.builtins.Python3Core` class. Take a look at
6767
these files to see what they do. If a file is called exactly as a built-in
6868
module is, it is executed in the context of that module during startup, so some
6969
of our modules are implemented both in Java and Python. If the name matches no
7070
existing module, the file is executed just for the side-effects.
7171

72+
When implementing a new (or fixing an existing) built-in, take a look at the
73+
CPython source. The layout and naming of modules and types is kept similar to
74+
the CPython source so it should be relatively easy to find the right piece of
75+
code. For some special dunder methods (`__add__`, `__getitem__`,
76+
`__getattribute__`, ...) you may have to figure out the C API slot names for
77+
them to find the right piece of code (`nb_add`, `sq_item`, `tp_getattr`, ...).
78+
79+
You will find that often there are specific C API methods that are called to
80+
convert or coerce arguments, to look up methods either starting on the object or
81+
only on the class, to call a callable object or invoke a method, and more. In
82+
general, most of these methods should have equivalents in our
83+
`PythonObjectLibrary`. See the
84+
[IMPLEMENTATION_DETAILS.md](./IMPLEMENTATION_DETAILS.md) file for details on
85+
that library. If something is missing that is commonly used, we probably have
86+
some Node for it, but it may be a good idea to add it to the
87+
`PythonObjectLibrary` for easier discovery.
88+
89+
Sometimes, you will not easily find what exactly happens for a given piece of
90+
code when that involves more than just a simple built-in call. The `dis` module
91+
on CPython can often help get an angle on what a particular piece of code is
92+
doing. You can call `dis.dis` on any Python function and it will print you
93+
details of the bytecode and associated data, which can be a good starting point
94+
to browse through the CPython source.
95+
7296
###### Python C API
7397

7498
The C implementation and headers for our C API are in

docs/contributor/IMPLEMENTATION_DETAILS.md

Lines changed: 70 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,71 @@
11
# Implementation Details
22

3-
### Python Global Thread State
3+
## Abstract Operations on Python Objects
4+
5+
Many generic operations on Python objects in CPython are defined in the header
6+
files `object.h` and `abstract.h`. These operations are widely used and their
7+
interplay and intricacies are the cause for the conversion, error message, and
8+
control flow bugs when not mimicked correctly. Our current approach is to
9+
provide many of these abstract operations as part of the `PythonObjectLibrary`.
10+
11+
### Common operations in the PythonObjectLibrary
12+
13+
The code has evolved over time, so not all built-in nodes are prime examples of
14+
messages that should be used from the PythonObjectLibrary. We are refactoring
15+
this as we go, but here are a few examples for things you can (or should soon be
16+
able to) use the PythonObjectLibrary for:
17+
18+
- casting and coercion to `java.lang.String`, array-sized Java `int`, Python
19+
index, fileno, `double`, filesystem path, iterator, and more
20+
- reading the class of an object
21+
- accessing the `__dict__` attribute of an object
22+
- hashing objects and testing for equality
23+
- testing for truthy-ness
24+
- getting the length
25+
- testing for abstract types such as `mapping`, `sequence`, `callable`
26+
- invoking methods or executing callables
27+
- access objects through the buffer protocol
28+
29+
### PythonObjectLibrary functions with and without state
30+
31+
Usually, there are at least two messages for each operation - one that takes a
32+
`ThreadState` argument, and one that doesn't. The intent is to allow passing of
33+
exception state and caller information similar to how we do it with the `PFrame`
34+
argument even across library messages, which cannot take a VirtualFrame.
35+
36+
All nodes that are used in message implementations must allow uncached
37+
usage. Often (e.g. in the case of the generic `CallNode`) they offer execute
38+
methods with and without frames. If a `ThreadState` was passed to the message, a
39+
frame to pass to the node can be reconstructed using
40+
`PArguments.frameForCall(threadState)`. Here's an example:
41+
42+
```java
43+
@ExportMessage
44+
long messageWithState(ThreadState state,
45+
@Cached CallNode callNode) {
46+
Object callable = ...
47+
48+
if (state != null) {
49+
return callNode.execute(PArguments.frameForCall(state), callable, arguments);
50+
} else {
51+
return callNode.execute(callable, arguments);
52+
}
53+
}
54+
```
55+
56+
*Note*: It is **always** preferable to call an `execute` method with a
57+
`VirtualFrame` when both one with and without exist! The reason is that this
58+
avoids materialization of the frame state in more cases, as described on the
59+
section on Python's global thread state above.
60+
61+
### Other libraries in the codebase
62+
63+
Accessing hashing storages (the storage for `dict`, `set`, and `frozenset`)
64+
should be done via the `HashingStorageLibrary`. We are in the process of
65+
creating a `SequenceStorageLibrary` for sequence types (`tuple`, `list`) to
66+
replace the `SequenceStorageNodes` collection of classes.
67+
68+
## Python Global Thread State
469

570
In CPython, each stack frame is allocated on the heap, and there's a global
671
thread state holding on to the chain of currently handled exceptions (e.g. if
@@ -21,15 +86,15 @@ be forced to the heap.
2186
In Graal Python, the implementation is thus a bit more involved. Here's how it
2287
works.
2388

24-
#### The PFrame.Reference
89+
### The PFrame.Reference
2590

2691
A `PFrame.Reference` is created when entering a Python function. By default it
2792
only holds on to another reference, that of the Python caller. If there are
2893
non-Python frames between the newly entered frame and the last Python frame,
2994
those are ignored - our linked list only connects Python frames. The entry point
3095
into the interpreter has a `PFrame.Reference` with no caller.
3196

32-
###### ExecutionContext.CallContext and ExecutionContext.CalleeContext
97+
#### ExecutionContext.CallContext and ExecutionContext.CalleeContext
3398

3499
If we're only calling between Python, we pass our `PFrame.Reference` as implicit
35100
argument to any callees. On entry, they will create their own `PFrame.Reference`
@@ -60,7 +125,7 @@ ExecutionContext.CalleeContext classes. These also use profiling information to
60125
eagerly fill in frame information if the callees actually access the stack, for
61126
example, so that no further stack walks need to take place.
62127

63-
###### ExecutionContext.IndirectCallContext and ExecutionContext.IndirectCalleeContext
128+
#### ExecutionContext.IndirectCallContext and ExecutionContext.IndirectCalleeContext
64129

65130
If we're mixing Python frames with non-Python frames, or if we are making calls
66131
to methods and cannot pass the Truffle frame, we need to store the last
@@ -72,48 +137,10 @@ caller, it initially walks the stack to find it. But it will also tell the last
72137
Python node that made a call to a "foreign" callee that it will have to store
73138
its `PFrame.Reference` globally in the future for it to be available later.
74139

75-
#### The current PException
140+
### The current PException
76141

77142
Now that we have a mechanism to lazily make available only as much frame state
78143
as needed, we use the same mechanism to also pass the currently handled
79144
exception. Unlike CPython we do not use a stack of currently handled exceptions,
80145
instead we utilize the call stack of Java by always passing the current exception
81146
and holding on to the last (if any) in a local variable.
82-
83-
### Abstract Operations on Python Objects
84-
85-
Many generic operations on Python objects in CPython are defined in the header
86-
files `abstract.c` and `abstract.h`. These operations are widely used and their
87-
interplay and intricacies are the cause for the conversion, error message, and
88-
control flow bugs when not mimicked correctly. Our current approach is to
89-
provide many of these abstract operations as part of the
90-
`PythonObjectLibrary`. Usually, this means there are at least two messages for
91-
each operation - one that takes a `ThreadState` argument, and one that
92-
doesn't. The intent is to allow passing of exception state and caller
93-
information similar to how we do it with the `PFrame` argument even across
94-
library messages, which cannot take a VirtualFrame.
95-
96-
All nodes that are used in message implementations must allow uncached
97-
usage. Often (e.g. in the case of the generic `CallNode`) they offer execute
98-
methods with and without frames. If a `ThreadState` was passed to the message, a
99-
frame to pass to the node can be reconstructed using
100-
`PArguments.frameForCall(threadState)`. Here's an example:
101-
102-
```java
103-
@ExportMessage
104-
long messageWithState(ThreadState state,
105-
@Cached CallNode callNode) {
106-
Object callable = ...
107-
108-
if (state != null) {
109-
return callNode.execute(PArguments.frameForCall(state), callable, arguments);
110-
} else {
111-
return callNode.execute(callable, arguments);
112-
}
113-
}
114-
```
115-
116-
*Note*: It is **always** preferable to call an `execute` method with a
117-
`VirtualFrame` when both one with and without exist! The reason is that this
118-
avoids materialization of the frame state in more cases, as described on the
119-
section on Python's global thread state above.

docs/user/FAQ.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Frequently Asked Questions
2+
3+
### Does module/package XYZ work on GraalPython?
4+
5+
It depends, but is currently unlikely. Our first goal with GraalPython was to
6+
show that we can run NumPy and related packages using the managed GraalVM LLVM
7+
implementation. Now that we have done so we are hard at work to improve the
8+
number of passing CPython unittests. We are also beginning to track our
9+
compatibility with popular PyPI packages and expect to increase our coverage
10+
there soon.
11+
12+
### Can GraalPython replace my Jython use case?
13+
14+
We hope it can, but there are some caveats, like Python code subclassing Java
15+
classes or use through the `javax.script.ScriptEngine` not being
16+
supported. Please see our [migration document](./JYTHON) for details.
17+
18+
### Do I need to compile and run native modules as LLVM bitcode to use GraalPython?
19+
20+
If you want to run C extensions or use certain built-in features, yes, you need
21+
to build the module with GraalPython and then it will run using the GraalVM LLVM
22+
runtime. However, many of the core features of Python (including e.g. large
23+
parts of the `os` API) are implemented in pure Java and many standard library
24+
modules and packages work without running any LLVM bitcode. So even though
25+
GraalPython depends on GraalVM LLVM, for many use cases you can disallow native
26+
modules entirely.
27+
28+
### Can I use GraalVM sandboxing features with GraalPython?
29+
30+
Yes! As an embedder, you can selectively disable features. As an example, you
31+
can disable native code execution or filesystem access. If you are a user of
32+
GraalVM Enterprise Edition, you will also find that the managed execution mode
33+
for LLVM fully works for running extensions such as NumPy in a safer manner.
34+
35+
### Do all the GraalVM polyglot features work?
36+
37+
We are doing our best to ensure the polyglot features of GraalVM work as a
38+
Python user would expect. There are still many cases where expectations are
39+
unclear or where multiple behaviors are imaginable. We are actively looking at
40+
use cases and are continuously evolving the implementation to provide the most
41+
convenient and least surprising behavior.
42+
43+
### What is the performance I can expect from GraalPython?
44+
45+
For pure Python code, performance after warm-up can be expected to be around 5-6
46+
times faster than CPython 3.8 (or 6-7x faster than Jython). For native
47+
extensions running as LLVM bitcode, we are currently slower than CPython - you
48+
can expect to see between 0.1x and 0.5x performance.
49+
50+
### I heard languages with JIT compilers have slow startup. Is that true for GraalPython?
51+
52+
It depends. When you use the GraalVM native image feature with GraalPython or
53+
use the GraalPython launcher in GraalVM its startup is competitive with
54+
CPython. In any case, both with native image or when running on JVM we first
55+
need to warm up to reach peak performance. This is a complicated story in
56+
itself, but in general it can take a while (a minute or two) after you have
57+
reached and are running your core workload. We are continuously working on
58+
improving this.
59+
60+
### Can I share warmed up code between multiple Python contexts?
61+
62+
Yes, this works, and you will find that starting up multiple contexts in the
63+
same engine and running the same or similar code in them will get increasingly
64+
faster, because the compiled code is shared across contexts. However, the peak
65+
performance in this setup is currently lower than in the single context case.

docs/user/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@
33
A primary goal of this Python implementation is to support SciPy and its
44
constituent libraries as well as work with other data science and machine
55
learning libraries from the rich Python ecosystem. At this point, the Python
6-
implementation is made available for experimentation and curious end-users.
6+
implementation is made available for experimentation and curious end-users. See
7+
our [FAQ](./FAQ) for commonly asked questions about this Python implementation.

graalpython/com.oracle.graal.python.benchmarks/python/harness.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -292,6 +292,17 @@ def run(self):
292292
else:
293293
print("### iteration=%s, name=%s, duration=%s" % (iteration, self.bench_module.__name__,
294294
duration_str))
295+
# a bit of fuzzy logic to avoid timing out on configurations
296+
# that are slow, without having to rework our logic for getting
297+
# default iterations
298+
if os.environ.get("CI") and iteration >= 4 and duration > 20:
299+
import statistics
300+
v = durations[-4:]
301+
if statistics.stdev(v) / min(v) < 0.03:
302+
# with less than 3 percent variance across ~20s
303+
# iterations, we can safely stop here
304+
break
305+
295306

296307
print(_HRULE)
297308
print("### teardown ... ")

0 commit comments

Comments
 (0)