Skip to content

Commit 50c4659

Browse files
committed
Adding documentation.
1 parent e2f4f23 commit 50c4659

File tree

2 files changed

+88
-0
lines changed

2 files changed

+88
-0
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,10 @@
33
This changelog summarizes major changes between GraalVM versions of the Python
44
language runtime. The main focus is on user-observable behavior of the engine.
55

6+
## Version 20.1.1
7+
* When a `*.py` file is imported, `*.pyc` file is created. It contains binary data to speed up parsing.
8+
9+
610
## Version 20.1.0
711

812
* Update language support target and standard library to 3.8.2

docs/contributor/PARSER_DETAILS.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
This document is about parsing python files in GraalPython implementation.
2+
It describes way how we obtain Truffle tree from a source.
3+
4+
Creating Truffle tree for a python source has two phases. The first one creates
5+
simple syntax tree (SST) and scope tree, the second phase transforms the SST to
6+
the Truffle tree and for the transformation we need scope tree. The scope tree
7+
contains scope locations for variable and function definitions and information
8+
about scopes. The simple syntax tree contains nodes mirroring the source.
9+
Comparing SST and Truffle tree, the SST is much smaller. It contains just the nodes
10+
representing the source in a simple way. One SST node is usually translated
11+
to many Truffle nodes.
12+
13+
The simple syntax tree can be created in two ways. With ANTLR parsing
14+
or deserialization from appropriate `*.pyc` file. In both cases together with
15+
scope tree. If there is no appropriate `.pyc` file for a source, then the source
16+
is parsed with ANTLR and result SST and scope tree is serialized to the `.pyc` file.
17+
The next time, we don't have to use ANTLR parser, because the result is already
18+
serialized in the `.pyc` file. So instead of parsing source file with ANTLR,
19+
we just deserialized SST and scope tree from the `.pyc` file. The deserialization
20+
is much faster then source parsing with ANTLR. The deserialization needs ruffly
21+
just 30% of the time that needs ANTLR parser. Of course the first run is little
22+
bit slower (we need to SST and scope tree save to the `.pyc` file).
23+
24+
In the folder structure it looks like this:
25+
26+
```
27+
top_folder
28+
__pycache__
29+
sourceA.graalpython.pyc
30+
sourceB.graalpython.pyc
31+
sourceA.py
32+
sourceB.py
33+
sub_folder
34+
__pycache__
35+
sourceX.graalpython.pyc
36+
sourceX.py
37+
```
38+
39+
On the same directory level of a source code file, the `__pycache__` directory
40+
is created and in this directory are stored all `.*pyc` files from the same
41+
directory. There can be also files created with CPython, so user can see there
42+
also files with extension `*.cpython3-6.pyc` for example.
43+
44+
The current implementation includes also copy of the original text into `.pyc' file.
45+
The reason is that we create from this Truffle Source object with path to the
46+
original source file, but we do not need to read the original `*.py` file, which
47+
speed up the process obtaining Truffle tree (we read just one file).
48+
49+
The structure of a `.graalpython.pyc` file is this:
50+
51+
```
52+
MAGIC_NUMBER
53+
source text
54+
binary data - scope tree
55+
binary data - simple syntax tree
56+
```
57+
58+
The serialized SST and scope tree is stored in Code object as well, attribute `code`
59+
60+
For example:
61+
```
62+
>>> def add(x, y):
63+
... print('Running x+y')
64+
... return x+y
65+
...
66+
>>> co = add.__code__
67+
>>> co.co_code
68+
b'\x01\x00\x00\x02[]K\xbf\xd1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 ...'
69+
```
70+
71+
The creating `*.pyc` files can be canceled / allowed in the same ways like in CPython:
72+
73+
* evironment variable: PYTHONDONTWRITEBYTECODE - If this is set to a non-empty string,
74+
Python won’t try to write .pyc files on the import of source modules.
75+
* command line option: -B, If given, Python won’t try to write .pyc files on
76+
the import of source modules.
77+
* in a code: setting attribute `dont_write_bytecode` of `sys` built in module
78+
79+
80+
## Security
81+
The serialization of SST and scope tree is hand written and during deserialization
82+
is not possible to load other classes then SSTNodes. It doesn't use Java serialization
83+
or other framework to serialize Java object. The main reason was performance.
84+
The performance can be maximize in this way. The next reason was the security.

0 commit comments

Comments
 (0)