Skip to content

Commit 2ad6559

Browse files
committed
[GR-20901] Better (de)serialization of code objects
PullRequest: graalpython/876
2 parents 9909b30 + 3b175aa commit 2ad6559

File tree

151 files changed

+5621
-718
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

151 files changed

+5621
-718
lines changed

CHANGELOG.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ language runtime. The main focus is on user-observable behavior of the engine.
66
## Version 20.2
77

88
* Escaping Unicode characters using the character names in strings like "\N{GREEK CAPITAL LETTER DELTA}".
9+
* When a `*.py` file is imported, `*.pyc` file is created. It contains binary data to speed up parsing.
10+
* Adding option `PyCachePrefix`, which is equivalent to PYTHONPYCACHEPREFIX environment variable, which is also accepted now.
11+
* Adding optin `DontWriteBytecodeFlag`. Equivalent to the Python -B flag. Don't write bytecode files.
12+
* Command option -B works
913

1014
## Version 20.1.0
1115

docs/contributor/PARSER_DETAILS.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
This document is about parsing python files in GraalPython implementation.
2+
It describes way how we obtain Truffle tree from a source.
3+
4+
Creating Truffle tree for a python source has two phases. The first one creates
5+
simple syntax tree (SST) and scope tree, the second phase transforms the SST to
6+
the Truffle tree and for the transformation we need scope tree. The scope tree
7+
contains scope locations for variable and function definitions and information
8+
about scopes. The simple syntax tree contains nodes mirroring the source.
9+
Comparing SST and Truffle tree, the SST is much smaller. It contains just the nodes
10+
representing the source in a simple way. One SST node is usually translated
11+
to many Truffle nodes.
12+
13+
The simple syntax tree can be created in two ways. With ANTLR parsing
14+
or deserialization from appropriate `*.pyc` file. In both cases together with
15+
scope tree. If there is no appropriate `.pyc` file for a source, then the source
16+
is parsed with ANTLR and result SST and scope tree is serialized to the `.pyc` file.
17+
The next time, we don't have to use ANTLR parser, because the result is already
18+
serialized in the `.pyc` file. So instead of parsing source file with ANTLR,
19+
we just deserialized SST and scope tree from the `.pyc` file. The deserialization
20+
is much faster then source parsing with ANTLR. The deserialization needs ruffly
21+
just 30% of the time that needs ANTLR parser. Of course the first run is little
22+
bit slower (we need to SST and scope tree save to the `.pyc` file).
23+
24+
In the folder structure it looks like this:
25+
26+
```
27+
top_folder
28+
__pycache__
29+
sourceA.graalpython.pyc
30+
sourceB.graalpython.pyc
31+
sourceA.py
32+
sourceB.py
33+
sub_folder
34+
__pycache__
35+
sourceX.graalpython.pyc
36+
sourceX.py
37+
```
38+
39+
On the same directory level of a source code file, the `__pycache__` directory
40+
is created and in this directory are stored all `.*pyc` files from the same
41+
directory. There can be also files created with CPython, so user can see there
42+
also files with extension `*.cpython3-6.pyc` for example.
43+
44+
The current implementation includes also copy of the original text into `.pyc' file.
45+
The reason is that we create from this Truffle Source object with path to the
46+
original source file, but we do not need to read the original `*.py` file, which
47+
speed up the process obtaining Truffle tree (we read just one file).
48+
49+
The structure of a `.graalpython.pyc` file is this:
50+
51+
```
52+
MAGIC_NUMBER
53+
source text
54+
binary data - scope tree
55+
binary data - simple syntax tree
56+
```
57+
58+
The serialized SST and scope tree is stored in Code object as well, attribute `code`
59+
60+
For example:
61+
```
62+
>>> def add(x, y):
63+
... print('Running x+y')
64+
... return x+y
65+
...
66+
>>> co = add.__code__
67+
>>> co.co_code
68+
b'\x01\x00\x00\x02[]K\xbf\xd1\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00 ...'
69+
```
70+
71+
The creating `*.pyc` files can be canceled / allowed in the same ways like in CPython:
72+
73+
* evironment variable: PYTHONDONTWRITEBYTECODE - If this is set to a non-empty string,
74+
Python won’t try to write .pyc files on the import of source modules.
75+
* command line option: -B, If given, Python won’t try to write .pyc files on
76+
the import of source modules.
77+
* in a code: setting attribute `dont_write_bytecode` of `sys` built in module
78+
79+
80+
## Security
81+
The serialization of SST and scope tree is hand written and during deserialization
82+
is not possible to load other classes then SSTNodes. It doesn't use Java serialization
83+
or other framework to serialize Java object. The main reason was performance.
84+
The performance can be maximize in this way. The next reason was the security.

graalpython/com.oracle.graal.python.shell/src/com/oracle/graal/python/shell/GraalPythonMain.java

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ public static void main(String[] args) {
8787
private List<String> relaunchArgs;
8888
private boolean wantsExperimental = false;
8989
private Map<String, String> enginePolyglotOptions;
90+
private boolean dontWriteBytecode = false;
9091

9192
@Override
9293
protected List<String> preprocessArguments(List<String> givenArgs, Map<String, String> polyglotOptions) {
@@ -102,6 +103,7 @@ protected List<String> preprocessArguments(List<String> givenArgs, Map<String, S
102103
String arg = arguments.get(i);
103104
switch (arg) {
104105
case "-B":
106+
dontWriteBytecode = true;
105107
break;
106108
case "-c":
107109
i += 1;
@@ -374,6 +376,11 @@ protected void launch(Builder contextBuilder) {
374376
noUserSite = noUserSite || System.getenv("PYTHONNOUSERSITE") != null;
375377
verboseFlag = verboseFlag || System.getenv("PYTHONVERBOSE") != null;
376378
unbufferedIO = unbufferedIO || System.getenv("PYTHONUNBUFFERED") != null;
379+
dontWriteBytecode = dontWriteBytecode || System.getenv("PYTHONDONTWRITEBYTECODE") != null;
380+
String cachePrefix = System.getenv("PYTHONPYCACHEPREFIX");
381+
if (cachePrefix != null) {
382+
contextBuilder.option("python.PyCachePrefix", cachePrefix);
383+
}
377384
}
378385

379386
String executable = getContextOptionIfSetViaCommandLine("python.Executable");
@@ -392,6 +399,7 @@ protected void launch(Builder contextBuilder) {
392399
contextBuilder.option("python.InspectFlag", Boolean.toString(inspectFlag));
393400
contextBuilder.option("python.VerboseFlag", Boolean.toString(verboseFlag));
394401
contextBuilder.option("python.IsolateFlag", Boolean.toString(isolateFlag));
402+
contextBuilder.option("python.DontWriteBytecodeFlag", Boolean.toString(dontWriteBytecode));
395403
if (verboseFlag) {
396404
contextBuilder.option("log.python.level", "FINE");
397405
}
@@ -548,8 +556,7 @@ protected String getLanguageId() {
548556
protected void printHelp(OptionCategory maxCategory) {
549557
print("usage: python [option] ... (-c cmd | file) [arg] ...\n" +
550558
"Options and arguments (and corresponding environment variables):\n" +
551-
"-B : on CPython, this disables writing .py[co] files on import;\n" +
552-
" GraalPython does not use bytecode, and thus this flag has no effect\n" +
559+
"-B : this disables writing .py[co] files on import\n" +
553560
"-c cmd : program passed in as string (terminates option list)\n" +
554561
// "-d : debug output from parser; also PYTHONDEBUG=x\n" +
555562
"-E : ignore PYTHON* environment variables (such as PYTHONPATH)\n" +
@@ -601,6 +608,8 @@ protected void printHelp(OptionCategory maxCategory) {
601608
" as specifying the -R option: a random value is used to seed the hashes of\n" +
602609
" str, bytes and datetime objects. It can also be set to an integer\n" +
603610
" in the range [0,4294967295] to get hash values with a predictable seed.\n" +
611+
"PYTHONPYCACHEPREFIX: if this is set, GraalPython will write .pyc files in a mirror\n" +
612+
" directory tree at this path, instead of in __pycache__ directories within the source tree.\n" +
604613
"GRAAL_PYTHON_ARGS: the value is added as arguments as if passed on the\n" +
605614
" commandline. There is one special case: any `$$' in the value is replaced\n" +
606615
" with the current process id. To pass a literal `$$', you must escape the\n" +

graalpython/com.oracle.graal.python.test/src/com/oracle/graal/python/test/parser/FuncDefTests.java

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,18 @@ public void functionDef21() throws Exception {
191191
"foo(1,2)\n");
192192
}
193193

194+
@Test
195+
public void functionDef22() throws Exception {
196+
checkScopeAndTree(
197+
"def test():\n" +
198+
" a = 1;\n" +
199+
" def fn1(): pass\n" +
200+
" def fn2(): pass\n" +
201+
" return locals()\n" +
202+
"\n" +
203+
"print(test())\n");
204+
}
205+
194206
@Test
195207
public void decorator01() throws Exception {
196208
checkScopeAndTree();

graalpython/com.oracle.graal.python.test/src/com/oracle/graal/python/test/parser/ParserTestBase.java

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
import com.oracle.graal.python.PythonLanguage;
4444
import com.oracle.graal.python.parser.PythonParserImpl;
4545
import com.oracle.graal.python.parser.ScopeInfo;
46+
import com.oracle.graal.python.parser.sst.SSTNode;
4647
import com.oracle.graal.python.runtime.PythonContext;
4748
import com.oracle.graal.python.runtime.PythonParser;
4849
import com.oracle.graal.python.runtime.exception.PException;
@@ -79,6 +80,7 @@ public class ParserTestBase {
7980
@Rule public TestName name = new TestName();
8081

8182
private ScopeInfo lastGlobalScope;
83+
private SSTNode lastSST;
8284

8385
public ParserTestBase() {
8486
PythonTests.enterContext();
@@ -106,13 +108,18 @@ public Node parse(Source source, PythonParser.ParserMode mode) {
106108
PythonParser parser = context.getCore().getParser();
107109
Node result = ((PythonParserImpl) parser).parseN(mode, context.getCore(), source, null);
108110
lastGlobalScope = ((PythonParserImpl) parser).getLastGlobaScope();
111+
lastSST = ((PythonParserImpl) parser).getLastSST();
109112
return result;
110113
}
111114

112115
protected ScopeInfo getLastGlobalScope() {
113116
return lastGlobalScope;
114117
}
115118

119+
protected SSTNode getLastSST() {
120+
return lastSST;
121+
}
122+
116123
public void checkSyntaxError(String source) throws Exception {
117124
boolean thrown = false;
118125
try {
@@ -253,14 +260,14 @@ public void checkScopeResult(String source, PythonParser.ParserMode mode) throws
253260
assertDescriptionMatches(scopes.toString(), goldenScopeFile);
254261
}
255262

256-
private String printTreeToString(Node node) {
263+
protected String printTreeToString(Node node) {
257264
ParserTreePrinter visitor = new ParserTreePrinter();
258265
visitor.printFormatStringLiteralDetail = printFormatStringLiteralValues;
259266
node.accept(visitor);
260267
return visitor.getTree();
261268
}
262269

263-
protected void assertDescriptionMatches(String actual, File goldenFile) throws IOException {
270+
protected void assertDescriptionMatches(String actual, File goldenFile) throws Exception {
264271
if (!goldenFile.exists()) {
265272
if (!goldenFile.createNewFile()) {
266273
assertTrue("Cannot create file " + goldenFile.getAbsolutePath(), false);
@@ -272,6 +279,10 @@ protected void assertDescriptionMatches(String actual, File goldenFile) throws I
272279
}
273280
String expected = readFile(goldenFile);
274281

282+
assertDescriptionMatches(actual, expected, goldenFile.getName());
283+
}
284+
285+
protected void assertDescriptionMatches(String actual, String expected, String someName) throws Exception {
275286
final String expectedTrimmed = expected.trim();
276287
final String actualTrimmed = actual.trim();
277288

@@ -290,7 +301,7 @@ protected void assertDescriptionMatches(String actual, File goldenFile) throws I
290301

291302
// There are some diffrerences between expected and actual content --> Test failed
292303

293-
assertTrue("Not matching goldenfile: " + goldenFile.getName() + lineSeparator(2) + getContentDifferences(expectedUnified, actualUnified), false);
304+
assertTrue("Not matching results: " + (someName == null ? "" : someName) + lineSeparator(2) + getContentDifferences(expectedUnified, actualUnified), false);
294305
}
295306
}
296307

graalpython/com.oracle.graal.python.test/src/com/oracle/graal/python/test/parser/ParserTreePrinter.java

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
* Copyright (c) 2019, Oracle and/or its affiliates. All rights reserved.
2+
* Copyright (c) 2019, 2020, Oracle and/or its affiliates. All rights reserved.
33
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
44
*
55
* The Universal Permissive License (UPL), Version 1.0
@@ -443,6 +443,12 @@ private void addSignature(Signature signature) {
443443
add(signature.takesPositionalOnly());
444444
sb.append(", requiresKeywordArgs=");
445445
add(signature.takesRequiredKeywordArgs());
446+
if (signature.getVarargsIdx() > -1) {
447+
sb.append(", varArgsIdx=").append(signature.getVarargsIdx());
448+
}
449+
if (signature.getPositionalOnlyArgIndex() > -1) {
450+
sb.append(", positionalOnlyIdx=").append(signature.getPositionalOnlyArgIndex());
451+
}
446452
newLine();
447453
level++;
448454
if (signature.getParameterIds() != null && signature.getParameterIds().length > 0) {
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
/*
2+
* Copyright (c) 2020, Oracle and/or its affiliates. All rights reserved.
3+
* DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
4+
*
5+
* The Universal Permissive License (UPL), Version 1.0
6+
*
7+
* Subject to the condition set forth below, permission is hereby granted to any
8+
* person obtaining a copy of this software, associated documentation and/or
9+
* data (collectively the "Software"), free of charge and under any and all
10+
* copyright rights in the Software, and any and all patent rights owned or
11+
* freely licensable by each licensor hereunder covering either (i) the
12+
* unmodified Software as contributed to or provided by such licensor, or (ii)
13+
* the Larger Works (as defined below), to deal in both
14+
*
15+
* (a) the Software, and
16+
*
17+
* (b) any piece of software and/or hardware listed in the lrgrwrks.txt file if
18+
* one is included with the Software each a "Larger Work" to which the Software
19+
* is contributed by such licensors),
20+
*
21+
* without restriction, including without limitation the rights to copy, create
22+
* derivative works of, display, perform, and distribute the Software and make,
23+
* use, sell, offer for sale, import, export, have made, and have sold the
24+
* Software and the Larger Work(s), and to sublicense the foregoing rights on
25+
* either these or other terms.
26+
*
27+
* This license is subject to the following condition:
28+
*
29+
* The above copyright notice and either this complete permission notice or at a
30+
* minimum a reference to the UPL must be included in all copies or substantial
31+
* portions of the Software.
32+
*
33+
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
34+
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
35+
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
36+
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
37+
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
38+
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
39+
* SOFTWARE.
40+
*/
41+
package com.oracle.graal.python.test.parser;
42+
43+
import com.oracle.graal.python.PythonLanguage;
44+
import com.oracle.graal.python.parser.sst.FunctionDefSSTNode;
45+
import com.oracle.graal.python.parser.sst.LambdaSSTNode;
46+
import com.oracle.graal.python.parser.sst.SSTNode;
47+
import com.oracle.graal.python.parser.sst.SSTNodeWithScopeFinder;
48+
import com.oracle.graal.python.runtime.PythonParser;
49+
import com.oracle.truffle.api.source.Source;
50+
import org.junit.Assert;
51+
import org.junit.Test;
52+
53+
public class SSTNodeWithScopeFinderTest extends ParserTestBase {
54+
55+
@Test
56+
public void calllArgTest() throws Exception {
57+
String code = "class Tests():\n" +
58+
" def test1(self):\n" +
59+
" rdd = self.parallelize(range(100))\n" +
60+
" assert rdd.reduce(lambda a, b: a+b) == 4950";
61+
checkFinder(code, code.indexOf("def test1"), code.indexOf("950") + 3, false);
62+
checkFinder(code, code.indexOf("lambda a"), code.indexOf("a+b") + 3, true);
63+
}
64+
65+
@Test
66+
public void tryTest() throws Exception {
67+
String code = "try:\n" +
68+
" import sys\n" +
69+
" process = None\n" +
70+
"\n" +
71+
" def fn1():\n" +
72+
" return 20\n" +
73+
"\n" +
74+
"except ImportError:\n" +
75+
"\n" +
76+
" def fn2():\n" +
77+
" return 30" +
78+
"\n";
79+
checkFinder(code, code.indexOf("def fn1"), code.indexOf("except"), false);
80+
checkFinder(code, code.indexOf("def fn2"), code.indexOf("30") + 3, false);
81+
}
82+
83+
@Test
84+
public void ifTest() throws Exception {
85+
String code = " if True:\n" +
86+
" a = 1\n" +
87+
" def fn1():\n" +
88+
" return 10\n" +
89+
" elif False:\n" +
90+
" b = 2\n" +
91+
" def fn2():\n" +
92+
" return 20\n" +
93+
" else:\n" +
94+
" def fn3():\n" +
95+
" return 30\n";
96+
checkFinder(code, code.indexOf("def fn1"), code.indexOf("elif"), false);
97+
checkFinder(code, code.indexOf("def fn2"), code.indexOf("else"), false);
98+
checkFinder(code, code.indexOf("def fn3"), code.indexOf("30") + 3, false);
99+
}
100+
101+
private void checkFinder(String code, int startOffset, int endOffset, boolean isLambda) {
102+
SSTNode result = findNodeWithScope(code, startOffset, endOffset);
103+
Assert.assertNotNull("No node with scope was found ", result);
104+
if (isLambda) {
105+
Assert.assertTrue("Was expected LambdaSSTNode, but " + result.getClass().getSimpleName() + " found.", result instanceof LambdaSSTNode);
106+
} else {
107+
Assert.assertTrue("Was expected FunctionDefSSTNode, but " + result.getClass().getSimpleName() + " found.", result instanceof FunctionDefSSTNode);
108+
}
109+
Assert.assertTrue("Start or end offset is not the expected one.", result.getStartOffset() == startOffset && result.getEndOffset() == endOffset);
110+
}
111+
112+
private SSTNode findNodeWithScope(String code, int startOffset, int endOffset) {
113+
Source source = Source.newBuilder(PythonLanguage.ID, code, "NodeWithScopeFinderTest").build();
114+
parse(source, PythonParser.ParserMode.File);
115+
SSTNode lastSST = getLastSST();
116+
SSTNodeWithScopeFinder finder = new SSTNodeWithScopeFinder(startOffset, endOffset);
117+
return lastSST.accept(finder);
118+
}
119+
120+
}

0 commit comments

Comments
 (0)