Skip to content

Commit a0d1cea

Browse files
committed
Python: Add investigation of field-flow problem
TL;DR; we used a too low value for `--max-import-depth` :(
1 parent f988e1f commit a0d1cea

File tree

61 files changed

+631
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+631
-0
lines changed
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
While working on the field-flow tests, I encountered some very strange behavior. By moving some tests into a new file, they suddenly started working :O
2+
3+
This folder contains the artifacts from investigating this problem, so we can recall the facts (but besides that, don't have much value in itself).
4+
5+
The test files can be found in `src/`, and I have set of a bunch of different tests with different extractor options in the `test-*` folders.
6+
7+
The core of the problem is that in _some_ configuration of extractor options, after seeing the code below, points-to gives up trying to resolve calls :flushed:
8+
9+
```py
10+
import os
11+
cond = os.urandom(1)[0] > 128
12+
13+
if cond:
14+
pass
15+
16+
if cond:
17+
pass
18+
```
19+
20+
This seems to have been caused by not allowing enough imports to be resolved. There is also some interaction with splitting, since turning that off also removes the problem.
21+
22+
But allowing our test to see more imports is more representative of what happens when analyzing real code, so that's the better approach :+1: (and going above 3 does not seem to change anything in this case).
23+
24+
I've thought about whether we can write a query to reliably cases such as this, but I don't see any solutions. However, we can easily try running all our tests with `--max-import-depth=100` and see if anything changes from this.
25+
26+
# Seeing the solutions work
27+
28+
Doing `diff -u -r test-1-normal/ test-5-max-import-depth-3/` shows that all the calls we should be able to resolve, are now resolved properly. and critically this line is added:
29+
30+
```diff
31+
+| ../src/urandom_problem.py:43:6:43:8 | ControlFlowNode for foo | Fixed missing result:flow="SOURCE, l:-15 -> foo" |
32+
```
33+
34+
<details>
35+
<summary>full diff</summary>
36+
37+
```diff
38+
diff '--color=auto' -u -r test-1-normal/NormalDataflowTest.expected test-5-max-import-depth-3/NormalDataflowTest.expected
39+
--- test-1-normal/NormalDataflowTest.expected 2022-02-27 10:33:00.603882599 +0100
40+
+++ test-5-max-import-depth-3/NormalDataflowTest.expected 2022-02-28 10:10:08.930743800 +0100
41+
@@ -1,2 +1,3 @@
42+
missingAnnotationOnSINK
43+
failures
44+
+| ../src/urandom_problem.py:43:6:43:8 | ControlFlowNode for foo | Fixed missing result:flow="SOURCE, l:-15 -> foo" |
45+
diff '--color=auto' -u -r test-1-normal/options test-5-max-import-depth-3/options
46+
--- test-1-normal/options 2022-02-27 10:36:51.124793909 +0100
47+
+++ test-5-max-import-depth-3/options 2022-02-27 11:01:43.908098372 +0100
48+
@@ -1 +1 @@
49+
-semmle-extractor-options: --max-import-depth=1 -R ../src
50+
+semmle-extractor-options: --max-import-depth=3 -R ../src
51+
diff '--color=auto' -u -r test-1-normal/UnresolvedCalls.expected test-5-max-import-depth-3/UnresolvedCalls.expected
52+
--- test-1-normal/UnresolvedCalls.expected 2022-02-28 10:09:19.213742437 +0100
53+
+++ test-5-max-import-depth-3/UnresolvedCalls.expected 2022-02-28 10:10:08.638737921 +0100
54+
@@ -0,0 +1,5 @@
55+
+| ../src/isfile_no_problem.py:34:33:34:70 | Comment # $ unresolved_call=os.path.isfile(..) | Missing result:unresolved_call=os.path.isfile(..) |
56+
+| ../src/urandom_no_if_no_problem.py:34:31:34:64 | Comment # $ unresolved_call=os.urandom(..) | Missing result:unresolved_call=os.urandom(..) |
57+
+| ../src/urandom_problem.py:34:31:34:64 | Comment # $ unresolved_call=os.urandom(..) | Missing result:unresolved_call=os.urandom(..) |
58+
+| ../src/urandom_problem.py:42:18:42:47 | Comment # $ unresolved_call=give_src() | Missing result:unresolved_call=give_src() |
59+
+| ../src/urandom_problem.py:43:11:43:75 | Comment # $ unresolved_call=SINK(..) MISSING: flow="SOURCE, l:-15 -> foo" | Missing result:unresolved_call=SINK(..) |
60+
diff '--color=auto' -u -r test-1-normal/UnresolvedPointsToCalls.expected test-5-max-import-depth-3/UnresolvedPointsToCalls.expected
61+
--- test-1-normal/UnresolvedPointsToCalls.expected 2022-02-28 10:09:19.033738812 +0100
62+
+++ test-5-max-import-depth-3/UnresolvedPointsToCalls.expected 2022-02-28 10:12:48.572752108 +0100
63+
@@ -1,5 +1 @@
64+
-| ../src/urandom_no_if_no_problem.py:34:8:34:20 | ../src/urandom_no_if_no_problem.py:34 | os.urandom(..) |
65+
| ../src/urandom_no_import_no_problem.py:34:8:34:20 | ../src/urandom_no_import_no_problem.py:34 | os.urandom(..) |
66+
-| ../src/urandom_problem.py:34:8:34:20 | ../src/urandom_problem.py:34 | os.urandom(..) |
67+
-| ../src/urandom_problem.py:42:7:42:16 | ../src/urandom_problem.py:42 | give_src() |
68+
-| ../src/urandom_problem.py:43:1:43:9 | ../src/urandom_problem.py:43 | SINK(..) |
69+
```
70+
71+
</details>
72+
73+
There are no benefit in increasing import depth above 3 for this test-example:
74+
75+
```diff
76+
$ diff -u -r test-4-max-import-depth-100/ test-5-max-import-depth-3/
77+
--- test-4-max-import-depth-100/options 2022-02-28 10:02:09.269071781 +0100
78+
+++ test-5-max-import-depth-3/options 2022-02-27 11:01:43.908098372 +0100
79+
@@ -1 +1 @@
80+
-semmle-extractor-options: --max-import-depth=100 -R ../src
81+
+semmle-extractor-options: --max-import-depth=3 -R ../src
82+
```
83+
84+
Also notice that using import depth 2 actually makes things worse, as we no longer handle the `isfile_no_problem.py` file properly :facepalm: :sweat_smile:
85+
86+
```diff
87+
diff '--color=auto' -u -r test-4-max-import-depth-100/NormalDataflowTest.expected test-6-max-import-depth-2/NormalDataflowTest.expected
88+
--- test-4-max-import-depth-100/NormalDataflowTest.expected 2022-02-28 10:10:02.206608379 +0100
89+
+++ test-6-max-import-depth-2/NormalDataflowTest.expected 2022-02-28 10:10:13.882716665 +0100
90+
@@ -1,3 +1,5 @@
91+
missingAnnotationOnSINK
92+
+| ../src/isfile_no_problem.py:43:6:43:8 | ../src/isfile_no_problem.py:43 | ERROR, you should add `# $ MISSING: flow` annotation | foo |
93+
failures
94+
+| ../src/isfile_no_problem.py:43:11:43:41 | Comment # $ flow="SOURCE, l:-15 -> foo" | Missing result:flow="SOURCE, l:-15 -> foo" |
95+
| ../src/urandom_problem.py:43:6:43:8 | ControlFlowNode for foo | Fixed missing result:flow="SOURCE, l:-15 -> foo" |
96+
```
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# These are defined so that we can evaluate the test code.
2+
NONSOURCE = "not a source"
3+
SOURCE = "source"
4+
5+
6+
def is_source(x):
7+
return x == "source" or x == b"source" or x == 42 or x == 42.0 or x == 42j
8+
9+
10+
def SINK(x):
11+
if is_source(x):
12+
print("OK")
13+
else:
14+
print("Unexpected flow", x)
15+
16+
17+
def SINK_F(x):
18+
if is_source(x):
19+
print("Unexpected flow", x)
20+
else:
21+
print("OK")
22+
23+
# ------------------------------------------------------------------------------
24+
# Actual tests
25+
# ------------------------------------------------------------------------------
26+
27+
def give_src():
28+
return SOURCE
29+
30+
foo = give_src()
31+
SINK(foo) # $ flow="SOURCE, l:-3 -> foo"
32+
33+
import os
34+
cond = eval("False")
35+
36+
if cond:
37+
pass
38+
39+
if cond:
40+
pass
41+
42+
foo = give_src()
43+
SINK(foo) # $ flow="SOURCE, l:-15 -> foo"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# These are defined so that we can evaluate the test code.
2+
NONSOURCE = "not a source"
3+
SOURCE = "source"
4+
5+
6+
def is_source(x):
7+
return x == "source" or x == b"source" or x == 42 or x == 42.0 or x == 42j
8+
9+
10+
def SINK(x):
11+
if is_source(x):
12+
print("OK")
13+
else:
14+
print("Unexpected flow", x)
15+
16+
17+
def SINK_F(x):
18+
if is_source(x):
19+
print("Unexpected flow", x)
20+
else:
21+
print("OK")
22+
23+
# ------------------------------------------------------------------------------
24+
# Actual tests
25+
# ------------------------------------------------------------------------------
26+
27+
def give_src():
28+
return SOURCE
29+
30+
foo = give_src()
31+
SINK(foo) # $ flow="SOURCE, l:-3 -> foo"
32+
33+
import os
34+
cond = os.path.isfile(__file__) # $ unresolved_call=os.path.isfile(..)
35+
36+
if cond:
37+
pass
38+
39+
if cond:
40+
pass
41+
42+
foo = give_src()
43+
SINK(foo) # $ flow="SOURCE, l:-15 -> foo"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# These are defined so that we can evaluate the test code.
2+
NONSOURCE = "not a source"
3+
SOURCE = "source"
4+
5+
6+
def is_source(x):
7+
return x == "source" or x == b"source" or x == 42 or x == 42.0 or x == 42j
8+
9+
10+
def SINK(x):
11+
if is_source(x):
12+
print("OK")
13+
else:
14+
print("Unexpected flow", x)
15+
16+
17+
def SINK_F(x):
18+
if is_source(x):
19+
print("Unexpected flow", x)
20+
else:
21+
print("OK")
22+
23+
# ------------------------------------------------------------------------------
24+
# Actual tests
25+
# ------------------------------------------------------------------------------
26+
27+
def give_src():
28+
return SOURCE
29+
30+
foo = give_src()
31+
SINK(foo) # $ flow="SOURCE, l:-3 -> foo"
32+
33+
import os
34+
cond = 1 + 1 == 2
35+
36+
if cond:
37+
pass
38+
39+
if cond:
40+
pass
41+
42+
foo = give_src()
43+
SINK(foo) # $ flow="SOURCE, l:-15 -> foo"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# These are defined so that we can evaluate the test code.
2+
NONSOURCE = "not a source"
3+
SOURCE = "source"
4+
5+
6+
def is_source(x):
7+
return x == "source" or x == b"source" or x == 42 or x == 42.0 or x == 42j
8+
9+
10+
def SINK(x):
11+
if is_source(x):
12+
print("OK")
13+
else:
14+
print("Unexpected flow", x)
15+
16+
17+
def SINK_F(x):
18+
if is_source(x):
19+
print("Unexpected flow", x)
20+
else:
21+
print("OK")
22+
23+
# ------------------------------------------------------------------------------
24+
# Actual tests
25+
# ------------------------------------------------------------------------------
26+
27+
def give_src():
28+
return SOURCE
29+
30+
foo = give_src()
31+
SINK(foo) # $ flow="SOURCE, l:-3 -> foo"
32+
33+
import os
34+
cond = os.urandom(1)[0] > 128 # $ unresolved_call=os.urandom(..)
35+
36+
# if cond:
37+
# pass
38+
#
39+
# if cond:
40+
# pass
41+
42+
foo = give_src()
43+
SINK(foo) # $ flow="SOURCE, l:-15 -> foo"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# These are defined so that we can evaluate the test code.
2+
NONSOURCE = "not a source"
3+
SOURCE = "source"
4+
5+
6+
def is_source(x):
7+
return x == "source" or x == b"source" or x == 42 or x == 42.0 or x == 42j
8+
9+
10+
def SINK(x):
11+
if is_source(x):
12+
print("OK")
13+
else:
14+
print("Unexpected flow", x)
15+
16+
17+
def SINK_F(x):
18+
if is_source(x):
19+
print("Unexpected flow", x)
20+
else:
21+
print("OK")
22+
23+
# ------------------------------------------------------------------------------
24+
# Actual tests
25+
# ------------------------------------------------------------------------------
26+
27+
def give_src():
28+
return SOURCE
29+
30+
foo = give_src()
31+
SINK(foo) # $ flow="SOURCE, l:-3 -> foo"
32+
33+
# import os
34+
cond = os.urandom(1)[0] > 128 # $ unresolved_call=os.urandom(..)
35+
36+
# if cond:
37+
# pass
38+
#
39+
# if cond:
40+
# pass
41+
42+
foo = give_src()
43+
SINK(foo) # $ flow="SOURCE, l:-15 -> foo"
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
# These are defined so that we can evaluate the test code.
2+
NONSOURCE = "not a source"
3+
SOURCE = "source"
4+
5+
6+
def is_source(x):
7+
return x == "source" or x == b"source" or x == 42 or x == 42.0 or x == 42j
8+
9+
10+
def SINK(x):
11+
if is_source(x):
12+
print("OK")
13+
else:
14+
print("Unexpected flow", x)
15+
16+
17+
def SINK_F(x):
18+
if is_source(x):
19+
print("Unexpected flow", x)
20+
else:
21+
print("OK")
22+
23+
# ------------------------------------------------------------------------------
24+
# Actual tests
25+
# ------------------------------------------------------------------------------
26+
27+
def give_src():
28+
return SOURCE
29+
30+
foo = give_src()
31+
SINK(foo) # $ flow="SOURCE, l:-3 -> foo"
32+
33+
import os
34+
cond = os.urandom(1)[0] > 128 # $ unresolved_call=os.urandom(..)
35+
36+
if cond:
37+
pass
38+
39+
if cond:
40+
pass
41+
42+
foo = give_src() # $ unresolved_call=give_src()
43+
SINK(foo) # $ unresolved_call=SINK(..) MISSING: flow="SOURCE, l:-15 -> foo"
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
missingAnnotationOnSINK
2+
failures
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
import python
2+
import experimental.dataflow.TestUtil.NormalDataflowTest
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
| ../src/eval_no_problem.py | has splitting |
2+
| ../src/isfile_no_problem.py | has splitting |
3+
| ../src/simple_no_problem.py | has splitting |
4+
| ../src/urandom_no_if_no_problem.py | does not have splitting |
5+
| ../src/urandom_no_import_no_problem.py | does not have splitting |
6+
| ../src/urandom_problem.py | has splitting |

0 commit comments

Comments
 (0)