Skip to content

Commit 6f852aa

Browse files
committed
Ruby: Document flow summary syntax
1 parent 0a4a851 commit 6f852aa

File tree

1 file changed

+192
-0
lines changed

1 file changed

+192
-0
lines changed

ruby/ql/docs/flow_summaries.md

Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
# Flow summaries
2+
3+
Flow summaries describe how data flows through methods whose definition is not
4+
included in the database. For example, methods in the standard library or a gem.
5+
6+
Say we have the following code:
7+
8+
```rb
9+
x = gets
10+
y = x.chomp
11+
system(y)
12+
```
13+
14+
This code reads a line from STDIN, strips any trailing newlines, and executes it
15+
as a shell command. Assuming `x` is considered tainted, we want the argument `y`
16+
to be tainted in the call to `system`.
17+
18+
`chomp` is a standard library method in the `String` class for which we
19+
have no source code, so we include a flow summary for it:
20+
21+
```ql
22+
private class ChompSummary extends SimpleSummarizedCallable {
23+
ChompSummary() { this = "chomp" }
24+
25+
override predicate propagatesFlowExt(string input, string output, boolean preservesValue) {
26+
input = "Argument[self]" and
27+
output = "ReturnValue" and
28+
preservesValue = false
29+
}
30+
}
31+
```
32+
33+
The shared dataflow library will use this summary to construct a fake definition
34+
for `chomp`. The behaviour of this definition depends on the body of
35+
`propagatesFlowExt`. In this case, the method will propagate taint flow from the
36+
`self` argument (i.e. the receiver) to the return value.
37+
38+
If `preservesValue = true` then value flow is propagated. If it is `false` then
39+
only taint flow is propagated.
40+
41+
Any call to `chomp` in the database will be translated, in the dataflow graph,
42+
to a call to this fake definition.
43+
44+
`input` and `output` define the "from" and "to" locations in the flow summary.
45+
They use a custom string-based syntax which is similar but not identical to
46+
Models as Data. These strings are often referred to as access paths.
47+
48+
# Syntax
49+
50+
Access paths consist of zero or more components separated by dots (`.`). The
51+
permitted components differ for input and output paths. The meaning of each
52+
component is defined relative to the implicit context of the component, which
53+
itself is defined by the preceding access path. For example,
54+
55+
```
56+
Argument[0].Element[1].ReturnValue
57+
```
58+
59+
refers to the return value of the element at index 1 in the array at argument 0
60+
of the method call.
61+
62+
## `Argument` and `Parameter`
63+
64+
The `Argument` and `Parameter` components refer respectively to an argument to a
65+
call or a parameter of a callable. They contain one or more _specifiers_[^1] which
66+
constrain the range of arguments/parameters that the component refers to. For
67+
example, `Argument[0]` refers to the first argument.
68+
69+
If multiple specifiers are given then the result is a disjunction, meaning that
70+
the component refers to any argument/parameter that satisfies at least one of
71+
the specifiers. For example, `Argument[0, 1]` refers to the first and second
72+
arguments.
73+
74+
### Specifiers
75+
76+
#### `self`
77+
The receiver of the call.
78+
79+
#### `<integer>`
80+
The argument to the method call at the position given by the integer. For
81+
example, `Argument[0]` refers to the first argument to the call.
82+
83+
#### `<integer>..`
84+
An argument to the call at a position greater or equal to the integer. For
85+
example, `Argument[1..]` refers to all arguments except the first one. This
86+
specifier is not available on `Parameter` components.
87+
88+
#### `<string>:`
89+
A keyword argument to the call with the given name. For example,
90+
`Argument[foo:]` refers to the keyword argument `foo:` in the call.
91+
92+
#### `block`
93+
The block argument passed to the call, if any.
94+
95+
#### `any`
96+
Any argument to the call. TODO: does this include self and block args?
97+
98+
#### `any-named`
99+
TODO
100+
101+
#### `hash-splat`
102+
The special "hash splat" argument/parameter, which is written as `**args`.
103+
104+
## `ReturnValue`
105+
`ReturnValue` refers to the return value of the element identified in the
106+
preceding access path. For example, `Argument[0].ReturnValue` refers to the
107+
return value of the first argument. Of course this only makes sense if the first
108+
argument is a callable.
109+
110+
## `Element`
111+
This component refers to elements inside a collection of some sort. Typically
112+
this is an Array or Hash. Elements are considered to have an index, which is an
113+
integer in arrays and a symbol or string in hashes (even though hashes can have
114+
arbitrary objects as keys). Elements can also have an unknown index, which means
115+
we know the element exists in the collection but we don't know where.
116+
117+
Many of the specifiers have an optional suffix `!`. If this suffix is used then
118+
the specifier excludes elements at unknown indices. Otherwise, these are
119+
included by default.
120+
121+
### Specifiers
122+
123+
#### `?`
124+
An element at an unknown index.
125+
126+
#### `any`
127+
An element at any known or unknown index.
128+
129+
#### `<integer>`, `<integer>!`
130+
An element at the index given by the integer.
131+
132+
#### `<integer>..`, `<integer>..!`
133+
Any element at a known index greater or equal to the integer.
134+
135+
#### `<string>`, `<string>!`
136+
An element at the index given by string. The string should match the result of
137+
`serialize()` on the `ConstantValue` that represents the index. This is
138+
typically something like `foo` for the string key `"foo"` and `:foo` for the
139+
symbol `:foo`.
140+
141+
## `Field`
142+
TODO
143+
144+
## `WithElement`
145+
This component restricts the set of elements that are included in the preceding
146+
access path to to those at a specific set of indices. The specifiers are the
147+
same as those for `Element`.
148+
149+
When used in an input path this component has the effect of copying
150+
all relevant elements from the input to the output. For example, in the
151+
following summary:
152+
153+
```ql
154+
input = "Argument[0].WithElement[1, 2]" and
155+
output = "ReturnValue"
156+
```
157+
158+
any data in indices 1 and 2 of the first argument will be copied to indices 1
159+
and 2 of the return value. We use this in many Hash summaries that return the
160+
receiver, in order to preserve any data stored in it. For example, the summary
161+
for `Hash#to_h` is
162+
163+
```ql
164+
input = "Argument[self].WithElement[any]" and
165+
output = "ReturnValue" and
166+
preservesValue = true
167+
```
168+
169+
TODO: I've not seen this component used in an output path; I don't know if it makes
170+
sense to do so, or what meaning it would have.
171+
172+
## `WithoutElement`
173+
This component is used to exclude certain elements from the set included in the
174+
preceding access path. It takes the same specifiers as `WithElement` and
175+
`Element`.
176+
177+
When used in an input path this component has the effect of excluding the
178+
relevant elements when copying from input to output. For example in the
179+
following summary:
180+
181+
```ql
182+
input = "Argument[0].WithoutElement[0]" and
183+
output = "ReturnValue"
184+
```
185+
186+
any data in any index of the first argument will be copied to the return value,
187+
with the exception of data at index 0.
188+
189+
TODO: I've not seen this component used in an output path; I don't know if it makes
190+
sense to do so, or what meaning it would have.
191+
192+
[^1]: I've chosen this name to avoid overloading the word "argument".

0 commit comments

Comments
 (0)