|
| 1 | +# Flow summaries |
| 2 | + |
| 3 | +Flow summaries describe how data flows through methods whose definition is not |
| 4 | +included in the database. For example, methods in the standard library or a gem. |
| 5 | + |
| 6 | +Say we have the following code: |
| 7 | + |
| 8 | +```rb |
| 9 | +x = gets |
| 10 | +y = x.chomp |
| 11 | +system(y) |
| 12 | +``` |
| 13 | + |
| 14 | +This code reads a line from STDIN, strips any trailing newlines, and executes it |
| 15 | +as a shell command. Assuming `x` is considered tainted, we want the argument `y` |
| 16 | +to be tainted in the call to `system`. |
| 17 | + |
| 18 | +`chomp` is a standard library method in the `String` class for which we |
| 19 | +have no source code, so we include a flow summary for it: |
| 20 | + |
| 21 | +```ql |
| 22 | +private class ChompSummary extends SimpleSummarizedCallable { |
| 23 | + ChompSummary() { this = "chomp" } |
| 24 | +
|
| 25 | + override predicate propagatesFlowExt(string input, string output, boolean preservesValue) { |
| 26 | + input = "Argument[self]" and |
| 27 | + output = "ReturnValue" and |
| 28 | + preservesValue = false |
| 29 | + } |
| 30 | +} |
| 31 | +``` |
| 32 | + |
| 33 | +The shared dataflow library will use this summary to construct a fake definition |
| 34 | +for `chomp`. The behaviour of this definition depends on the body of |
| 35 | +`propagatesFlowExt`. In this case, the method will propagate taint flow from the |
| 36 | +`self` argument (i.e. the receiver) to the return value. |
| 37 | + |
| 38 | +If `preservesValue = true` then value flow is propagated. If it is `false` then |
| 39 | +only taint flow is propagated. |
| 40 | + |
| 41 | +Any call to `chomp` in the database will be translated, in the dataflow graph, |
| 42 | +to a call to this fake definition. |
| 43 | + |
| 44 | +`input` and `output` define the "from" and "to" locations in the flow summary. |
| 45 | +They use a custom string-based syntax which is similar but not identical to |
| 46 | +Models as Data. These strings are often referred to as access paths. |
| 47 | + |
| 48 | +# Syntax |
| 49 | + |
| 50 | +Access paths consist of zero or more components separated by dots (`.`). The |
| 51 | +permitted components differ for input and output paths. The meaning of each |
| 52 | +component is defined relative to the implicit context of the component, which |
| 53 | +itself is defined by the preceding access path. For example, |
| 54 | + |
| 55 | +``` |
| 56 | +Argument[0].Element[1].ReturnValue |
| 57 | +``` |
| 58 | + |
| 59 | +refers to the return value of the element at index 1 in the array at argument 0 |
| 60 | +of the method call. |
| 61 | + |
| 62 | +## `Argument` and `Parameter` |
| 63 | + |
| 64 | +The `Argument` and `Parameter` components refer respectively to an argument to a |
| 65 | +call or a parameter of a callable. They contain one or more _specifiers_[^1] which |
| 66 | +constrain the range of arguments/parameters that the component refers to. For |
| 67 | +example, `Argument[0]` refers to the first argument. |
| 68 | + |
| 69 | +If multiple specifiers are given then the result is a disjunction, meaning that |
| 70 | +the component refers to any argument/parameter that satisfies at least one of |
| 71 | +the specifiers. For example, `Argument[0, 1]` refers to the first and second |
| 72 | +arguments. |
| 73 | + |
| 74 | +### Specifiers |
| 75 | + |
| 76 | +#### `self` |
| 77 | +The receiver of the call. |
| 78 | + |
| 79 | +#### `<integer>` |
| 80 | +The argument to the method call at the position given by the integer. For |
| 81 | +example, `Argument[0]` refers to the first argument to the call. |
| 82 | + |
| 83 | +#### `<integer>..` |
| 84 | +An argument to the call at a position greater or equal to the integer. For |
| 85 | +example, `Argument[1..]` refers to all arguments except the first one. This |
| 86 | +specifier is not available on `Parameter` components. |
| 87 | + |
| 88 | +#### `<string>:` |
| 89 | +A keyword argument to the call with the given name. For example, |
| 90 | +`Argument[foo:]` refers to the keyword argument `foo:` in the call. |
| 91 | + |
| 92 | +#### `block` |
| 93 | +The block argument passed to the call, if any. |
| 94 | + |
| 95 | +#### `any` |
| 96 | +Any argument to the call. TODO: does this include self and block args? |
| 97 | + |
| 98 | +#### `any-named` |
| 99 | +TODO |
| 100 | + |
| 101 | +#### `hash-splat` |
| 102 | +The special "hash splat" argument/parameter, which is written as `**args`. |
| 103 | + |
| 104 | +## `ReturnValue` |
| 105 | +`ReturnValue` refers to the return value of the element identified in the |
| 106 | +preceding access path. For example, `Argument[0].ReturnValue` refers to the |
| 107 | +return value of the first argument. Of course this only makes sense if the first |
| 108 | +argument is a callable. |
| 109 | + |
| 110 | +## `Element` |
| 111 | +This component refers to elements inside a collection of some sort. Typically |
| 112 | +this is an Array or Hash. Elements are considered to have an index, which is an |
| 113 | +integer in arrays and a symbol or string in hashes (even though hashes can have |
| 114 | +arbitrary objects as keys). Elements can also have an unknown index, which means |
| 115 | +we know the element exists in the collection but we don't know where. |
| 116 | + |
| 117 | +Many of the specifiers have an optional suffix `!`. If this suffix is used then |
| 118 | +the specifier excludes elements at unknown indices. Otherwise, these are |
| 119 | +included by default. |
| 120 | + |
| 121 | +### Specifiers |
| 122 | + |
| 123 | +#### `?` |
| 124 | +An element at an unknown index. |
| 125 | + |
| 126 | +#### `any` |
| 127 | +An element at any known or unknown index. |
| 128 | + |
| 129 | +#### `<integer>`, `<integer>!` |
| 130 | +An element at the index given by the integer. |
| 131 | + |
| 132 | +#### `<integer>..`, `<integer>..!` |
| 133 | +Any element at a known index greater or equal to the integer. |
| 134 | + |
| 135 | +#### `<string>`, `<string>!` |
| 136 | +An element at the index given by string. The string should match the result of |
| 137 | +`serialize()` on the `ConstantValue` that represents the index. This is |
| 138 | +typically something like `foo` for the string key `"foo"` and `:foo` for the |
| 139 | +symbol `:foo`. |
| 140 | + |
| 141 | +## `Field` |
| 142 | +TODO |
| 143 | + |
| 144 | +## `WithElement` |
| 145 | +This component restricts the set of elements that are included in the preceding |
| 146 | +access path to to those at a specific set of indices. The specifiers are the |
| 147 | +same as those for `Element`. |
| 148 | + |
| 149 | +When used in an input path this component has the effect of copying |
| 150 | +all relevant elements from the input to the output. For example, in the |
| 151 | +following summary: |
| 152 | + |
| 153 | +```ql |
| 154 | +input = "Argument[0].WithElement[1, 2]" and |
| 155 | +output = "ReturnValue" |
| 156 | +``` |
| 157 | + |
| 158 | +any data in indices 1 and 2 of the first argument will be copied to indices 1 |
| 159 | +and 2 of the return value. We use this in many Hash summaries that return the |
| 160 | +receiver, in order to preserve any data stored in it. For example, the summary |
| 161 | +for `Hash#to_h` is |
| 162 | + |
| 163 | +```ql |
| 164 | +input = "Argument[self].WithElement[any]" and |
| 165 | +output = "ReturnValue" and |
| 166 | +preservesValue = true |
| 167 | +``` |
| 168 | + |
| 169 | +TODO: I've not seen this component used in an output path; I don't know if it makes |
| 170 | +sense to do so, or what meaning it would have. |
| 171 | + |
| 172 | +## `WithoutElement` |
| 173 | +This component is used to exclude certain elements from the set included in the |
| 174 | +preceding access path. It takes the same specifiers as `WithElement` and |
| 175 | +`Element`. |
| 176 | + |
| 177 | +When used in an input path this component has the effect of excluding the |
| 178 | +relevant elements when copying from input to output. For example in the |
| 179 | +following summary: |
| 180 | + |
| 181 | +```ql |
| 182 | +input = "Argument[0].WithoutElement[0]" and |
| 183 | +output = "ReturnValue" |
| 184 | +``` |
| 185 | + |
| 186 | +any data in any index of the first argument will be copied to the return value, |
| 187 | +with the exception of data at index 0. |
| 188 | + |
| 189 | +TODO: I've not seen this component used in an output path; I don't know if it makes |
| 190 | +sense to do so, or what meaning it would have. |
| 191 | + |
| 192 | +[^1]: I've chosen this name to avoid overloading the word "argument". |
0 commit comments