|
| 1 | +# Flow summaries |
| 2 | + |
| 3 | +Flow summaries describe how data flows through methods whose definition is not |
| 4 | +included in the database. For example, methods in the standard library or a gem. |
| 5 | + |
| 6 | +Say we have the following code: |
| 7 | + |
| 8 | +```rb |
| 9 | +x = gets |
| 10 | +y = x.chomp |
| 11 | +system(y) |
| 12 | +``` |
| 13 | + |
| 14 | +This code reads a line from STDIN, strips any trailing newlines, and executes it |
| 15 | +as a shell command. Assuming `x` is considered tainted, we want the argument `y` |
| 16 | +to be tainted in the call to `system`. |
| 17 | + |
| 18 | +`chomp` is a standard library method in the `String` class for which we |
| 19 | +have no source code, so we include a flow summary for it: |
| 20 | + |
| 21 | +```ql |
| 22 | +private class ChompSummary extends SimpleSummarizedCallable { |
| 23 | + ChompSummary() { this = "chomp" } |
| 24 | +
|
| 25 | + override predicate propagatesFlowExt(string input, string output, boolean preservesValue) { |
| 26 | + input = "Argument[self]" and |
| 27 | + output = "ReturnValue" and |
| 28 | + preservesValue = false |
| 29 | + } |
| 30 | +} |
| 31 | +``` |
| 32 | + |
| 33 | +The shared dataflow library will use this summary to construct a fake definition |
| 34 | +for `chomp`. The behaviour of this definition depends on the body of |
| 35 | +`propagatesFlowExt`. In this case, the method will propagate taint flow from the |
| 36 | +`self` argument (i.e. the receiver) to the return value. |
| 37 | + |
| 38 | +If `preservesValue = true` then value flow is propagated. If it is `false` then |
| 39 | +only taint flow is propagated. |
| 40 | + |
| 41 | +Any call to `chomp` in the database will be translated, in the dataflow graph, |
| 42 | +to a call to this fake definition. |
| 43 | + |
| 44 | +`input` and `output` define the "from" and "to" locations in the flow summary. |
| 45 | +They use a custom string-based syntax which is similar to that used in `path` |
| 46 | +column in the Models as Data format. These strings are often referred to as |
| 47 | +access paths. |
| 48 | + |
| 49 | +Note: The behaviour documented below is tested in |
| 50 | +`dataflow/flow-summaries/behaviour.ql`. Where specific quirks exist, we may |
| 51 | +reference a particular test case in this file which demonstrates the quirk. |
| 52 | + |
| 53 | +# Syntax |
| 54 | + |
| 55 | +Access paths consist of zero or more components separated by dots (`.`). The |
| 56 | +permitted components differ for input and output paths. The meaning of each |
| 57 | +component is defined relative to the implicit context of the component as |
| 58 | +defined by the preceding access path. For example, |
| 59 | + |
| 60 | +``` |
| 61 | +Argument[0].Element[1].ReturnValue |
| 62 | +``` |
| 63 | + |
| 64 | +refers to the return value of the element at index 1 in the array at argument 0 |
| 65 | +of the method call. |
| 66 | + |
| 67 | +## `Argument` and `Parameter` |
| 68 | + |
| 69 | +The `Argument` and `Parameter` components refer respectively to an argument to a |
| 70 | +call or a parameter of a callable. They contain one or more _specifiers_[^1] which |
| 71 | +constrain the range of arguments/parameters that the component refers to. For |
| 72 | +example, `Argument[0]` refers to the first argument. |
| 73 | + |
| 74 | +If multiple specifiers are given then the result is a disjunction, meaning that |
| 75 | +the component refers to any argument/parameter that satisfies at least one of |
| 76 | +the specifiers. For example, `Argument[0, 1]` refers to the first and second |
| 77 | +arguments. |
| 78 | + |
| 79 | +### Specifiers |
| 80 | + |
| 81 | +#### `self` |
| 82 | +The receiver of the call. |
| 83 | + |
| 84 | +#### `<integer>` |
| 85 | +The argument to the method call at the position given by the integer. For |
| 86 | +example, `Argument[0]` refers to the first argument to the call. |
| 87 | + |
| 88 | +#### `<integer>..` |
| 89 | +An argument to the call at a position greater or equal to the integer. For |
| 90 | +example, `Argument[1..]` refers to all arguments except the first one. This |
| 91 | +specifier is not available on `Parameter` components. |
| 92 | + |
| 93 | +#### `<string>:` |
| 94 | +A keyword argument to the call with the given name. For example, |
| 95 | +`Argument[foo:]` refers to the keyword argument `foo:` in the call. |
| 96 | + |
| 97 | +#### `block` |
| 98 | +The block argument passed to the call, if any. |
| 99 | + |
| 100 | +#### `any` |
| 101 | +Any argument to the call, except `self` or `block` arguments. |
| 102 | + |
| 103 | +#### `any-named` |
| 104 | +Any keyword argument to the call. |
| 105 | + |
| 106 | +#### `hash-splat` |
| 107 | +The special "hash splat" argument/parameter, which is written as `**args`. |
| 108 | +When used in an `Argument` component, this specifier refers to special dataflow |
| 109 | +node which is constructed at the call site, containing any elements in a hash |
| 110 | +splat argument (`**args`) along with any explicit keyword arguments (`foo: |
| 111 | +bar`). The node behaves like a normal dataflow node for a hash, meaning that you |
| 112 | +can access specific elements of it using the `Element` component. |
| 113 | + |
| 114 | +For example, the following flow summary states that values flow from any keyword |
| 115 | +arguments (including those in a hash splat) to the return value: |
| 116 | + |
| 117 | +```ql |
| 118 | +input = "Argument[hash-splat].Element[any]" and |
| 119 | +output = "ReturnValue" and |
| 120 | +preservesValue = true |
| 121 | +``` |
| 122 | + |
| 123 | +Assuming this summary is for a global method `foo`, the following test will pass: |
| 124 | + |
| 125 | +```rb |
| 126 | +a = source "a" |
| 127 | +b = source "b" |
| 128 | + |
| 129 | +h = {a: a} |
| 130 | + |
| 131 | +x = foo(b: b, **h) |
| 132 | + |
| 133 | +sink x # $ hasValueFlow=a hasValueFlow=b |
| 134 | +``` |
| 135 | + |
| 136 | +If the method returns the hash itself, you will need to use `WithElement` in |
| 137 | +order to preserve taint/value in its elements. For example: |
| 138 | + |
| 139 | +```ql |
| 140 | +input = "Argument[hash-splat].WithElement[any]" and |
| 141 | +output = "ReturnValue" and |
| 142 | +preservesValue = true |
| 143 | +``` |
| 144 | +```rb |
| 145 | +a = source "a" |
| 146 | +x = foo(a: a) |
| 147 | +sink x[:a] # $ hasValueFlow=a |
| 148 | +``` |
| 149 | + |
| 150 | +## `ReturnValue` |
| 151 | +`ReturnValue` refers to the return value of the element identified in the |
| 152 | +preceding access path. For example, `Argument[0].ReturnValue` refers to the |
| 153 | +return value of the first argument. Of course this only makes sense if the first |
| 154 | +argument is a callable. |
| 155 | + |
| 156 | +## `Element` |
| 157 | +This component refers to elements inside a collection of some sort. Typically |
| 158 | +this is an Array or Hash. Elements are considered to have an index, which is an |
| 159 | +integer in arrays and a symbol or string in hashes (even though hashes can have |
| 160 | +arbitrary objects as keys). Elements can also have an unknown index, which means |
| 161 | +we know the element exists in the collection but we don't know where. |
| 162 | + |
| 163 | +Many of the specifiers have an optional suffix `!`. If this suffix is used then |
| 164 | +the specifier excludes elements at unknown indices. Otherwise, these are |
| 165 | +included by default. |
| 166 | + |
| 167 | +### Specifiers |
| 168 | + |
| 169 | +#### `?` |
| 170 | +If used in an input path: an element at an unknown index. If used in an output |
| 171 | +path: an element at any known or unkown index. In other words, `?` in an output |
| 172 | +path means the same as `any`. |
| 173 | + |
| 174 | +#### `any` |
| 175 | +An element at any known or unknown index. |
| 176 | + |
| 177 | +#### `<integer>`, `<integer>!` |
| 178 | +An element at the index given by the integer. |
| 179 | + |
| 180 | +#### `<integer>..`, `<integer>..!` |
| 181 | +Any element at a known index greater or equal to the integer. |
| 182 | + |
| 183 | +#### `<string>`, `<string>!` |
| 184 | +An element at the index given by string. The string should match the result of |
| 185 | +`serialize()` on the `ConstantValue` that represents the index. For a string |
| 186 | +with contents `foo` this is `"foo"` and for a symbol `:foo` it is `:foo`. The |
| 187 | +Ruby values `true`, `false` and `nil` can be written verbatim. See tests 31-33 |
| 188 | +for examples. |
| 189 | + |
| 190 | +## `Field` |
| 191 | +A "field" in the object. In practice this refers to a value stored in an |
| 192 | +instance variable in the object. The only valid specifier is `@<string>`, where |
| 193 | +`<string>` is the name of the instance variable. Currently we assume that a |
| 194 | +setter call such as `x.foo = bar` means there is a field `foo` in `x`, backed by |
| 195 | +an instance variable `@foo`. |
| 196 | + |
| 197 | +For example, the access path `Argument[0].Field[@foo]` would refer to the value `"foo"` in |
| 198 | + |
| 199 | +```rb |
| 200 | +x = SomeClass.new |
| 201 | +x.foo = "foo" |
| 202 | +some_call(x) |
| 203 | +``` |
| 204 | + |
| 205 | +## `WithElement` |
| 206 | +This component restricts the set of elements that are included in the preceding |
| 207 | +access path to to those at a specific set of indices. The specifiers are the |
| 208 | +same as those for `Element`. It is only valid in an input path. |
| 209 | + |
| 210 | +This component has the effect of copying all relevant elements from the input to |
| 211 | +the output. For example, in the following summary: |
| 212 | + |
| 213 | +```ql |
| 214 | +input = "Argument[0].WithElement[1, 2]" and |
| 215 | +output = "ReturnValue" |
| 216 | +``` |
| 217 | + |
| 218 | +any data in indices 1 and 2 of the first argument will be copied to indices 1 |
| 219 | +and 2 of the return value. We use this in many Hash summaries that return the |
| 220 | +receiver, in order to preserve any data stored in it. For example, the summary |
| 221 | +for `Hash#to_h` is |
| 222 | + |
| 223 | +```ql |
| 224 | +input = "Argument[self].WithElement[any]" and |
| 225 | +output = "ReturnValue" and |
| 226 | +preservesValue = true |
| 227 | +``` |
| 228 | + |
| 229 | +## `WithoutElement` |
| 230 | +This component is used to exclude certain elements from the set included in the |
| 231 | +preceding access path. It takes the same specifiers as `WithElement` and |
| 232 | +`Element`. It is only valid in an input path. |
| 233 | + |
| 234 | +This component has the effect of excluding the relevant elements when copying |
| 235 | +from input to output. It is useful for modelling methods that remove elements |
| 236 | +from a collection. For example to model a method that removes the first element |
| 237 | +from the receiver, we can do so like this: |
| 238 | + |
| 239 | +```ql |
| 240 | +input = "Argument[self].WithoutElement[0]" and |
| 241 | +output = "Argument[self]" |
| 242 | +``` |
| 243 | + |
| 244 | +Note that both the input and output refer to the receiver. The effect of this |
| 245 | +summary is that use-use flow between the receiver in the method call and a |
| 246 | +subsequent use of the same receiver will be blocked: |
| 247 | + |
| 248 | +```ruby |
| 249 | +a[0] = source 0 |
| 250 | +a[1] = source 1 |
| 251 | + |
| 252 | +a.remove_first # use-use flow from `a` on this line to `a` below will be blocked. |
| 253 | + # there will still be flow from `[post-update] a` to `a` below. |
| 254 | + |
| 255 | +sink a[0] |
| 256 | +sink a[1] # $ hasValueFlow=1 |
| 257 | +``` |
| 258 | + |
| 259 | +It is also important to note that in a summary such as |
| 260 | + |
| 261 | +```ql |
| 262 | +input = "Argument[self].WithoutElement[0]" and |
| 263 | +output = "ReturnValue" |
| 264 | +``` |
| 265 | + |
| 266 | +if `Argument[self]` contains data, it will be copied to `ReturnValue`. If you only want to copy data in elements, and not in the container itself, add `WithElement[any]` to the input path: |
| 267 | + |
| 268 | +```ql |
| 269 | +input = "Argument[self].WithoutElement[0].WithElement[any]" and |
| 270 | +output = "ReturnValue" |
| 271 | +``` |
| 272 | + |
| 273 | +See tests 53 and 54 for examples of this behaviour. |
| 274 | + |
| 275 | + |
| 276 | + |
| 277 | +[^1]: I've chosen this name to avoid overloading the word "argument". |
0 commit comments