|
2 | 2 | * This file provides the second phase of the `cpp/invalid-pointer-deref` query that identifies flow
|
3 | 3 | * from the out-of-bounds pointer identified by the `AllocationToInvalidPointer.qll` library to
|
4 | 4 | * a dereference of the out-of-bounds pointer.
|
| 5 | + * |
| 6 | + * Consider the following snippet: |
| 7 | + * ```cpp |
| 8 | + * 1. char* base = (char*)malloc(size); |
| 9 | + * 2. char* end = base + size; |
| 10 | + * 3. for(char *p = base; p <= end; p++) { |
| 11 | + * 4. use(*p); // BUG: Should have been bounded by `p < end`. |
| 12 | + * 5. } |
| 13 | + * ``` |
| 14 | + * this file identifies the flow from `base + size` to `end`. We call `base + size` the "dereference source" and `end` |
| 15 | + * the "dereference sink" (even though `end` is not actually dereferenced we will use this term because we will perform |
| 16 | + * dataflow to find a use of a pointer `x` such that `x <= end` which is dereferenced. In the above example, `x` is `p` |
| 17 | + * on line 4). |
| 18 | + * |
| 19 | + * Merely _constructing_ a pointer that's out-of-bounds is fine if the pointer is never dereferenced (in reality, the |
| 20 | + * standard only guarantees that it is safe to move the pointer one element past the last element, but we ignore that |
| 21 | + * here). So this step is about identifying which of the out-of-bounds pointers found by `pointerAddInstructionHasBounds` |
| 22 | + * in `AllocationToInvalidPointer.qll` are actually being dereferenced. We do this using a regular dataflow |
| 23 | + * configuration (see `InvalidPointerToDerefConfig`). |
| 24 | + * |
| 25 | + * The dataflow traversal defines the set of sources as any dataflow node `n` such that there exists a pointer-arithmetic |
| 26 | + * instruction `pai` found by `AllocationToInvalidPointer.qll` and a `n.asInstruction() >= pai + deltaDerefSourceAndPai`. |
| 27 | + * Here, `deltaDerefSourceAndPai` is the constant difference between the source we track for finding a dereference and the |
| 28 | + * pointer-arithmetic instruction. |
| 29 | + * |
| 30 | + * The set of sinks is defined as any dataflow node `n` such that `addr <= n.asInstruction() + deltaDerefSinkAndDerefAddress` |
| 31 | + * for some address operand `addr` and constant difference `deltaDerefSinkAndDerefAddress`. Since an address operand is |
| 32 | + * always consumed by an instruction that performs a dereference this lets us identify a "bad dereference". We call the |
| 33 | + * instruction that consumes the address operand the "operation". |
| 34 | + * |
| 35 | + * For example, consider the flow from `base + size` to `end` above. The sink is `end` on line 3 because |
| 36 | + * `p <= end.asInstruction() + deltaDerefSinkAndDerefAddress`, where `p` is the address operand in `use(*p)` and |
| 37 | + * `deltaDerefSinkAndDerefAddress >= 0`. The load attached to `*p` is the "operation". To ensure that the path makes |
| 38 | + * intuitive sense, we only pick operations that are control-flow reachable from the dereference sink. |
| 39 | + * |
| 40 | + * To compute how many elements the dereference is beyond the end position of the allocation, we sum the two deltas |
| 41 | + * `deltaDerefSourceAndPai` and `deltaDerefSinkAndDerefAddress`. This is done in the `operationIsOffBy` predicate |
| 42 | + * (which is the only predicate exposed by this file). |
| 43 | + * |
| 44 | + * Handling false positives: |
| 45 | + * |
| 46 | + * Consider the following snippet: |
| 47 | + * ```cpp |
| 48 | + * 1. char *p = new char[size]; |
| 49 | + * 2. char *end = p + size; |
| 50 | + * 3. if (p < end) { |
| 51 | + * 4. p += 1; |
| 52 | + * 5. } |
| 53 | + * 6. if (p < end) { |
| 54 | + * 7. int val = *p; // GOOD |
| 55 | + * 8. } |
| 56 | + * ``` |
| 57 | + * this is safe because `p` is guarded to be strictly less than `end` on line 6 before the dereference on line 7. However, if we |
| 58 | + * run the query on the above without further modifications we would see an alert on line 7. This is because range analysis infers |
| 59 | + * that `p <= end` after the increment on line 4, and thus the result of `p += 1` is seen as a valid dereference source. This |
| 60 | + * node then flows to `p` on line 6 (which is a valid dereference sink since it non-strictly upper bounds an address operand), and |
| 61 | + * range analysis then infers that the address operand of `*p` (i.e., `p`) is non-strictly upper bounded by `p`, and thus reports |
| 62 | + * an alert on line 7. |
| 63 | + * |
| 64 | + * In order to handle the above false positive, we define a barrier that identifies guards such as `p < end` that ensures that a value |
| 65 | + * is less than the pointer-arithmetic instruction that computed the invalid pointer. This is done in the `InvalidPointerToDerefBarrier` |
| 66 | + * module. Since the node we are tracking is not necessarily _equal_ to the pointer-arithmetic instruction, but rather satisfies |
| 67 | + * `node.asInstruction() <= pai + deltaDerefSourceAndPai`, we need to account for the delta when checking if a guard is sufficiently |
| 68 | + * strong to infer that a future dereference is safe. To do this, we check that the guard guarantees that a node `n` satisfies |
| 69 | + * `n < node + k` where `node` is a node we know is equal to the value of the dereference source (i.e., it satisfies |
| 70 | + * `node.asInstruction() <= pai + deltaDerefSourceAndPai`) and `k <= deltaDerefSourceAndPai`. Combining this we have |
| 71 | + * `n < node + k <= node + deltaDerefSourceAndPai <= pai + 2*deltaDerefSourceAndPai` (TODO: Oops. This math doesn't quite work out. |
| 72 | + * I think this is because we need to redefine the `BarrierConfig` to start flow at the pointer-arithmetic instruction instead of |
| 73 | + * at the dereference source. When combined with TODO above it's easy to show that this guard ensures that the dereference is safe). |
5 | 74 | */
|
6 | 75 |
|
7 | 76 | private import cpp
|
|
0 commit comments