Skip to content

Commit 60c7c84

Browse files
authored
Merge pull request github#13774 from MathiasVP/add-more-invalid-deref-documentation
C++: Add more documentation to the `cpp/invalid-pointer-deref` query
2 parents 3767ce5 + 9f2ee0d commit 60c7c84

File tree

3 files changed

+162
-16
lines changed

3 files changed

+162
-16
lines changed

cpp/ql/lib/semmle/code/cpp/security/InvalidPointerDereference/AllocationToInvalidPointer.qll

Lines changed: 48 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,54 @@
11
/**
22
* This file provides the first phase of the `cpp/invalid-pointer-deref` query that identifies flow
33
* from an allocation to a pointer-arithmetic instruction that constructs a pointer that is out of bounds.
4+
*
5+
* Consider the following snippet:
6+
* ```cpp
7+
* 1. char* base = (char*)malloc(size);
8+
* 2. char* end = base + size;
9+
* 3. for(int *p = base; p <= end; p++) {
10+
* 4. use(*p); // BUG: Should have been bounded by `p < end`.
11+
* 5. }
12+
* ```
13+
* this file identifies the flow from `new int[size]` to `base + size`.
14+
*
15+
* This is done using the product-flow library. The configuration tracks flow from the pair
16+
* `(allocation, size of allocation)` to a pair `(a, b)` where there exists a pointer-arithmetic instruction
17+
* `pai = a + r` such that `b` is a dataflow node where `b <= r`. Because there will be a dataflow-path from
18+
* `allocation` to `a` this means that the `pai` will compute a pointer that is some number of elements beyond
19+
* the end position of the allocation. See `pointerAddInstructionHasBounds` for the implementation of this.
20+
*
21+
* In the above example, the pair `(a, b)` is `(base, size)` with `base` and `size` coming from the expression
22+
* `base + size` on line 2, which is also the pointer-arithmetic instruction. In general, the pair does not necessarily
23+
* correspond directly to the operands of the pointer-arithmetic instruction.
24+
* In the following example, the pair is again `(base, size)`, but with `base` coming from line 3 and `size` from line 2,
25+
* and the pointer-arithmetic instruction being `base + n` on line 3:
26+
* ```cpp
27+
* 1. int* base = new int[size];
28+
* 2. if(n <= size) {
29+
* 3. int* end = base + n;
30+
* 4. for(int* p = base; p <= end; ++p) {
31+
* 5. *p = 0; // BUG: Should have been bounded by `p < end`.
32+
* 6. }
33+
* 7. }
34+
* ```
35+
*
36+
* Handling false positives:
37+
*
38+
* Consider a snippet such as:
39+
* ```cpp
40+
* 1. int* base = new int[size];
41+
* 2. int n = condition() ? size : 0;
42+
* 3. if(n >= size) return;
43+
* 4. int* end = base + n;
44+
* 5. for(int* p = base; p <= end; ++p) {
45+
* 6. *p = 0; // This is fine since `end < base + size`
46+
* 7. }
47+
* ```
48+
* In order to remove this false positive we define a barrier (see `SizeBarrier::SizeBarrierConfig`) that finds the
49+
* possible guards that compares a value to the size of the allocation. In the above example, this is the `(n >= size)`
50+
* guard on line 3. `SizeBarrier::getABarrierNode` then defines any node that is guarded by such a guard as a barrier
51+
* in the dataflow configuration.
452
*/
553

654
private import cpp
@@ -149,22 +197,6 @@ private module InterestingPointerAddInstruction {
149197
/**
150198
* A product-flow configuration for flow from an `(allocation, size)` pair to a
151199
* pointer-arithmetic operation `pai` such that `pai <= allocation + size`.
152-
*
153-
* The goal of this query is to find patterns such as:
154-
* ```cpp
155-
* 1. char* begin = (char*)malloc(size);
156-
* 2. char* end = begin + size;
157-
* 3. for(int *p = begin; p <= end; p++) {
158-
* 4. use(*p);
159-
* 5. }
160-
* ```
161-
*
162-
* We do this by splitting the task up into two configurations:
163-
* 1. `AllocToInvalidPointerConfig` find flow from `malloc(size)` to `begin + size`, and
164-
* 2. `InvalidPointerToDerefConfig` finds flow from `begin + size` to an `end` (on line 3).
165-
*
166-
* Finally, the range-analysis library will find a load from (or store to) an address that
167-
* is non-strictly upper-bounded by `end` (which in this case is `*p`).
168200
*/
169201
private module Config implements ProductFlow::StateConfigSig {
170202
class FlowState1 = Unit;

cpp/ql/lib/semmle/code/cpp/security/InvalidPointerDereference/InvalidPointerToDereference.qll

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,75 @@
22
* This file provides the second phase of the `cpp/invalid-pointer-deref` query that identifies flow
33
* from the out-of-bounds pointer identified by the `AllocationToInvalidPointer.qll` library to
44
* a dereference of the out-of-bounds pointer.
5+
*
6+
* Consider the following snippet:
7+
* ```cpp
8+
* 1. char* base = (char*)malloc(size);
9+
* 2. char* end = base + size;
10+
* 3. for(char *p = base; p <= end; p++) {
11+
* 4. use(*p); // BUG: Should have been bounded by `p < end`.
12+
* 5. }
13+
* ```
14+
* this file identifies the flow from `base + size` to `end`. We call `base + size` the "dereference source" and `end`
15+
* the "dereference sink" (even though `end` is not actually dereferenced we will use this term because we will perform
16+
* dataflow to find a use of a pointer `x` such that `x <= end` which is dereferenced. In the above example, `x` is `p`
17+
* on line 4).
18+
*
19+
* Merely _constructing_ a pointer that's out-of-bounds is fine if the pointer is never dereferenced (in reality, the
20+
* standard only guarantees that it is safe to move the pointer one element past the last element, but we ignore that
21+
* here). So this step is about identifying which of the out-of-bounds pointers found by `pointerAddInstructionHasBounds`
22+
* in `AllocationToInvalidPointer.qll` are actually being dereferenced. We do this using a regular dataflow
23+
* configuration (see `InvalidPointerToDerefConfig`).
24+
*
25+
* The dataflow traversal defines the set of sources as any dataflow node `n` such that there exists a pointer-arithmetic
26+
* instruction `pai` found by `AllocationToInvalidPointer.qll` and a `n.asInstruction() >= pai + deltaDerefSourceAndPai`.
27+
* Here, `deltaDerefSourceAndPai` is the constant difference between the source we track for finding a dereference and the
28+
* pointer-arithmetic instruction.
29+
*
30+
* The set of sinks is defined as any dataflow node `n` such that `addr <= n.asInstruction() + deltaDerefSinkAndDerefAddress`
31+
* for some address operand `addr` and constant difference `deltaDerefSinkAndDerefAddress`. Since an address operand is
32+
* always consumed by an instruction that performs a dereference this lets us identify a "bad dereference". We call the
33+
* instruction that consumes the address operand the "operation".
34+
*
35+
* For example, consider the flow from `base + size` to `end` above. The sink is `end` on line 3 because
36+
* `p <= end.asInstruction() + deltaDerefSinkAndDerefAddress`, where `p` is the address operand in `use(*p)` and
37+
* `deltaDerefSinkAndDerefAddress >= 0`. The load attached to `*p` is the "operation". To ensure that the path makes
38+
* intuitive sense, we only pick operations that are control-flow reachable from the dereference sink.
39+
*
40+
* To compute how many elements the dereference is beyond the end position of the allocation, we sum the two deltas
41+
* `deltaDerefSourceAndPai` and `deltaDerefSinkAndDerefAddress`. This is done in the `operationIsOffBy` predicate
42+
* (which is the only predicate exposed by this file).
43+
*
44+
* Handling false positives:
45+
*
46+
* Consider the following snippet:
47+
* ```cpp
48+
* 1. char *p = new char[size];
49+
* 2. char *end = p + size;
50+
* 3. if (p < end) {
51+
* 4. p += 1;
52+
* 5. }
53+
* 6. if (p < end) {
54+
* 7. int val = *p; // GOOD
55+
* 8. }
56+
* ```
57+
* this is safe because `p` is guarded to be strictly less than `end` on line 6 before the dereference on line 7. However, if we
58+
* run the query on the above without further modifications we would see an alert on line 7. This is because range analysis infers
59+
* that `p <= end` after the increment on line 4, and thus the result of `p += 1` is seen as a valid dereference source. This
60+
* node then flows to `p` on line 6 (which is a valid dereference sink since it non-strictly upper bounds an address operand), and
61+
* range analysis then infers that the address operand of `*p` (i.e., `p`) is non-strictly upper bounded by `p`, and thus reports
62+
* an alert on line 7.
63+
*
64+
* In order to handle the above false positive, we define a barrier that identifies guards such as `p < end` that ensures that a value
65+
* is less than the pointer-arithmetic instruction that computed the invalid pointer. This is done in the `InvalidPointerToDerefBarrier`
66+
* module. Since the node we are tracking is not necessarily _equal_ to the pointer-arithmetic instruction, but rather satisfies
67+
* `node.asInstruction() <= pai + deltaDerefSourceAndPai`, we need to account for the delta when checking if a guard is sufficiently
68+
* strong to infer that a future dereference is safe. To do this, we check that the guard guarantees that a node `n` satisfies
69+
* `n < node + k` where `node` is a node we know is equal to the value of the dereference source (i.e., it satisfies
70+
* `node.asInstruction() <= pai + deltaDerefSourceAndPai`) and `k <= deltaDerefSourceAndPai`. Combining this we have
71+
* `n < node + k <= node + deltaDerefSourceAndPai <= pai + 2*deltaDerefSourceAndPai` (TODO: Oops. This math doesn't quite work out.
72+
* I think this is because we need to redefine the `BarrierConfig` to start flow at the pointer-arithmetic instruction instead of
73+
* at the dereference source. When combined with TODO above it's easy to show that this guard ensures that the dereference is safe).
574
*/
675

776
private import cpp

cpp/ql/src/experimental/Security/CWE/CWE-193/InvalidPointerDeref.ql

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,51 @@
1515
* external/cwe/cwe-787
1616
*/
1717

18+
/*
19+
* High-level description of the query:
20+
*
21+
* The goal of this query is to identify issues such as:
22+
* ```cpp
23+
* 1. int* base = new int[size];
24+
* 2. int* end = base + size;
25+
* 3. for(int* p = base; p <= end; ++p) {
26+
* 4. *p = 0; // BUG: Should have been bounded by `p < end`.
27+
* 5. }
28+
* ```
29+
* In order to do this, we split the problem into three subtasks:
30+
* 1. First, we find flow from `new int[size]` to `base + size`.
31+
* 2. Then, we find flow from `base + size` to `end` (on line 3).
32+
* 3. Finally, we use range-analysis to find a write to (or read from) a pointer that may be greater than or equal to `end`.
33+
*
34+
* Step 1 is implemented in `AllocationToInvalidPointer.qll`, and step 2 is implemented by
35+
* `InvalidPointerToDereference.qll`. See those files for the description of these.
36+
*
37+
* This file imports both libraries and defines a final dataflow configuration that constructs the full path from
38+
* the allocation to the dereference of the out-of-bounds pointer. This is done for several reasons:
39+
* 1. It means the user is able to inspect the entire path from the allocation to the dereference, which can be useful
40+
* to understand the problem highlighted.
41+
* 2. It ensures that the call-contexts line up correctly when we transition from step 1 to step 2. See the
42+
* `test_missing_call_context_1` and `test_missing_call_context_2` tests for how this may flag false positives
43+
* without this final configuration.
44+
*
45+
* The source of the final path is an allocation that is:
46+
* 1. identified as flowing to an invalid pointer (by `AllocationToInvalidPointer`), and
47+
* 2. for which the invalid pointer flows to a dereference (as identified by `InvalidPointerToDereference`).
48+
*
49+
* The path can be described in 3 "chunks":
50+
* 1. One path from the allocation to the construction of the invalid pointer
51+
* 2. Another path from the construction of the invalid pointer to the final pointer that is about to be dereferenced.
52+
* 3. Finally, a single step from the dataflow node that represents the final pointer to the dereference.
53+
*
54+
* Step 1 happens when the flow state is `TInitial`, and step 2 and 3 happen when the flow state is `TPointerArith(pai)`
55+
* where the pointer-arithmetic instruction `pai` tracks the instruction that generated the out-of-bounds pointer. This
56+
* instruction is used in the construction of the alert message.
57+
*
58+
* The set of pointer-arithmetic instructions that define the `TPointerArith` flow state is restricted to be the pointer-
59+
* arithmetic instructions that both receive flow from the allocation (as identified by `AllocationToInvalidPointer.qll`),
60+
* and further flow to a dereference (as identified by `InvalidPointerToDereference.qll`).
61+
*/
62+
1863
import cpp
1964
import semmle.code.cpp.dataflow.new.DataFlow
2065
import semmle.code.cpp.ir.IR

0 commit comments

Comments
 (0)