|
6 | 6 | * uses, however, it is better to write a query that imports `PrintIR.qll`, extends
|
7 | 7 | * `PrintIRConfiguration`, and overrides `shouldPrintDeclaration()` to select a subset of declarations
|
8 | 8 | * to dump.
|
| 9 | + * |
| 10 | + * Anatomy of a printed IR instruction |
| 11 | + * |
| 12 | + * An instruction: |
| 13 | + * |
| 14 | + * ``` |
| 15 | + * # 2281| v2281_19(void) = Call[~String] : func:r2281_18, this:r2281_17 |
| 16 | + * ``` |
| 17 | + * |
| 18 | + * The prefix `# 2281|` specifies that this instruction was generated by the C++ source code on line 2281. |
| 19 | + * Scrolling up in the printed output, one will eventually find the name of the file to which the line |
| 20 | + * belongs. |
| 21 | + * |
| 22 | + * `v2281_19(void)` is the result of the instruction. Here, `v` means this is a void result or operand (so |
| 23 | + * there should be no later uses of the result; see below for other possible values). The `2281_19` is a |
| 24 | + * unique ID for the result. This is usually just the line number plus a small integer suffix to make it |
| 25 | + * unique within the function. The type of the result is `void`. In this case, it is `void`, because |
| 26 | + * `~String` returns `void`. The type of the result is usually just the name of the appropriate C++ type, |
| 27 | + * but it will sometimes be a type like `glval<int>`, which means result holds a glvalue, which at the |
| 28 | + * IR level works like a pointer. In other words, in the source code the type was `int`, but it is really |
| 29 | + * more like an `int*`. We see this, for example, in `x = y;`, where `x` is a glvalue. |
| 30 | + * |
| 31 | + * `Call` is the opcode of the instruction. Common opcodes include: |
| 32 | + * |
| 33 | + * * Arithmetic operations: `Add`, `Sub`, `Mul`, etc. |
| 34 | + * * Memory access operations: `Load`, `Store`. |
| 35 | + * * Function calls: `Call`. |
| 36 | + * * Literals: `Constant`. |
| 37 | + * * Variable addresses: `VariableAddress`. |
| 38 | + * * Function entry points: `EnterFunction`. |
| 39 | + * * Return from a function: `Return`, `ReturnVoid`. Note that the value being returned is set separately by a |
| 40 | + * `Store` to a special `#return` variable. |
| 41 | + * * Stack unwinding for C++ function that throw and where the exception escapes the function: `Unwind`. |
| 42 | + * * Common exit point for `Unwind` and `Return`: `ExitFunction`. |
| 43 | + * * SSA-related opcodes: `Phi`, `Chi`. |
| 44 | + * |
| 45 | + * `[~String]` denotes additional information. The information might be present earlier in the IR, as is the case |
| 46 | + * for `Call`, where it is the name of the called function. This is also the case for `Load` and `Store`, where it |
| 47 | + * is the name of the variable that loaded or stored (if known). In the case of `Constant`, `FieldAddress`, and |
| 48 | + * `VariableAddress`, the information between brackets does not occur earlier. |
| 49 | + * |
| 50 | + * `func:r2281_18` and `this:r28281_17` are the operands of the instruction. The `func:` prefix denotes the operand |
| 51 | + * that holds the address of the called function. The `this:` prefix denotes the argument to the special `this` |
| 52 | + * parameter of an instance member function. `r2281_18`, `r2281_17` are the unique IDs of the operands. Each of these |
| 53 | + * matches the ID of a previously seen result, showing where that value came from. The `r` means that these are |
| 54 | + * "register" operands (see below). |
| 55 | + * |
| 56 | + * Result and operand kinds: |
| 57 | + * |
| 58 | + * Every result and operand is one of these three kinds: |
| 59 | + * |
| 60 | + * * `r` "register". These operands are not stored in any particular memory location. We can think of them as |
| 61 | + * temporary values created during the evaluation of an expression. A register operand almost always has one |
| 62 | + * use, often in the same block as its definition. |
| 63 | + * * `m` "memory". These operands represents accesses to a specific memory location. The location could be a |
| 64 | + * local variable, a global variable, a field of an object, an element of an array, or any memory that we happen |
| 65 | + * to have a pointer to. These only occur as the result of a `Store`, the source operand of a `Load` or on the |
| 66 | + * SSA instructions (`Phi`, `Chi`). |
| 67 | + * * `v` "void". Really just a register operand, but we mark register operands of type void with this special prefix |
| 68 | + * so we know that there is no actual value there. |
| 69 | + * |
| 70 | + * Branches in the IR: |
| 71 | + * |
| 72 | + * The IR is divided into basic blocks. At the end of each block, there are one or more edges showing the possible |
| 73 | + * control flow successors of the block. |
| 74 | + * |
| 75 | + * ``` |
| 76 | + * # 44| v44_3(void) = ConditionalBranch : r44_2 |
| 77 | + * #-----| False -> Block 4 |
| 78 | + * #-----| True -> Block 3 |
| 79 | + * ``` |
| 80 | + * Here we have a block that ends with a conditional branch. The two edges show where the control flows to depending |
| 81 | + * on whether the condition is true or false. |
| 82 | + * |
| 83 | + * SSA instructions: |
| 84 | + * |
| 85 | + * We use `Phi` instructions in SSA to create a single definition for a variable that might be assigned on multiple |
| 86 | + * control flow paths. The `Phi` instruction merges the potential values of that variable from each predecessor edge, |
| 87 | + * and the resulting definition is then used wherever that variable is accessed later on. |
| 88 | + * |
| 89 | + * When dealing with aliased memory, we use the `Chi` instruction to create a single definition for memory that might |
| 90 | + * or might not have been updated by a store, depending on the actual address that was written to. For example, take: |
| 91 | + * |
| 92 | + * ```cpp |
| 93 | + * int x = 5; |
| 94 | + * int y = 7; |
| 95 | + * int* p = condition ? &x : &y; |
| 96 | + * *p = 6; |
| 97 | + * return x; |
| 98 | + * ``` |
| 99 | + * |
| 100 | + * At the point where we store to `*p`, we do not know whether `p` points to `x` or `y`. Thus, we do not know whether |
| 101 | + * `return x;` is going to return the value that `x` was originally initialized to (5), or whether it will return 6, |
| 102 | + * because it was overwritten by `*p = 6;`. We insert a `Chi` instruction immediately after the store to `*p`: |
| 103 | + * |
| 104 | + * ``` |
| 105 | + * r2(int) = Constant[6] |
| 106 | + * r3(int*) = <<value of p>> |
| 107 | + * m4(int) = Store : &r3, r2 // Stores the constant 6 to *p |
| 108 | + * m5(unknown) = Chi : total:m1, partial:m4 |
| 109 | + * ``` |
| 110 | + * The `partial:` operand represents the memory that was just stored. The `total:` operand represents the previous |
| 111 | + * contents of all of the memory that `p` might have pointed to (in this case, both `x` and `y`). The result of the |
| 112 | + * `Chi` represents the new contents of whatever memory the `total:` operand referred to. We usually do not know exactly |
| 113 | + * which parts of that memory were overwritten, but it does model that any of that memory could have been modified, so |
| 114 | + * that later instructions do not assume that the memory was unchanged. |
9 | 115 | */
|
10 | 116 |
|
11 | 117 | private import internal.IRInternal
|
|
0 commit comments