Skip to content

Commit 3d6965a

Browse files
authored
Merge pull request #17665 from jketema/printir-doc
C++: Add some documentation on the printed IR
2 parents 5a4cd1c + ed266da commit 3d6965a

File tree

3 files changed

+318
-0
lines changed

3 files changed

+318
-0
lines changed

cpp/ql/lib/semmle/code/cpp/ir/implementation/aliased_ssa/PrintIR.qll

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,112 @@
66
* uses, however, it is better to write a query that imports `PrintIR.qll`, extends
77
* `PrintIRConfiguration`, and overrides `shouldPrintDeclaration()` to select a subset of declarations
88
* to dump.
9+
*
10+
* Anatomy of a printed IR instruction
11+
*
12+
* An instruction:
13+
*
14+
* ```
15+
* # 2281| v2281_19(void) = Call[~String] : func:r2281_18, this:r2281_17
16+
* ```
17+
*
18+
* The prefix `# 2281|` specifies that this instruction was generated by the C++ source code on line 2281.
19+
* Scrolling up in the printed output, one will eventually find the name of the file to which the line
20+
* belongs.
21+
*
22+
* `v2281_19(void)` is the result of the instruction. Here, `v` means this is a void result or operand (so
23+
* there should be no later uses of the result; see below for other possible values). The `2281_19` is a
24+
* unique ID for the result. This is usually just the line number plus a small integer suffix to make it
25+
* unique within the function. The type of the result is `void`. In this case, it is `void`, because
26+
* `~String` returns `void`. The type of the result is usually just the name of the appropriate C++ type,
27+
* but it will sometimes be a type like `glval<int>`, which means result holds a glvalue, which at the
28+
* IR level works like a pointer. In other words, in the source code the type was `int`, but it is really
29+
* more like an `int*`. We see this, for example, in `x = y;`, where `x` is a glvalue.
30+
*
31+
* `Call` is the opcode of the instruction. Common opcodes include:
32+
*
33+
* * Arithmetic operations: `Add`, `Sub`, `Mul`, etc.
34+
* * Memory access operations: `Load`, `Store`.
35+
* * Function calls: `Call`.
36+
* * Literals: `Constant`.
37+
* * Variable addresses: `VariableAddress`.
38+
* * Function entry points: `EnterFunction`.
39+
* * Return from a function: `Return`, `ReturnVoid`. Note that the value being returned is set separately by a
40+
* `Store` to a special `#return` variable.
41+
* * Stack unwinding for C++ function that throw and where the exception escapes the function: `Unwind`.
42+
* * Common exit point for `Unwind` and `Return`: `ExitFunction`.
43+
* * SSA-related opcodes: `Phi`, `Chi`.
44+
*
45+
* `[~String]` denotes additional information. The information might be present earlier in the IR, as is the case
46+
* for `Call`, where it is the name of the called function. This is also the case for `Load` and `Store`, where it
47+
* is the name of the variable that loaded or stored (if known). In the case of `Constant`, `FieldAddress`, and
48+
* `VariableAddress`, the information between brackets does not occur earlier.
49+
*
50+
* `func:r2281_18` and `this:r28281_17` are the operands of the instruction. The `func:` prefix denotes the operand
51+
* that holds the address of the called function. The `this:` prefix denotes the argument to the special `this`
52+
* parameter of an instance member function. `r2281_18`, `r2281_17` are the unique IDs of the operands. Each of these
53+
* matches the ID of a previously seen result, showing where that value came from. The `r` means that these are
54+
* "register" operands (see below).
55+
*
56+
* Result and operand kinds:
57+
*
58+
* Every result and operand is one of these three kinds:
59+
*
60+
* * `r` "register". These operands are not stored in any particular memory location. We can think of them as
61+
* temporary values created during the evaluation of an expression. A register operand almost always has one
62+
* use, often in the same block as its definition.
63+
* * `m` "memory". These operands represents accesses to a specific memory location. The location could be a
64+
* local variable, a global variable, a field of an object, an element of an array, or any memory that we happen
65+
* to have a pointer to. These only occur as the result of a `Store`, the source operand of a `Load` or on the
66+
* SSA instructions (`Phi`, `Chi`).
67+
* * `v` "void". Really just a register operand, but we mark register operands of type void with this special prefix
68+
* so we know that there is no actual value there.
69+
*
70+
* Branches in the IR:
71+
*
72+
* The IR is divided into basic blocks. At the end of each block, there are one or more edges showing the possible
73+
* control flow successors of the block.
74+
*
75+
* ```
76+
* # 44| v44_3(void) = ConditionalBranch : r44_2
77+
* #-----| False -> Block 4
78+
* #-----| True -> Block 3
79+
* ```
80+
* Here we have a block that ends with a conditional branch. The two edges show where the control flows to depending
81+
* on whether the condition is true or false.
82+
*
83+
* SSA instructions:
84+
*
85+
* We use `Phi` instructions in SSA to create a single definition for a variable that might be assigned on multiple
86+
* control flow paths. The `Phi` instruction merges the potential values of that variable from each predecessor edge,
87+
* and the resulting definition is then used wherever that variable is accessed later on.
88+
*
89+
* When dealing with aliased memory, we use the `Chi` instruction to create a single definition for memory that might
90+
* or might not have been updated by a store, depending on the actual address that was written to. For example, take:
91+
*
92+
* ```cpp
93+
* int x = 5;
94+
* int y = 7;
95+
* int* p = condition ? &x : &y;
96+
* *p = 6;
97+
* return x;
98+
* ```
99+
*
100+
* At the point where we store to `*p`, we do not know whether `p` points to `x` or `y`. Thus, we do not know whether
101+
* `return x;` is going to return the value that `x` was originally initialized to (5), or whether it will return 6,
102+
* because it was overwritten by `*p = 6;`. We insert a `Chi` instruction immediately after the store to `*p`:
103+
*
104+
* ```
105+
* r2(int) = Constant[6]
106+
* r3(int*) = <<value of p>>
107+
* m4(int) = Store : &r3, r2 // Stores the constant 6 to *p
108+
* m5(unknown) = Chi : total:m1, partial:m4
109+
* ```
110+
* The `partial:` operand represents the memory that was just stored. The `total:` operand represents the previous
111+
* contents of all of the memory that `p` might have pointed to (in this case, both `x` and `y`). The result of the
112+
* `Chi` represents the new contents of whatever memory the `total:` operand referred to. We usually do not know exactly
113+
* which parts of that memory were overwritten, but it does model that any of that memory could have been modified, so
114+
* that later instructions do not assume that the memory was unchanged.
9115
*/
10116

11117
private import internal.IRInternal

cpp/ql/lib/semmle/code/cpp/ir/implementation/raw/PrintIR.qll

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,112 @@
66
* uses, however, it is better to write a query that imports `PrintIR.qll`, extends
77
* `PrintIRConfiguration`, and overrides `shouldPrintDeclaration()` to select a subset of declarations
88
* to dump.
9+
*
10+
* Anatomy of a printed IR instruction
11+
*
12+
* An instruction:
13+
*
14+
* ```
15+
* # 2281| v2281_19(void) = Call[~String] : func:r2281_18, this:r2281_17
16+
* ```
17+
*
18+
* The prefix `# 2281|` specifies that this instruction was generated by the C++ source code on line 2281.
19+
* Scrolling up in the printed output, one will eventually find the name of the file to which the line
20+
* belongs.
21+
*
22+
* `v2281_19(void)` is the result of the instruction. Here, `v` means this is a void result or operand (so
23+
* there should be no later uses of the result; see below for other possible values). The `2281_19` is a
24+
* unique ID for the result. This is usually just the line number plus a small integer suffix to make it
25+
* unique within the function. The type of the result is `void`. In this case, it is `void`, because
26+
* `~String` returns `void`. The type of the result is usually just the name of the appropriate C++ type,
27+
* but it will sometimes be a type like `glval<int>`, which means result holds a glvalue, which at the
28+
* IR level works like a pointer. In other words, in the source code the type was `int`, but it is really
29+
* more like an `int*`. We see this, for example, in `x = y;`, where `x` is a glvalue.
30+
*
31+
* `Call` is the opcode of the instruction. Common opcodes include:
32+
*
33+
* * Arithmetic operations: `Add`, `Sub`, `Mul`, etc.
34+
* * Memory access operations: `Load`, `Store`.
35+
* * Function calls: `Call`.
36+
* * Literals: `Constant`.
37+
* * Variable addresses: `VariableAddress`.
38+
* * Function entry points: `EnterFunction`.
39+
* * Return from a function: `Return`, `ReturnVoid`. Note that the value being returned is set separately by a
40+
* `Store` to a special `#return` variable.
41+
* * Stack unwinding for C++ function that throw and where the exception escapes the function: `Unwind`.
42+
* * Common exit point for `Unwind` and `Return`: `ExitFunction`.
43+
* * SSA-related opcodes: `Phi`, `Chi`.
44+
*
45+
* `[~String]` denotes additional information. The information might be present earlier in the IR, as is the case
46+
* for `Call`, where it is the name of the called function. This is also the case for `Load` and `Store`, where it
47+
* is the name of the variable that loaded or stored (if known). In the case of `Constant`, `FieldAddress`, and
48+
* `VariableAddress`, the information between brackets does not occur earlier.
49+
*
50+
* `func:r2281_18` and `this:r28281_17` are the operands of the instruction. The `func:` prefix denotes the operand
51+
* that holds the address of the called function. The `this:` prefix denotes the argument to the special `this`
52+
* parameter of an instance member function. `r2281_18`, `r2281_17` are the unique IDs of the operands. Each of these
53+
* matches the ID of a previously seen result, showing where that value came from. The `r` means that these are
54+
* "register" operands (see below).
55+
*
56+
* Result and operand kinds:
57+
*
58+
* Every result and operand is one of these three kinds:
59+
*
60+
* * `r` "register". These operands are not stored in any particular memory location. We can think of them as
61+
* temporary values created during the evaluation of an expression. A register operand almost always has one
62+
* use, often in the same block as its definition.
63+
* * `m` "memory". These operands represents accesses to a specific memory location. The location could be a
64+
* local variable, a global variable, a field of an object, an element of an array, or any memory that we happen
65+
* to have a pointer to. These only occur as the result of a `Store`, the source operand of a `Load` or on the
66+
* SSA instructions (`Phi`, `Chi`).
67+
* * `v` "void". Really just a register operand, but we mark register operands of type void with this special prefix
68+
* so we know that there is no actual value there.
69+
*
70+
* Branches in the IR:
71+
*
72+
* The IR is divided into basic blocks. At the end of each block, there are one or more edges showing the possible
73+
* control flow successors of the block.
74+
*
75+
* ```
76+
* # 44| v44_3(void) = ConditionalBranch : r44_2
77+
* #-----| False -> Block 4
78+
* #-----| True -> Block 3
79+
* ```
80+
* Here we have a block that ends with a conditional branch. The two edges show where the control flows to depending
81+
* on whether the condition is true or false.
82+
*
83+
* SSA instructions:
84+
*
85+
* We use `Phi` instructions in SSA to create a single definition for a variable that might be assigned on multiple
86+
* control flow paths. The `Phi` instruction merges the potential values of that variable from each predecessor edge,
87+
* and the resulting definition is then used wherever that variable is accessed later on.
88+
*
89+
* When dealing with aliased memory, we use the `Chi` instruction to create a single definition for memory that might
90+
* or might not have been updated by a store, depending on the actual address that was written to. For example, take:
91+
*
92+
* ```cpp
93+
* int x = 5;
94+
* int y = 7;
95+
* int* p = condition ? &x : &y;
96+
* *p = 6;
97+
* return x;
98+
* ```
99+
*
100+
* At the point where we store to `*p`, we do not know whether `p` points to `x` or `y`. Thus, we do not know whether
101+
* `return x;` is going to return the value that `x` was originally initialized to (5), or whether it will return 6,
102+
* because it was overwritten by `*p = 6;`. We insert a `Chi` instruction immediately after the store to `*p`:
103+
*
104+
* ```
105+
* r2(int) = Constant[6]
106+
* r3(int*) = <<value of p>>
107+
* m4(int) = Store : &r3, r2 // Stores the constant 6 to *p
108+
* m5(unknown) = Chi : total:m1, partial:m4
109+
* ```
110+
* The `partial:` operand represents the memory that was just stored. The `total:` operand represents the previous
111+
* contents of all of the memory that `p` might have pointed to (in this case, both `x` and `y`). The result of the
112+
* `Chi` represents the new contents of whatever memory the `total:` operand referred to. We usually do not know exactly
113+
* which parts of that memory were overwritten, but it does model that any of that memory could have been modified, so
114+
* that later instructions do not assume that the memory was unchanged.
9115
*/
10116

11117
private import internal.IRInternal

cpp/ql/lib/semmle/code/cpp/ir/implementation/unaliased_ssa/PrintIR.qll

Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,112 @@
66
* uses, however, it is better to write a query that imports `PrintIR.qll`, extends
77
* `PrintIRConfiguration`, and overrides `shouldPrintDeclaration()` to select a subset of declarations
88
* to dump.
9+
*
10+
* Anatomy of a printed IR instruction
11+
*
12+
* An instruction:
13+
*
14+
* ```
15+
* # 2281| v2281_19(void) = Call[~String] : func:r2281_18, this:r2281_17
16+
* ```
17+
*
18+
* The prefix `# 2281|` specifies that this instruction was generated by the C++ source code on line 2281.
19+
* Scrolling up in the printed output, one will eventually find the name of the file to which the line
20+
* belongs.
21+
*
22+
* `v2281_19(void)` is the result of the instruction. Here, `v` means this is a void result or operand (so
23+
* there should be no later uses of the result; see below for other possible values). The `2281_19` is a
24+
* unique ID for the result. This is usually just the line number plus a small integer suffix to make it
25+
* unique within the function. The type of the result is `void`. In this case, it is `void`, because
26+
* `~String` returns `void`. The type of the result is usually just the name of the appropriate C++ type,
27+
* but it will sometimes be a type like `glval<int>`, which means result holds a glvalue, which at the
28+
* IR level works like a pointer. In other words, in the source code the type was `int`, but it is really
29+
* more like an `int*`. We see this, for example, in `x = y;`, where `x` is a glvalue.
30+
*
31+
* `Call` is the opcode of the instruction. Common opcodes include:
32+
*
33+
* * Arithmetic operations: `Add`, `Sub`, `Mul`, etc.
34+
* * Memory access operations: `Load`, `Store`.
35+
* * Function calls: `Call`.
36+
* * Literals: `Constant`.
37+
* * Variable addresses: `VariableAddress`.
38+
* * Function entry points: `EnterFunction`.
39+
* * Return from a function: `Return`, `ReturnVoid`. Note that the value being returned is set separately by a
40+
* `Store` to a special `#return` variable.
41+
* * Stack unwinding for C++ function that throw and where the exception escapes the function: `Unwind`.
42+
* * Common exit point for `Unwind` and `Return`: `ExitFunction`.
43+
* * SSA-related opcodes: `Phi`, `Chi`.
44+
*
45+
* `[~String]` denotes additional information. The information might be present earlier in the IR, as is the case
46+
* for `Call`, where it is the name of the called function. This is also the case for `Load` and `Store`, where it
47+
* is the name of the variable that loaded or stored (if known). In the case of `Constant`, `FieldAddress`, and
48+
* `VariableAddress`, the information between brackets does not occur earlier.
49+
*
50+
* `func:r2281_18` and `this:r28281_17` are the operands of the instruction. The `func:` prefix denotes the operand
51+
* that holds the address of the called function. The `this:` prefix denotes the argument to the special `this`
52+
* parameter of an instance member function. `r2281_18`, `r2281_17` are the unique IDs of the operands. Each of these
53+
* matches the ID of a previously seen result, showing where that value came from. The `r` means that these are
54+
* "register" operands (see below).
55+
*
56+
* Result and operand kinds:
57+
*
58+
* Every result and operand is one of these three kinds:
59+
*
60+
* * `r` "register". These operands are not stored in any particular memory location. We can think of them as
61+
* temporary values created during the evaluation of an expression. A register operand almost always has one
62+
* use, often in the same block as its definition.
63+
* * `m` "memory". These operands represents accesses to a specific memory location. The location could be a
64+
* local variable, a global variable, a field of an object, an element of an array, or any memory that we happen
65+
* to have a pointer to. These only occur as the result of a `Store`, the source operand of a `Load` or on the
66+
* SSA instructions (`Phi`, `Chi`).
67+
* * `v` "void". Really just a register operand, but we mark register operands of type void with this special prefix
68+
* so we know that there is no actual value there.
69+
*
70+
* Branches in the IR:
71+
*
72+
* The IR is divided into basic blocks. At the end of each block, there are one or more edges showing the possible
73+
* control flow successors of the block.
74+
*
75+
* ```
76+
* # 44| v44_3(void) = ConditionalBranch : r44_2
77+
* #-----| False -> Block 4
78+
* #-----| True -> Block 3
79+
* ```
80+
* Here we have a block that ends with a conditional branch. The two edges show where the control flows to depending
81+
* on whether the condition is true or false.
82+
*
83+
* SSA instructions:
84+
*
85+
* We use `Phi` instructions in SSA to create a single definition for a variable that might be assigned on multiple
86+
* control flow paths. The `Phi` instruction merges the potential values of that variable from each predecessor edge,
87+
* and the resulting definition is then used wherever that variable is accessed later on.
88+
*
89+
* When dealing with aliased memory, we use the `Chi` instruction to create a single definition for memory that might
90+
* or might not have been updated by a store, depending on the actual address that was written to. For example, take:
91+
*
92+
* ```cpp
93+
* int x = 5;
94+
* int y = 7;
95+
* int* p = condition ? &x : &y;
96+
* *p = 6;
97+
* return x;
98+
* ```
99+
*
100+
* At the point where we store to `*p`, we do not know whether `p` points to `x` or `y`. Thus, we do not know whether
101+
* `return x;` is going to return the value that `x` was originally initialized to (5), or whether it will return 6,
102+
* because it was overwritten by `*p = 6;`. We insert a `Chi` instruction immediately after the store to `*p`:
103+
*
104+
* ```
105+
* r2(int) = Constant[6]
106+
* r3(int*) = <<value of p>>
107+
* m4(int) = Store : &r3, r2 // Stores the constant 6 to *p
108+
* m5(unknown) = Chi : total:m1, partial:m4
109+
* ```
110+
* The `partial:` operand represents the memory that was just stored. The `total:` operand represents the previous
111+
* contents of all of the memory that `p` might have pointed to (in this case, both `x` and `y`). The result of the
112+
* `Chi` represents the new contents of whatever memory the `total:` operand referred to. We usually do not know exactly
113+
* which parts of that memory were overwritten, but it does model that any of that memory could have been modified, so
114+
* that later instructions do not assume that the memory was unchanged.
9115
*/
10116

11117
private import internal.IRInternal

0 commit comments

Comments
 (0)