1
1
# Using the shared data-flow library
2
2
3
- This document is aimed towards language maintainers and contain implementation
3
+ This document is aimed towards language maintainers and contains implementation
4
4
details that should be mostly irrelevant to query writers.
5
5
6
6
## Overview
@@ -40,9 +40,10 @@ module DataFlow {
40
40
The ` DataFlowImpl.qll ` and ` DataFlowCommon.qll ` files contain the library code
41
41
that is shared across languages. These contain ` Configuration ` -specific and
42
42
` Configuration ` -independent code, respectively. This organization allows
43
- multiple copies of the library (for the use case when a query wants to use two
44
- instances of global data flow and the configuration of one depends on the
45
- results from the other). Using multiple copies just means duplicating
43
+ multiple copies of the library to exist without duplicating the
44
+ ` Configuration ` -independent predicates (for the use case when a query wants to
45
+ use two instances of global data flow and the configuration of one depends on
46
+ the results from the other). Using multiple copies just means duplicating
46
47
` DataFlow.qll ` and ` DataFlowImpl.qll ` , for example as:
47
48
48
49
```
@@ -52,9 +53,9 @@ dataflow/internal/DataFlowImpl2.qll
52
53
dataflow/internal/DataFlowImpl3.qll
53
54
```
54
55
55
- The ` DataFlowImplSpecific.qll ` provides all the language-specific classes and
56
- predicates that the library needs as input and is the topic of the rest of this
57
- document.
56
+ The file ` DataFlowImplSpecific.qll ` provides all the language-specific classes
57
+ and predicates that the library needs as input and is the topic of the rest of
58
+ this document.
58
59
59
60
This file must provide two modules named ` Public ` and ` Private ` , which the
60
61
shared library code will import publicly and privately, respectively, thus
@@ -88,7 +89,9 @@ Recommendations:
88
89
* Define ` predicate localFlowStep(Node node1, Node node2) ` as an alias of
89
90
` simpleLocalFlowStep ` and expose it publicly. The reason for this indirection
90
91
is that it gives the option of exposing local flow augmented with field flow.
91
- See the C/C++ implementation, which makes use of this feature.
92
+ See the C/C++ implementation, which makes use of this feature. Another use of
93
+ this indirection is to hide synthesized local steps that are only relevant
94
+ for global flow. See the C# implementation for an example of this.
92
95
* Define ` predicate localFlow(Node node1, Node node2) { localFlowStep*(node1, node2) } ` .
93
96
* Make the local flow step relation in ` simpleLocalFlowStep ` follow
94
97
def-to-first-use and use-to-next-use steps for SSA variables. Def-use steps
@@ -141,8 +144,9 @@ must be provided.
141
144
First, two types, ` DataFlowCall ` and ` DataFlowCallable ` , must be defined. These
142
145
should be aliases for whatever language-specific class represents calls and
143
146
callables (a "callable" is intended as a broad term covering functions,
144
- methods, constructors, lambdas, etc.). The call-graph should be defined as a
145
- predicate:
147
+ methods, constructors, lambdas, etc.). It can also be useful to represent
148
+ ` DataFlowCall ` as an IPA type if implicit calls need to be modelled. The
149
+ call-graph should be defined as a predicate:
146
150
``` ql
147
151
DataFlowCallable viableCallable(DataFlowCall c)
148
152
```
@@ -182,7 +186,7 @@ corresponding `OutNode`s.
182
186
183
187
Flow through global variables are called jump-steps, since such flow steps
184
188
essentially jump from one callable to another completely discarding call
185
- context .
189
+ contexts .
186
190
187
191
Adding support for this type of flow is done with the following predicate:
188
192
``` ql
@@ -206,10 +210,12 @@ as described above.
206
210
207
211
The library supports tracking flow through field stores and reads. In order to
208
212
support this, a class ` Content ` and two predicates
209
- ` storeStep(Node node1, Content f, PostUpdateNode node2) ` and
210
- ` readStep(Node node1, Content f, Node node2) ` must be defined. Besides this,
211
- certain nodes must have associated ` PostUpdateNode ` s. The node associated with
212
- a ` PostUpdateNode ` should be defined by ` PostUpdateNode::getPreUpdateNode() ` .
213
+ ` storeStep(Node node1, Content f, Node node2) ` and
214
+ ` readStep(Node node1, Content f, Node node2) ` must be defined. It generally
215
+ makes sense for stores to target ` PostUpdateNode ` s, but this is not a strict
216
+ requirement. Besides this, certain nodes must have associated
217
+ ` PostUpdateNode ` s. The node associated with a ` PostUpdateNode ` should be
218
+ defined by ` PostUpdateNode::getPreUpdateNode() ` .
213
219
214
220
` PostUpdateNode ` s are generally used when we need two data-flow nodes for a
215
221
single AST element in order to distinguish the value before and after some
@@ -351,30 +357,27 @@ otherwise be equivalent with respect to compatibility can then be represented
351
357
as a single entity (this improves performance). As an example, Java uses erased
352
358
types for this purpose and a single equivalence class for all numeric types.
353
359
354
- One also needs to define
360
+ The type of a ` Node ` is given by the following predicate
361
+ ```
362
+ DataFlowType getNodeType(Node n)
363
+ ```
364
+ and every ` Node ` should have a type.
365
+
366
+ One also needs to define the the string representation of a ` DataFlowType ` :
355
367
```
356
- Type Node::getType()
357
- Type Node::getTypeBound()
358
- DataFlowType getErasedRepr(Type t)
359
368
string ppReprType(DataFlowType t)
360
369
```
361
- where ` Type ` can be a language-specific name for the types native to the
362
- language. Of the member predicate ` Node::getType() ` and ` Node::getTypeBound() `
363
- only the latter is used by the library, but the former is usually nice to have
364
- if it makes sense for the language. The ` getErasedRepr ` predicate acts as the
365
- translation between regular types and the type system used for pruning, the
366
- shared library will use ` getErasedRepr(node.getTypeBound()) ` to get the
367
- ` DataFlowType ` for a node. The ` ppReprType ` predicate is used for printing a
368
- type in the labels of ` PathNode ` s, this can be defined as ` none() ` if type
369
- pruning is not used.
370
+ The ` ppReprType ` predicate is used for printing a type in the labels of
371
+ ` PathNode ` s, this can be defined as ` none() ` if type pruning is not used.
370
372
371
373
Finally, one must define ` CastNode ` as a subclass of ` Node ` as those nodes
372
374
where types should be checked. Usually this will be things like explicit casts.
373
375
The shared library will also check types at ` ParameterNode ` s and ` OutNode ` s
374
376
without needing to include these in ` CastNode ` . It is semantically perfectly
375
377
valid to include all nodes in ` CastNode ` , but this can hurt performance as it
376
378
will reduce the opportunity for the library to compact several local steps into
377
- one.
379
+ one. It is also perfectly valid to leave ` CastNode ` as the empty set, and this
380
+ should be the default if type pruning is not used.
378
381
379
382
## Virtual dispatch with call context
380
383
@@ -424,9 +427,9 @@ that can be tracked. This is given by the following predicate:
424
427
``` ql
425
428
int accessPathLimit() { result = 5 }
426
429
```
427
- We have traditionally used 5 as a default value here, as we have yet to observe
428
- the need for this much field nesting . Changing this value has a direct impact
429
- on performance for large databases.
430
+ We have traditionally used 5 as a default value here, and real examples have
431
+ been observed to require at least this much . Changing this value has a direct
432
+ impact on performance for large databases.
430
433
431
434
### Hidden nodes
432
435
0 commit comments