Skip to content

Commit 4aac70d

Browse files
committed
Dataflow: update doc based on review.
1 parent 4dabbac commit 4aac70d

File tree

1 file changed

+35
-32
lines changed

1 file changed

+35
-32
lines changed

docs/ql-libraries/dataflow/dataflow.md

Lines changed: 35 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Using the shared data-flow library
22

3-
This document is aimed towards language maintainers and contain implementation
3+
This document is aimed towards language maintainers and contains implementation
44
details that should be mostly irrelevant to query writers.
55

66
## Overview
@@ -40,9 +40,10 @@ module DataFlow {
4040
The `DataFlowImpl.qll` and `DataFlowCommon.qll` files contain the library code
4141
that is shared across languages. These contain `Configuration`-specific and
4242
`Configuration`-independent code, respectively. This organization allows
43-
multiple copies of the library (for the use case when a query wants to use two
44-
instances of global data flow and the configuration of one depends on the
45-
results from the other). Using multiple copies just means duplicating
43+
multiple copies of the library to exist without duplicating the
44+
`Configuration`-independent predicates (for the use case when a query wants to
45+
use two instances of global data flow and the configuration of one depends on
46+
the results from the other). Using multiple copies just means duplicating
4647
`DataFlow.qll` and `DataFlowImpl.qll`, for example as:
4748

4849
```
@@ -52,9 +53,9 @@ dataflow/internal/DataFlowImpl2.qll
5253
dataflow/internal/DataFlowImpl3.qll
5354
```
5455

55-
The `DataFlowImplSpecific.qll` provides all the language-specific classes and
56-
predicates that the library needs as input and is the topic of the rest of this
57-
document.
56+
The file `DataFlowImplSpecific.qll` provides all the language-specific classes
57+
and predicates that the library needs as input and is the topic of the rest of
58+
this document.
5859

5960
This file must provide two modules named `Public` and `Private`, which the
6061
shared library code will import publicly and privately, respectively, thus
@@ -88,7 +89,9 @@ Recommendations:
8889
* Define `predicate localFlowStep(Node node1, Node node2)` as an alias of
8990
`simpleLocalFlowStep` and expose it publicly. The reason for this indirection
9091
is that it gives the option of exposing local flow augmented with field flow.
91-
See the C/C++ implementation, which makes use of this feature.
92+
See the C/C++ implementation, which makes use of this feature. Another use of
93+
this indirection is to hide synthesized local steps that are only relevant
94+
for global flow. See the C# implementation for an example of this.
9295
* Define `predicate localFlow(Node node1, Node node2) { localFlowStep*(node1, node2) }`.
9396
* Make the local flow step relation in `simpleLocalFlowStep` follow
9497
def-to-first-use and use-to-next-use steps for SSA variables. Def-use steps
@@ -141,8 +144,9 @@ must be provided.
141144
First, two types, `DataFlowCall` and `DataFlowCallable`, must be defined. These
142145
should be aliases for whatever language-specific class represents calls and
143146
callables (a "callable" is intended as a broad term covering functions,
144-
methods, constructors, lambdas, etc.). The call-graph should be defined as a
145-
predicate:
147+
methods, constructors, lambdas, etc.). It can also be useful to represent
148+
`DataFlowCall` as an IPA type if implicit calls need to be modelled. The
149+
call-graph should be defined as a predicate:
146150
```ql
147151
DataFlowCallable viableCallable(DataFlowCall c)
148152
```
@@ -182,7 +186,7 @@ corresponding `OutNode`s.
182186

183187
Flow through global variables are called jump-steps, since such flow steps
184188
essentially jump from one callable to another completely discarding call
185-
context.
189+
contexts.
186190

187191
Adding support for this type of flow is done with the following predicate:
188192
```ql
@@ -206,10 +210,12 @@ as described above.
206210

207211
The library supports tracking flow through field stores and reads. In order to
208212
support this, a class `Content` and two predicates
209-
`storeStep(Node node1, Content f, PostUpdateNode node2)` and
210-
`readStep(Node node1, Content f, Node node2)` must be defined. Besides this,
211-
certain nodes must have associated `PostUpdateNode`s. The node associated with
212-
a `PostUpdateNode` should be defined by `PostUpdateNode::getPreUpdateNode()`.
213+
`storeStep(Node node1, Content f, Node node2)` and
214+
`readStep(Node node1, Content f, Node node2)` must be defined. It generally
215+
makes sense for stores to target `PostUpdateNode`s, but this is not a strict
216+
requirement. Besides this, certain nodes must have associated
217+
`PostUpdateNode`s. The node associated with a `PostUpdateNode` should be
218+
defined by `PostUpdateNode::getPreUpdateNode()`.
213219

214220
`PostUpdateNode`s are generally used when we need two data-flow nodes for a
215221
single AST element in order to distinguish the value before and after some
@@ -351,30 +357,27 @@ otherwise be equivalent with respect to compatibility can then be represented
351357
as a single entity (this improves performance). As an example, Java uses erased
352358
types for this purpose and a single equivalence class for all numeric types.
353359

354-
One also needs to define
360+
The type of a `Node` is given by the following predicate
361+
```
362+
DataFlowType getNodeType(Node n)
363+
```
364+
and every `Node` should have a type.
365+
366+
One also needs to define the the string representation of a `DataFlowType`:
355367
```
356-
Type Node::getType()
357-
Type Node::getTypeBound()
358-
DataFlowType getErasedRepr(Type t)
359368
string ppReprType(DataFlowType t)
360369
```
361-
where `Type` can be a language-specific name for the types native to the
362-
language. Of the member predicate `Node::getType()` and `Node::getTypeBound()`
363-
only the latter is used by the library, but the former is usually nice to have
364-
if it makes sense for the language. The `getErasedRepr` predicate acts as the
365-
translation between regular types and the type system used for pruning, the
366-
shared library will use `getErasedRepr(node.getTypeBound())` to get the
367-
`DataFlowType` for a node. The `ppReprType` predicate is used for printing a
368-
type in the labels of `PathNode`s, this can be defined as `none()` if type
369-
pruning is not used.
370+
The `ppReprType` predicate is used for printing a type in the labels of
371+
`PathNode`s, this can be defined as `none()` if type pruning is not used.
370372

371373
Finally, one must define `CastNode` as a subclass of `Node` as those nodes
372374
where types should be checked. Usually this will be things like explicit casts.
373375
The shared library will also check types at `ParameterNode`s and `OutNode`s
374376
without needing to include these in `CastNode`. It is semantically perfectly
375377
valid to include all nodes in `CastNode`, but this can hurt performance as it
376378
will reduce the opportunity for the library to compact several local steps into
377-
one.
379+
one. It is also perfectly valid to leave `CastNode` as the empty set, and this
380+
should be the default if type pruning is not used.
378381

379382
## Virtual dispatch with call context
380383

@@ -424,9 +427,9 @@ that can be tracked. This is given by the following predicate:
424427
```ql
425428
int accessPathLimit() { result = 5 }
426429
```
427-
We have traditionally used 5 as a default value here, as we have yet to observe
428-
the need for this much field nesting. Changing this value has a direct impact
429-
on performance for large databases.
430+
We have traditionally used 5 as a default value here, and real examples have
431+
been observed to require at least this much. Changing this value has a direct
432+
impact on performance for large databases.
430433

431434
### Hidden nodes
432435

0 commit comments

Comments
 (0)