@@ -125,7 +125,7 @@ For further details about `PostUpdateNode` see [Field flow](#field-flow) below.
125
125
126
126
Nodes corresponding to expressions and parameters are the most common for users
127
127
to interact with so a couple of convenience predicates are generally included:
128
- ```
128
+ ``` ql
129
129
DataFlowExpr Node::asExpr()
130
130
Parameter Node::asParameter()
131
131
ExprNode exprNode(DataFlowExpr n)
@@ -266,10 +266,23 @@ as described above.
266
266
## Field flow
267
267
268
268
The library supports tracking flow through field stores and reads. In order to
269
- support this, a class ` Content ` and two predicates
270
- ` storeStep(Node node1, Content f, Node node2) ` and
271
- ` readStep(Node node1, Content f, Node node2) ` must be defined. It generally
272
- makes sense for stores to target ` PostUpdateNode ` s, but this is not a strict
269
+ support this, two classes ` ContentSet ` and ` Content ` , and two predicates
270
+ ` storeStep(Node node1, ContentSet f, Node node2) ` and
271
+ ` readStep(Node node1, ContentSet f, Node node2) ` , must be defined. The interaction
272
+ between ` ContentSet ` and ` Content ` is defined through
273
+
274
+ ``` ql
275
+ Content ContentSet::getAStoreContent();
276
+ Content ContentSet::getAReadContent();
277
+ ```
278
+
279
+ which means that a ` storeStep(n1, cs, n2) ` will be interpreted as storing into _ any_
280
+ of ` cs.getAStoreContent() ` , and dually that a ` readStep(n1, cs, n2) ` will be
281
+ interpreted as reading from _ any_ of ` cs.getAReadContent() ` . In most cases, there
282
+ will be a simple bijection between ` ContentSet ` and ` Content ` , but when modelling
283
+ for example flow through arrays it can be more involved (see [ Example 4] ( #example-4 ) ).
284
+
285
+ It generally makes sense for stores to target ` PostUpdateNode ` s, but this is not a strict
273
286
requirement. Besides this, certain nodes must have associated
274
287
` PostUpdateNode ` s. The node associated with a ` PostUpdateNode ` should be
275
288
defined by ` PostUpdateNode::getPreUpdateNode() ` .
@@ -294,7 +307,7 @@ through a couple of examples.
294
307
### Example 1
295
308
296
309
Consider the following setter and its call:
297
- ```
310
+ ``` java
298
311
setFoo(obj, x) {
299
312
sink1(obj. foo);
300
313
obj. foo = x;
@@ -335,12 +348,12 @@ call sites.
335
348
In the following two lines we would like flow from ` x ` to reach the
336
349
` PostUpdateNode ` of ` a ` through a sequence of two store steps, and this is
337
350
indeed handled automatically by the shared library.
338
- ```
351
+ ``` java
339
352
a. b. c = x;
340
353
a. getB(). c = x;
341
354
```
342
355
The only requirement for this to work is the existence of ` PostUpdateNode ` s.
343
- For a specified read step (in ` readStep(Node n1, Content f , Node n2) ` ) the
356
+ For a specified read step (in ` readStep(Node n1, ContentSet c , Node n2) ` ) the
344
357
shared library will generate a store step in the reverse direction between the
345
358
corresponding ` PostUpdateNode ` s. A similar store-through-reverse-read will be
346
359
generated for calls that can be summarized by the shared library as getters.
@@ -370,33 +383,132 @@ itself as this represents the value of the object after construction, that is
370
383
after the constructor has run. With this setup of ` ArgumentNode ` s and
371
384
` PostUpdateNode ` s we will achieve the desired flow from ` source ` to ` sink `
372
385
386
+ ### Example 4
387
+
388
+ Assume we want to track flow through arrays precisely:
389
+
390
+ ``` rb
391
+ a[0 ] = tainted
392
+ a[1 ] = not_tainted
393
+ sink(a[0 ]) # bad
394
+ sink(a[1 ]) # good
395
+ sink(a[unknown]) # bad; unknown may be 0
396
+
397
+ b[unknown] = tainted
398
+ sink(b[0 ]) # bad; unknown may be 0
399
+
400
+ c[unknown][0 ] = tainted
401
+ c[unknown][1 ] = not_tainted
402
+ sink(c[0 ][0 ]) # bad; unknown may be 0
403
+ sink(c[0 ][1 ]) # good
404
+ ```
405
+
406
+ This can be done by defining
407
+
408
+ ``` ql
409
+ newtype TContent =
410
+ TKnownArrayElementContent(int i) { i in [0 .. 10] } or
411
+ TUnknownArrayElementContent()
412
+
413
+ class Content extends TContent {
414
+ ...
415
+ }
416
+
417
+ newtype TContentSet =
418
+ TSingletonContent(Content c) or
419
+ TAnyArrayElementContent()
420
+
421
+ class ContentSet extends TContentSet {
422
+ Content getAStoreContent() {
423
+ this = TSingletonContent(result)
424
+ or
425
+ // for reverse stores
426
+ this = TAnyArrayElementContent() and
427
+ result = TUnknownArrayElementContent()
428
+ }
429
+
430
+ Content getAReadContent() {
431
+ this = TSingletonContent(result)
432
+ or
433
+ this = TAnyArrayElementContent() and
434
+ (result = TUnknownArrayElementContent() or result = TKnownArrayElementContent(_))
435
+ }
436
+ }
437
+ ```
438
+
439
+ and we will have the following store/read steps
440
+ ``` rb
441
+ # storeStep(tainted, TSingletonContent(TKnownArrayElementContent(0)), [post update] a)
442
+ a[0 ] = tainted
443
+
444
+ # storeStep(not_tainted, TSingletonContent(TKnownArrayElementContent(1)), [post update] a)
445
+ a[1 ] = not_tainted
446
+
447
+ # readStep(a, TSingletonContent(TKnownArrayElementContent(0)), a[0])
448
+ # readStep(a, TSingletonContent(TUnknownArrayElementContent()), a[0])
449
+ sink(a[0 ]) # bad
450
+
451
+ # readStep(a, TSingletonContent(TKnownArrayElementContent(1)), a[1])
452
+ # readStep(a, TSingletonContent(TUnknownArrayElementContent()), a[1])
453
+ sink(a[1 ]) # good
454
+
455
+ # readStep(a, TAnyArrayElementContent(), a[unknown])
456
+ sink(a[unknown]) # bad; unknown may be 0
457
+
458
+ # storeStep(tainted, TSingletonContent(TUnknownArrayElementContent()), [post update] b)
459
+ b[unknown] = tainted
460
+
461
+ # readStep(b, TSingletonContent(TKnownArrayElementContent(0)), b[0])
462
+ # readStep(b, TSingletonContent(TUnknownArrayElementContent()), b[0])
463
+ sink(b[0 ]) # bad; unknown may be 0
464
+
465
+ # storeStep(tainted, TSingletonContent(TKnownArrayElementContent(0)), [post update] c[unknown])
466
+ # storeStep(not_tainted, TSingletonContent(TKnownArrayElementContent(1)), [post update] c[unknown])
467
+ # readStep(c, TAnyArrayElementContent(), c[unknown])
468
+ # storeStep([post update] c[unknown], TAnyArrayElementContent(), [post update] c) # auto-generated reverse store (see Example 2)
469
+ c[unknown][0 ] = tainted
470
+ c[unknown][1 ] = not_tainted
471
+
472
+
473
+ # readStep(c[0], TSingletonContent(TKnownArrayElementContent(0)), c[0][0])
474
+ # readStep(c[0], TSingletonContent(TUnknownArrayElementContent()), c[0][0])
475
+ # readStep(c[0], TSingletonContent(TKnownArrayElementContent(1)), c[0][1])
476
+ # readStep(c[0], TSingletonContent(TUnknownArrayElementContent()), c[0][1])
477
+ # readStep(c, TSingletonContent(TKnownArrayElementContent(0)), c[0])
478
+ # readStep(c, TSingletonContent(TUnknownArrayElementContent()), c[0])
479
+ sink(c[0 ][0 ]) # bad; unknown may be 0
480
+ sink(c[0 ][1 ]) # good
481
+ ```
482
+
373
483
### Field flow barriers
374
484
375
485
Consider this field flow example:
376
- ```
486
+ ``` java
377
487
obj. f = source;
378
488
obj. f = safeValue;
379
489
sink(obj. f);
380
490
```
381
491
or the similar case when field flow is used to model collection content:
382
- ```
492
+ ``` java
383
493
obj. add(source);
384
494
obj. clear();
385
495
sink(obj. get(key));
386
496
```
387
497
Clearing a field or content like this should act as a barrier, and this can be
388
- achieved by marking the relevant ` Node, Content ` pair as a clear operation in
498
+ achieved by marking the relevant ` Node, ContentSet ` pair as a clear operation in
389
499
the ` clearsContent ` predicate. A reasonable default implementation for fields
390
500
looks like this:
391
501
``` ql
392
- predicate clearsContent(Node n, Content c) {
502
+ predicate clearsContent(Node n, ContentSet c) {
393
503
n = any(PostUpdateNode pun | storeStep(_, c, pun)).getPreUpdateNode()
394
504
}
395
505
```
396
506
However, this relies on the local step relation using the smallest possible
397
507
use-use steps. If local flow is implemented using def-use steps, then
398
508
` clearsContent ` might not be easy to use.
399
509
510
+ Note that ` clearsContent(n, cs) ` is interpreted using ` cs.getAReadContent() ` .
511
+
400
512
## Type pruning
401
513
402
514
The library supports pruning paths when a sequence of value-preserving steps
@@ -415,13 +527,13 @@ as a single entity (this improves performance). As an example, Java uses erased
415
527
types for this purpose and a single equivalence class for all numeric types.
416
528
417
529
The type of a ` Node ` is given by the following predicate
418
- ```
530
+ ``` ql
419
531
DataFlowType getNodeType(Node n)
420
532
```
421
533
and every ` Node ` should have a type.
422
534
423
535
One also needs to define the string representation of a ` DataFlowType ` :
424
- ```
536
+ ``` ql
425
537
string ppReprType(DataFlowType t)
426
538
```
427
539
The ` ppReprType ` predicate is used for printing a type in the labels of
@@ -499,7 +611,7 @@ predicate nodeIsHidden(Node n)
499
611
### Unreachable nodes
500
612
501
613
Consider:
502
- ```
614
+ ``` java
503
615
foo(source1, false );
504
616
foo(source2, true );
505
617
0 commit comments