Skip to content

Commit c38cbe8

Browse files
authored
Merge pull request github#13737 from asgerf/dynamic/fuzzy-models
Dynamic: add Fuzzy token
2 parents 89aa86a + da3eb28 commit c38cbe8

File tree

17 files changed

+316
-17
lines changed

17 files changed

+316
-17
lines changed

docs/codeql/codeql-language-guides/customizing-library-models-for-javascript.rst

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -220,6 +220,59 @@ For example, the **mysql** model that is included with the CodeQL JS analysis in
220220
221221
- ["mysql.Connection", "mysql", "Member[createConnection].ReturnValue"]
222222
223+
Example: Using fuzzy models to simplify modeling
224+
------------------------------------------------
225+
226+
In this example, we'll show how to add the following SQL injection sink using a "fuzzy" model:
227+
228+
.. code-block:: ts
229+
230+
import * as mysql from 'mysql';
231+
const pool = mysql.createPool({...});
232+
pool.getConnection((err, conn) => {
233+
conn.query(q, (err, rows) => {...}); // <-- add 'q' as a SQL injection sink
234+
});
235+
236+
We can recognize this using a fuzzy model, as shown in the following extension:
237+
238+
.. code-block:: yaml
239+
240+
extensions:
241+
- addsTo:
242+
pack: codeql/javascript-all
243+
extensible: sinkModel
244+
data:
245+
- ["mysql", "Fuzzy.Member[query].Argument[0]", "sql-injection"]
246+
247+
- The first column, **"mysql"**, begins the search at places where the `mysql` package is imported.
248+
- **Fuzzy** selects all objects that appear to originate from the `mysql` package, such as the `pool`, `conn`, `err`, and `rows` objects.
249+
- **Member[query]** selects the **query** member from any of those objects. In this case, the only such member is `conn.query`.
250+
In principle, this would also find expressions such as `pool.query` and `err.query`, but in practice such expressions
251+
are not likely to occur, because the `pool` and `err` objects do not have a member named `query`.
252+
- **Argument[0]** selects the first argument of a call to the selected member, that is, the `q` argument to `conn.query`.
253+
- **sql-injection** indicates that this is considered as a sink for the SQL injection query.
254+
255+
For reference, a more detailed model might look like this, as described in the preceding examples:
256+
257+
.. code-block:: yaml
258+
259+
extensions:
260+
- addsTo:
261+
pack: codeql/javascript-all
262+
extensible: sinkModel
263+
data:
264+
- ["mysql.Connection", "Member[query].Argument[0]", "sql-injection"]
265+
266+
- addsTo:
267+
pack: codeql/javascript-all
268+
extensible: typeModel
269+
data:
270+
- ["mysql.Pool", "mysql", "Member[createPool].ReturnValue"]
271+
- ["mysql.Connection", "mysql.Pool", "Member[getConnection].Argument[0].Parameter[1]"]
272+
273+
The model using the **Fuzzy** component is simpler, at the cost of being approximate.
274+
This technique is useful when modeling a large or complex library, where it is difficult to write a detailed model.
275+
223276
Example: Adding flow through 'decodeURIComponent'
224277
-------------------------------------------------
225278

@@ -431,6 +484,9 @@ The following components are supported:
431484
- **MapValue** selects a value of a map object.
432485
- **Awaited** selects the value of a promise.
433486
- **Instance** selects instances of a class.
487+
- **Fuzzy** selects all values that are derived from the current value through a combination of the other operations described in this list.
488+
For example, this can be used to find all values that appear to originate from a particular package. This can be useful for finding method calls
489+
from a known package, but where the receiver type is not known or is difficult to model.
434490

435491
The following components are called "call site filters". They select a subset of the previously-selected calls, if the call fits certain criteria:
436492

javascript/ql/lib/semmle/javascript/frameworks/Vue.qll

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,7 @@ module Vue {
2020
private class VueExportEntryPoint extends API::EntryPoint {
2121
VueExportEntryPoint() { this = "VueExportEntryPoint" }
2222

23-
override DataFlow::Node getASink() {
24-
result = any(SingleFileComponent c).getModule().getDefaultOrBulkExport()
25-
}
23+
override DataFlow::Node getASink() { result = getModuleFromVueFile(_).getDefaultOrBulkExport() }
2624
}
2725

2826
/**
@@ -455,6 +453,13 @@ module Vue {
455453
}
456454
}
457455

456+
private Module getModuleFromVueFile(VueFile file) {
457+
exists(HTML::ScriptElement elem |
458+
xmlElements(elem, _, _, _, file) and // Avoid materializing all of Locatable.getFile()
459+
result.getTopLevel() = elem.getScript()
460+
)
461+
}
462+
458463
/**
459464
* A single file Vue component in a `.vue` file.
460465
*/
@@ -482,12 +487,7 @@ module Vue {
482487
}
483488

484489
/** Gets the module defined by the `script` tag in this .vue file, if any. */
485-
Module getModule() {
486-
exists(HTML::ScriptElement elem |
487-
xmlElements(elem, _, _, _, file) and // Avoid materializing all of Locatable.getFile()
488-
result.getTopLevel() = elem.getScript()
489-
)
490-
}
490+
Module getModule() { result = getModuleFromVueFile(file) }
491491

492492
override API::Node getComponentRef() {
493493
// There is no explicit `new Vue()` call in .vue files, so instead get all the imports

javascript/ql/lib/semmle/javascript/frameworks/data/internal/ApiGraphModels.qll

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -454,6 +454,14 @@ private API::Node getNodeFromPath(string type, AccessPath path, int n) {
454454
or
455455
// Apply a type step
456456
typeStep(getNodeFromPath(type, path, n), result)
457+
or
458+
// Apply a fuzzy step (without advancing 'n')
459+
path.getToken(n).getName() = "Fuzzy" and
460+
result = Specific::getAFuzzySuccessor(getNodeFromPath(type, path, n))
461+
or
462+
// Skip a fuzzy step (advance 'n' without changing the current node)
463+
path.getToken(n - 1).getName() = "Fuzzy" and
464+
result = getNodeFromPath(type, path, n - 1)
457465
}
458466

459467
/**
@@ -500,6 +508,14 @@ private API::Node getNodeFromSubPath(API::Node base, AccessPath subPath, int n)
500508
// will themselves find by following type-steps.
501509
n > 0 and
502510
n < subPath.getNumToken()
511+
or
512+
// Apply a fuzzy step (without advancing 'n')
513+
subPath.getToken(n).getName() = "Fuzzy" and
514+
result = Specific::getAFuzzySuccessor(getNodeFromSubPath(base, subPath, n))
515+
or
516+
// Skip a fuzzy step (advance 'n' without changing the current node)
517+
subPath.getToken(n - 1).getName() = "Fuzzy" and
518+
result = getNodeFromSubPath(base, subPath, n - 1)
503519
}
504520

505521
/**
@@ -561,7 +577,7 @@ private Specific::InvokeNode getInvocationFromPath(string type, AccessPath path)
561577
*/
562578
bindingset[name]
563579
private predicate isValidTokenNameInIdentifyingAccessPath(string name) {
564-
name = ["Argument", "Parameter", "ReturnValue", "WithArity", "TypeVar"]
580+
name = ["Argument", "Parameter", "ReturnValue", "WithArity", "TypeVar", "Fuzzy"]
565581
or
566582
Specific::isExtraValidTokenNameInIdentifyingAccessPath(name)
567583
}
@@ -572,7 +588,7 @@ private predicate isValidTokenNameInIdentifyingAccessPath(string name) {
572588
*/
573589
bindingset[name]
574590
private predicate isValidNoArgumentTokenInIdentifyingAccessPath(string name) {
575-
name = "ReturnValue"
591+
name = ["ReturnValue", "Fuzzy"]
576592
or
577593
Specific::isExtraValidNoArgumentTokenInIdentifyingAccessPath(name)
578594
}

javascript/ql/lib/semmle/javascript/frameworks/data/internal/ApiGraphModelsSpecific.qll

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,43 @@ API::Node getExtraSuccessorFromInvoke(API::InvokeNode node, AccessPathToken toke
192192
result.asSink() = node.(DataFlow::CallNode).getReceiver()
193193
}
194194

195+
/**
196+
* Holds if `name` is the name of a built-in method on Object, Array, or String.
197+
*/
198+
private predicate isCommonBuiltinMethodName(string name) {
199+
exists(JS::ExternalInstanceMemberDecl member |
200+
member.getBaseName() in ["Object", "Array", "String"] and
201+
name = member.getName()
202+
)
203+
}
204+
205+
/**
206+
* Holds if fuzzy evaluation should not traverse through `call`.
207+
*/
208+
private predicate blockFuzzyCall(DataFlow::CallNode call) {
209+
isCommonBuiltinMethodName(call.getCalleeName())
210+
}
211+
212+
pragma[inline]
213+
API::Node getAFuzzySuccessor(API::Node node) {
214+
result = node.getAMember() and
215+
// Block traversal into calls to built-ins like .toString() and .substring()
216+
// Since there is no API node representing the call itself, block flow into the callee node.
217+
not exists(DataFlow::CallNode call |
218+
node.asSource() = call.getCalleeNode() and
219+
blockFuzzyCall(call)
220+
)
221+
or
222+
result = node.getAParameter()
223+
or
224+
result = node.getReturn()
225+
or
226+
result = node.getPromised()
227+
or
228+
// include 'this' parameters but not 'this' arguments
229+
result = node.getReceiver() and result.asSource() instanceof DataFlow::ThisNode
230+
}
231+
195232
/**
196233
* Holds if `invoke` matches the JS-specific call site filter in `token`.
197234
*/

javascript/ql/test/library-tests/frameworks/data/test.expected

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,14 @@ taintFlow
6666
| test.js:231:59:231:66 | source() | test.js:231:59:231:66 | source() |
6767
| test.js:232:59:232:66 | source() | test.js:232:59:232:66 | source() |
6868
| test.js:233:59:233:66 | source() | test.js:233:59:233:66 | source() |
69+
| test.js:237:21:237:28 | source() | test.js:237:21:237:28 | source() |
70+
| test.js:238:25:238:32 | source() | test.js:238:25:238:32 | source() |
71+
| test.js:239:27:239:34 | source() | test.js:239:27:239:34 | source() |
72+
| test.js:241:17:241:24 | source() | test.js:241:17:241:24 | source() |
73+
| test.js:244:33:244:40 | source() | test.js:244:33:244:40 | source() |
74+
| test.js:249:28:249:35 | source() | test.js:249:28:249:35 | source() |
75+
| test.js:252:15:252:22 | source() | test.js:252:15:252:22 | source() |
76+
| test.js:254:32:254:39 | source() | test.js:254:32:254:39 | source() |
6977
isSink
7078
| test.js:54:18:54:25 | source() | test-sink |
7179
| test.js:55:22:55:29 | source() | test-sink |
@@ -136,6 +144,14 @@ isSink
136144
| test.js:231:59:231:66 | source() | test-sink |
137145
| test.js:232:59:232:66 | source() | test-sink |
138146
| test.js:233:59:233:66 | source() | test-sink |
147+
| test.js:237:21:237:28 | source() | test-sink |
148+
| test.js:238:25:238:32 | source() | test-sink |
149+
| test.js:239:27:239:34 | source() | test-sink |
150+
| test.js:241:17:241:24 | source() | test-sink |
151+
| test.js:244:33:244:40 | source() | test-sink |
152+
| test.js:249:28:249:35 | source() | test-sink |
153+
| test.js:252:15:252:22 | source() | test-sink |
154+
| test.js:254:32:254:39 | source() | test-sink |
139155
syntaxErrors
140156
| Member[foo |
141157
| Member[foo] .Member[bar] |

javascript/ql/test/library-tests/frameworks/data/test.js

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -232,3 +232,27 @@ function typeVars() {
232232
testlib.typevar.left.x.getThis().getThis().right.mySink(source()); // NOT OK
233233
testlib.typevar.left.x.right.getThis().getThis().mySink(source()); // NOT OK
234234
}
235+
236+
function fuzzy() {
237+
testlib.fuzzyCall(source()); // NOT OK
238+
testlib.foo.fuzzyCall(source()); // NOT OK
239+
testlib.foo().fuzzyCall(source()); // NOT OK
240+
new testlib.Blah().foo.bar(async p => {
241+
p.fuzzyCall(source()); // NOT OK
242+
p.otherCall(source()); // OK
243+
p.fuzzyCall().laterMethod(source()); // OK
244+
(await p.promise).fuzzyCall(source()); // NOT OK
245+
});
246+
247+
const wrapped = _.partial(testlib.foo, [123]);
248+
wrapped().fuzzyCall(source()); // NOT OK [INCONSISTENCY] - API graphs do not currently propagate return values through partial invocation
249+
wrapped(p => p.fuzzyCall(source())); // NOT OK
250+
251+
const wrappedSink = _.partial(testlib.fuzzyCall);
252+
wrappedSink(source()); // NOT OK
253+
254+
_.partial(testlib.fuzzyCall, source()); // NOT OK
255+
256+
fuzzyCall(source()); // OK - does not come from 'testlib'
257+
require('blah').fuzzyCall(source()); // OK - does not come from 'testlib'
258+
}

javascript/ql/test/library-tests/frameworks/data/test.ql

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ class Sinks extends ModelInput::SinkModelCsv {
5454
"testlib;Member[typevar].TypeVar[ABC].Member[mySink].Argument[0];test-sink",
5555
"testlib;Member[typevar].TypeVar[ABC].TypeVar[ABC].Member[mySink].Argument[1];test-sink",
5656
"testlib;Member[typevar].TypeVar[LeftRight].Member[mySink].Argument[0];test-sink",
57+
"testlib;Fuzzy.Member[fuzzyCall].Argument[0];test-sink"
5758
]
5859
}
5960
}

python/ql/lib/semmle/python/frameworks/data/internal/ApiGraphModels.qll

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -454,6 +454,14 @@ private API::Node getNodeFromPath(string type, AccessPath path, int n) {
454454
or
455455
// Apply a type step
456456
typeStep(getNodeFromPath(type, path, n), result)
457+
or
458+
// Apply a fuzzy step (without advancing 'n')
459+
path.getToken(n).getName() = "Fuzzy" and
460+
result = Specific::getAFuzzySuccessor(getNodeFromPath(type, path, n))
461+
or
462+
// Skip a fuzzy step (advance 'n' without changing the current node)
463+
path.getToken(n - 1).getName() = "Fuzzy" and
464+
result = getNodeFromPath(type, path, n - 1)
457465
}
458466

459467
/**
@@ -500,6 +508,14 @@ private API::Node getNodeFromSubPath(API::Node base, AccessPath subPath, int n)
500508
// will themselves find by following type-steps.
501509
n > 0 and
502510
n < subPath.getNumToken()
511+
or
512+
// Apply a fuzzy step (without advancing 'n')
513+
subPath.getToken(n).getName() = "Fuzzy" and
514+
result = Specific::getAFuzzySuccessor(getNodeFromSubPath(base, subPath, n))
515+
or
516+
// Skip a fuzzy step (advance 'n' without changing the current node)
517+
subPath.getToken(n - 1).getName() = "Fuzzy" and
518+
result = getNodeFromSubPath(base, subPath, n - 1)
503519
}
504520

505521
/**
@@ -561,7 +577,7 @@ private Specific::InvokeNode getInvocationFromPath(string type, AccessPath path)
561577
*/
562578
bindingset[name]
563579
private predicate isValidTokenNameInIdentifyingAccessPath(string name) {
564-
name = ["Argument", "Parameter", "ReturnValue", "WithArity", "TypeVar"]
580+
name = ["Argument", "Parameter", "ReturnValue", "WithArity", "TypeVar", "Fuzzy"]
565581
or
566582
Specific::isExtraValidTokenNameInIdentifyingAccessPath(name)
567583
}
@@ -572,7 +588,7 @@ private predicate isValidTokenNameInIdentifyingAccessPath(string name) {
572588
*/
573589
bindingset[name]
574590
private predicate isValidNoArgumentTokenInIdentifyingAccessPath(string name) {
575-
name = "ReturnValue"
591+
name = ["ReturnValue", "Fuzzy"]
576592
or
577593
Specific::isExtraValidNoArgumentTokenInIdentifyingAccessPath(name)
578594
}

python/ql/lib/semmle/python/frameworks/data/internal/ApiGraphModelsSpecific.qll

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,23 @@ API::Node getExtraSuccessorFromInvoke(API::CallNode node, AccessPathToken token)
108108
)
109109
}
110110

111+
pragma[inline]
112+
API::Node getAFuzzySuccessor(API::Node node) {
113+
result = node.getAMember()
114+
or
115+
result = node.getParameter(_)
116+
or
117+
result = node.getKeywordParameter(_)
118+
or
119+
result = node.getReturn()
120+
or
121+
result = node.getASubscript()
122+
or
123+
result = node.getAwaited()
124+
or
125+
result = node.getASubclass()
126+
}
127+
111128
/**
112129
* Holds if `invoke` matches the PY-specific call site filter in `token`.
113130
*/

python/ql/test/library-tests/frameworks/data/test.expected

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,11 @@ taintFlow
1111
| test.py:83:50:83:60 | ControlFlowNode for getSource() | test.py:83:8:83:61 | ControlFlowNode for Attribute() |
1212
| test.py:86:49:86:59 | ControlFlowNode for getSource() | test.py:86:8:86:60 | ControlFlowNode for Attribute() |
1313
| test.py:87:56:87:66 | ControlFlowNode for getSource() | test.py:87:8:87:67 | ControlFlowNode for Attribute() |
14+
| test.py:114:19:114:29 | ControlFlowNode for getSource() | test.py:114:19:114:29 | ControlFlowNode for getSource() |
15+
| test.py:115:20:115:30 | ControlFlowNode for getSource() | test.py:115:20:115:30 | ControlFlowNode for getSource() |
16+
| test.py:116:31:116:41 | ControlFlowNode for getSource() | test.py:116:31:116:41 | ControlFlowNode for getSource() |
17+
| test.py:117:31:117:41 | ControlFlowNode for getSource() | test.py:117:31:117:41 | ControlFlowNode for getSource() |
18+
| test.py:118:35:118:45 | ControlFlowNode for getSource() | test.py:118:35:118:45 | ControlFlowNode for getSource() |
1419
isSink
1520
| test.py:4:8:4:8 | ControlFlowNode for x | test-sink |
1621
| test.py:7:17:7:17 | ControlFlowNode for x | test-sink |
@@ -50,6 +55,11 @@ isSink
5055
| test.py:91:21:91:23 | ControlFlowNode for one | test-sink |
5156
| test.py:91:30:91:32 | ControlFlowNode for two | test-sink |
5257
| test.py:98:6:98:9 | ControlFlowNode for baz2 | test-sink |
58+
| test.py:114:19:114:29 | ControlFlowNode for getSource() | test-sink |
59+
| test.py:115:20:115:30 | ControlFlowNode for getSource() | test-sink |
60+
| test.py:116:31:116:41 | ControlFlowNode for getSource() | test-sink |
61+
| test.py:117:31:117:41 | ControlFlowNode for getSource() | test-sink |
62+
| test.py:118:35:118:45 | ControlFlowNode for getSource() | test-sink |
5363
isSource
5464
| test.py:3:5:3:15 | ControlFlowNode for getSource() | test-source |
5565
| test.py:9:8:9:14 | ControlFlowNode for alias() | test-source |
@@ -89,6 +99,12 @@ isSource
8999
| test.py:104:32:104:37 | ControlFlowNode for param2 | test-source |
90100
| test.py:107:24:107:28 | ControlFlowNode for name1 | test-source |
91101
| test.py:107:31:107:35 | ControlFlowNode for name2 | test-source |
102+
| test.py:114:19:114:29 | ControlFlowNode for getSource() | test-source |
103+
| test.py:115:20:115:30 | ControlFlowNode for getSource() | test-source |
104+
| test.py:116:31:116:41 | ControlFlowNode for getSource() | test-source |
105+
| test.py:117:31:117:41 | ControlFlowNode for getSource() | test-source |
106+
| test.py:118:35:118:45 | ControlFlowNode for getSource() | test-source |
107+
| test.py:119:20:119:30 | ControlFlowNode for getSource() | test-source |
92108
syntaxErrors
93109
| Member[foo |
94110
| Member[foo] .Member[bar] |

0 commit comments

Comments
 (0)