Skip to content

Commit 7c32efc

Browse files
authored
Merge pull request github#17203 from RasmusWL/threat-models
Python: Add support for threat models
2 parents 381ea93 + 431a1af commit 7c32efc

File tree

48 files changed

+473
-74
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+473
-74
lines changed

docs/codeql/codeql-language-guides/customizing-library-models-for-python.rst

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -427,7 +427,7 @@ Kinds
427427
Source kinds
428428
~~~~~~~~~~~~
429429

430-
- **remote**: A generic source of remote flow. Most taint-tracking queries will use such a source. Currently this is the only supported source kind.
430+
See documentation below for :ref:`Threat models <threat-models-python>`.
431431

432432
Sink kinds
433433
~~~~~~~~~~
@@ -449,3 +449,10 @@ Summary kinds
449449

450450
- **taint**: A summary that propagates taint. This means the output is not necessarily equal to the input, but it was derived from the input in an unrestrictive way. An attacker who controls the input will have significant control over the output as well.
451451
- **value**: A summary that preserves the value of the input or creates a copy of the input such that all of its object properties are preserved.
452+
453+
.. _threat-models-python:
454+
455+
Threat models
456+
-------------
457+
458+
.. include:: ../reusables/threat-model-description.rst

docs/codeql/reusables/beta-note-threat-models.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,4 @@
22

33
Note
44

5-
Threat models are currently in beta and subject to change. During the beta, threat models are supported only by Java and C# analysis.
5+
Threat models are currently in beta and subject to change. During the beta, threat models are supported only by Java, C# and Python analysis.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
category: feature
3+
---
4+
* Added support for custom threat-models, which can be used in most of our taint-tracking queries, see our [documentation](https://docs.github.com/en/code-security/code-scanning/creating-an-advanced-setup-for-code-scanning/customizing-your-advanced-setup-for-code-scanning#extending-codeql-coverage-with-threat-models) for more details.
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
extensions:
2+
- addsTo:
3+
pack: codeql/threat-models
4+
extensible: threatModelConfiguration
5+
data:
6+
# Since responses are enabled by default in the shared threat-models configuration,
7+
# we need to disable it here to keep existing behavior for the python analysis.
8+
- ["response", false, -2147483647]

python/ql/lib/qlpack.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,12 @@ dependencies:
99
codeql/dataflow: ${workspace}
1010
codeql/mad: ${workspace}
1111
codeql/regex: ${workspace}
12+
codeql/threat-models: ${workspace}
1213
codeql/tutorial: ${workspace}
1314
codeql/util: ${workspace}
1415
codeql/xml: ${workspace}
1516
codeql/yaml: ${workspace}
1617
dataExtensions:
1718
- semmle/python/frameworks/**/*.model.yml
19+
- ext/*.model.yml
1820
warnOnImplicitThis: true

python/ql/lib/semmle/python/Concepts.qll

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,62 @@ private import semmle.python.dataflow.new.RemoteFlowSources
1010
private import semmle.python.dataflow.new.TaintTracking
1111
private import semmle.python.Frameworks
1212
private import semmle.python.security.internal.EncryptionKeySizes
13+
private import codeql.threatmodels.ThreatModels
14+
15+
/**
16+
* A data flow source, for a specific threat-model.
17+
*
18+
* Extend this class to refine existing API models. If you want to model new APIs,
19+
* extend `ThreatModelSource::Range` instead.
20+
*/
21+
class ThreatModelSource extends DataFlow::Node instanceof ThreatModelSource::Range {
22+
/**
23+
* Gets a string that represents the source kind with respect to threat modeling.
24+
*
25+
* See
26+
* - https://github.com/github/codeql/blob/main/docs/codeql/reusables/threat-model-description.rst
27+
* - https://github.com/github/codeql/blob/main/shared/threat-models/ext/threat-model-grouping.model.yml
28+
*/
29+
string getThreatModel() { result = super.getThreatModel() }
30+
31+
/** Gets a string that describes the type of this threat-model source. */
32+
string getSourceType() { result = super.getSourceType() }
33+
}
34+
35+
/** Provides a class for modeling new sources for specific threat-models. */
36+
module ThreatModelSource {
37+
/**
38+
* A data flow source, for a specific threat-model.
39+
*
40+
* Extend this class to model new APIs. If you want to refine existing API models,
41+
* extend `ThreatModelSource` instead.
42+
*/
43+
abstract class Range extends DataFlow::Node {
44+
/**
45+
* Gets a string that represents the source kind with respect to threat modeling.
46+
*
47+
* See
48+
* - https://github.com/github/codeql/blob/main/docs/codeql/reusables/threat-model-description.rst
49+
* - https://github.com/github/codeql/blob/main/shared/threat-models/ext/threat-model-grouping.model.yml
50+
*/
51+
abstract string getThreatModel();
52+
53+
/** Gets a string that describes the type of this threat-model source. */
54+
abstract string getSourceType();
55+
}
56+
}
57+
58+
/**
59+
* A data flow source that is enabled in the current threat model configuration.
60+
*/
61+
class ActiveThreatModelSource extends ThreatModelSource {
62+
ActiveThreatModelSource() {
63+
exists(string kind |
64+
currentThreatModel(kind) and
65+
this.getThreatModel() = kind
66+
)
67+
}
68+
}
1369

1470
/**
1571
* A data-flow node that executes an operating system command,

python/ql/lib/semmle/python/dataflow/new/RemoteFlowSources.qll

Lines changed: 3 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,7 @@ private import semmle.python.Concepts
1515
* Extend this class to refine existing API models. If you want to model new APIs,
1616
* extend `RemoteFlowSource::Range` instead.
1717
*/
18-
class RemoteFlowSource extends DataFlow::Node instanceof RemoteFlowSource::Range {
19-
/** Gets a string that describes the type of this remote flow source. */
20-
string getSourceType() { result = super.getSourceType() }
21-
}
18+
class RemoteFlowSource extends ThreatModelSource instanceof RemoteFlowSource::Range { }
2219

2320
/** Provides a class for modeling new sources of remote user input. */
2421
module RemoteFlowSource {
@@ -28,8 +25,7 @@ module RemoteFlowSource {
2825
* Extend this class to model new APIs. If you want to refine existing API models,
2926
* extend `RemoteFlowSource` instead.
3027
*/
31-
abstract class Range extends DataFlow::Node {
32-
/** Gets a string that describes the type of this remote flow source. */
33-
abstract string getSourceType();
28+
abstract class Range extends ThreatModelSource::Range {
29+
override string getThreatModel() { result = "remote" }
3430
}
3531
}

python/ql/lib/semmle/python/frameworks/PEP249.qll

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,24 @@ module PEP249 {
8181
}
8282
}
8383

84+
/** A call to a method that fetches rows from a previous execution. */
85+
private class FetchMethodCall extends ThreatModelSource::Range, API::CallNode {
86+
FetchMethodCall() {
87+
exists(API::Node start |
88+
start instanceof DatabaseCursor or start instanceof DatabaseConnection
89+
|
90+
// note: since we can't currently provide accesspaths for sources, these are all
91+
// lumped together, although clearly the fetchmany/fetchall returns a
92+
// list/iterable with rows.
93+
this = start.getMember(["fetchone", "fetchmany", "fetchall"]).getACall()
94+
)
95+
}
96+
97+
override string getThreatModel() { result = "database" }
98+
99+
override string getSourceType() { result = "cursor.fetch*()" }
100+
}
101+
84102
// ---------------------------------------------------------------------------
85103
// asyncio implementations
86104
// ---------------------------------------------------------------------------
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
extensions:
2+
- addsTo:
3+
pack: codeql/python-all
4+
extensible: sourceModel
5+
data:
6+
- ['os', 'Member[getenv].ReturnValue', 'environment']
7+
- ['os', 'Member[getenvb].ReturnValue', 'environment']
8+
- ['os', 'Member[environ]', 'environment']
9+
- ['os', 'Member[environb]', 'environment']
10+
- ['posix', 'Member[environ]', 'environment']
11+
12+
- ['sys', 'Member[argv]', 'commandargs']
13+
- ['sys', 'Member[orig_argv]', 'commandargs']
14+
15+
- ['sys', 'Member[stdin]', 'stdin']
16+
- ['builtins', 'Member[input].ReturnValue', 'stdin']
17+
- ['builtins', 'Member[raw_input].ReturnValue', 'stdin'] # python 2 only
18+
19+
20+
# if no argument is given, the default is to use sys.argv[1:]
21+
- ['argparse.ArgumentParser', 'Member[parse_args,parse_known_args].WithArity[0].ReturnValue', 'commandargs']
22+
23+
- ['os', 'Member[read].ReturnValue', 'file']
24+
- addsTo:
25+
pack: codeql/python-all
26+
extensible: summaryModel
27+
data:
28+
- ['argparse.ArgumentParser', 'Member[parse_args,parse_known_args]', 'Argument[0,args:]', 'ReturnValue', 'taint']
29+
# note: taint of attribute lookups is handled in QL

python/ql/lib/semmle/python/frameworks/Stdlib.qll

Lines changed: 46 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -338,7 +338,7 @@ module StdlibPrivate {
338338
* Modeling of path related functions in the `os` module.
339339
* Wrapped in QL module to make it easy to fold/unfold.
340340
*/
341-
private module OsFileSystemAccessModeling {
341+
module OsFileSystemAccessModeling {
342342
/**
343343
* A call to the `os.fsencode` function.
344344
*
@@ -395,7 +395,7 @@ module StdlibPrivate {
395395
*
396396
* See https://docs.python.org/3/library/os.html#os.open
397397
*/
398-
private class OsOpenCall extends FileSystemAccess::Range, DataFlow::CallCfgNode {
398+
class OsOpenCall extends FileSystemAccess::Range, DataFlow::CallCfgNode {
399399
OsOpenCall() { this = os().getMember("open").getACall() }
400400

401401
override DataFlow::Node getAPathArgument() {
@@ -1499,13 +1499,22 @@ module StdlibPrivate {
14991499
* See https://docs.python.org/3/library/functions.html#open
15001500
*/
15011501
private class OpenCall extends FileSystemAccess::Range, Stdlib::FileLikeObject::InstanceSource,
1502-
DataFlow::CallCfgNode
1502+
ThreatModelSource::Range, DataFlow::CallCfgNode
15031503
{
1504-
OpenCall() { this = getOpenFunctionRef().getACall() }
1504+
OpenCall() {
1505+
this = getOpenFunctionRef().getACall() and
1506+
// when analyzing stdlib code for os.py we wrongly assume that `os.open` is an
1507+
// alias of the builtins `open` function
1508+
not this instanceof OsFileSystemAccessModeling::OsOpenCall
1509+
}
15051510

15061511
override DataFlow::Node getAPathArgument() {
15071512
result in [this.getArg(0), this.getArgByName("file")]
15081513
}
1514+
1515+
override string getThreatModel() { result = "file" }
1516+
1517+
override string getSourceType() { result = "open()" }
15091518
}
15101519

15111520
/**
@@ -4989,6 +4998,39 @@ module StdlibPrivate {
49894998

49904999
override string getKind() { result = Escaping::getHtmlKind() }
49915000
}
5001+
5002+
// ---------------------------------------------------------------------------
5003+
// argparse
5004+
// ---------------------------------------------------------------------------
5005+
/**
5006+
* if result of `parse_args` is tainted (because it uses command-line arguments),
5007+
* then the parsed values accesssed on any attribute lookup is also tainted.
5008+
*/
5009+
private class ArgumentParserAnyAttributeStep extends TaintTracking::AdditionalTaintStep {
5010+
override predicate step(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
5011+
nodeFrom =
5012+
API::moduleImport("argparse")
5013+
.getMember("ArgumentParser")
5014+
.getReturn()
5015+
.getMember("parse_args")
5016+
.getReturn()
5017+
.getAValueReachableFromSource() and
5018+
nodeTo.(DataFlow::AttrRead).getObject() = nodeFrom
5019+
}
5020+
}
5021+
5022+
// ---------------------------------------------------------------------------
5023+
// sys
5024+
// ---------------------------------------------------------------------------
5025+
/**
5026+
* An access of `sys.stdin`/`sys.stdout`/`sys.stderr`, to get additional FileLike
5027+
* modeling.
5028+
*/
5029+
private class SysStandardStreams extends Stdlib::FileLikeObject::InstanceSource, DataFlow::Node {
5030+
SysStandardStreams() {
5031+
this = API::moduleImport("sys").getMember(["stdin", "stdout", "stderr"]).asSource()
5032+
}
5033+
}
49925034
}
49935035

49945036
// ---------------------------------------------------------------------------

0 commit comments

Comments
 (0)