Skip to content

Commit eaec2d7

Browse files
authored
Merge pull request github#3888 from shati-patel/go-docs
Learning CodeQL: Add new library modeling guide (Go)
2 parents f917b9e + f98491a commit eaec2d7

File tree

2 files changed

+126
-0
lines changed

2 files changed

+126
-0
lines changed
Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
Modeling data flow in Go libraries
2+
==================================
3+
4+
When analyzing a Go program, CodeQL does not examine the source code for
5+
external packages. To track the flow of untrusted data through a library, you
6+
can create a model of the library.
7+
8+
You can find existing models in the ``ql/src/semmle/go/frameworks/`` folder of the
9+
`CodeQL for Go repository <https://github.com/github/codeql-go/tree/main/ql/src/semmle/go/frameworks>`__.
10+
To add a new model, you should make a new file in that folder, named after the library.
11+
12+
Sources
13+
-------
14+
15+
To mark a source of data that is controlled by an untrusted user, we
16+
create a class extending ``UntrustedFlowSource::Range``. Inheritance and
17+
the characteristic predicate of the class should be used to specify
18+
exactly the dataflow node that introduces the data. Here is a short
19+
example from ``Mux.qll``.
20+
21+
.. code-block:: ql
22+
23+
class RequestVars extends DataFlow::UntrustedFlowSource::Range, DataFlow::CallNode {
24+
RequestVars() { this.getTarget().hasQualifiedName("github.com/gorilla/mux", "Vars") }
25+
}
26+
27+
This has the effect that all calls to `the function Vars from the
28+
package mux <http://www.gorillatoolkit.org/pkg/mux#Vars>`__ are
29+
treated as sources of untrusted data.
30+
31+
Flow propagation
32+
----------------
33+
34+
By default, we assume that all functions in libraries do not have
35+
any data flow. To indicate that a particular function does have data flow,
36+
create a class extending ``TaintTracking::FunctionModel`` (or
37+
``DataFlow::FunctionModel`` if the untrusted user data is passed on
38+
without being modified).
39+
40+
Inheritance and the characteristic predicate of the class should specify
41+
the function. The class should also have a member predicate with the signature
42+
``override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp)``
43+
(or
44+
``override predicate hasDataFlow(FunctionInput inp, FunctionOutput outp)``
45+
if extending ``DataFlow::FunctionModel``). The body should constrain
46+
``inp`` and ``outp``.
47+
48+
``FunctionInput`` is an abstract representation of the inputs to a
49+
function. The options are:
50+
51+
* the receiver (``inp.isReceiver()``)
52+
* one of the parameters (``inp.isParameter(i)``)
53+
* one of the results (``inp.isResult(i)``, or ``inp.isResult`` if there is only one result)
54+
55+
Note that it may seem strange that the result of a function could be
56+
considered as a function input, but it is needed in some cases. For
57+
instance, the function ``bufio.NewWriter`` returns a writer ``bw`` that
58+
buffers write operations to an underlying writer ``w``. If tainted data
59+
is written to ``bw``, then it makes sense to propagate that taint back
60+
to the underlying writer ``w``, which can be modeled by saying that
61+
``bufio.NewWriter`` propagates taint from its result to its first
62+
argument.
63+
64+
Similarly, ``FunctionOutput`` is an abstract representation of the
65+
outputs to a function. The options are:
66+
67+
* the receiver (``outp.isReceiver()``)
68+
* one of the parameters (``outp.isParameter(i)``)
69+
* one of the results (``outp.isResult(i)``, or ``outp.isResult`` if there is only one result)
70+
71+
Here is an example from ``Gin.qll``, which has been slightly simplified.
72+
73+
.. code-block:: ql
74+
75+
private class ParamsGet extends TaintTracking::FunctionModel, Method {
76+
ParamsGet() { this.hasQualifiedName("github.com/gin-gonic/gin", "Params", "Get") }
77+
78+
override predicate hasTaintFlow(FunctionInput inp, FunctionOutput outp) {
79+
inp.isReceiver() and outp.isResult(0)
80+
}
81+
}
82+
83+
This has the effect that calls to the ``Get`` method with receiver type
84+
``Params`` from the ``gin-gonic/gin`` package allow taint to flow from
85+
the receiver to the first result. In other words, if ``p`` has type
86+
``Params`` and taint can flow to it, then after the line
87+
``x := p.Get("foo")`` taint can also flow to ``x``.
88+
89+
Sanitizers
90+
----------
91+
92+
It is not necessary to indicate that library functions are sanitizers.
93+
Their bodies are not analyzed, so it is assumed that data does not
94+
flow through them.
95+
96+
Sinks
97+
-----
98+
99+
Data-flow sinks are specified by queries rather than by library models.
100+
However, you can use library models to indicate when functions belong to
101+
special categories. Queries can then use these categories when specifying
102+
sinks. Classes representing these special categories are contained in
103+
``ql/src/semmle/go/Concepts.qll`` in the `CodeQL for Go repository
104+
<https://github.com/github/codeql-go/blob/main/ql/src/semmle/go/Concepts.qll>`__.
105+
``Concepts.qll`` includes classes for logger mechanisms,
106+
HTTP response writers, HTTP redirects, and marshaling and unmarshaling
107+
functions.
108+
109+
Here is a short example from ``Stdlib.qll``, which has been slightly simplified.
110+
111+
.. code-block:: ql
112+
113+
private class PrintfCall extends LoggerCall::Range, DataFlow::CallNode {
114+
PrintfCall() { this.getTarget().hasQualifiedName("fmt", ["Print", "Printf", "Println"]) }
115+
116+
override DataFlow::Node getAMessageComponent() { result = this.getAnArgument() }
117+
}
118+
119+
This has the effect that any call to ``Print``, ``Printf``, or
120+
``Println`` in the package ``fmt`` is recognized as a logger call.
121+
Any query that uses logger calls as a sink will then identify when tainted data
122+
has been passed as an argument to ``Print``, ``Printf``, or ``Println``.

docs/language/learn-ql/go/ql-for-go.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,13 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat
88

99
introduce-libraries-go
1010
ast-class-reference
11+
library-modeling-go
1112

1213
- `Basic Go query <https://lgtm.com/help/lgtm/console/ql-go-basic-example>`__: Learn to write and run a simple CodeQL query using LGTM.
1314

1415
- :doc:`CodeQL library for Go <introduce-libraries-go>`: When you're analyzing a Go program, you can make use of the large collection of classes in the CodeQL library for Go.
1516

1617
- :doc:`Abstract syntax tree classes for working with Go programs <ast-class-reference>`: CodeQL has a large selection of classes for representing the abstract syntax tree of Go programs.
18+
19+
- :doc:`Modeling data flow in Go libraries <library-modeling-go>`: When analyzing a Go program, CodeQL does not examine the source code for external packages.
20+
To track the flow of untrusted data through a library, you can create a model of the library.

0 commit comments

Comments
 (0)