Skip to content

Commit 7ef4cc4

Browse files
committed
Java: Add flow through examples.
1 parent f6ef558 commit 7ef4cc4

File tree

1 file changed

+103
-4
lines changed

1 file changed

+103
-4
lines changed

docs/codeql/codeql-language-guides/customizing-library-models-for-java.rst

Lines changed: 103 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Please note that this sink is already added to the CodeQL Java analysis.
4545
4646
public static void taintsink(Connection conn, String query) throws SQLException {
4747
Statement stmt = conn.createStatement();
48-
stmt.execute(query);
48+
stmt.execute(query); // The argument passed to this method is a SQL injection sink.
4949
}
5050
5151
This can be achieved by adding the following data extension.
@@ -86,7 +86,7 @@ Please note that this source is already added to the CodeQL Java analysis.
8686
.. code-block:: java
8787
8888
public static InputStream tainted(Socket socket) throws IOException {
89-
InputStream stream = socket.getInputStream();
89+
InputStream stream = socket.getInputStream(); // The return value of this method is a remote source.
9090
return stream;
9191
}
9292
@@ -119,9 +119,108 @@ The remaining values are used to define the **access path**, the **kind**, and t
119119
- The eighth value **remote** is the kind of the source. The source kind is used to define for which queries the source is in scope. **remote** applies to many of security related queries as it means a remote source of untrusted data. As an example the SQL injection query uses **remote** sources.
120120
- The ninth value **manual** is the provenance of the source, which is used to identify the origin of the source.
121121

122-
Example: Adding flow through '<TODO>' methods.
123-
----------------------------------------------
122+
Example: Adding flow through the **concat** method.
123+
---------------------------------------------------
124+
In this example we will see, how to define flow through a method for a simple case.
125+
This pattern covers many of the cases where we need to define flow through a method.
126+
Please note that the flow through the **concat** method is already added to the CodeQL Java analysis.
124127

128+
.. code-block:: java
129+
130+
public static String taintflow(String s1, String s2) {
131+
String t = s1.concat(s2); // There is taint flow from s1 and s2 to t.
132+
return t;
133+
}
134+
135+
This can be achieved by adding the following data extension.
136+
These are widely known as summary models.
137+
138+
.. code-block:: yaml
139+
140+
extensions:
141+
- addsTo:
142+
pack: codeql/java-all
143+
extensible: summaryModel
144+
data:
145+
- ["java.lang", "String", False, "concat", "(String)", "", "Argument[-1]", "ReturnValue", "taint", "manual"]
146+
- ["java.lang", "String", False, "concat", "(String)", "", "Argument[0]", "ReturnValue", "taint", "manual"]
147+
148+
Reasoning:
149+
150+
Since we are adding flow through a method, we need to add tuples to the **summaryModel** extension point.
151+
Each tuple defines flow from one argument to the return value.
152+
The first five values are used to identify the method (callable) which we are defining a source on.
153+
These are the same for both of the rows above.
154+
155+
- The first value **java.lang** is the package name.
156+
- The second value **String** is the class (type) name.
157+
- The third value **False** is flag indicating, whether the source also applies to all overrides of the method.
158+
- The fourth value **concat** is the method name.
159+
- The fifth value **(String)** is the method input type signature.
160+
161+
For most practical purposes the sixth value is not relevant.
162+
The remaining values are used to define the **access path**, the **kind**, and the **provenance** (origin) of the source.
163+
164+
- The seventh value is the access path to the input where data flows from. **Argument[-1]** is the access path to the qualifier (**s1** in the example) and **Argument[0]** is the access path to the first argument (**s2** in the example).
165+
- The eighth value **ReturnValue** is the access path to the output where data flows too, in this case **ReturnValue**, which means that the input flows to the return value.
166+
- The ninth value **taint** is the kind of the flow. **taint** means that taint is propagated through the flow.
167+
- The tenth value **manual** is the provenance of the source, which is used to identify the origin of the summary.
168+
169+
Example: Add flow through the **map** method.
170+
---------------------------------------------
171+
In this example will will see a more complex example of modelling flow through a method.
172+
This pattern shows how to model flow through higher order methods and collection types.
173+
Please note that the flow through the **map** method is already added to the CodeQL Java analysis.
174+
175+
.. code-block:: java
176+
177+
public static Stream<String> taintflow(Stream<String> s) {
178+
Stream<String> l = s.map(e -> e.concat("\n"));
179+
return l;
180+
}
181+
182+
This can be achieved by adding the following data extension.
183+
184+
.. code-block:: yaml
185+
186+
extensions:
187+
- addsTo:
188+
pack: codeql/java-all
189+
extensible: summaryModel
190+
data:
191+
- ["java.util.stream", "Stream", True, "map", "(Function)", "", "Argument[-1].Element", "Argument[0].Parameter[0]", "value", "manual"]
192+
- ["java.util.stream", "Stream", True, "map", "(Function)", "", "Argument[0].ReturnValue", "ReturnValue.Element", "value", "manual"]
193+
194+
Reasoning:
195+
196+
Since we are adding flow through a method, we need to add tuples to the **summaryModel** extension point.
197+
Each tuple defines part of the flow that comprises the total flow through the method.
198+
The first five values are used to identify the method (callable) which we are defining a source on.
199+
These are the same for both of the rows above.
200+
201+
- The first value **java.util.stream** is the package name.
202+
- The second value **Stream** is the class (type) name.
203+
- The third value **True** is flag indicating, whether the source also applies to all overrides of the method.
204+
- The fourth value **map** is the method name.
205+
- The fifth value **Function** is the method input type signature.
206+
207+
For most practical purposes the sixth value is not relevant.
208+
The remaining values are used to define the **access path**, the **kind**, and the **provenance** (origin) of the source.
209+
- The seventh value is the access path to the **input** where data flows from.
210+
- The eighth value **ReturnValue** is the access path to the **output** where data flows too.
211+
212+
For the first row the
213+
- The seventh value is **Argument[-1].Element**, which is the access path to the elements of the qualifier (the elements of the stream **s** in the example).
214+
- The eight value is **Argument[0].Paramter[0]**, which is the access path the first parameter of the **Function** argument of **map** (the lambda parameter **e** in the example).
215+
216+
For the second row the
217+
- The seventh value is **Argument[0].ReturnValue**, which is the access path to the return value of the **Function** argument of **map** (the return value of the lambda in the example).
218+
- The eighth value is **ReturnValue.Element**, which is the access path to the elements of the return value of **map** (the elements of the stream **l** in the example).
219+
220+
- The ninth value **value** is the kind of the flow. **value** means that the value is propagated.
221+
- The tenth value **manual** is the provenance of the source, which is used to identify the origin of the summary.
222+
223+
That is, the first row models that there is value flow from the elements of qualifier stream into the first argument of the Function provided to **map** and the second row models that there is value flow from the return value of the Function to the elements of the stream returned from **map**.
125224

126225
Example: Adding **neutral** methods.
127226
------------------------------------

0 commit comments

Comments
 (0)