You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/codeql/codeql-language-guides/customizing-library-models-for-java.rst
+18-27Lines changed: 18 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,14 +10,9 @@ Customizing Library Models for Java
10
10
11
11
The Java analysis can be customized by adding library models (summaries, sinks and sources) in data extension files.
12
12
A model is a definition of a behavior of a library element, such as a method, that is used to improve the data flow analysis precision by identifying more results.
13
-
Most of the security related queries are *taint tracking* queries that try to find paths from a *source* of untrusted input to a *sink* that represents a vulnerability.
14
-
Furthermore, the taint tracking queries also need to know how data can flow through elements that are not included in the source code - these are named *summaries*.
13
+
Most of the security related queries are taint tracking queries that try to find paths from a source of untrusted input to a sink that represents a vulnerability. Sources are the starting points of a taint tracking data flow analysis, and sinks are the end points of a taint tracking data flow analysis.
15
14
16
-
That is:
17
-
18
-
- **sources** are the starting points of a taint tracking data flow analysis.
19
-
- **sinks** are the end points of a taint tracking data flow analysis.
20
-
- **summaries** are models of elements that allow us to synthesize the elements flow behavior without having them in the source code. This is especially helpful when using a third party (or the standard) library.
15
+
Furthermore, the taint tracking queries also need to know how data can flow through elements that are not included in the source code. These are named summaries: they are models of elements that allow us to synthesize the elements flow behavior without having them in the source code. This is especially helpful when using a third party (or the standard) library.
21
16
22
17
The models are defined using data extensions where each tuple constitutes a model.
23
18
A data extension file for Java is a YAML file in the form:
@@ -46,9 +41,9 @@ The extensible predicates are populated using data extensions specified in YAML
46
41
47
42
In the sections below, we will provide examples of how to add tuples to the different extensible predicates.
48
43
The extensible predicates are used to customize and improve the existing data flow queries, by providing sources, sinks, and flow through (summaries) for library elements.
49
-
The :ref:`reference-material` section will provide details on the *mini DSLs* that defines models for each extensible predicate.
44
+
The :ref:`reference-material` section will provide details on the *mini DSLs* that define models for each extensible predicate.
50
45
51
-
Example: Taint sink in the **java.sql** package.
46
+
Example: Taint sink in the **java.sql** package
52
47
------------------------------------------------
53
48
54
49
In this example we will show how to model the argument of the **execute** method as a SQL injection sink.
@@ -62,7 +57,7 @@ Please note that this sink is already added to the CodeQL Java analysis.
62
57
stmt.execute(query); // The argument to this method is a SQL injection sink.
63
58
}
64
59
65
-
This means that we want to add a tuple to the **sinkModel**\(package, type, subtypes, name, signature, ext, input, kind, provenance) extensible predicate, which can be achieved by adding the following to a data extension file:
60
+
We need to add a tuple to the **sinkModel**\(package, type, subtypes, name, signature, ext, input, kind, provenance) extensible predicate. To do this, add the following to a data extension file:
66
61
67
62
.. code-block:: yaml
68
63
@@ -73,7 +68,6 @@ This means that we want to add a tuple to the **sinkModel**\(package, type, subt
Since we are adding a new sink, we need to add a tuple to the **sinkModel** extensible predicate.
79
73
The first five values identify the callable (in this case a method) to be modeled as a sink.
@@ -91,7 +85,7 @@ The remaining values are used to define the **access path**, the **kind**, and t
91
85
- The eighth value **sql** is the kind of the sink. The sink kind is used to define the queries where the sink is in scope. In this case - the SQL injection queries.
92
86
- The ninth value **manual** is the provenance of the sink, which is used to identify the origin of the sink.
93
87
94
-
Example: Taint source from the **java.net** package.
88
+
Example: Taint source from the **java.net** package
In this example we show how to model the return value from the **getInputStream** method as a **remote** source.
97
91
This is the **getInputStream** method in the **Socket** class, which is located in the **java.net** package.
@@ -104,7 +98,7 @@ Please note that this source is already added to the CodeQL Java analysis.
104
98
...
105
99
}
106
100
107
-
This means that we want to add a tuple to the **sourceModel**\(package, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate, which can be achieved by adding the following to a data extension file:
101
+
We need to add a tuple to the **sourceModel**\(package, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate. To do this, add the following to a data extension file:
108
102
109
103
.. code-block:: yaml
110
104
@@ -115,7 +109,6 @@ This means that we want to add a tuple to the **sourceModel**\(package, type, su
Since we are adding a new source, we need to add a tuple to the **sourceModel** extensible predicate.
121
114
The first five values identify the callable (in this case a method) to be modeled as a source.
@@ -133,7 +126,7 @@ The remaining values are used to define the **access path**, the **kind**, and t
133
126
- The eighth value **remote** is the kind of the source. The source kind is used to define the queries where the source is in scope. **remote** applies to many of the security related queries as it means a remote source of untrusted data. As an example the SQL injection query uses **remote** sources.
134
127
- The ninth value **manual** is the provenance of the source, which is used to identify the origin of the source.
135
128
136
-
Example: Add flow through the **concat** method.
129
+
Example: Add flow through the **concat** method
137
130
------------------------------------------------
138
131
In this example we show how to model flow through a method for a simple case.
139
132
This pattern covers many of the cases where we need to define flow through a method.
@@ -146,7 +139,7 @@ Please note that the flow through the **concat** method is already added to the
146
139
...
147
140
}
148
141
149
-
This means that we want to add tuples to the **summaryModel**\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate, which can be achieved by adding the following to a data extension file:
142
+
We need to add tuples to the **summaryModel**\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate. To do this, add the following to a data extension file:
150
143
151
144
.. code-block:: yaml
152
145
@@ -181,7 +174,7 @@ The remaining values are used to define the **access path**, the **kind**, and t
181
174
- The ninth value **taint** is the kind of the flow. **taint** means that taint is propagated through the call.
182
175
- The tenth value **manual** is the provenance of the summary, which is used to identify the origin of the summary.
183
176
184
-
Example: Add flow through the **map** method.
177
+
Example: Add flow through the **map** method
185
178
---------------------------------------------
186
179
In this example, we will see a more complex example of modeling flow through a method.
187
180
This pattern shows how to model flow through higher order methods and collection types.
@@ -194,7 +187,7 @@ Please note that the flow through the **map** method is already added to the Cod
194
187
...
195
188
}
196
189
197
-
This can be achieved by adding the following to a data extension file:
190
+
To do this, add the following to a data extension file:
198
191
199
192
.. code-block:: yaml
200
193
@@ -206,7 +199,6 @@ This can be achieved by adding the following to a data extension file:
Since we are adding flow through a method, we need to add tuples to the **summaryModel** extensible predicate.
212
204
Each tuple defines part of the flow that comprises the total flow through the **map** method.
@@ -225,24 +217,24 @@ The remaining values are used to define the **access path**, the **kind**, and t
225
217
- The seventh value is the access path to the **input** (where data flows from).
226
218
- The eighth value is the access path to the **output** (where data flows to).
227
219
228
-
For the first row the
220
+
For the first row:
229
221
230
222
- The seventh value is **Argument[this].Element**, which is the access path to the elements of the qualifier (the elements of the stream **s** in the example).
231
223
- The eight value is **Argument[0].Parameter[0]**, which is the access path to the first parameter of the **Function** argument of **map** (the lambda parameter **e** in the example).
232
224
233
-
For the second row the
225
+
For the second row:
234
226
235
227
- The seventh value is **Argument[0].ReturnValue**, which is the access path to the return value of the **Function** argument of **map** (the return value of the lambda in the example).
236
228
- The eighth value is **ReturnValue.Element**, which is the access path to the elements of the return value of **map** (the elements of the stream **l** in the example).
237
229
238
-
The remaining values for both rows
230
+
For the remaining values for both rows:
239
231
240
232
- The ninth value **value** is the kind of the flow. **value** means that the value is preserved.
241
233
- The tenth value **manual** is the provenance of the summary, which is used to identify the origin of the summary.
242
234
243
-
That is, the first row models that there is value flow from the elements of the qualifier stream into the first argument of the Function provided to **map** and the second row models that there is value flow from the return value of the Function to the elements of the stream returned from **map**.
235
+
That is, the first row models that there is value flow from the elements of the qualifier stream into the first argument of the function provided to **map** and the second row models that there is value flow from the return value of the function to the elements of the stream returned from **map**.
244
236
245
-
Example: Add a **neutral** method.
237
+
Example: Add a **neutral** method
246
238
----------------------------------
247
239
In this example we will show how to model the **now** method as being neutral.
248
240
This is purely for completeness and has no impact on the analysis.
@@ -256,7 +248,7 @@ Please note that the neutral model for the **now** method is already added.
256
248
...
257
249
}
258
250
259
-
This means that we want to add a tuple to the **neutralModel**\(package, type, name, signature, provenance) extensible predicate, which can be achieved by adding the following to a data extension file:
251
+
We need to add a tuple to the **neutralModel**\(package, type, name, signature, provenance) extensible predicate. To do this, add the following to a data extension file:
260
252
261
253
.. code-block:: yaml
262
254
@@ -267,7 +259,6 @@ This means that we want to add a tuple to the **neutralModel**\(package, type, n
267
259
data:
268
260
- ["java.time", "Instant", "now", "()", "manual"]
269
261
270
-
Reasoning:
271
262
272
263
Since we are adding a neutral model, we need to add tuples to the **neutralModel** extensible predicate.
273
264
The first five values identify the callable (in this case a method) to be modeled as a neutral and the fifth value is the provenance (origin) of the neutral.
@@ -290,7 +281,7 @@ Extensible predicates
290
281
---------------------
291
282
292
283
Below is a description of the columns for each extensible predicate.
293
-
Sources, Sinks, Summaries and Neutrals are commonly known as Models.
284
+
Sources, sinks, summaries and neutrals are commonly known as models.
294
285
The semantics of many of the columns of the extensible predicates are shared.
0 commit comments