Skip to content

Commit 43fd342

Browse files
michaelnebelsubatoi
andcommitted
Apply suggestions from code review
Co-authored-by: Ben Ahmady <[email protected]>
1 parent ad42f7d commit 43fd342

File tree

1 file changed

+18
-27
lines changed

1 file changed

+18
-27
lines changed

docs/codeql/codeql-language-guides/customizing-library-models-for-java.rst

Lines changed: 18 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,9 @@ Customizing Library Models for Java
1010

1111
The Java analysis can be customized by adding library models (summaries, sinks and sources) in data extension files.
1212
A model is a definition of a behavior of a library element, such as a method, that is used to improve the data flow analysis precision by identifying more results.
13-
Most of the security related queries are *taint tracking* queries that try to find paths from a *source* of untrusted input to a *sink* that represents a vulnerability.
14-
Furthermore, the taint tracking queries also need to know how data can flow through elements that are not included in the source code - these are named *summaries*.
13+
Most of the security related queries are taint tracking queries that try to find paths from a source of untrusted input to a sink that represents a vulnerability. Sources are the starting points of a taint tracking data flow analysis, and sinks are the end points of a taint tracking data flow analysis.
1514

16-
That is:
17-
18-
- **sources** are the starting points of a taint tracking data flow analysis.
19-
- **sinks** are the end points of a taint tracking data flow analysis.
20-
- **summaries** are models of elements that allow us to synthesize the elements flow behavior without having them in the source code. This is especially helpful when using a third party (or the standard) library.
15+
Furthermore, the taint tracking queries also need to know how data can flow through elements that are not included in the source code. These are named summaries: they are models of elements that allow us to synthesize the elements flow behavior without having them in the source code. This is especially helpful when using a third party (or the standard) library.
2116

2217
The models are defined using data extensions where each tuple constitutes a model.
2318
A data extension file for Java is a YAML file in the form:
@@ -46,9 +41,9 @@ The extensible predicates are populated using data extensions specified in YAML
4641

4742
In the sections below, we will provide examples of how to add tuples to the different extensible predicates.
4843
The extensible predicates are used to customize and improve the existing data flow queries, by providing sources, sinks, and flow through (summaries) for library elements.
49-
The :ref:`reference-material` section will provide details on the *mini DSLs* that defines models for each extensible predicate.
44+
The :ref:`reference-material` section will provide details on the *mini DSLs* that define models for each extensible predicate.
5045

51-
Example: Taint sink in the **java.sql** package.
46+
Example: Taint sink in the **java.sql** package
5247
------------------------------------------------
5348

5449
In this example we will show how to model the argument of the **execute** method as a SQL injection sink.
@@ -62,7 +57,7 @@ Please note that this sink is already added to the CodeQL Java analysis.
6257
stmt.execute(query); // The argument to this method is a SQL injection sink.
6358
}
6459
65-
This means that we want to add a tuple to the **sinkModel**\(package, type, subtypes, name, signature, ext, input, kind, provenance) extensible predicate, which can be achieved by adding the following to a data extension file:
60+
We need to add a tuple to the **sinkModel**\(package, type, subtypes, name, signature, ext, input, kind, provenance) extensible predicate. To do this, add the following to a data extension file:
6661

6762
.. code-block:: yaml
6863
@@ -73,7 +68,6 @@ This means that we want to add a tuple to the **sinkModel**\(package, type, subt
7368
data:
7469
- ["java.sql", "Statement", True, "execute", "(String)", "", "Argument[0]", "sql", "manual"]
7570
76-
Reasoning:
7771
7872
Since we are adding a new sink, we need to add a tuple to the **sinkModel** extensible predicate.
7973
The first five values identify the callable (in this case a method) to be modeled as a sink.
@@ -91,7 +85,7 @@ The remaining values are used to define the **access path**, the **kind**, and t
9185
- The eighth value **sql** is the kind of the sink. The sink kind is used to define the queries where the sink is in scope. In this case - the SQL injection queries.
9286
- The ninth value **manual** is the provenance of the sink, which is used to identify the origin of the sink.
9387

94-
Example: Taint source from the **java.net** package.
88+
Example: Taint source from the **java.net** package
9589
----------------------------------------------------
9690
In this example we show how to model the return value from the **getInputStream** method as a **remote** source.
9791
This is the **getInputStream** method in the **Socket** class, which is located in the **java.net** package.
@@ -104,7 +98,7 @@ Please note that this source is already added to the CodeQL Java analysis.
10498
...
10599
}
106100
107-
This means that we want to add a tuple to the **sourceModel**\(package, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate, which can be achieved by adding the following to a data extension file:
101+
We need to add a tuple to the **sourceModel**\(package, type, subtypes, name, signature, ext, output, kind, provenance) extensible predicate. To do this, add the following to a data extension file:
108102

109103
.. code-block:: yaml
110104
@@ -115,7 +109,6 @@ This means that we want to add a tuple to the **sourceModel**\(package, type, su
115109
data:
116110
- ["java.net", "Socket", False, "getInputStream", "()", "", "ReturnValue", "remote", "manual"]
117111
118-
Reasoning:
119112
120113
Since we are adding a new source, we need to add a tuple to the **sourceModel** extensible predicate.
121114
The first five values identify the callable (in this case a method) to be modeled as a source.
@@ -133,7 +126,7 @@ The remaining values are used to define the **access path**, the **kind**, and t
133126
- The eighth value **remote** is the kind of the source. The source kind is used to define the queries where the source is in scope. **remote** applies to many of the security related queries as it means a remote source of untrusted data. As an example the SQL injection query uses **remote** sources.
134127
- The ninth value **manual** is the provenance of the source, which is used to identify the origin of the source.
135128

136-
Example: Add flow through the **concat** method.
129+
Example: Add flow through the **concat** method
137130
------------------------------------------------
138131
In this example we show how to model flow through a method for a simple case.
139132
This pattern covers many of the cases where we need to define flow through a method.
@@ -146,7 +139,7 @@ Please note that the flow through the **concat** method is already added to the
146139
...
147140
}
148141
149-
This means that we want to add tuples to the **summaryModel**\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate, which can be achieved by adding the following to a data extension file:
142+
We need to add tuples to the **summaryModel**\(package, type, subtypes, name, signature, ext, input, output, kind, provenance) extensible predicate. To do this, add the following to a data extension file:
150143

151144
.. code-block:: yaml
152145
@@ -181,7 +174,7 @@ The remaining values are used to define the **access path**, the **kind**, and t
181174
- The ninth value **taint** is the kind of the flow. **taint** means that taint is propagated through the call.
182175
- The tenth value **manual** is the provenance of the summary, which is used to identify the origin of the summary.
183176

184-
Example: Add flow through the **map** method.
177+
Example: Add flow through the **map** method
185178
---------------------------------------------
186179
In this example, we will see a more complex example of modeling flow through a method.
187180
This pattern shows how to model flow through higher order methods and collection types.
@@ -194,7 +187,7 @@ Please note that the flow through the **map** method is already added to the Cod
194187
...
195188
}
196189
197-
This can be achieved by adding the following to a data extension file:
190+
To do this, add the following to a data extension file:
198191

199192
.. code-block:: yaml
200193
@@ -206,7 +199,6 @@ This can be achieved by adding the following to a data extension file:
206199
- ["java.util.stream", "Stream", True, "map", "(Function)", "", "Argument[this].Element", "Argument[0].Parameter[0]", "value", "manual"]
207200
- ["java.util.stream", "Stream", True, "map", "(Function)", "", "Argument[0].ReturnValue", "ReturnValue.Element", "value", "manual"]
208201
209-
Reasoning:
210202
211203
Since we are adding flow through a method, we need to add tuples to the **summaryModel** extensible predicate.
212204
Each tuple defines part of the flow that comprises the total flow through the **map** method.
@@ -225,24 +217,24 @@ The remaining values are used to define the **access path**, the **kind**, and t
225217
- The seventh value is the access path to the **input** (where data flows from).
226218
- The eighth value is the access path to the **output** (where data flows to).
227219

228-
For the first row the
220+
For the first row:
229221

230222
- The seventh value is **Argument[this].Element**, which is the access path to the elements of the qualifier (the elements of the stream **s** in the example).
231223
- The eight value is **Argument[0].Parameter[0]**, which is the access path to the first parameter of the **Function** argument of **map** (the lambda parameter **e** in the example).
232224

233-
For the second row the
225+
For the second row:
234226

235227
- The seventh value is **Argument[0].ReturnValue**, which is the access path to the return value of the **Function** argument of **map** (the return value of the lambda in the example).
236228
- The eighth value is **ReturnValue.Element**, which is the access path to the elements of the return value of **map** (the elements of the stream **l** in the example).
237229

238-
The remaining values for both rows
230+
For the remaining values for both rows:
239231

240232
- The ninth value **value** is the kind of the flow. **value** means that the value is preserved.
241233
- The tenth value **manual** is the provenance of the summary, which is used to identify the origin of the summary.
242234

243-
That is, the first row models that there is value flow from the elements of the qualifier stream into the first argument of the Function provided to **map** and the second row models that there is value flow from the return value of the Function to the elements of the stream returned from **map**.
235+
That is, the first row models that there is value flow from the elements of the qualifier stream into the first argument of the function provided to **map** and the second row models that there is value flow from the return value of the function to the elements of the stream returned from **map**.
244236

245-
Example: Add a **neutral** method.
237+
Example: Add a **neutral** method
246238
----------------------------------
247239
In this example we will show how to model the **now** method as being neutral.
248240
This is purely for completeness and has no impact on the analysis.
@@ -256,7 +248,7 @@ Please note that the neutral model for the **now** method is already added.
256248
...
257249
}
258250
259-
This means that we want to add a tuple to the **neutralModel**\(package, type, name, signature, provenance) extensible predicate, which can be achieved by adding the following to a data extension file:
251+
We need to add a tuple to the **neutralModel**\(package, type, name, signature, provenance) extensible predicate. To do this, add the following to a data extension file:
260252

261253
.. code-block:: yaml
262254
@@ -267,7 +259,6 @@ This means that we want to add a tuple to the **neutralModel**\(package, type, n
267259
data:
268260
- ["java.time", "Instant", "now", "()", "manual"]
269261
270-
Reasoning:
271262
272263
Since we are adding a neutral model, we need to add tuples to the **neutralModel** extensible predicate.
273264
The first five values identify the callable (in this case a method) to be modeled as a neutral and the fifth value is the provenance (origin) of the neutral.
@@ -290,7 +281,7 @@ Extensible predicates
290281
---------------------
291282

292283
Below is a description of the columns for each extensible predicate.
293-
Sources, Sinks, Summaries and Neutrals are commonly known as Models.
284+
Sources, sinks, summaries and neutrals are commonly known as models.
294285
The semantics of many of the columns of the extensible predicates are shared.
295286

296287
The shared columns are:

0 commit comments

Comments
 (0)