You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/codeql/codeql-language-guides/codeql-for-java.rst
-2Lines changed: 0 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,5 +45,3 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat
45
45
- :doc:`Working with source locations <working-with-source-locations>`: You can use the location of entities within Java code to look for potential errors. Locations allow you to deduce the presence, or absence, of white space which, in some cases, may indicate a problem.
46
46
47
47
- :doc:`Abstract syntax tree classes for working with Java programs <abstract-syntax-tree-classes-for-working-with-java-programs>`: CodeQL has a large selection of classes for representing the abstract syntax tree of Java programs.
48
-
49
-
- :doc:`Customizing library models for Java <customizing-library-models-for-java>`: You can customize the CodeQL library for Java to model the behavior of your own Java libraries using data extensions.
Copy file name to clipboardExpand all lines: docs/codeql/codeql-language-guides/customizing-library-models-for-java.rst
+20-20Lines changed: 20 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,14 +10,14 @@ Customizing Library Models for Java
10
10
11
11
The Java analysis can be customized by adding library models (summaries, sinks and sources) in data extension files.
12
12
A model is a definition of a behavior of a library element, such as a method, that is used to improve the data flow analysis precision by identifying more results.
13
-
Most of the security related queries are *taint tracking* queries that tries to find paths from a *source* of untrusted input to a *sink* that represents a vulnerability.
13
+
Most of the security related queries are *taint tracking* queries that try to find paths from a *source* of untrusted input to a *sink* that represents a vulnerability.
14
14
Furthermore, the taint tracking queries also need to know how data can flow through elements that are not included in the source code - these are named *summaries*.
15
15
16
-
That is
16
+
That is:
17
17
18
18
- **sources** are the starting points of a taint tracking data flow analysis.
19
19
- **sinks** are the end points of a taint tracking data flow analysis.
20
-
- **summaries** are models of elements that allows us to synthesize the elements flow behavior without having them in the source code. This is especially helpful when using a third party (or the standard) library.
20
+
- **summaries** are models of elements that allow us to synthesize the elements flow behavior without having them in the source code. This is especially helpful when using a third party (or the standard) library.
21
21
22
22
The models are defined using data extensions where each tuple constitutes a model.
23
23
A data extension file for Java is a YAML file in the form:
@@ -35,7 +35,7 @@ A data extension file for Java is a YAML file in the form:
35
35
36
36
Data extensions contribute to the extensible predicates defined in the CodeQL library. For more information on how to define data extensions and extensible predicates as well as how to wire them up, see the :ref:`data-extensions` documentation.
37
37
38
-
The CodeQL library for Java expose the following extensible predicates:
38
+
The CodeQL library for Java exposes the following extensible predicates:
39
39
40
40
- **sourceModel**\(package, type, subtypes, name, signature, ext, output, kind, provenance). This is used for **source** models.
41
41
- **sinkModel**\(package, type, subtypes, name, signature, ext, input, kind, provenance). This is used for **sink** models.
@@ -44,9 +44,9 @@ The CodeQL library for Java expose the following extensible predicates:
44
44
45
45
The extensible predicates are populated using data extensions specified in YAML files.
46
46
47
-
In the sections below, we will show by example how to add tuples to the different extensible predicates.
47
+
In the sections below, we will provide examples of how to add tuples to the different extensible predicates.
48
48
The extensible predicates are used to customize and improve the existing data flow queries, by providing sources, sinks, and flow through (summaries) for library elements.
49
-
The :ref:`reference-material` section will provide details on the *mini DSLs* that define models for each extensible predicate.
49
+
The :ref:`reference-material` section will provide details on the *mini DSLs* that defines models for each extensible predicate.
50
50
51
51
Example: Taint sink in the **java.sql** package.
52
52
------------------------------------------------
@@ -84,11 +84,11 @@ The first five values identify the callable (in this case a method) to be modele
84
84
- The fourth value **execute** is the method name.
85
85
- The fifth value **(String)** is the method input type signature.
86
86
87
-
For most practical purposes the sixth value is not relevant.
87
+
The sixth value is only relevant internally and can be omitted in most use cases.
88
88
The remaining values are used to define the **access path**, the **kind**, and the **provenance** (origin) of the sink.
89
89
90
90
- The seventh value **Argument[0]** is the **access path** to the first argument passed to the method, which means that this is the location of the sink.
91
-
- The eighth value **sql** is the kind of the sink. The sink kind is used to define the queries where the sink is in scope. In this case - the SQL injection queries.
91
+
- The eighth value **sql** is the kind of the sink. The sink kind is used to define the queries where the sink is in scope. In this case - the SQL injection queries.
92
92
- The ninth value **manual** is the provenance of the sink, which is used to identify the origin of the sink.
93
93
94
94
Example: Taint source from the **java.net** package.
@@ -122,7 +122,7 @@ The first five values identify the callable (in this case a method) to be modele
122
122
123
123
- The first value **java.net** is the package name.
124
124
- The second value **Socket** is the name of the class (type) that contains the source.
125
-
- The third value **False** flag indicates whether or not the source also applies to all overrides of the method.
125
+
- The third value **False** is a flag that indicates whether or not the source also applies to all overrides of the method.
126
126
- The fourth value **getInputStream** is the method name.
127
127
- The fifth value **()** is the method input type signature.
128
128
@@ -164,12 +164,12 @@ Since we are adding flow through a method, we need to add tuples to the **summar
164
164
Each tuple defines flow from one argument to the return value.
165
165
The first row defines flow from the qualifier (**s1** in the example) to the return value (**t** in the example) and the second row defines flow from the first argument (**s2** in the example) to the return value (**t** in the example).
166
166
167
-
The first five values are used to identify the callable (in this case a method) which we are defining a summary for.
167
+
The first five values identify the callable (in this case a method) to be modeled as a summary.
168
168
These are the same for both of the rows above as we are adding two summaries for the same method.
169
169
170
170
- The first value **java.lang** is the package name.
171
171
- The second value **String** is the class (type) name.
172
-
- The third value **False** is a flag indicating whether the summary also applies to all overrides of the method.
172
+
- The third value **False** is a flag that indicates whether or not the summary also applies to all overrides of the method.
173
173
- The fourth value **concat** is the method name.
174
174
- The fifth value **(String)** is the method input type signature.
175
175
@@ -183,7 +183,7 @@ The remaining values are used to define the **access path**, the **kind**, and t
183
183
184
184
Example: Add flow through the **map** method.
185
185
---------------------------------------------
186
-
In this example, we will see a more complex example of modelling flow through a method.
186
+
In this example, we will see a more complex example of modeling flow through a method.
187
187
This pattern shows how to model flow through higher order methods and collection types.
188
188
Please note that the flow through the **map** method is already added to the CodeQL Java analysis.
189
189
@@ -194,7 +194,7 @@ Please note that the flow through the **map** method is already added to the Cod
194
194
...
195
195
}
196
196
197
-
This can be achieved by adding the following data extension.
197
+
This can be achieved by adding the following to a data extension file:
198
198
199
199
.. code-block:: yaml
200
200
@@ -215,7 +215,7 @@ These are the same for both of the rows above as we are adding two summaries for
215
215
216
216
- The first value **java.util.stream** is the package name.
217
217
- The second value **Stream** is the class (type) name.
218
-
- The third value **True** is a flag indicating whether the summary also applies to all overrides of the method.
218
+
- The third value **True** is a flag that indicates whether or not the summary also applies to all overrides of the method.
219
219
- The fourth value **map** is the method name.
220
220
- The fifth value **Function** is the method input type signature.
221
221
@@ -245,7 +245,7 @@ That is, the first row models that there is value flow from the elements of the
245
245
Example: Add a **neutral** method.
246
246
----------------------------------
247
247
In this example we will show how to model the **now** method as being neutral.
248
-
This is purely for consistency and has no impact on the analysis.
248
+
This is purely for completeness and has no impact on the analysis.
249
249
A neutral model is used to define that there is no flow through a method.
250
250
Please note that the neutral model for the **now** method is already added.
251
251
@@ -284,7 +284,7 @@ Reference material
284
284
------------------
285
285
286
286
The following sections provide reference material for extensible predicates.
287
-
This includes descriptions of each of the arguments (eg. access paths, kinds and provenance).
287
+
This includes descriptions of each of the arguments (e.g. access paths, kinds and provenance).
288
288
289
289
Extensible predicates
290
290
---------------------
@@ -299,7 +299,7 @@ The shared columns are:
299
299
- **type**: Name of the type containing the element(s) to be modeled.
300
300
- **subtypes**: A boolean flag indicating whether the model should also apply to all overrides of the selected element(s).
301
301
- **name**: Name of the element (optional). If this is left blank, it means all elements matching the previous selection criteria.
302
-
- **signature**: Type signature of the selected element (optional). If this is left blank it means all elements matching the previous selection criteria.
302
+
- **signature**: Type signature of the selected element (optional). If this is left blank, it means all elements matching the previous selection criteria.
303
303
- **ext**: Specifies additional API-graph-like edges (mostly empty) and out of scope for this document.
304
304
- **provenance**: Provenance (origin) of the model definition.
305
305
@@ -417,12 +417,12 @@ The following values are supported:
417
417
- **ai-generated**: The model was generated by AI and added to the extensible predicate.
418
418
419
419
The provenance is used to distinguish between models that are manually added to the extensible predicate and models that are automatically generated.
420
-
Furthermore, it impacts the data flow analysis in the following way
420
+
Furthermore, it impacts the data flow analysis in the following way:
421
421
422
422
- A **manual** model takes precedence over **generated** models. If a **manual** model exists for an element then all generated models are ignored.
423
-
- A **generated** or **ai-generated** model is ignored during analysis, if the source code of the element it is modelling is available.
423
+
- A **generated** or **ai-generated** model is ignored during analysis, if the source code of the element it is modeling is available.
424
424
425
-
That is, generated models are less trusted than manual models and only used if neither source code or a manual model is available.
425
+
That is, generated models are less trusted than manual models and only used if neither source code nor a manual model is available.
0 commit comments