Skip to content

Commit aba7d84

Browse files
michaelnebelaeisenbergjcogs33
committed
Apply suggestions from code review
Co-authored-by: Andrew Eisenberg <[email protected]> Co-authored-by: Jami <[email protected]>
1 parent 5659b58 commit aba7d84

File tree

2 files changed

+20
-22
lines changed

2 files changed

+20
-22
lines changed

docs/codeql/codeql-language-guides/codeql-for-java.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,5 +45,3 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat
4545
- :doc:`Working with source locations <working-with-source-locations>`: You can use the location of entities within Java code to look for potential errors. Locations allow you to deduce the presence, or absence, of white space which, in some cases, may indicate a problem.
4646

4747
- :doc:`Abstract syntax tree classes for working with Java programs <abstract-syntax-tree-classes-for-working-with-java-programs>`: CodeQL has a large selection of classes for representing the abstract syntax tree of Java programs.
48-
49-
- :doc:`Customizing library models for Java <customizing-library-models-for-java>`: You can customize the CodeQL library for Java to model the behavior of your own Java libraries using data extensions.

docs/codeql/codeql-language-guides/customizing-library-models-for-java.rst

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -10,14 +10,14 @@ Customizing Library Models for Java
1010

1111
The Java analysis can be customized by adding library models (summaries, sinks and sources) in data extension files.
1212
A model is a definition of a behavior of a library element, such as a method, that is used to improve the data flow analysis precision by identifying more results.
13-
Most of the security related queries are *taint tracking* queries that tries to find paths from a *source* of untrusted input to a *sink* that represents a vulnerability.
13+
Most of the security related queries are *taint tracking* queries that try to find paths from a *source* of untrusted input to a *sink* that represents a vulnerability.
1414
Furthermore, the taint tracking queries also need to know how data can flow through elements that are not included in the source code - these are named *summaries*.
1515

16-
That is
16+
That is:
1717

1818
- **sources** are the starting points of a taint tracking data flow analysis.
1919
- **sinks** are the end points of a taint tracking data flow analysis.
20-
- **summaries** are models of elements that allows us to synthesize the elements flow behavior without having them in the source code. This is especially helpful when using a third party (or the standard) library.
20+
- **summaries** are models of elements that allow us to synthesize the elements flow behavior without having them in the source code. This is especially helpful when using a third party (or the standard) library.
2121

2222
The models are defined using data extensions where each tuple constitutes a model.
2323
A data extension file for Java is a YAML file in the form:
@@ -35,7 +35,7 @@ A data extension file for Java is a YAML file in the form:
3535
3636
Data extensions contribute to the extensible predicates defined in the CodeQL library. For more information on how to define data extensions and extensible predicates as well as how to wire them up, see the :ref:`data-extensions` documentation.
3737

38-
The CodeQL library for Java expose the following extensible predicates:
38+
The CodeQL library for Java exposes the following extensible predicates:
3939

4040
- **sourceModel**\(package, type, subtypes, name, signature, ext, output, kind, provenance). This is used for **source** models.
4141
- **sinkModel**\(package, type, subtypes, name, signature, ext, input, kind, provenance). This is used for **sink** models.
@@ -44,9 +44,9 @@ The CodeQL library for Java expose the following extensible predicates:
4444

4545
The extensible predicates are populated using data extensions specified in YAML files.
4646

47-
In the sections below, we will show by example how to add tuples to the different extensible predicates.
47+
In the sections below, we will provide examples of how to add tuples to the different extensible predicates.
4848
The extensible predicates are used to customize and improve the existing data flow queries, by providing sources, sinks, and flow through (summaries) for library elements.
49-
The :ref:`reference-material` section will provide details on the *mini DSLs* that define models for each extensible predicate.
49+
The :ref:`reference-material` section will provide details on the *mini DSLs* that defines models for each extensible predicate.
5050

5151
Example: Taint sink in the **java.sql** package.
5252
------------------------------------------------
@@ -84,11 +84,11 @@ The first five values identify the callable (in this case a method) to be modele
8484
- The fourth value **execute** is the method name.
8585
- The fifth value **(String)** is the method input type signature.
8686

87-
For most practical purposes the sixth value is not relevant.
87+
The sixth value is only relevant internally and can be omitted in most use cases.
8888
The remaining values are used to define the **access path**, the **kind**, and the **provenance** (origin) of the sink.
8989

9090
- The seventh value **Argument[0]** is the **access path** to the first argument passed to the method, which means that this is the location of the sink.
91-
- The eighth value **sql** is the kind of the sink. The sink kind is used to define the queries where the sink is in scope. In this case - the SQL injection queries.
91+
- The eighth value **sql** is the kind of the sink. The sink kind is used to define the queries where the sink is in scope. In this case - the SQL injection queries.
9292
- The ninth value **manual** is the provenance of the sink, which is used to identify the origin of the sink.
9393

9494
Example: Taint source from the **java.net** package.
@@ -122,7 +122,7 @@ The first five values identify the callable (in this case a method) to be modele
122122

123123
- The first value **java.net** is the package name.
124124
- The second value **Socket** is the name of the class (type) that contains the source.
125-
- The third value **False** flag indicates whether or not the source also applies to all overrides of the method.
125+
- The third value **False** is a flag that indicates whether or not the source also applies to all overrides of the method.
126126
- The fourth value **getInputStream** is the method name.
127127
- The fifth value **()** is the method input type signature.
128128

@@ -164,12 +164,12 @@ Since we are adding flow through a method, we need to add tuples to the **summar
164164
Each tuple defines flow from one argument to the return value.
165165
The first row defines flow from the qualifier (**s1** in the example) to the return value (**t** in the example) and the second row defines flow from the first argument (**s2** in the example) to the return value (**t** in the example).
166166

167-
The first five values are used to identify the callable (in this case a method) which we are defining a summary for.
167+
The first five values identify the callable (in this case a method) to be modeled as a summary.
168168
These are the same for both of the rows above as we are adding two summaries for the same method.
169169

170170
- The first value **java.lang** is the package name.
171171
- The second value **String** is the class (type) name.
172-
- The third value **False** is a flag indicating whether the summary also applies to all overrides of the method.
172+
- The third value **False** is a flag that indicates whether or not the summary also applies to all overrides of the method.
173173
- The fourth value **concat** is the method name.
174174
- The fifth value **(String)** is the method input type signature.
175175

@@ -183,7 +183,7 @@ The remaining values are used to define the **access path**, the **kind**, and t
183183

184184
Example: Add flow through the **map** method.
185185
---------------------------------------------
186-
In this example, we will see a more complex example of modelling flow through a method.
186+
In this example, we will see a more complex example of modeling flow through a method.
187187
This pattern shows how to model flow through higher order methods and collection types.
188188
Please note that the flow through the **map** method is already added to the CodeQL Java analysis.
189189

@@ -194,7 +194,7 @@ Please note that the flow through the **map** method is already added to the Cod
194194
...
195195
}
196196
197-
This can be achieved by adding the following data extension.
197+
This can be achieved by adding the following to a data extension file:
198198

199199
.. code-block:: yaml
200200
@@ -215,7 +215,7 @@ These are the same for both of the rows above as we are adding two summaries for
215215

216216
- The first value **java.util.stream** is the package name.
217217
- The second value **Stream** is the class (type) name.
218-
- The third value **True** is a flag indicating whether the summary also applies to all overrides of the method.
218+
- The third value **True** is a flag that indicates whether or not the summary also applies to all overrides of the method.
219219
- The fourth value **map** is the method name.
220220
- The fifth value **Function** is the method input type signature.
221221

@@ -245,7 +245,7 @@ That is, the first row models that there is value flow from the elements of the
245245
Example: Add a **neutral** method.
246246
----------------------------------
247247
In this example we will show how to model the **now** method as being neutral.
248-
This is purely for consistency and has no impact on the analysis.
248+
This is purely for completeness and has no impact on the analysis.
249249
A neutral model is used to define that there is no flow through a method.
250250
Please note that the neutral model for the **now** method is already added.
251251

@@ -284,7 +284,7 @@ Reference material
284284
------------------
285285

286286
The following sections provide reference material for extensible predicates.
287-
This includes descriptions of each of the arguments (eg. access paths, kinds and provenance).
287+
This includes descriptions of each of the arguments (e.g. access paths, kinds and provenance).
288288

289289
Extensible predicates
290290
---------------------
@@ -299,7 +299,7 @@ The shared columns are:
299299
- **type**: Name of the type containing the element(s) to be modeled.
300300
- **subtypes**: A boolean flag indicating whether the model should also apply to all overrides of the selected element(s).
301301
- **name**: Name of the element (optional). If this is left blank, it means all elements matching the previous selection criteria.
302-
- **signature**: Type signature of the selected element (optional). If this is left blank it means all elements matching the previous selection criteria.
302+
- **signature**: Type signature of the selected element (optional). If this is left blank, it means all elements matching the previous selection criteria.
303303
- **ext**: Specifies additional API-graph-like edges (mostly empty) and out of scope for this document.
304304
- **provenance**: Provenance (origin) of the model definition.
305305

@@ -417,12 +417,12 @@ The following values are supported:
417417
- **ai-generated**: The model was generated by AI and added to the extensible predicate.
418418

419419
The provenance is used to distinguish between models that are manually added to the extensible predicate and models that are automatically generated.
420-
Furthermore, it impacts the data flow analysis in the following way
420+
Furthermore, it impacts the data flow analysis in the following way:
421421

422422
- A **manual** model takes precedence over **generated** models. If a **manual** model exists for an element then all generated models are ignored.
423-
- A **generated** or **ai-generated** model is ignored during analysis, if the source code of the element it is modelling is available.
423+
- A **generated** or **ai-generated** model is ignored during analysis, if the source code of the element it is modeling is available.
424424

425-
That is, generated models are less trusted than manual models and only used if neither source code or a manual model is available.
425+
That is, generated models are less trusted than manual models and only used if neither source code nor a manual model is available.
426426

427427

428428
.. include:: ../reusables/data-extensions.rst

0 commit comments

Comments
 (0)