Skip to content

Commit 396e24c

Browse files
committed
Java: Add documentation for access paths and provenance.
1 parent c624536 commit 396e24c

File tree

1 file changed

+50
-21
lines changed

1 file changed

+50
-21
lines changed

docs/codeql/codeql-language-guides/customizing-library-models-for-java.rst

Lines changed: 50 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -271,7 +271,23 @@ This includes descriptions of each of the arguments (eg. access paths, types, an
271271
Extension points
272272
----------------
273273

274-
Below is a description of the tuple values for each extension point.
274+
Below is a description of the columns for each extension point.
275+
Sources, Sinks, Summaries and Neutrals are commonly known as Models.
276+
The semantics of many of the columns of the extension points are shared.
277+
278+
279+
The shared columns are:
280+
281+
- **package**: Name of the package.
282+
- **type**: Name of the type.
283+
- **subtypes**: A flag indicating whether the model should also apply to all overrides of the selected method(s).
284+
- **name**: Name of the method (optional). If left blank, it means all methods matching the previous selction criteria.
285+
- **signature**: Type signature of the method where the source resides (optional). If this is left blank it means all methods matching the previous selction criteria.
286+
- **ext**: Specifies additional API-graph-like edges (mostly empty).
287+
- **provenance**: Provenance (origin) of the model definition.
288+
289+
The columns **package**, **type**, **subtypes**, **name**, and **signature** are used to select the method(s) that the model applies to.
290+
275291
The section Access paths describes in more detail, how access paths are composed.
276292
This is the most complicated part of the extension points and the **mini DSL** for access paths is shared accross the extension points.
277293

@@ -280,12 +296,6 @@ sourceModel(package, type, subtypes, name, signature, ext, output, kind, provena
280296

281297
Taint source. Most taint tracking queries will use the sources added to this extensions point.
282298

283-
- **package**: Name of the package where the source resides.
284-
- **type**: Name of the type where the source resides.
285-
- **subtypes**: Whether the source should also apply to all overrides of the method.
286-
- **name**: Name of the method where the source resides.
287-
- **signature**: Type signature of the method where the source resides.
288-
- **ext**: Specifies additional API-graph-like edges (mostly empty).
289299
- **output**: Access path to the source, where the possibly tainted data flows from.
290300
- **kind**: Kind of the source.
291301
- **provenance**: Provenance (origin) of the source definition.
@@ -303,15 +313,8 @@ sinkModel(package, type, subtypes, name, signature, ext, input, kind, provenance
303313

304314
Taint sink. As opposed to source kinds, there are many different kinds of sinks as these tend to be more query specific.
305315

306-
- **package**: Name of the package where the sink resides.
307-
- **type**: Name of the type where the sink resides.
308-
- **subtypes**: Whether the sink should also apply to all overrides of the method.
309-
- **name**: Name of the method where the sink resides.
310-
- **signature**: Type signature of the method where the sink resides.
311-
- **ext**: Specifies additional API-graph-like edges (mostly empty).
312316
- **input**: Access path to the sink, where we want to check if possibly tainted data flows too.
313317
- **kind**: Kind of the sink.
314-
- **provenance**: Provenance (origin) of the sink definition.
315318

316319
The following sink kinds are supported:
317320

@@ -347,12 +350,6 @@ summaryModel(package, type, subtypes, name, signature, ext, input, output, kind,
347350

348351
Flow through. This extension point is used to model flow through methods.
349352

350-
- **package**: Name of the package where the method resides.
351-
- **type**: Name of the type where the method resides.
352-
- **subtypes**: Whether the method should also apply to all overrides of the method.
353-
- **name**: Name of the method where we are defining flow through.
354-
- **signature**: Type signature of the method where we are defining flow through.
355-
- **ext**: Specifies additional API-graph-like edges (mostly empty).
356353
- **input**: Access path to the input of the method where data will flow to the output.
357354
- **output**: Access path to the output of the method where data will flow from the input.
358355
- **kind**: Kind of the flow through.
@@ -368,6 +365,38 @@ neutralModel(package, type, name, signature, provenance)
368365

369366
Access paths
370367
------------
368+
The **input**, and **output** columns consist of a **.**-separated list of components, which is evaluted from left to right, with each step selecting a new set of values derived from the previous set of values.
369+
370+
The following components are supported:
371+
372+
- **Argument[**\ `n`\ **]** selects the argument at index `n` (zero-indexed).
373+
- **Argument[**\ `-1`\ **]** selects the qualifier of the call.
374+
- **Argument[**\ `n1..n2`\ **]** selects the arguments in the given range (both ends included).
375+
- **Parameter[**\ `n`\ **]** selects the parameter at index `n` (zero-indexed).
376+
- **Parameter[**\ `n1..n2`\ **]** selects the parameters in the given range (both ends included).
377+
- **ReturnValue** selects the return value.
378+
- **Field[**\ `name`\ **]** selects the field with the fully qualified name `name`.
379+
- **SyntheticField[**\ `name`\ **]** selects the synthetic field with name `name`.
380+
- **ArrayElement** selects the elements of an array.
381+
- **Element** selects the elements of a collection-like container.
382+
- **MapKey** selects the element keys of a map.
383+
- **MapValue** selects the element values of a map.
371384

372385
Provenance
373-
----------
386+
----------
387+
388+
The **provenance** column is used to specify the provenance (origin) of the model definition.
389+
390+
The following values are supported:
391+
392+
- **manual**: The model was manually created (or verified by a human) and added to the extension point.
393+
- **generated**: The model was generated by the model generator and added to the extension point.
394+
- **ai-generated**: The model was generated by AI and added to the extension point.
395+
396+
The provenance is used to distinguish between models that are manually added to the extension point and models that are automatically generated.
397+
Furthermore, it impacts the dataflow analysis in the following way
398+
399+
- A **manual** model takes precedence over **generated** models. If a **manual** model exist for a method then all generated models are ignored.
400+
- A **generated** or **ai-generated** model is ignored during analysis, if the source code of the method they are modelling is available.
401+
402+
That is, generated models are less trusted than manual models.

0 commit comments

Comments
 (0)