You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: java/ql/automodel/src/README.md
+109-1Lines changed: 109 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,8 +57,116 @@ The +/- and candidate extraction queries largely<sup>[1](#largely-use-characteri
57
57
58
58
#### :warning: Warning
59
59
60
-
Do not to "fix" shortcomings that could be fixed by a better prompt or better example selection by adding language- or mode-specific characteristics . Those "fixes" tend to be confusing downstream when questions like "why wasn't this location selected as a candidate?" is harder and harder to answer. It's best to rely on characteristics in the code that is shared across all languages and modes (see [Shared Code](#shared-code)).
60
+
Do not to "fix" shortcomings that could be fixed by a better prompt or better example selection by adding language- or mode-specific characteristics . Those "fixes" tend to be confusing downstream when questions like "why wasn't this location selected as a candidate?" becomes progressively harder and harder to answer. It's best to rely on characteristics in the code that is shared across all languages and modes (see [Shared Code](#shared-code)).
61
61
62
62
## Shared Code
63
63
64
64
A significant part of the behavior of extraction queries is implemented in shared modules. When we add support for new languages, we expect to move the shared code to a separate QL pack. In the mean time, shared code modules must not import any java libraries.
65
+
66
+
## Candidate Examples
67
+
68
+
This section contains a few examples of the kinds of candidates that our queries might select, and why.
69
+
70
+
:warning: For clarity, this section presents "candidates" that are **actual** sinks. Therefore, the candidates presented here would actually be selected as positive examples in practice - rather than as candidates.
71
+
72
+
### Framework Mode Candidates
73
+
74
+
Framework mode is special because in framework mode, we extract candidates (as well as examples) from the implementation of a framework or library while the resulting models are applied in code bases that are _using_ the framework or library.
75
+
76
+
In framework mode, endpoints currently can have a number of shapes (see: `newtype TFrameworkModeEndpoint` in [AutomodelApplicationModeExtractCandidates.ql](https://github.com/github/codeql/blob/main/java/ql/automodel/src/AutomodelFrameworkModeCharacteristics.qll)). Depending on what kind of endpoint it is, the candidate is a candidate for one or several extensible types (eg., `sinkModel`, `sourceModel`).
77
+
78
+
#### Framework Mode Sink Candidates
79
+
80
+
Sink candidates in framework mode are inputs to calls. As, in framework mode, we work on the implementation of a callable, these inputs are represented by a method's parameter definition.
81
+
82
+
For example, customer code could call the `Files.copy` method:
83
+
84
+
```java
85
+
// customer code using a library
86
+
...
87
+
Files.copy(userInputPath, outStream);
88
+
...
89
+
```
90
+
91
+
In order for `userInputPath` to be modeled as a sink, the corresponding parameter must be selected as a candidate. In the following example, assuming they're not modeled yet, the parameters `source` and `out` would be candidates:
In this case, data passed to the program via a web socket connection is a source of remote data. Therefore, when we look at the implementation of `WebSocket.Listener` in framework mode, we need to produce a candidate for each parameter:
For framework mode, all parameters of the `onText` method should be candidates. If the candidates result in a model, the parameters of classes implementing this interface will be recognized as sources of remote data.
137
+
138
+
:warning: a consequence of this is that we can have endpoints in framework mode that are both sink candidates, as well as source candidates.
139
+
140
+
##### Return Values as Source Candidates
141
+
142
+
The other kind of source candidate we model is the return value of a method. For example:
This method returns a source of remote data that should be modeled as a sink. We therefore want to select the _method_ as a candidate.
155
+
156
+
### Application Mode Candidates
157
+
158
+
In application mode, we extract candidates from an application that is using various libraries.
159
+
160
+
#### Application Mode Source Candidates
161
+
162
+
##### Overridden Parameters as Source Candidates
163
+
164
+
In application mode, a parameter of a method that is overriding another method is taken as a source parameter to account for cases like the `WebSocket.Listener` example above where an application is implementing a "handler" that receives remote data.
165
+
166
+
##### Return Values as Source Candidates
167
+
168
+
Just like in framework mode, application mode also has to consider the return value of a call as a source candidate. The difference is that in application mode, we extract from the application sources, not the library sources. Therefore, we use the invocation expression as a candidate (unlike in framework mode, where we use the method definition).
169
+
170
+
#### Application Mode Sink Candidates
171
+
172
+
In application mode, arguments to calls are sink candidates.
0 commit comments