Skip to content

Commit eb97ce3

Browse files
author
Stephan Brandauer
committed
Java: automodel extraction query docs, candidate examples
1 parent bcde466 commit eb97ce3

File tree

1 file changed

+109
-1
lines changed

1 file changed

+109
-1
lines changed

java/ql/automodel/src/README.md

Lines changed: 109 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,8 +57,116 @@ The +/- and candidate extraction queries largely<sup>[1](#largely-use-characteri
5757

5858
#### :warning: Warning
5959

60-
Do not to "fix" shortcomings that could be fixed by a better prompt or better example selection by adding language- or mode-specific characteristics . Those "fixes" tend to be confusing downstream when questions like "why wasn't this location selected as a candidate?" is harder and harder to answer. It's best to rely on characteristics in the code that is shared across all languages and modes (see [Shared Code](#shared-code)).
60+
Do not to "fix" shortcomings that could be fixed by a better prompt or better example selection by adding language- or mode-specific characteristics . Those "fixes" tend to be confusing downstream when questions like "why wasn't this location selected as a candidate?" becomes progressively harder and harder to answer. It's best to rely on characteristics in the code that is shared across all languages and modes (see [Shared Code](#shared-code)).
6161

6262
## Shared Code
6363

6464
A significant part of the behavior of extraction queries is implemented in shared modules. When we add support for new languages, we expect to move the shared code to a separate QL pack. In the mean time, shared code modules must not import any java libraries.
65+
66+
## Candidate Examples
67+
68+
This section contains a few examples of the kinds of candidates that our queries might select, and why.
69+
70+
:warning: For clarity, this section presents "candidates" that are **actual** sinks. Therefore, the candidates presented here would actually be selected as positive examples in practice - rather than as candidates.
71+
72+
### Framework Mode Candidates
73+
74+
Framework mode is special because in framework mode, we extract candidates (as well as examples) from the implementation of a framework or library while the resulting models are applied in code bases that are _using_ the framework or library.
75+
76+
In framework mode, endpoints currently can have a number of shapes (see: `newtype TFrameworkModeEndpoint` in [AutomodelApplicationModeExtractCandidates.ql](https://github.com/github/codeql/blob/main/java/ql/automodel/src/AutomodelFrameworkModeCharacteristics.qll)). Depending on what kind of endpoint it is, the candidate is a candidate for one or several extensible types (eg., `sinkModel`, `sourceModel`).
77+
78+
#### Framework Mode Sink Candidates
79+
80+
Sink candidates in framework mode are inputs to calls. As, in framework mode, we work on the implementation of a callable, these inputs are represented by a method's parameter definition.
81+
82+
For example, customer code could call the `Files.copy` method:
83+
84+
```java
85+
// customer code using a library
86+
...
87+
Files.copy(userInputPath, outStream);
88+
...
89+
```
90+
91+
In order for `userInputPath` to be modeled as a sink, the corresponding parameter must be selected as a candidate. In the following example, assuming they're not modeled yet, the parameters `source` and `out` would be candidates:
92+
93+
```java
94+
// Files.java
95+
// library code that's analyzed in framework mode
96+
public class Files {
97+
public static void copy(Path source, OutputStream out) throws IOException {
98+
// ...
99+
}
100+
}
101+
```
102+
103+
#### Framework Mode Source Candidates
104+
105+
Source candidates are a bit more varied than sink candidates:
106+
107+
##### Parameters as Source Candidates
108+
109+
A parameter could be a source, eg. when a framework passes user-controlled data to a handler defined in customer code.
110+
```java
111+
// customer code using a library:
112+
import java.net.http.WebSocket;
113+
114+
final class MyListener extends WebSocket.Listener {
115+
@override
116+
public CompletionStage<?> onText(WebSocket ws, CharSequence cs, boolean last) {
117+
... process data that was received from websocket
118+
}
119+
}
120+
```
121+
122+
In this case, data passed to the program via a web socket connection is a source of remote data. Therefore, when we look at the implementation of `WebSocket.Listener` in framework mode, we need to produce a candidate for each parameter:
123+
124+
```java
125+
// WebSocket.java
126+
// library code that's analyzed in framework mode
127+
interface Listener {
128+
...
129+
default CompletionStage<?> onText(WebSocket webSocket CharSequence data, boolean last) {
130+
// <omitting default implementation>
131+
}
132+
...
133+
}
134+
```
135+
136+
For framework mode, all parameters of the `onText` method should be candidates. If the candidates result in a model, the parameters of classes implementing this interface will be recognized as sources of remote data.
137+
138+
:warning: a consequence of this is that we can have endpoints in framework mode that are both sink candidates, as well as source candidates.
139+
140+
##### Return Values as Source Candidates
141+
142+
The other kind of source candidate we model is the return value of a method. For example:
143+
144+
```java
145+
public class Socket {
146+
...
147+
public InputStream getInputStream() throws IOException {
148+
...
149+
}
150+
...
151+
}
152+
```
153+
154+
This method returns a source of remote data that should be modeled as a sink. We therefore want to select the _method_ as a candidate.
155+
156+
### Application Mode Candidates
157+
158+
In application mode, we extract candidates from an application that is using various libraries.
159+
160+
#### Application Mode Source Candidates
161+
162+
##### Overridden Parameters as Source Candidates
163+
164+
In application mode, a parameter of a method that is overriding another method is taken as a source parameter to account for cases like the `WebSocket.Listener` example above where an application is implementing a "handler" that receives remote data.
165+
166+
##### Return Values as Source Candidates
167+
168+
Just like in framework mode, application mode also has to consider the return value of a call as a source candidate. The difference is that in application mode, we extract from the application sources, not the library sources. Therefore, we use the invocation expression as a candidate (unlike in framework mode, where we use the method definition).
169+
170+
#### Application Mode Sink Candidates
171+
172+
In application mode, arguments to calls are sink candidates.

0 commit comments

Comments
 (0)