Skip to content

Commit 1997e0a

Browse files
authored
Merge pull request github#18427 from asgerf/jss/change-note
JS: Add migration guide and change note
2 parents b6b93dc + 10d5d09 commit 1997e0a

File tree

6 files changed

+328
-8
lines changed

6 files changed

+328
-8
lines changed

docs/codeql/codeql-language-guides/codeql-for-javascript.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat
1818
abstract-syntax-tree-classes-for-working-with-javascript-and-typescript-programs
1919
data-flow-cheat-sheet-for-javascript
2020
customizing-library-models-for-javascript
21+
migrating-javascript-dataflow-queries
2122

2223
- :doc:`Basic query for JavaScript and TypeScript code <basic-query-for-javascript-code>`: Learn to write and run a simple CodeQL query.
2324

@@ -37,4 +38,6 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat
3738

3839
- :doc:`Data flow cheat sheet for JavaScript <data-flow-cheat-sheet-for-javascript>`: This article describes parts of the JavaScript libraries commonly used for variant analysis and in data flow queries.
3940

40-
- :doc:`Customizing library models for JavaScript <customizing-library-models-for-javascript>`: You can model frameworks and libraries that your codebase depends on using data extensions and publish them as CodeQL model packs.
41+
- :doc:`Customizing library models for JavaScript <customizing-library-models-for-javascript>`: You can model frameworks and libraries that your codebase depends on using data extensions and publish them as CodeQL model packs.
42+
43+
- :doc:`Migrating JavaScript dataflow queries <migrating-javascript-dataflow-queries>`: Guide on migrating data flow queries to the new data flow library.
Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
.. _migrating-javascript-dataflow-queries:
2+
3+
Migrating JavaScript Dataflow Queries
4+
=====================================
5+
6+
The JavaScript analysis used to have its own data flow library, which differed from the shared data flow
7+
library used by other languages. This library has now been deprecated in favor of the shared library.
8+
9+
This article explains how to migrate JavaScript data flow queries to use the shared data flow library,
10+
and some important differences to be aware of. Note that the article on :ref:`analyzing data flow in JavaScript and TypeScript <analyzing-data-flow-in-javascript-and-typescript>`
11+
provides a general guide to the new data flow library, whereas this article aims to help with migrating existing queries from the old data flow library.
12+
13+
Note that the ``DataFlow::Configuration`` class is still backed by the original data flow library, but has been marked as deprecated.
14+
This means data flow queries using this class will continue to work, albeit with deprecation warnings, until the 1-year deprecation period expires in early 2026.
15+
It is recommended that all custom queries are migrated before this time, to ensure they continue to work in the future.
16+
17+
Data flow queries should be migrated to use ``DataFlow::ConfigSig``-style modules instead of the ``DataFlow::Configuration`` class.
18+
This is identical to the interface found in other languages.
19+
When making this switch, the query will become backed by the shared data flow library instead. That is, data flow queries will only work
20+
with the shared data flow library when they have been migrated to ``ConfigSig``-style, as shown in the following table:
21+
22+
.. list-table:: Data flow libraries
23+
:widths: 20 80
24+
:header-rows: 1
25+
26+
* - API
27+
- Implementation
28+
* - ``DataFlow::Configuration``
29+
- Old library (deprecated, to be removed in early 2026)
30+
* - ``DataFlow::ConfigSig``
31+
- Shared library
32+
33+
A straightforward translation to ``DataFlow::ConfigSig``-style is usually possible, although there are some complications
34+
that may cause the query to behave differently.
35+
We'll first cover some straightforward migration examples, and then go over some of the complications that may arise.
36+
37+
Simple migration example
38+
------------------------
39+
40+
A simple example of a query using the old data flow library is shown below:
41+
42+
.. code-block:: ql
43+
44+
/** @kind path-problem */
45+
import javascript
46+
import DataFlow::PathGraph
47+
48+
class MyConfig extends DataFlow::Configuration {
49+
MyConfig() { this = "MyConfig" }
50+
51+
override predicate isSource(DataFlow::Node node) { ... }
52+
53+
override predicate isSink(DataFlow::Node node) { ... }
54+
}
55+
56+
from MyConfig cfg, DataFlow::PathNode source, DataFlow::PathNode sink
57+
where cfg.hasFlowPath(source, sink)
58+
select sink, source, sink, "Flow found"
59+
60+
With the new style this would look like this:
61+
62+
.. code-block:: ql
63+
64+
/** @kind path-problem */
65+
import javascript
66+
67+
module MyConfig implements DataFlow::ConfigSig {
68+
predicate isSource(DataFlow::Node node) { ... }
69+
70+
predicate isSink(DataFlow::Node node) { ... }
71+
}
72+
73+
module MyFlow = DataFlow::Global<MyConfig>;
74+
75+
import MyFlow::PathGraph
76+
77+
from MyFlow::PathNode source, MyFlow::PathNode sink
78+
where MyFlow::flowPath(source, sink)
79+
select sink, source, sink, "Flow found"
80+
81+
The changes can be summarized as:
82+
83+
- The ``DataFlow::Configuration`` class was replaced with a module implementing ``DataFlow::ConfigSig``.
84+
- The characteristic predicate was removed (modules have no characteristic predicates).
85+
- Predicates such as ``isSource`` no longer have the ``override`` keyword (as they are defined in a module now).
86+
- The configuration module is being passed to ``DataFlow::Global``, resulting in a new module, called ``MyFlow`` in this example.
87+
- The query imports ``MyFlow::PathGraph`` instead of ``DataFlow::PathGraph``.
88+
- The ``MyConfig cfg`` variable was removed from the ``from`` clause.
89+
- The ``hasFlowPath`` call was replaced with ``MyFlow::flowPath``.
90+
- The type ``DataFlow::PathNode`` was replaced with ``MyFlow::PathNode``.
91+
92+
With these changes, we have produced an equivalent query that is backed by the new data flow library.
93+
94+
Taint tracking
95+
--------------
96+
97+
For configuration classes extending ``TaintTracking::Configuration``, the migration is similar but with a few differences:
98+
99+
- The ``TaintTracking::Global`` module should be used instead of ``DataFlow::Global``.
100+
- Some predicates originating from ``TaintTracking::Configuration`` should be renamed to match the ``DataFlow::ConfigSig`` interface:
101+
- ``isSanitizer`` should be renamed to ``isBarrier``.
102+
- ``isAdditionalTaintStep`` should be renamed to ``isAdditionalFlowStep``.
103+
104+
Note that there is no such thing as ``TaintTracking::ConfigSig``. The ``DataFlow::ConfigSig`` interface is used for both data flow and taint tracking.
105+
106+
For example:
107+
108+
.. code-block:: ql
109+
110+
class MyConfig extends TaintTracking::Configuration {
111+
MyConfig() { this = "MyConfig" }
112+
113+
predicate isSanitizer(DataFlow::Node node) { ... }
114+
predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) { ... }
115+
...
116+
}
117+
118+
The above configuration can be migrated to the shared data flow library as follows:
119+
120+
.. code-block:: ql
121+
122+
module MyConfig implements DataFlow::ConfigSig {
123+
predicate isBarrier(DataFlow::Node node) { ... }
124+
predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) { ... }
125+
...
126+
}
127+
128+
module MyFlow = TaintTracking::Global<MyConfig>;
129+
130+
131+
Flow labels and flow states
132+
---------------------------
133+
134+
The ``DataFlow::FlowLabel`` class has been deprecated. Queries that relied on flow labels should use the new `flow state` concept instead.
135+
This is done by implementing ``DataFlow::StateConfigSig`` instead of ``DataFlow::ConfigSig``, and passing the module to ``DataFlow::GlobalWithState``
136+
or ``TaintTracking::GlobalWithState``. See :ref:`using flow state <using-flow-labels-for-precise-data-flow-analysis>` for more details about flow state.
137+
138+
Some changes to be aware of:
139+
140+
- The 4-argument version of ``isAdditionalFlowStep`` now takes parameters in a different order.
141+
It now takes ``node1, state1, node2, state2`` instead of ``node1, node2, state1, state2``.
142+
- Taint steps apply to all flow states, not just the ``taint`` flow label. See more details further down in this article.
143+
144+
Barrier guards
145+
--------------
146+
147+
The predicates ``isBarrierGuard`` and ``isSanitizerGuard`` have been removed.
148+
149+
Instead, the ``isBarrier`` predicate must be used to define all barriers. To do this, barrier guards can be reduced to a set of barrier nodes using the ``DataFlow::MakeBarrierGuard`` module.
150+
151+
For example, consider this data flow configuration using a barrier guard:
152+
153+
.. code-block:: ql
154+
155+
class MyConfig extends DataFlow::Configuration {
156+
override predicate isBarrierGuard(DataFlow::BarrierGuardNode node) {
157+
node instanceof MyBarrierGuard
158+
}
159+
..
160+
}
161+
162+
class MyBarrierGuard extends DataFlow::BarrierGuardNode {
163+
MyBarrierGuard() { ... }
164+
165+
override predicate blocks(Expr e, boolean outcome) { ... }
166+
}
167+
168+
This can be migrated to the shared data flow library as follows:
169+
170+
.. code-block:: ql
171+
172+
module MyConfig implements DataFlow::ConfigSig {
173+
predicate isBarrier(DataFlow::Node node) {
174+
node = DataFlow::MakeBarrierGuard<MyBarrierGuard>::getABarrierNode()
175+
}
176+
..
177+
}
178+
179+
class MyBarrierGuard extends DataFlow::Node {
180+
MyBarrierGuard() { ... }
181+
182+
predicate blocksExpr(Expr e, boolean outcome) { ... }
183+
}
184+
185+
The changes can be summarized as:
186+
- The contents of ``isBarrierGuard`` have been moved to ``isBarrier``.
187+
- The ``node instanceof MyBarrierGuard`` check was replaced with ``node = DataFlow::MakeBarrierGuard<MyBarrierGuard>::getABarrierNode()``.
188+
- The ``MyBarrierGuard`` class no longer has ``DataFlow::BarrierGuardNode`` as a base class. We simply use ``DataFlow::Node`` instead.
189+
- The ``blocks`` predicate has been renamed to ``blocksExpr`` and no longer has the ``override`` keyword.
190+
191+
See :ref:`using flow state <using-flow-labels-for-precise-data-flow-analysis>` for examples of how to use barrier guards with flow state.
192+
193+
Query-specific load and store steps
194+
-----------------------------------
195+
196+
The predicates ``isAdditionalLoadStep``, ``isAdditionalStoreStep``, and ``isAdditionalLoadStoreStep`` have been removed. There is no way to emulate the original behavior.
197+
198+
Library models can still contribute such steps, but they will be applicable to all queries. Also see the section on jump steps further down.
199+
200+
Changes in behavior
201+
--------------------
202+
203+
When the query has been migrated to the new interface, it may seem to behave differently due to some technical differences in the internals of
204+
the two data flow libraries. The most significant changes are described below.
205+
206+
Taint steps now propagate all flow states
207+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
208+
209+
There's an important change from the old data flow library when using flow state and taint-tracking together.
210+
211+
When using ``TaintTracking::GlobalWithState``, all flow states can propagate along taint steps.
212+
In the old data flow library, only the ``taint`` flow label could propagate along taint steps.
213+
A straightforward translation of such a query may therefore result in new flow paths being found, which might be unexpected.
214+
215+
To emulate the old behavior, use ``DataFlow::GlobalWithState`` instead of ``TaintTracking::GlobalWithState``,
216+
and manually add taint steps using ``isAdditionalFlowStep``. The predicate ``TaintTracking::defaultTaintStep`` can be used to access to the set of taint steps.
217+
218+
For example:
219+
220+
.. code-block:: ql
221+
222+
module MyConfig implements DataFlow::StateConfigSig {
223+
class FlowState extends string {
224+
FlowState() { this = ["taint", "foo"] }
225+
}
226+
227+
predicate isAdditionalFlowStep(DataFlow::Node node1, FlowState state1, DataFlow::Node node2, FlowState state2) {
228+
// Allow taint steps to propagate the "taint" flow state
229+
TaintTracking::defaultTaintStep(node1, node2) and
230+
state1 = "taint" and
231+
state2 = state
232+
}
233+
234+
...
235+
}
236+
237+
module MyFlow = DataFlow::GlobalWithState<MyConfig>;
238+
239+
240+
Jump steps across function boundaries
241+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
242+
243+
When a flow step crosses a function boundary, that is, it starts and ends in two different functions, it will now be classified as a "jump" step.
244+
245+
Jump steps can be problematic in some cases. Roughly speaking, the data flow library will "forget" which call site it came from when following a jump step.
246+
This can lead to spurious flow paths that go into a function through one call site, and back out of a different call site.
247+
248+
If the step was generated by a library model, that is, the step is applicable to all queries, this is best mitigated by converting the step to a flow summary.
249+
For example, the following library model adds a taint step from ``x`` to ``y`` in ``foo.bar(x, y => {})``:
250+
251+
.. code-block:: ql
252+
253+
class MyStep extends TaintTracking::SharedTaintStep {
254+
override predicate step(DataFlow::Node node1, DataFlow::Node node2) {
255+
exists(DataFlow::CallNode call |
256+
call = DataFlow::moduleMember("foo", "bar").getACall() and
257+
node1 = call.getArgument(0) and
258+
node2 = call.getCallback(1).getParameter(0)
259+
)
260+
}
261+
}
262+
263+
Because this step crosses a function boundary, it becomes a jump step. This can be avoided by converting it to a flow summary as follows:
264+
265+
.. code-block:: ql
266+
267+
class MySummary extends DataFlow::SummarizedCallable {
268+
MySummary() { this = "MySummary" }
269+
270+
override DataFlow::CallNode getACall() { result = DataFlow::moduleMember("foo", "bar").getACall() }
271+
272+
override predicate propagatesFlow(string input, string output, boolean preservesValue) {
273+
input = "Argument[this]" and
274+
output = "Argument[1].Parameter[0]" and
275+
preservesValue = false // taint step
276+
}
277+
}
278+
279+
See :ref:`customizing library models for JavaScript <customizing-library-models-for-javascript>` for details about the format of the ``input`` and ``output`` strings.
280+
The aforementioned article also provides guidance on how to store the flow summary in a data extension.
281+
282+
For query-specific steps that cross function boundaries, that is, steps added with ``isAdditionalFlowStep``, there is currently no way to emulate the original behavior.
283+
A possible workaround is to convert the query-specific step to a flow summary. In this case it should be stored in a data extension to avoid performance issues, although this also means
284+
that all other queries will be able to use the flow summary.
285+
286+
Barriers block all flows
287+
~~~~~~~~~~~~~~~~~~~~~~~~
288+
289+
In the shared data flow library, a barrier blocks all flows, even if the tracked value is inside a content.
290+
291+
In the old data flow library, only barriers specific to the ``data`` flow label blocked flows when the tracked value was inside a content.
292+
293+
This rarely has significant impact, but some users may observe some result changes because of this.
294+
295+
There is currently no way to emulate the original behavior.
296+
297+
Further reading
298+
---------------
299+
300+
- :ref:`Analyzing data flow in JavaScript and TypeScript <analyzing-data-flow-in-javascript-and-typescript>` provides a general guide to the new data flow library.
301+
- :ref:`Using flow state for precise data flow analysis <using-flow-labels-for-precise-data-flow-analysis>` provides a general guide on using flow state.
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
category: deprecated
3+
---
4+
* Custom data flow queries will need to be migrated in order to use the shared data flow library. Until migrated, such queries will compile with deprecation warnings and run with a
5+
deprecated copy of the old data flow library. The deprecation layer will be removed in early 2026, after which any unmigrated queries will stop working.
6+
See more information in the [migration guide](https://codeql.github.com/docs/codeql-language-guides/migrating-javascript-dataflow-queries).
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
category: majorAnalysis
3+
---
4+
* All data flow queries are now using the same underlying data flow library as the other languages analyses, replacing the old one written specifically for JavaScript/TypeScript.
5+
This is a significant change and users may consequently observe differences in the alerts generated by the analysis.

javascript/ql/lib/semmle/javascript/dataflow/Configuration.qll

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,6 @@
66
* Additional data flow edges can be specified, and conversely certain nodes or
77
* edges can be designated as _barriers_ that block flow.
88
*
9-
* NOTE: The API of this library is not stable yet and may change in
10-
* the future.
11-
*
12-
*
139
* # Technical overview
1410
*
1511
* This module implements a summarization-based inter-procedural data flow
@@ -78,6 +74,11 @@ private import AdditionalFlowSteps
7874
private import internal.DataFlowPrivate as DataFlowPrivate
7975

8076
/**
77+
* DEPRECATED.
78+
* Subclasses of this class should be replaced by a module implementing the new `ConfigSig` or `StateConfigSig` interface.
79+
* See the [migration guide](https://codeql.github.com/docs/codeql-language-guides/migrating-javascript-dataflow-queries) for more details.
80+
*
81+
* #### Legacy documentation
8182
* A data flow tracking configuration for finding inter-procedural paths from
8283
* sources to sinks.
8384
*

javascript/ql/lib/semmle/javascript/dataflow/TaintTracking.qll

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,6 @@
88
* substrings. As for data flow configurations, additional flow edges can be
99
* specified, and conversely certain nodes or edges can be designated as taint
1010
* _sanitizers_ that block flow.
11-
*
12-
* NOTE: The API of this library is not stable yet and may change in
13-
* the future.
1411
*/
1512

1613
import javascript
@@ -27,6 +24,13 @@ module TaintTracking {
2724
import AdditionalTaintSteps
2825

2926
/**
27+
* DEPRECATED.
28+
* Subclasses of this class should be replaced by a module implementing the new `ConfigSig` or `StateConfigSig` interface.
29+
* See the [migration guide](https://codeql.github.com/docs/codeql-language-guides/migrating-javascript-dataflow-queries) for more details.
30+
*
31+
* When migrating a `TaintTracking::Configuration` to `DataFlow::ConfigSig`, use `TaintTracking::Global<...>` instead of `DataFlow::Global<...>`.
32+
*
33+
* #### Legacy documentation
3034
* A data flow tracking configuration that considers taint propagation through
3135
* objects, arrays, promises and strings in addition to standard data flow.
3236
*

0 commit comments

Comments
 (0)