|
| 1 | +.. _migrating-javascript-dataflow-queries: |
| 2 | + |
| 3 | +Migrating JavaScript Dataflow Queries |
| 4 | +===================================== |
| 5 | + |
| 6 | +The JavaScript analysis used to have its own data flow library, which differed from the shared data flow |
| 7 | +library used by other languages. This library has now been deprecated in favor of the shared library. |
| 8 | + |
| 9 | +This article explains how to migrate JavaScript data flow queries to use the shared data flow library, |
| 10 | +and some important differences to be aware of. Note that the article on :ref:`analyzing data flow in JavaScript and TypeScript <analyzing-data-flow-in-javascript-and-typescript>` |
| 11 | +provides a general guide to the new data flow library, whereas this article aims to help with migrating existing queries from the old data flow library. |
| 12 | + |
| 13 | +Note that the ``DataFlow::Configuration`` class is still backed by the original data flow library, but has been marked as deprecated. |
| 14 | +This means data flow queries using this class will continue to work, albeit with deprecation warnings, until the 1-year deprecation period expires in early 2026. |
| 15 | +It is recommended that all custom queries are migrated before this time, to ensure they continue to work in the future. |
| 16 | + |
| 17 | +Data flow queries should be migrated to use ``DataFlow::ConfigSig``-style modules instead of the ``DataFlow::Configuration`` class. |
| 18 | +This is identical to the interface found in other languages. |
| 19 | +When making this switch, the query will become backed by the shared data flow library instead. That is, data flow queries will only work |
| 20 | +with the shared data flow library when they have been migrated to ``ConfigSig``-style, as shown in the following table: |
| 21 | + |
| 22 | +.. list-table:: Data flow libraries |
| 23 | + :widths: 20 80 |
| 24 | + :header-rows: 1 |
| 25 | + |
| 26 | + * - API |
| 27 | + - Implementation |
| 28 | + * - ``DataFlow::Configuration`` |
| 29 | + - Old library (deprecated, to be removed in early 2026) |
| 30 | + * - ``DataFlow::ConfigSig`` |
| 31 | + - Shared library |
| 32 | + |
| 33 | +A straightforward translation to ``DataFlow::ConfigSig``-style is usually possible, although there are some complications |
| 34 | +that may cause the query to behave differently. |
| 35 | +We'll first cover some straightforward migration examples, and then go over some of the complications that may arise. |
| 36 | + |
| 37 | +Simple migration example |
| 38 | +------------------------ |
| 39 | + |
| 40 | +A simple example of a query using the old data flow library is shown below: |
| 41 | + |
| 42 | +.. code-block:: ql |
| 43 | +
|
| 44 | + /** @kind path-problem */ |
| 45 | + import javascript |
| 46 | + import DataFlow::PathGraph |
| 47 | +
|
| 48 | + class MyConfig extends DataFlow::Configuration { |
| 49 | + MyConfig() { this = "MyConfig" } |
| 50 | +
|
| 51 | + override predicate isSource(DataFlow::Node node) { ... } |
| 52 | +
|
| 53 | + override predicate isSink(DataFlow::Node node) { ... } |
| 54 | + } |
| 55 | +
|
| 56 | + from MyConfig cfg, DataFlow::PathNode source, DataFlow::PathNode sink |
| 57 | + where cfg.hasFlowPath(source, sink) |
| 58 | + select sink, source, sink, "Flow found" |
| 59 | +
|
| 60 | +With the new style this would look like this: |
| 61 | + |
| 62 | +.. code-block:: ql |
| 63 | +
|
| 64 | + /** @kind path-problem */ |
| 65 | + import javascript |
| 66 | +
|
| 67 | + module MyConfig implements DataFlow::ConfigSig { |
| 68 | + predicate isSource(DataFlow::Node node) { ... } |
| 69 | +
|
| 70 | + predicate isSink(DataFlow::Node node) { ... } |
| 71 | + } |
| 72 | +
|
| 73 | + module MyFlow = DataFlow::Global<MyConfig>; |
| 74 | +
|
| 75 | + import MyFlow::PathGraph |
| 76 | +
|
| 77 | + from MyFlow::PathNode source, MyFlow::PathNode sink |
| 78 | + where MyFlow::flowPath(source, sink) |
| 79 | + select sink, source, sink, "Flow found" |
| 80 | +
|
| 81 | +The changes can be summarized as: |
| 82 | + |
| 83 | +- The ``DataFlow::Configuration`` class was replaced with a module implementing ``DataFlow::ConfigSig``. |
| 84 | +- The characteristic predicate was removed (modules have no characteristic predicates). |
| 85 | +- Predicates such as ``isSource`` no longer have the ``override`` keyword (as they are defined in a module now). |
| 86 | +- The configuration module is being passed to ``DataFlow::Global``, resulting in a new module, called ``MyFlow`` in this example. |
| 87 | +- The query imports ``MyFlow::PathGraph`` instead of ``DataFlow::PathGraph``. |
| 88 | +- The ``MyConfig cfg`` variable was removed from the ``from`` clause. |
| 89 | +- The ``hasFlowPath`` call was replaced with ``MyFlow::flowPath``. |
| 90 | +- The type ``DataFlow::PathNode`` was replaced with ``MyFlow::PathNode``. |
| 91 | + |
| 92 | +With these changes, we have produced an equivalent query that is backed by the new data flow library. |
| 93 | + |
| 94 | +Taint tracking |
| 95 | +-------------- |
| 96 | + |
| 97 | +For configuration classes extending ``TaintTracking::Configuration``, the migration is similar but with a few differences: |
| 98 | + |
| 99 | +- The ``TaintTracking::Global`` module should be used instead of ``DataFlow::Global``. |
| 100 | +- Some predicates originating from ``TaintTracking::Configuration`` should be renamed to match the ``DataFlow::ConfigSig`` interface: |
| 101 | + - ``isSanitizer`` should be renamed to ``isBarrier``. |
| 102 | + - ``isAdditionalTaintStep`` should be renamed to ``isAdditionalFlowStep``. |
| 103 | + |
| 104 | +Note that there is no such thing as ``TaintTracking::ConfigSig``. The ``DataFlow::ConfigSig`` interface is used for both data flow and taint tracking. |
| 105 | + |
| 106 | +For example: |
| 107 | + |
| 108 | +.. code-block:: ql |
| 109 | +
|
| 110 | + class MyConfig extends TaintTracking::Configuration { |
| 111 | + MyConfig() { this = "MyConfig" } |
| 112 | +
|
| 113 | + predicate isSanitizer(DataFlow::Node node) { ... } |
| 114 | + predicate isAdditionalTaintStep(DataFlow::Node node1, DataFlow::Node node2) { ... } |
| 115 | + ... |
| 116 | + } |
| 117 | +
|
| 118 | +The above configuration can be migrated to the shared data flow library as follows: |
| 119 | + |
| 120 | +.. code-block:: ql |
| 121 | +
|
| 122 | + module MyConfig implements DataFlow::ConfigSig { |
| 123 | + predicate isBarrier(DataFlow::Node node) { ... } |
| 124 | + predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) { ... } |
| 125 | + ... |
| 126 | + } |
| 127 | +
|
| 128 | + module MyFlow = TaintTracking::Global<MyConfig>; |
| 129 | +
|
| 130 | +
|
| 131 | +Flow labels and flow states |
| 132 | +--------------------------- |
| 133 | + |
| 134 | +The ``DataFlow::FlowLabel`` class has been deprecated. Queries that relied on flow labels should use the new `flow state` concept instead. |
| 135 | +This is done by implementing ``DataFlow::StateConfigSig`` instead of ``DataFlow::ConfigSig``, and passing the module to ``DataFlow::GlobalWithState`` |
| 136 | +or ``TaintTracking::GlobalWithState``. See :ref:`using flow state <using-flow-labels-for-precise-data-flow-analysis>` for more details about flow state. |
| 137 | + |
| 138 | +Some changes to be aware of: |
| 139 | + |
| 140 | +- The 4-argument version of ``isAdditionalFlowStep`` now takes parameters in a different order. |
| 141 | + It now takes ``node1, state1, node2, state2`` instead of ``node1, node2, state1, state2``. |
| 142 | +- Taint steps apply to all flow states, not just the ``taint`` flow label. See more details further down in this article. |
| 143 | + |
| 144 | +Barrier guards |
| 145 | +-------------- |
| 146 | + |
| 147 | +The predicates ``isBarrierGuard`` and ``isSanitizerGuard`` have been removed. |
| 148 | + |
| 149 | +Instead, the ``isBarrier`` predicate must be used to define all barriers. To do this, barrier guards can be reduced to a set of barrier nodes using the ``DataFlow::MakeBarrierGuard`` module. |
| 150 | + |
| 151 | +For example, consider this data flow configuration using a barrier guard: |
| 152 | + |
| 153 | +.. code-block:: ql |
| 154 | +
|
| 155 | + class MyConfig extends DataFlow::Configuration { |
| 156 | + override predicate isBarrierGuard(DataFlow::BarrierGuardNode node) { |
| 157 | + node instanceof MyBarrierGuard |
| 158 | + } |
| 159 | + .. |
| 160 | + } |
| 161 | +
|
| 162 | + class MyBarrierGuard extends DataFlow::BarrierGuardNode { |
| 163 | + MyBarrierGuard() { ... } |
| 164 | +
|
| 165 | + override predicate blocks(Expr e, boolean outcome) { ... } |
| 166 | + } |
| 167 | +
|
| 168 | +This can be migrated to the shared data flow library as follows: |
| 169 | + |
| 170 | +.. code-block:: ql |
| 171 | +
|
| 172 | + module MyConfig implements DataFlow::ConfigSig { |
| 173 | + predicate isBarrier(DataFlow::Node node) { |
| 174 | + node = DataFlow::MakeBarrierGuard<MyBarrierGuard>::getABarrierNode() |
| 175 | + } |
| 176 | + .. |
| 177 | + } |
| 178 | +
|
| 179 | + class MyBarrierGuard extends DataFlow::Node { |
| 180 | + MyBarrierGuard() { ... } |
| 181 | +
|
| 182 | + predicate blocksExpr(Expr e, boolean outcome) { ... } |
| 183 | + } |
| 184 | +
|
| 185 | +The changes can be summarized as: |
| 186 | +- The contents of ``isBarrierGuard`` have been moved to ``isBarrier``. |
| 187 | +- The ``node instanceof MyBarrierGuard`` check was replaced with ``node = DataFlow::MakeBarrierGuard<MyBarrierGuard>::getABarrierNode()``. |
| 188 | +- The ``MyBarrierGuard`` class no longer has ``DataFlow::BarrierGuardNode`` as a base class. We simply use ``DataFlow::Node`` instead. |
| 189 | +- The ``blocks`` predicate has been renamed to ``blocksExpr`` and no longer has the ``override`` keyword. |
| 190 | + |
| 191 | +See :ref:`using flow state <using-flow-labels-for-precise-data-flow-analysis>` for examples of how to use barrier guards with flow state. |
| 192 | + |
| 193 | +Query-specific load and store steps |
| 194 | +----------------------------------- |
| 195 | + |
| 196 | +The predicates ``isAdditionalLoadStep``, ``isAdditionalStoreStep``, and ``isAdditionalLoadStoreStep`` have been removed. There is no way to emulate the original behavior. |
| 197 | + |
| 198 | +Library models can still contribute such steps, but they will be applicable to all queries. Also see the section on jump steps further down. |
| 199 | + |
| 200 | +Changes in behavior |
| 201 | +-------------------- |
| 202 | + |
| 203 | +When the query has been migrated to the new interface, it may seem to behave differently due to some technical differences in the internals of |
| 204 | +the two data flow libraries. The most significant changes are described below. |
| 205 | + |
| 206 | +Taint steps now propagate all flow states |
| 207 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 208 | + |
| 209 | +There's an important change from the old data flow library when using flow state and taint-tracking together. |
| 210 | + |
| 211 | +When using ``TaintTracking::GlobalWithState``, all flow states can propagate along taint steps. |
| 212 | +In the old data flow library, only the ``taint`` flow label could propagate along taint steps. |
| 213 | +A straightforward translation of such a query may therefore result in new flow paths being found, which might be unexpected. |
| 214 | + |
| 215 | +To emulate the old behavior, use ``DataFlow::GlobalWithState`` instead of ``TaintTracking::GlobalWithState``, |
| 216 | +and manually add taint steps using ``isAdditionalFlowStep``. The predicate ``TaintTracking::defaultTaintStep`` can be used to access to the set of taint steps. |
| 217 | + |
| 218 | +For example: |
| 219 | + |
| 220 | +.. code-block:: ql |
| 221 | +
|
| 222 | + module MyConfig implements DataFlow::StateConfigSig { |
| 223 | + class FlowState extends string { |
| 224 | + FlowState() { this = ["taint", "foo"] } |
| 225 | + } |
| 226 | +
|
| 227 | + predicate isAdditionalFlowStep(DataFlow::Node node1, FlowState state1, DataFlow::Node node2, FlowState state2) { |
| 228 | + // Allow taint steps to propagate the "taint" flow state |
| 229 | + TaintTracking::defaultTaintStep(node1, node2) and |
| 230 | + state1 = "taint" and |
| 231 | + state2 = state |
| 232 | + } |
| 233 | +
|
| 234 | + ... |
| 235 | + } |
| 236 | +
|
| 237 | + module MyFlow = DataFlow::GlobalWithState<MyConfig>; |
| 238 | +
|
| 239 | +
|
| 240 | +Jump steps across function boundaries |
| 241 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 242 | + |
| 243 | +When a flow step crosses a function boundary, that is, it starts and ends in two different functions, it will now be classified as a "jump" step. |
| 244 | + |
| 245 | +Jump steps can be problematic in some cases. Roughly speaking, the data flow library will "forget" which call site it came from when following a jump step. |
| 246 | +This can lead to spurious flow paths that go into a function through one call site, and back out of a different call site. |
| 247 | + |
| 248 | +If the step was generated by a library model, that is, the step is applicable to all queries, this is best mitigated by converting the step to a flow summary. |
| 249 | +For example, the following library model adds a taint step from ``x`` to ``y`` in ``foo.bar(x, y => {})``: |
| 250 | + |
| 251 | +.. code-block:: ql |
| 252 | +
|
| 253 | + class MyStep extends TaintTracking::SharedTaintStep { |
| 254 | + override predicate step(DataFlow::Node node1, DataFlow::Node node2) { |
| 255 | + exists(DataFlow::CallNode call | |
| 256 | + call = DataFlow::moduleMember("foo", "bar").getACall() and |
| 257 | + node1 = call.getArgument(0) and |
| 258 | + node2 = call.getCallback(1).getParameter(0) |
| 259 | + ) |
| 260 | + } |
| 261 | + } |
| 262 | +
|
| 263 | +Because this step crosses a function boundary, it becomes a jump step. This can be avoided by converting it to a flow summary as follows: |
| 264 | + |
| 265 | +.. code-block:: ql |
| 266 | +
|
| 267 | + class MySummary extends DataFlow::SummarizedCallable { |
| 268 | + MySummary() { this = "MySummary" } |
| 269 | +
|
| 270 | + override DataFlow::CallNode getACall() { result = DataFlow::moduleMember("foo", "bar").getACall() } |
| 271 | +
|
| 272 | + override predicate propagatesFlow(string input, string output, boolean preservesValue) { |
| 273 | + input = "Argument[this]" and |
| 274 | + output = "Argument[1].Parameter[0]" and |
| 275 | + preservesValue = false // taint step |
| 276 | + } |
| 277 | + } |
| 278 | +
|
| 279 | +See :ref:`customizing library models for JavaScript <customizing-library-models-for-javascript>` for details about the format of the ``input`` and ``output`` strings. |
| 280 | +The aforementioned article also provides guidance on how to store the flow summary in a data extension. |
| 281 | + |
| 282 | +For query-specific steps that cross function boundaries, that is, steps added with ``isAdditionalFlowStep``, there is currently no way to emulate the original behavior. |
| 283 | +A possible workaround is to convert the query-specific step to a flow summary. In this case it should be stored in a data extension to avoid performance issues, although this also means |
| 284 | +that all other queries will be able to use the flow summary. |
| 285 | + |
| 286 | +Barriers block all flows |
| 287 | +~~~~~~~~~~~~~~~~~~~~~~~~ |
| 288 | + |
| 289 | +In the shared data flow library, a barrier blocks all flows, even if the tracked value is inside a content. |
| 290 | + |
| 291 | +In the old data flow library, only barriers specific to the ``data`` flow label blocked flows when the tracked value was inside a content. |
| 292 | + |
| 293 | +This rarely has significant impact, but some users may observe some result changes because of this. |
| 294 | + |
| 295 | +There is currently no way to emulate the original behavior. |
| 296 | + |
| 297 | +Further reading |
| 298 | +--------------- |
| 299 | + |
| 300 | +- :ref:`Analyzing data flow in JavaScript and TypeScript <analyzing-data-flow-in-javascript-and-typescript>` provides a general guide to the new data flow library. |
| 301 | +- :ref:`Using flow state for precise data flow analysis <using-flow-labels-for-precise-data-flow-analysis>` provides a general guide on using flow state. |
0 commit comments