Skip to content

Commit 81a9ce2

Browse files
author
james
committed
polish text
1 parent 1844071 commit 81a9ce2

File tree

2 files changed

+22
-9
lines changed

2 files changed

+22
-9
lines changed

docs/codeql/codeql-for-visual-studio-code/analyzing-your-projects.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,8 @@ You can see all quick queries that you've run in the current session in the Quer
7979

8080
Once you're happy with your quick query, you should save it in a QL pack so you can access it later. For more information, see ":ref:`About QL packs <about-ql-packs>`."
8181

82+
.. _running-a-specific-part-of-a-query-or-library:
83+
8284
Running a specific part of a query or library
8385
----------------------------------------------
8486

docs/codeql/writing-codeql-queries/debugging-data-flow-queries-using-partial-flow.rst

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ A typical data-flow query looks like this:
2424
where config.hasFlowPath(source, sink)
2525
select sink.getNode(), source, sink, "Sink is reached from $@.", source.getNode(), "here"
2626

27-
Or slightly simpler without path explanations:
27+
The same query can be slightly simplified by rewriting it without :ref:`path explanations <creating-path-queries>`:
2828

2929
.. code-block:: ql
3030
@@ -38,25 +38,26 @@ You can try to debug the potential problem by following the steps described belo
3838
Checking sources and sinks
3939
--------------------------
4040

41-
As a first step, make sure that the source and sink definitions contain what you expect. If one of these is empty then there can never be any data flow. The easiest way to verify this is using quick evaluation in CodeQL for VS Code: Select the text ``node instanceof MySource``, right-click, and choose "CodeQL: Quick Evaluation". This will evaluate the highlighted text, which in this case means the set of sources.
41+
Initially, you should make sure that the ``source`` and ``sink`` definitions contain what you expect. If either the ``source`` or ``sink`` is empty then there can never be any data flow. The easiest way to check this is using quick evaluation in CodeQL for VS Code. Select the text ``node instanceof MySource``, right-click, and choose "CodeQL: Quick Evaluation". This will evaluate the highlighted text, which in this case means the set of sources. For more information, see :ref:`Analyzing your projects <
42+
.. _running-a-specific-part-of-a-query-or-library>` in the CodeQL for VS Code help.
4243

4344
If both source and sink definitions look good then we will need to look for missing flow steps.
4445

4546
``fieldFlowBranchLimit``
4647
------------------------
4748

48-
Data-flow configurations contain a parameter called ``fieldFlowBranchLimit``. This is a slightly unfortunate, but currently necessary, performance trade-off, and a too low value can cause false negatives. It is worth a quick check to set this to a high value and see whether this causes the query to yield result. Try, for example, to add the following to your configuration:
49+
Data-flow configurations contain a parameter called ``fieldFlowBranchLimit``. If the value is set too high, you may experience performance degradation, but if it's too low you may miss results. When debugging data flow try setting ``fieldFlowBranchLimit`` to a high value and see whether your query generates more results. For example, try adding the following to your configuration:
4950

5051
.. code-block:: ql
5152
5253
override int fieldFlowBranchLimit() { result = 5000 }
5354
54-
If there are still no results and performance did not degrade to complete uselessness, then it is best to leave this set to a high value while doing further debugging.
55+
If there are still no results and performance is still useable, then it is best to leave this set to a high value while doing further debugging.
5556

5657
Partial flow
5758
------------
5859

59-
A naive next step could be to try changing the sink definition to ``any()``. This would mean that we would get a lot of flow to all the places that are reachable from the sources. While this approach makes sense and can work in some cases, you might find that it produces so many results that it's very hard to explore the findings, which can also dramatically affect query performance. More importantly, you might not even see all the partial flow paths. This is because the data-flow library tries very hard to prune impossible paths and, since field stores and reads must be evenly matched along a path, we will never see paths going through a store that fail to reach a corresponding read. This can make it hard to see where flow actually stops.
60+
A naive next step could be to change the ``sink`` definition to ``any()``. This would mean that we would get a lot of flow to all the places that are reachable from the ``source``\ s. While this approach may work in some cases, you might find that it produces so many results that it's very hard to explore the findings. It can can also dramatically affect query performance. More importantly, you might not even see all the partial flow paths. This is because the data-flow library tries very hard to prune impossible paths and, since field stores and reads must be evenly matched along a path, we will never see paths going through a store that fail to reach a corresponding read. This can make it hard to see where flow actually stops.
6061

6162
To avoid these problems, a data-flow ``Configuration`` comes with a mechanism for exploring partial flow that tries to deal with these caveats. This is the ``Configuration.hasPartialFlow`` predicate:
6263

@@ -87,9 +88,9 @@ As noted in the documentation for ``hasPartialFlow`` (for example, in the `CodeQ
8788
8889
This defines the exploration radius within which ``hasPartialFlow`` returns results.
8990

90-
It is also generally useful to focus on a single source at a time as the starting point for the flow exploration. This is most easily done by adding some ad-hoc restriction in the ``isSource`` predicate.
91+
It is also useful to focus on a single ``source`` at a time as the starting point for the flow exploration. This is most easily done by adding a temporary restriction in the ``isSource`` predicate.
9192

92-
To do quick ad-hoc evaluations of partial flow it is often easiest to add a predicate to the query that is solely intended for quick evaluation (right-click the predicate name and choose "CodeQL: Quick Evaluation"). A good starting point is something like:
93+
To do quick evaluations of partial flow it is often easiest to add a predicate to the query that is solely intended for quick evaluation (right-click the predicate name and choose "CodeQL: Quick Evaluation"). A good starting point is something like:
9394

9495
.. code-block:: ql
9596
@@ -101,6 +102,16 @@ To do quick ad-hoc evaluations of partial flow it is often easiest to add a pred
101102
)
102103
}
103104
104-
If you are focusing on a single source then the ``src`` column is of course superfluous, and you may of course also add other columns of interest based on ``n``, but including the enclosing callable and the distance to the source at the very least is generally recommended, as they can be useful columns to sort on to better inspect the results.
105+
If you are focusing on a single ``source`` then the ``src`` column is meaningless. You may of course also add other columns of interest based on ``n``, but including the enclosing callable and the distance to the source at the very least is generally recommended, as they can be useful columns to sort on to better inspect the results.
105106

106-
A couple of advanced tips in order to focus the partial flow results: If flow travels a long distance following an expected path and the distance means that a lot of uninteresting flow gets included in the exploration radius then one can simply replace the source definition with a suitable node found along the way and restart the partial flow exploration from that point. Alternatively, creative use of barriers/sanitizers can be used to cut off flow paths that are uninteresting and thereby reduce the number of partial flow results to increase overview.
107+
108+
If you see a large number of partial flow results, you can focus them in a couple of ways:
109+
110+
- If flow travels a long distance following an expected path, that can result in a lot of uninteresting flow being included in the exploration radius. To reduce the amount of uninteresting flow, you can replace the ``source`` definition with a suitable ``node`` that appears along the path and restart the partial flow exploration from that point.
111+
- Creative use of barriers and sanitizers can be used to cut off flow paths that are uninteresting. This also reduces the number of partial flow results to explore while debugging.
112+
113+
Further reading
114+
----------------
115+
116+
- :ref:`About data flow analysis <about-data-flow-analysis>`
117+
- :ref:`Creating path queries <creating-path-queries>`

0 commit comments

Comments
 (0)