Skip to content

Commit cf791e8

Browse files
committed
Python: Describe Concepts and Attributes
1 parent 14dd708 commit cf791e8

File tree

1 file changed

+45
-25
lines changed

1 file changed

+45
-25
lines changed

docs/codeql/codeql-language-guides/analyzing-data-flow-in-python.rst

Lines changed: 45 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -271,34 +271,16 @@ These predicates are defined in the configuration:
271271

272272
Similar to global data flow, the characteristic predicate (``MyTaintTrackingConfiguration()``) defines the unique name of the configuration and the taint analysis is performed using the predicate ``hasFlow(DataFlow::Node source, DataFlow::Node sink)``.
273273

274-
Flow sources
275-
~~~~~~~~~~~~
274+
Predefined sources and sinks
275+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
276276

277-
The data flow library contains some predefined flow sources. The class ``RemoteFlowSource`` (defined in module ``semmle.python.dataflow.new.RemoteFlowSources``) represents data flow from remote network inputs. This is useful for finding security problems in networked services.
277+
The data flow library contains a number of predefined sources and sinks, providing a good starting point for defining data flow based security queries.
278278

279-
For global flow, it is also useful to restrict sources to instances of ``LocalSourceNode``. The predefined sources generally do that.
280-
281-
Example
282-
~~~~~~~
279+
- The class ``RemoteFlowSource`` (defined in module ``semmle.python.dataflow.new.RemoteFlowSources``) represents data flow from remote network inputs. This is useful for finding security problems in networked services.
280+
- The library ``Concepts`` (defined in module ``semmle.python.Concepts``) contain several subclasses of ``DataFlow::Node`` that are security relevant, such as ``FileSystemAccess`` and ``SqlExecution``.
281+
- The module ``Attributes`` (defined in module ``semmle.python.dataflow.new.internal.Attributes``) defines ``AttrRead`` and ``AttrWrite`` which handle both ordinary and dynamic attribute access.
283282

284-
This query shows a data flow configuration that uses all network input as data sources:
285-
286-
.. code-block:: ql
287-
288-
import python
289-
import semmle.python.dataflow.new.RemoteFlowSources
290-
291-
class MyDataFlowConfiguration extends DataFlow::Configuration {
292-
MyDataFlowConfiguration() {
293-
this = "..."
294-
}
295-
296-
override predicate isSource(DataFlow::Node source) {
297-
source instanceof RemoteFlowSource
298-
}
299-
300-
...
301-
}
283+
For global flow, it is also useful to restrict sources to instances of ``LocalSourceNode``. The predefined sources generally do that.
302284

303285
Class hierarchy
304286
~~~~~~~~~~~~~~~
@@ -309,12 +291,50 @@ Class hierarchy
309291
- ``DataFlow::ExprNode`` - an expression behaving as a data flow node.
310292
- ``DataFlow::ParameterNode`` - a parameter data flow node representing the value of a parameter at function entry.
311293
- ``RemoteFlowSource`` - data flow from network/remote input.
294+
- ``Attributes::AttrRead`` - flow out of an attribute.
295+
- ``Attributes::AttrWrite`` - flow into an attribute.
296+
- ``Concepts::SystemCommandExecution`` - a data-flow node that executes an operating system command, for instance by spawning a new process.
297+
- ``Concepts::FileSystemAccess`` - a data flow node that performs a file system access, including reading and writing data, creating and deleting files and folders, checking and updating permissions, and so on.
298+
- ``Concepts::Path::PathNormalization`` - a data-flow node that performs path normalization. This is often needed in order to safely access paths.
299+
- ``Concepts::Decoding`` - a data-flow node that decodes data from a binary or textual format. A decoding (automatically) preserves taint from input to output. However, it can also be a problem in itself, for example if it allows code execution or could result in denial-of-service.
300+
- ``Concepts::Encoding`` - a data-flow node that encodes data to a binary or textual format. An encoding (automatically) preserves taint from input to output.
301+
- ``Concepts::CodeExecution`` - a data-flow node that dynamically executes Python code.
302+
- ``Concepts::SqlExecution`` - a data-flow node that executes SQL statements.
303+
- ``Concepts::HTTP::Server::RouteSetup`` - a data-flow node that sets up a route on a server.
304+
- ``Concepts::HTTP::Server::HttpResponse`` - a data-flow node that creates a HTTP response on a server.
312305

313306
- ``TaintTracking::Configuration`` - base class for custom global taint tracking analysis.
314307

315308
Examples
316309
~~~~~~~~
317310

311+
This query shows a data flow configuration that uses all network input as data sources:
312+
313+
.. code-block:: ql
314+
315+
import python
316+
import semmle.python.dataflow.new.DataFlow
317+
import semmle.python.dataflow.new.TaintTracking
318+
import semmle.python.dataflow.new.RemoteFlowSources
319+
import semmle.python.Concepts
320+
321+
class RemoteToFileConfiguration extends TaintTracking::Configuration {
322+
RemoteToFileConfiguration() { this = "RemoteToFileConfiguration" }
323+
324+
override predicate isSource(DataFlow::Node source) {
325+
source instanceof RemoteFlowSource
326+
}
327+
328+
override predicate isSink(DataFlow::Node sink) {
329+
sink = any(FileSystemAccess fa).getAPathArgument()
330+
}
331+
}
332+
333+
from DataFlow::Node input, DataFlow::Node fileAccess, RemoteToFileConfiguration config
334+
where config.hasFlow(input, fileAccess)
335+
select fileAccess, "This file access uses data from $@.",
336+
input, "user-controllable input."
337+
318338
This data flow configuration tracks data flow from environment variables to opening files:
319339

320340
.. code-block:: ql

0 commit comments

Comments
 (0)