Skip to content

Commit 8f259d4

Browse files
committed
Python: port API graph doc comment
1 parent 0912996 commit 8f259d4

File tree

1 file changed

+74
-3
lines changed

1 file changed

+74
-3
lines changed

python/ql/lib/semmle/python/ApiGraphs.qll

Lines changed: 74 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,83 @@ import semmle.python.dataflow.new.DataFlow
1212
private import semmle.python.internal.CachedStages
1313

1414
/**
15-
* Provides classes and predicates for working with APIs used in a database.
15+
* Provides classes and predicates for working with the API boundary between the current
16+
* codebase and external libraries.
17+
*
18+
* See `API::Node` for more in-depth documentation.
1619
*/
1720
module API {
1821
/**
19-
* An abstract representation of a definition or use of an API component such as a function
20-
* exported by a Python package, or its result.
22+
* A node in the API graph, representing a value that has crossed the boundary between this
23+
* codebase and an external library (or in general, any external codebase).
24+
*
25+
* ### Basic usage
26+
*
27+
* API graphs are typically used to identify "API calls", that is, calls to an external function
28+
* whose implementation is not necessarily part of the current codebase.
29+
*
30+
* The most basic use of API graphs is typically as follows:
31+
* 1. Start with `API::moduleImport` for the relevant library.
32+
* 2. Follow up with a chain of accessors such as `getMember` describing how to get to the relevant API function.
33+
* 3. Map the resulting API graph nodes to data-flow nodes, using `asSource` or `asSink`.
34+
*
35+
* For example, a simplified way to get arguments to `json.dumps` would be
36+
* ```ql
37+
* API::moduleImport("json").getMember("dumps").getParameter(0).asSink()
38+
* ```
39+
*
40+
* The most commonly used accessors are `getMember`, `getParameter`, and `getReturn`.
41+
*
42+
* ### API graph nodes
43+
*
44+
* There are two kinds of nodes in the API graphs, distinguished by who is "holding" the value:
45+
* - **Use-nodes** represent values held by the current codebase, which came from an external library.
46+
* (The current codebase is "using" a value that came from the library).
47+
* - **Def-nodes** represent values held by the external library, which came from this codebase.
48+
* (The current codebase "defines" the value seen by the library).
49+
*
50+
* API graph nodes are associated with data-flow nodes in the current codebase.
51+
* (Since external libraries are not part of the database, there is no way to associate with concrete
52+
* data-flow nodes from the external library).
53+
* - **Use-nodes** are associated with data-flow nodes where a value enters the current codebase,
54+
* such as the return value of a call to an external function.
55+
* - **Def-nodes** are associated with data-flow nodes where a value leaves the current codebase,
56+
* such as an argument passed in a call to an external function.
57+
*
58+
*
59+
* ### Access paths and edge labels
60+
*
61+
* Nodes in the API graph are associated with a set of access paths, describing a series of operations
62+
* that may be performed to obtain that value.
63+
*
64+
* For example, the access path `API::moduleImport("json").getMember("dumps")` represents the action of
65+
* importing `json` and then accessing the member `dumps` on the resulting object.
66+
*
67+
* Each edge in the graph is labelled by such an "operation". For an edge `A->B`, the type of the `A` node
68+
* determines who is performing the operation, and the type of the `B` node determines who ends up holding
69+
* the result:
70+
* - An edge starting from a use-node describes what the current codebase is doing to a value that
71+
* came from a library.
72+
* - An edge starting from a def-node describes what the external library might do to a value that
73+
* came from the current codebase.
74+
* - An edge ending in a use-node means the result ends up in the current codebase (at its associated data-flow node).
75+
* - An edge ending in a def-node means the result ends up in external code (its associated data-flow node is
76+
* the place where it was "last seen" in the current codebase before flowing out)
77+
*
78+
* Because the implementation of the external library is not visible, it is not known exactly what operations
79+
* it will perform on values that flow there. Instead, the edges starting from a def-node are operations that would
80+
* lead to an observable effect within the current codebase; without knowing for certain if the library will actually perform
81+
* those operations. (When constructing these edges, we assume the library is somewhat well-behaved).
82+
*
83+
* For example, given this snippet:
84+
* ```python
85+
* import foo
86+
* foo.bar(lambda x: doSomething(x))
87+
* ```
88+
* A callback is passed to the external function `foo.bar`. We can't know if `foo.bar` will actually invoke this callback.
89+
* But _if_ the library should decide to invoke the callback, then a value will flow into the current codebase via the `x` parameter.
90+
* For that reason, an edge is generated representing the argument-passing operation that might be performed by `foo.bar`.
91+
* This edge is going from the def-node associated with the callback to the use-node associated with the parameter `x`.
2192
*/
2293
class Node extends Impl::TApiNode {
2394
/**

0 commit comments

Comments
 (0)