Skip to content

Commit 07ca09e

Browse files
authored
Merge pull request github#5425 from yoff/tausbn-python-document-api-graphs
Python: document api graphs
2 parents 3415b64 + eae7bcc commit 07ca09e

File tree

4 files changed

+172
-4
lines changed

4 files changed

+172
-4
lines changed

docs/codeql/codeql-language-guides/analyzing-data-flow-in-python.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,8 @@ Python has builtin functionality for reading and writing files, such as the func
9999
100100
➤ `See this in the query console on LGTM.com <https://lgtm.com/query/8635258505893505141/>`__. Two of the demo projects make use of this low-level API.
101101

102+
Notice the use of the ``API`` module for referring to library functions. For more information, see ":doc:`Using API graphs in Python <using-api-graphs-in-python>`."
103+
102104
Unfortunately this will only give the expression in the argument, not the values which could be passed to it. So we use local data flow to find all expressions that flow into the argument:
103105

104106
.. code-block:: ql

docs/codeql/codeql-language-guides/codeql-for-python.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat
1111
basic-query-for-python-code
1212
codeql-library-for-python
1313
analyzing-data-flow-in-python
14+
using-api-graphs-in-python
1415
functions-in-python
1516
expressions-and-statements-in-python
1617
analyzing-control-flow-in-python
@@ -21,6 +22,8 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat
2122

2223
- :doc:`Analyzing data flow in Python <analyzing-data-flow-in-python>`: You can use CodeQL to track the flow of data through a Python program to places where the data is used.
2324

25+
- :doc:`Using API graphs in Python <using-api-graphs-in-python>`: API graphs are a uniform interface for referring to functions, classes, and methods defined in external libraries.
26+
2427
- :doc:`Functions in Python <functions-in-python>`: You can use syntactic classes from the standard CodeQL library to find Python functions and identify calls to them.
2528

2629
- :doc:`Expressions and statements in Python <expressions-and-statements-in-python>`: You can use syntactic classes from the CodeQL library to explore how Python expressions and statements are used in a codebase.

docs/codeql/codeql-language-guides/codeql-library-for-python.rst

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,9 @@ The CodeQL library for Python incorporates a large number of classes. Each class
2323
- **Data flow** - classes that represent entities from the data flow graphs.
2424
- **API graphs** - classes that represent entities from the API graphs.
2525

26-
The first two categories are described below. See ":doc:`Analyzing data flow in Python <analyzing-data-flow-in-python>`" for a description of data flow and associated classes.
27-
28-
..
29-
and [TO COME IN FUTURE PR] for a description of API graphs and their use.
26+
The first two categories are described below.
27+
For a description of data flow and associated classes, see ":doc:`Analyzing data flow in Python <analyzing-data-flow-in-python>`".
28+
For a description of API graphs and their use, see ":doc:`Using API graphs in Python <using-api-graphs-in-python>`."
3029

3130
Syntactic classes
3231
-----------------
Lines changed: 164 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,164 @@
1+
.. _using-api-graphs-in-python:
2+
3+
Using API graphs in Python
4+
==========================
5+
6+
API graphs are a uniform interface for referring to functions, classes, and methods defined in
7+
external libraries.
8+
9+
About this article
10+
------------------
11+
12+
This article describes how to use API graphs to reference classes and functions defined in library
13+
code. You can use API graphs to conveniently refer to external library functions when defining things like
14+
remote flow sources.
15+
16+
17+
Module imports
18+
--------------
19+
20+
The most common entry point into the API graph will be the point where an external module or package is
21+
imported. For example, you can access the API graph node corresponding to the ``re`` library
22+
by using the ``API::moduleImport`` method defined in the ``semmle.python.ApiGraphs`` module, as the
23+
following snippet demonstrates.
24+
25+
.. code-block:: ql
26+
27+
import python
28+
import semmle.python.ApiGraphs
29+
30+
select API::moduleImport("re")
31+
32+
➤ `See this in the query console on LGTM.com <https://lgtm.com/query/1876172022264324639/>`__.
33+
34+
This query only selects the API graph node corresponding to the ``re`` module. To find
35+
where this module is referenced, you can use the ``getAUse`` method. The following query selects
36+
all references to the ``re`` module in the current database.
37+
38+
.. code-block:: ql
39+
40+
import python
41+
import semmle.python.ApiGraphs
42+
43+
select API::moduleImport("re").getAUse()
44+
45+
➤ `See this in the query console on LGTM.com <https://lgtm.com/query/8072356519514905526/>`__.
46+
47+
Note that the ``getAUse`` method accounts for local flow, so that ``my_re_compile``
48+
in the following snippet is
49+
correctly recognized as a reference to the ``re.compile`` function.
50+
51+
.. code-block:: python
52+
53+
from re import compile as re_compile
54+
55+
my_re_compile = re_compile
56+
57+
r = my_re_compile(".*")
58+
59+
If you only require immediate uses, without taking local flow into account, then you can use
60+
the ``getAnImmediateUse`` method instead.
61+
62+
Note that the given module name *must not* contain any dots. Thus, something like
63+
``API::moduleImport("flask.views")`` will not do what you expect. Instead, this should be decomposed
64+
into an access of the ``views`` member of the API graph node for ``flask``, as described in the next
65+
section.
66+
67+
Accessing attributes
68+
--------------------
69+
70+
Given a node in the API graph, you can access its attributes by using the ``getMember`` method. Using
71+
the above ``re.compile`` example, you can now find references to ``re.compile``.
72+
73+
.. code-block:: ql
74+
75+
import python
76+
import semmle.python.ApiGraphs
77+
78+
select API::moduleImport("re").getMember("compile").getAUse()
79+
80+
➤ `See this in the query console on LGTM.com <https://lgtm.com/query/7970570434725297676/>`__.
81+
82+
In addition to ``getMember``, you can use the ``getUnknownMember`` method to find references to API
83+
components where the name is not known statically. You can use the ``getAMember`` method to
84+
access all members, both known and unknown.
85+
86+
Calls and class instantiations
87+
------------------------------
88+
89+
To track instances of classes defined in external libraries, or the results of calling externally
90+
defined functions, you can use the ``getReturn`` method. The following snippet finds all places
91+
where the return value of ``re.compile`` is used:
92+
93+
.. code-block:: ql
94+
95+
import python
96+
import semmle.python.ApiGraphs
97+
98+
select API::moduleImport("re").getMember("compile").getReturn().getAUse()
99+
100+
➤ `See this in the query console on LGTM.com <https://lgtm.com/query/4346050399960356921/>`__.
101+
102+
Note that this includes all uses of the result of ``re.compile``, including those reachable via
103+
local flow. To get just the *calls* to ``re.compile``, you can use ``getAnImmediateUse`` instead of
104+
``getAUse``. As this is a common occurrence, you can use ``getACall`` instead of
105+
``getReturn`` followed by ``getAnImmediateUse``.
106+
107+
➤ `See this in the query console on LGTM.com <https://lgtm.com/query/8143347716552092926/>`__.
108+
109+
Note that the API graph does not distinguish between class instantiations and function calls. As far
110+
as it's concerned, both are simply places where an API graph node is called.
111+
112+
Subclasses
113+
----------
114+
115+
For many libraries, the main mode of usage is to extend one or more library classes. To track this
116+
in the API graph, you can use the ``getASubclass`` method to get the API graph node corresponding to
117+
all the immediate subclasses of this node. To find *all* subclasses, use ``*`` or ``+`` to apply the
118+
method repeatedly, as in ``getASubclass*``.
119+
120+
Note that ``getASubclass`` does not account for any subclassing that takes place in library code
121+
that has not been extracted. Thus, it may be necessary to account for this in the models you write.
122+
For example, the ``flask.views.View`` class has a predefined subclass ``MethodView``. To find
123+
all subclasses of ``View``, you must explicitly include the subclasses of ``MethodView`` as well.
124+
125+
.. code-block:: ql
126+
127+
import python
128+
import semmle.python.ApiGraphs
129+
130+
API::Node viewClass() {
131+
result =
132+
API::moduleImport("flask").getMember("views").getMember(["View", "MethodView"]).getASubclass*()
133+
}
134+
135+
select viewClass().getAUse()
136+
137+
➤ `See this in the query console on LGTM.com <https://lgtm.com/query/288293322319747121/>`__.
138+
139+
Note the use of the set literal ``["View", "MethodView"]`` to match both classes simultaneously.
140+
141+
Built-in functions and classes
142+
------------------------------
143+
144+
You can access built-in functions and classes using the ``API::builtin`` method, giving the name of
145+
the built-in as an argument.
146+
147+
For example, to find all calls to the built-in ``open`` function, you can use the following snippet.
148+
149+
.. code-block:: ql
150+
151+
import python
152+
import semmle.python.ApiGraphs
153+
154+
select API::builtin("open").getACall()
155+
156+
157+
158+
159+
Further reading
160+
---------------
161+
162+
163+
.. include:: ../reusables/python-further-reading.rst
164+
.. include:: ../reusables/codeql-ref-tools-further-reading.rst

0 commit comments

Comments
 (0)