Skip to content

Commit 6f646be

Browse files
committed
Ruby: document API graphs
1 parent 0281bfe commit 6f646be

File tree

2 files changed

+186
-0
lines changed

2 files changed

+186
-0
lines changed

docs/codeql/codeql-language-guides/codeql-for-ruby.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,12 @@ Experiment and learn how to write effective and efficient queries for CodeQL dat
1010

1111
basic-query-for-ruby-code
1212
codeql-library-for-ruby
13+
using-api-graphs-in-ruby
1314

1415
- :doc:`Basic query for Ruby code <basic-query-for-ruby-code>`: Learn to write and run a simple CodeQL query using LGTM.
1516

1617
- :doc:`CodeQL library for Ruby <codeql-library-for-ruby>`: When you're analyzing a Ruby program, you can make use of the large collection of classes in the CodeQL library for Ruby.
1718

19+
- :doc:`Using API graphs in Ruby <using-api-graphs-in-ruby>`: API graphs are a uniform interface for referring to functions, classes, and methods defined in external libraries.
20+
1821
.. include:: ../reusables/ruby-beta-note.rst
Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
.. _using-api-graphs-in-ruby:
2+
3+
Using API graphs in Ruby
4+
==========================
5+
6+
API graphs are a uniform interface for referring to functions, classes, and methods defined in
7+
external libraries.
8+
9+
About this article
10+
------------------
11+
12+
This article describes how to use API graphs to reference classes and functions defined in library
13+
code. You can use API graphs to conveniently refer to external library functions when defining things like
14+
remote flow sources.
15+
16+
17+
Module and class references
18+
---------------------------
19+
20+
The most common entry point into the API graph will be the point where a toplevel module or class is
21+
accessed. For example, you can access the API graph node corresponding to the ``::Regexp`` class
22+
by using the ``API::getTopLevelMember`` method defined in the ``codeql.ruby.ApiGraphs`` module, as the
23+
following snippet demonstrates.
24+
25+
.. code-block:: ql
26+
27+
import codeql.ruby.ApiGraphs
28+
29+
select API::getTopLevelMember("Regexp")
30+
31+
This query selects the API graph nodes corresponding to references to the ``Regexp`` class. For nested
32+
modules and classes, you can use the ``getMember` method. For example the following query selects
33+
references to the ``Net::HTTP`` class.
34+
35+
.. code-block:: ql
36+
37+
import codeql.ruby.ApiGraphs
38+
39+
select API::getTopLevelMember("Net").getMember("HTTP")
40+
41+
Note that the given module name *must not* contain any ```::`` symbols. Thus, something like
42+
`API::getTopLevelMember("Net::HTTP")`` will not do what you expect. Instead, this should be decomposed
43+
into an access of the ``HTTP`` member of the API graph node for ``Net``, as in the example above.
44+
45+
Calls and class instantiations
46+
------------------------------
47+
48+
To track the calls of externally defined functions, you can use the ``getMethod`` method. The
49+
following snippet finds all calls of ``Regexp.compile``:
50+
51+
.. code-block:: ql
52+
53+
import codeql.ruby.ApiGraphs
54+
55+
select API::getTopLevelMember("Regexp").getMethod("compile")
56+
57+
The example above is for a call to a class method. Tracking calls to instance methods, is a two-step
58+
process, first you need to find instances of the class before you can find the calls
59+
to methods on those instances. The following snippet finds instantiations of the ``Regexp`` class:
60+
61+
.. code-block:: ql
62+
63+
import codeql.ruby.ApiGraphs
64+
65+
select API::getTopLevelMember("Regexp").getInstance()
66+
67+
Note that the ``getInstance`` method also includes subclasses. For example if there is a
68+
``class SpecialRegexp < Regexp`` then ``getInstance`` also finds ``SpecialRegexp.new``.
69+
70+
The following snippet builds on the above to find calls of the ``Regexp#match?`` instance method:
71+
72+
.. code-block:: ql
73+
74+
import codeql.ruby.ApiGraphs
75+
76+
select API::getTopLevelMember("Regexp").getInstance().getMethod("match?")
77+
78+
Subclasses
79+
----------
80+
81+
For many libraries, the main mode of usage is to extend one or more library classes. To track this
82+
in the API graph, you can use the ``getASubclass`` method to get the API graph node corresponding to
83+
all the immediate subclasses of this node. To find *all* subclasses, use ``*`` or ``+`` to apply the
84+
method repeatedly, as in ``getASubclass*``.
85+
86+
Note that ``getASubclass`` does not account for any subclassing that takes place in library code
87+
that has not been extracted. Thus, it may be necessary to account for this in the models you write.
88+
For example, the ``ActionController::Base`` class has a predefined subclass ``Rails::ApplicationController``. To find
89+
all subclasses of ``ActionController::Base``, you must explicitly include the subclasses of ``Rails::ApplicationController`` as well.
90+
91+
.. code-block:: ql
92+
93+
import codeql.ruby.ApiGraphs
94+
95+
96+
API::Node actionController() {
97+
result =
98+
[
99+
API::getTopLevelMember("ActionController").getMember("Base"),
100+
API::getTopLevelMember("Rails").getMember("ApplicationController")
101+
].getASubclass*()
102+
}
103+
104+
select actionController()
105+
106+
107+
Using the API graph in dataflow queries
108+
---------------------------------------
109+
110+
Dataflow queries often search for points where data from external sources enters the code base
111+
as well as places where data leaves the code base. API graphs provide a convenient way to refer
112+
to external API components such as library functions and their inputs and outputs. API graph nodes
113+
cannot be used directly in dataflow queries they model entities that are defined externally,
114+
while dataflow nodes correspond to entities defined in the current code base. To brigde this gap
115+
the API node classes provide the ``asSource()`` and ``asSink()`` methods.
116+
117+
The ``asSource()`` method is used to select dataflow nodes where a value from an external source
118+
enters the current code base. A typical example is the return value of a library function such as
119+
``File.read(path)``:
120+
121+
.. code-block:: ql
122+
123+
import codeql.ruby.ApiGraphs
124+
125+
select API::getTopLevelMember("File").getMethod("read").getParameter(1).asSource()
126+
127+
128+
The ``asSink()`` method is used to select dataflow nodes where a value leaves the
129+
current code base and flows into an external library. For example the second parameter
130+
of the ``File.write(path, value)`` method.
131+
132+
.. code-block:: ql
133+
134+
import codeql.ruby.ApiGraphs
135+
136+
select API::getTopLevelMember("File").getMethod("write").getParameter(1).asSink()
137+
138+
A more complex example is a call to ``File.open`` with a block argument. This function creates a ```File`` instance
139+
and passes it to the supplied block. In this case the first parameter of the block is the place where an
140+
externally created value enters the code base, i.e. the ``|file|`` in the example below:
141+
142+
.. code-block:: ruby
143+
144+
File.open("/my/file.txt", "w") { |file| file << "Hello world" }
145+
146+
The following snippet finds parameters of blocks of ``File.open`` method calls:
147+
148+
.. code-block:: ql
149+
150+
import codeql.ruby.ApiGraphs
151+
152+
select API::getTopLevelMember("File").getMethod("open").getBlock().getParameter(0).asSource()
153+
154+
The following example is a dataflow query that that uses API graphs to find cases where data that
155+
is read flows into a call to ```File.write``.
156+
157+
.. code-block:: ql
158+
159+
import codeql.ruby.DataFlow
160+
import codeql.ruby.ApiGraphs
161+
162+
class Configuration extends DataFlow::Configuration {
163+
Configuration() { this = "File read/write Configuration" }
164+
165+
override predicate isSource(DataFlow::Node source) {
166+
source = API::getTopLevelMember("File").getMethod("read").getReturn().asSource()
167+
}
168+
169+
override predicate isSink(DataFlow::Node sink) {
170+
sink = API::getTopLevelMember("File").getMethod("write").getParameter(1).asSink()
171+
}
172+
}
173+
174+
from DataFlow::Node src, DataFlow::Node sink, Configuration config
175+
where config.hasFlow(src, sink)
176+
select src, "The data read here flows into a $@ call.", sink, "File.write"
177+
178+
Further reading
179+
---------------
180+
181+
182+
.. include:: ../reusables/ruby-further-reading.rst
183+
.. include:: ../reusables/codeql-ref-tools-further-reading.rst

0 commit comments

Comments
 (0)