Skip to content

Commit 5df328d

Browse files
committed
[docs][mlgo] Document MLModelRunner
1 parent 4bcc083 commit 5df328d

File tree

1 file changed

+157
-11
lines changed

1 file changed

+157
-11
lines changed

llvm/docs/MLGO.rst

Lines changed: 157 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,174 @@
1-
====
2-
MLGO
3-
====
1+
=============================================
2+
Machine Learning - Guided Optimization (MLGO)
3+
=============================================
44

55
Introduction
66
============
77

8-
MLGO is a framework for integrating ML techniques systematically in LLVM. It is
9-
designed primarily to replace heuristics within LLVM with machine learned
10-
models. Currently there is upstream infrastructure for the following
11-
heuristics:
8+
MLGO refers to integrating ML techniques (primarily) to replace heuristics within
9+
LLVM with machine learned models.
10+
11+
Currently the following heuristics feature such integration:
1212

1313
* Inlining for size
1414
* Register allocation (LLVM greedy eviction heuristic) for performance
1515

16-
This document is an outline of the tooling that composes MLGO.
16+
This document is an outline of the tooling and APIs facilitating MLGO.
17+
18+
Note that tools for orchestrating ML training are not part of LLVM, as they are
19+
dependency-heavy - both on the ML infrastructure choice, as well as choices of
20+
distrubuted computing. For the training scenario, LLVM only contains facilities
21+
enabling it, such as corpus extraction, training data extraction, evaluation of
22+
models during training.
23+
24+
25+
.. contents::
1726

1827
Corpus Tooling
1928
==============
2029

2130
..
2231
TODO(boomanaiden154): Write this section.
2332
24-
Model Runner Interfaces
25-
=======================
33+
Interacting with ML models
34+
==========================
35+
36+
We interact with ML models in 2 primary scenarios: one is to train such a model.
37+
The other, inference, is to use a model during compilation, to make optimization
38+
decisions.
39+
40+
For a specific optimization problem - i.e. inlining, or regalloc eviction - we
41+
first separate correctness - preserving decisions from optimization decisions.
42+
For example, not inlining functions marked "no inline" is an example of the
43+
former. Same is not evicting an unevictable live range. An exmple of the latter
44+
is deciding to inline a function that will bloat the caller size, just because
45+
we have reason to believe that later, the effect will be some constant
46+
propagation that will actually reduce the size (or dynamic instruction count).
47+
48+
ML models can be understood as functions. Their inputs are tensors - buffers of
49+
scalars. The output (in our case, singular) is a scalar. For example, for
50+
inlining, the inputs are properties of the caller, callee, and the callsite
51+
being analyzed for inlining. The output is a boolean.
52+
53+
Inputs and outputs are named, have a scalar type (e.g. int32_t) and a shape
54+
(e.g. 3x4). These are the elements that we use to bind to a ML model.
55+
56+
In both training and inference, we want to expose to ML (training algorithms or
57+
trained model, respectivelly) the features we want to make optimization
58+
decisions on. In that regard, the interface from the compiler side to the ML
59+
side is the same: pass features, and get a decision. It's essentially a function
60+
call, where the parameters and result are bound by name and are described by
61+
name, scalar type, and shape tuples.
62+
63+
The main types in LLVM are:
64+
- ``MLModelRunner`` - an abstraction for the decision making mechanism
65+
- ``TensorSpec`` which describes a tensor.
66+
67+
TensorSpec
68+
----------
69+
70+
See ``llvm/Analysis/TensorSpec.h``. This is a simple data bag, identifying a
71+
tensor by name (a string), scalar type, and shape (a vector of ints). The scalar
72+
type can only be int (8, 16, 32, or 64), signed or unsigned; float; or double.
73+
74+
MLModelRunner
75+
-------------
76+
77+
See ``llvm/Analysis/MLModelRunner.h``. The abstraction has a pure virtual,
78+
``evaluateUntyped``, but the contract with implementers is a bit more involved:
79+
80+
Implementers
81+
^^^^^^^^^^^^
82+
83+
At construction, the implementer is expected to receive a list of ``TensorSpec``
84+
for input features and the ``TensorSpec`` of the output (e.g.
85+
``std::vector<TensorSpec>``). The list type is not contractual, but it must be
86+
a 0-based indexing array-like container. In the order of appearance in the input
87+
list, for a ``TensorSpec`` with a name "N", shape D1xD2x...Dn, and scalar type
88+
"T", the implementer must set up a contiguous buffer sized
89+
``sizeof(T) * D1 * D2 * ... * Dn``. This buffer's lifetime must be the same as
90+
the lifetime of the implementer object; finally, for each given ``TensorSpec``,
91+
the implementer must call ``MLModelRunner::setUpBufferForTensor``.
92+
93+
Internally, the expectation is that the implementer uses the name (and maybe
94+
shape) of a ``TensorSpec`` for binding (e.g. lookup in an underlying ML model).
95+
96+
``MLModelRunner::setUpBufferForTensor`` stores each buffer at the corresponding
97+
index (i.e. its position in the list used at construction). The expectation is
98+
that the user will use that position when calling ``MLModelRunner::getTensor``
99+
to retrieve the underlying buffer (more on that in a bit).
100+
101+
The implementation of ``evaluateUntyped`` is expected to use the value in the
102+
buffers described above, carry out whatever computation (e.g. evaluate a ML
103+
model) and then place the outcome in an output buffer which will be returned to
104+
the caller. Importantly, ``evaluateUntyped`` must not reset the input buffers.
105+
This is because during training we may want to log the features and decisions,
106+
and since the data is already buffered, there's no reason to force backing it
107+
up elsewhere.
108+
109+
Users
110+
^^^^^
111+
112+
The users must pass the input ``TensorSpec`` list at the construction of a
113+
specific ``MLModelRunner`` object. After that, users can be agnostic of the
114+
specific implementation, and would typically follow the following workflow:
115+
116+
- call ``getTensor`` or ``getTensorUntyped``, for each input tensor, identified
117+
by its index (i.e. the index of the corresponding ``TensorSpec`` in the list
118+
used at construction).
119+
- populate the tensor buffer of each input tensor with values. Users can take
120+
advantage of the stability of the tensor buffers like set only once those that
121+
don't change, or cache the buffer address
122+
- call ``evaluate`` and use its result.
123+
124+
Versioning
125+
^^^^^^^^^^
126+
127+
We support a model "knowing" less inputs than the compiler. This is supported by
128+
``MLModelRunner::setUpBufferForTensor``. If a ``TensorSpec`` requested by the
129+
compiler is not supported by the underlying model, the ``MLModelRunner``
130+
implementer must still call ``setUpBufferForTensor`` with a ``nullptr`` value
131+
for the buffer. In turn, ``MLModelRunner`` will allocate an appropriately - sized
132+
buffer and track its lifetime. The user can safely populate that buffer. Since
133+
the rest of the inputs are still provided, this allows an evolution model where
134+
we first add features to the compiler and continue using older models without
135+
regressing. Then, the new compiler can be used to train new models. Deprecating
136+
features in the compiler involves, then, training first a model without those
137+
features.
138+
139+
``MLModelRunner`` implementations
140+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
141+
142+
We currently feature 3 implementations:
143+
144+
- ``ModelUnderTrainingRunner``. This requires the compiler be built with TFLite
145+
support. It allows loading a TFLite model dynamically and is primarily
146+
intended for training scenarios, but it can be used relatively easily in
147+
production build environments, as it does not change how the compiler operates
148+
(why this remark is necessary will become clear in a few paragraphs)
149+
150+
- ``ReleaseModeModelRunner``. This is intended for inference scenarios. This
151+
uses the rules defined in ``llvm/cmake/modules/TensorFlowCompile.cmake`` to
152+
convert, at the time the compiler is built, TensorFlow Saved Models into a
153+
header (.h) and native object (.o). The latter is a CPU-based implementation of
154+
the neural network, together with its weights (essentially, loops performing
155+
matrix multiplications)
156+
157+
NOTE: we are activelly working on replacing this with an EmitC implementation
158+
requiring no out of tree build-time dependencies.
159+
160+
- ``InteractiveModelRunner``. This is intended for training scenarios where the
161+
training algorithm drives compilation. This model runner has no special
162+
dependencies, and relies on I/O pipes to communicate with a separate process
163+
- presumably a python training algorithm. We do not envision using this in a
164+
production environment.
165+
166+
Note that training leaves it to the training infrastructure to handle
167+
distributed computing. The assumed architecture has python processes
168+
communicating remotely between themselves, but managing local communication with
169+
clang.
26170

27171
..
28-
TODO(mtrofin): Write this section.
172+
TODO(mtrofin):
173+
- logging, and the use in interactive mode.
174+
- discuss an example (like the inliner)

0 commit comments

Comments
 (0)