Skip to content

Commit 6e0385e

Browse files
committed
Adding documentation
1 parent 784ee22 commit 6e0385e

File tree

1 file changed

+148
-0
lines changed

1 file changed

+148
-0
lines changed

llvm/docs/MLGO.rst

Lines changed: 148 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,3 +174,151 @@ clang.
174174
TODO(mtrofin):
175175
- logging, and the use in interactive mode.
176176
- discuss an example (like the inliner)
177+
178+
IR2Vec Embeddings
179+
=================
180+
181+
IR2Vec is a program embedding approach designed specifically for LLVM IR. It
182+
is implemented as a function analysis pass in LLVM. The IR2Vec embeddings
183+
capture syntactic, semantic, and structural properties of the IR through
184+
learned representations. These representations are obtained as a JSON
185+
vocabulary that maps the entities of the IR (opcodes, types, operands) to
186+
n-dimensional floating point vectors (embeddings).
187+
188+
With IR2Vec, representation at different granularities of IR, such as
189+
instructions, functions, and basic blocks, can be obtained. Representations
190+
of loops and regions can be derived from these representations, which can be
191+
useful in different scenarios. The representations can be useful for various
192+
downstream tasks, including ML-guided compiler optimizations.
193+
194+
Currently, to use IR2Vec embeddings, the JSON vocabulary first needs to be read
195+
and used to obtain the vocabulary mapping. Then, use this mapping to
196+
derive the representations. In LLVM, this process is implemented using two
197+
independent passes: ``IR2VecVocabAnalysis`` and ``IR2VecAnalysis``. The former
198+
reads the JSON vocabulary and populates ``IR2VecVocabResult``, which is then used
199+
by ``IR2VecAnalysis``.
200+
201+
It is recommended to run ``IR2VecVocabAnalysis`` once, as the
202+
vocabulary typically does not change. In the future, we plan
203+
to improve this process by automatically generating the vocabulary mappings
204+
during build time, eliminating the need for a separate file read.
205+
206+
IR2VecAnalysis Usage
207+
--------------------
208+
209+
To use IR2Vec in an LLVM-based tool or pass, interaction with the analysis
210+
results can be done through the following APIs:
211+
212+
1. **Including the Header:**
213+
214+
First, include the necessary header file in the source code:
215+
216+
.. code-block:: c++
217+
218+
#include "llvm/Analysis/IR2VecAnalysis.h"
219+
220+
2. **Accessing the Analysis Results:**
221+
222+
To access the IR2Vec embeddings, obtain the ``IR2VecAnalysis``
223+
result from the Function Analysis Manager (FAM).
224+
225+
.. code-block:: c++
226+
227+
llvm::FunctionAnalysisManager &FAM = ...; // The FAM instance
228+
llvm::Function &F = ...; // The function to analyze
229+
auto &IR2VecResult = FAM.getResult<llvm::IR2VecAnalysis>(F);
230+
231+
3. **Checking for Valid Results:**
232+
233+
Ensure that the analysis result is valid before accessing the embeddings:
234+
235+
.. code-block:: c++
236+
237+
if (IR2VecResult.isValid()) {
238+
// Proceed to access embeddings
239+
}
240+
241+
4. **Retrieving Embeddings:**
242+
243+
The ``IR2VecResult`` provides access to embeddings (currently) at three levels:
244+
245+
- **Instruction Embeddings:**
246+
247+
.. code-block:: c++
248+
249+
const auto &instVecMap = IR2VecResult.getInstVecMap();
250+
// instVecMap is a SmallMapVector<const Instruction*, ir2vec::Embedding, 128>
251+
for (const auto &it : instVecMap) {
252+
const Instruction *I = it.first;
253+
const ir2vec::Embedding &embedding = it.second;
254+
// Use the instruction embedding
255+
}
256+
- **Basic Block Embeddings:**
257+
258+
.. code-block:: c++
259+
260+
const auto &bbVecMap = IR2VecResult.getBBVecMap();
261+
// bbVecMap is a SmallMapVector<const BasicBlock*, ir2vec::Embedding, 16>
262+
for (const auto &it : bbVecMap) {
263+
const BasicBlock *BB = it.first;
264+
const ir2vec::Embedding &embedding = it.second;
265+
// Use the basic block embedding
266+
}
267+
- **Function Embedding:**
268+
269+
.. code-block:: c++
270+
271+
const ir2vec::Embedding &funcEmbedding = IR2VecResult.getFunctionVector();
272+
// Use the function embedding
273+
274+
5. **Working with Embeddings:**
275+
276+
Embeddings are represented as ``std::vector<double>``. These
277+
vectors as features for machine learning models, compute similarity scores
278+
between different code snippets, or perform other analyses as needed.
279+
280+
Example Usage
281+
^^^^^^^^^^^^^
282+
283+
.. code-block:: c++
284+
285+
#include "llvm/Analysis/IR2VecAnalysis.h"
286+
#include "llvm/IR/Function.h"
287+
#include "llvm/IR/Instructions.h"
288+
#include "llvm/Passes/PassBuilder.h"
289+
290+
// ... other includes and code ...
291+
292+
void processFunction(llvm::Function &F, llvm::FunctionAnalysisManager &FAM) {
293+
auto &IR2VecResult = FAM.getResult<llvm::IR2VecAnalysis>(F);
294+
295+
if (IR2VecResult.isValid()) {
296+
const auto &instVecMap = IR2VecResult.getInstVecMap();
297+
for (const auto &it : instVecMap) {
298+
const Instruction *I = it.first;
299+
const auto &embedding = it.second;
300+
llvm::errs() << "Instruction: " << *I << "\n";
301+
llvm::errs() << "Embedding: ";
302+
for (double val : embedding) {
303+
llvm::errs() << val << " ";
304+
}
305+
llvm::errs() << "\n";
306+
}
307+
} else {
308+
llvm::errs() << "IR2Vec analysis failed for function " << F.getName() << "\n";
309+
}
310+
}
311+
312+
// ... rest of the pass ...
313+
314+
// In the pass's run method:
315+
// processFunction(F, FAM);
316+
317+
Further Details
318+
---------------
319+
320+
For more detailed information about the IR2Vec algorithm, its parameters, and
321+
advanced usage, please refer to the original paper:
322+
`IR2Vec: LLVM IR Based Scalable Program Embeddings <https://doi.org/10.1145/3418463>`_.
323+
The LLVM source code for ``IR2VecAnalysis`` can also be explored to understand the
324+
implementation details.

0 commit comments

Comments
 (0)