@@ -174,3 +174,151 @@ clang.
174174 TODO(mtrofin):
175175 - logging, and the use in interactive mode.
176176 - discuss an example (like the inliner)
177+
178+ IR2Vec Embeddings
179+ =================
180+
181+ IR2Vec is a program embedding approach designed specifically for LLVM IR. It
182+ is implemented as a function analysis pass in LLVM. The IR2Vec embeddings
183+ capture syntactic, semantic, and structural properties of the IR through
184+ learned representations. These representations are obtained as a JSON
185+ vocabulary that maps the entities of the IR (opcodes, types, operands) to
186+ n-dimensional floating point vectors (embeddings).
187+
188+ With IR2Vec, representation at different granularities of IR, such as
189+ instructions, functions, and basic blocks, can be obtained. Representations
190+ of loops and regions can be derived from these representations, which can be
191+ useful in different scenarios. The representations can be useful for various
192+ downstream tasks, including ML-guided compiler optimizations.
193+
194+ Currently, to use IR2Vec embeddings, the JSON vocabulary first needs to be read
195+ and used to obtain the vocabulary mapping. Then, use this mapping to
196+ derive the representations. In LLVM, this process is implemented using two
197+ independent passes: ``IR2VecVocabAnalysis `` and ``IR2VecAnalysis ``. The former
198+ reads the JSON vocabulary and populates ``IR2VecVocabResult ``, which is then used
199+ by ``IR2VecAnalysis ``.
200+
201+ It is recommended to run ``IR2VecVocabAnalysis `` once, as the
202+ vocabulary typically does not change. In the future, we plan
203+ to improve this process by automatically generating the vocabulary mappings
204+ during build time, eliminating the need for a separate file read.
205+
206+ IR2VecAnalysis Usage
207+ --------------------
208+
209+ To use IR2Vec in an LLVM-based tool or pass, interaction with the analysis
210+ results can be done through the following APIs:
211+
212+ 1. **Including the Header: **
213+
214+ First, include the necessary header file in the source code:
215+
216+ .. code-block :: c++
217+
218+ #include "llvm/Analysis/IR2VecAnalysis.h"
219+
220+ 2. **Accessing the Analysis Results: **
221+
222+ To access the IR2Vec embeddings, obtain the ``IR2VecAnalysis ``
223+ result from the Function Analysis Manager (FAM).
224+
225+ .. code-block :: c++
226+
227+ llvm::FunctionAnalysisManager &FAM = ...; // The FAM instance
228+ llvm::Function &F = ...; // The function to analyze
229+ auto &IR2VecResult = FAM.getResult<llvm::IR2VecAnalysis>(F);
230+
231+ 3. **Checking for Valid Results: **
232+
233+ Ensure that the analysis result is valid before accessing the embeddings:
234+
235+ .. code-block :: c++
236+
237+ if (IR2VecResult.isValid()) {
238+ // Proceed to access embeddings
239+ }
240+
241+ 4. **Retrieving Embeddings: **
242+
243+ The ``IR2VecResult `` provides access to embeddings (currently) at three levels:
244+
245+ - **Instruction Embeddings: **
246+
247+ .. code-block :: c++
248+
249+ const auto &instVecMap = IR2VecResult.getInstVecMap();
250+ // instVecMap is a SmallMapVector<const Instruction*, ir2vec::Embedding, 128>
251+ for (const auto &it : instVecMap) {
252+ const Instruction *I = it.first;
253+ const ir2vec::Embedding &embedding = it.second;
254+ // Use the instruction embedding
255+ }
256+ - **Basic Block Embeddings:* *
257+
258+ .. code-block :: c++
259+
260+ const auto &bbVecMap = IR2VecResult.getBBVecMap();
261+ // bbVecMap is a SmallMapVector<const BasicBlock*, ir2vec::Embedding, 16>
262+ for (const auto &it : bbVecMap) {
263+ const BasicBlock *BB = it.first;
264+ const ir2vec::Embedding &embedding = it.second;
265+ // Use the basic block embedding
266+ }
267+ - **Function Embedding:* *
268+
269+ .. code-block :: c++
270+
271+ const ir2vec::Embedding &funcEmbedding = IR2VecResult.getFunctionVector();
272+ // Use the function embedding
273+
274+ 5. **Working with Embeddings: **
275+
276+ Embeddings are represented as ``std::vector<double> ``. These
277+ vectors as features for machine learning models, compute similarity scores
278+ between different code snippets, or perform other analyses as needed.
279+
280+ Example Usage
281+ ^^^^^^^^^^^^^
282+
283+ .. code-block :: c++
284+
285+ #include "llvm/Analysis/IR2VecAnalysis.h"
286+ #include "llvm/IR/Function.h"
287+ #include "llvm/IR/Instructions.h"
288+ #include "llvm/Passes/PassBuilder.h"
289+
290+ // ... other includes and code ...
291+
292+ void processFunction(llvm::Function &F, llvm::FunctionAnalysisManager &FAM) {
293+ auto &IR2VecResult = FAM.getResult<llvm::IR2VecAnalysis>(F);
294+
295+ if (IR2VecResult.isValid()) {
296+ const auto &instVecMap = IR2VecResult.getInstVecMap();
297+ for (const auto &it : instVecMap) {
298+ const Instruction *I = it.first;
299+ const auto &embedding = it.second;
300+ llvm::errs() << "Instruction: " << *I << "\n ";
301+ llvm::errs() << "Embedding: ";
302+ for (double val : embedding) {
303+ llvm::errs() << val << " ";
304+ }
305+ llvm::errs() << "\n ";
306+ }
307+ } else {
308+ llvm::errs() << "IR2Vec analysis failed for function " << F.getName() << "\n ";
309+ }
310+ }
311+
312+ // ... rest of the pass ...
313+
314+ // In the pass's run method:
315+ // processFunction(F, FAM);
316+
317+ Further Details
318+ ---------------
319+
320+ For more detailed information about the IR2Vec algorithm, its parameters, and
321+ advanced usage, please refer to the original paper:
322+ `IR2Vec: LLVM IR Based Scalable Program Embeddings <https://doi.org/10.1145/3418463 >`_.
323+ The LLVM source code for ``IR2VecAnalysis `` can also be explored to understand the
324+ implementation details.
0 commit comments