@@ -6,27 +6,27 @@ llvm-ir2vec - IR2Vec Embedding Generation Tool
6
6
SYNOPSIS
7
7
--------
8
8
9
- :program: `llvm-ir2vec ` [*options *] * input-file *
9
+ :program: `llvm-ir2vec ` [*subcommand *] [* options *]
10
10
11
11
DESCRIPTION
12
12
-----------
13
13
14
14
:program: `llvm-ir2vec ` is a standalone command-line tool for IR2Vec. It
15
15
generates IR2Vec embeddings for LLVM IR and supports triplet generation
16
- for vocabulary training. It provides three main operation modes :
16
+ for vocabulary training. The tool provides three main subcommands :
17
17
18
- 1. **Triplet Mode **: Generates numeric triplets in train2id format for vocabulary
18
+ 1. **triplets **: Generates numeric triplets in train2id format for vocabulary
19
19
training from LLVM IR.
20
20
21
- 2. **Entity Mode **: Generates entity mapping files (entity2id.txt) for vocabulary
21
+ 2. **entities **: Generates entity mapping files (entity2id.txt) for vocabulary
22
22
training.
23
23
24
- 3. **Embedding Mode **: Generates IR2Vec embeddings using a trained vocabulary
24
+ 3. **embeddings **: Generates IR2Vec embeddings using a trained vocabulary
25
25
at different granularity levels (instruction, basic block, or function).
26
26
27
27
The tool is designed to facilitate machine learning applications that work with
28
28
LLVM IR by converting the IR into numerical representations that can be used by
29
- ML models. The triplet mode generates numeric IDs directly instead of string
29
+ ML models. The ` triplets ` subcommand generates numeric IDs directly instead of string
30
30
triplets, streamlining the training data preparation workflow.
31
31
32
32
.. note ::
@@ -53,111 +53,115 @@ for details).
53
53
See `llvm/utils/mlgo-utils/IR2Vec/generateTriplets.py ` for more details on how
54
54
these two modes are used to generate the triplets and entity mappings.
55
55
56
- Triplet Generation Mode
57
- ~~~~~~~~~~~~~~~~~~~~~~~
56
+ Triplet Generation
57
+ ~~~~~~~~~~~~~~~~~~
58
58
59
- In triplet mode , :program: `llvm-ir2vec ` analyzes LLVM IR and extracts numeric
60
- triplets consisting of opcode IDs, type IDs, and operand IDs. These triplets
61
- are generated in the standard format used for knowledge graph embedding training.
62
- The tool outputs numeric IDs directly using the ir2vec::Vocabulary mapping
59
+ With the ` triplets ` subcommand , :program: `llvm-ir2vec ` analyzes LLVM IR and extracts
60
+ numeric triplets consisting of opcode IDs, type IDs, and operand IDs. These triplets
61
+ are generated in the standard format used for knowledge graph embedding training.
62
+ The tool outputs numeric IDs directly using the ir2vec::Vocabulary mapping
63
63
infrastructure, eliminating the need for string-to-ID preprocessing.
64
64
65
65
Usage:
66
66
67
67
.. code-block :: bash
68
68
69
- llvm-ir2vec --mode= triplets input.bc -o triplets_train2id.txt
69
+ llvm-ir2vec triplets input.bc -o triplets_train2id.txt
70
70
71
- Entity Mapping Generation Mode
72
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
71
+ Entity Mapping Generation
72
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
73
73
74
- In entity mode, :program: `llvm-ir2vec ` generates the entity mappings supported by
75
- IR2Vec in the standard format used for knowledge graph embedding training. This
76
- mode outputs all supported entities (opcodes, types, and operands) with their
77
- corresponding numeric IDs, and is not specific for an LLVM IR file.
74
+ With the `entities ` subcommand, :program: `llvm-ir2vec ` generates the entity mappings
75
+ supported by IR2Vec in the standard format used for knowledge graph embedding
76
+ training. This subcommand outputs all supported entities (opcodes, types, and
77
+ operands) with their corresponding numeric IDs, and is not specific for an
78
+ LLVM IR file.
78
79
79
80
Usage:
80
81
81
82
.. code-block :: bash
82
83
83
- llvm-ir2vec --mode= entities -o entity2id.txt
84
+ llvm-ir2vec entities -o entity2id.txt
84
85
85
- Embedding Generation Mode
86
- ~~~~~~~~~~~~~~~~~~~~~~~~~~
86
+ Embedding Generation
87
+ ~~~~~~~~~~~~~~~~~~~~
87
88
88
- In embedding mode , :program: `llvm-ir2vec ` uses a pre-trained vocabulary to
89
+ With the ` embeddings ` subcommand , :program: `llvm-ir2vec ` uses a pre-trained vocabulary to
89
90
generate numerical embeddings for LLVM IR at different levels of granularity.
90
91
91
92
Example Usage:
92
93
93
94
.. code-block :: bash
94
95
95
- llvm-ir2vec --mode= embeddings --ir2vec-vocab-path=vocab.json --level=func input.bc -o embeddings.txt
96
+ llvm-ir2vec embeddings --ir2vec-vocab-path=vocab.json --level=func input.bc -o embeddings.txt
96
97
97
98
OPTIONS
98
99
-------
99
100
100
- .. option :: --mode= <mode >
101
+ Global options:
102
+
103
+ .. option :: -o <filename >
104
+
105
+ Specify the output filename. Use ``- `` to write to standard output (default).
106
+
107
+ .. option :: --help
108
+
109
+ Print a summary of command line options.
101
110
102
- Specify the operation mode. Valid values are :
111
+ Subcommand-specific options :
103
112
104
- * ``triplets `` - Generate triplets for vocabulary training
105
- * ``entities `` - Generate entity mappings for vocabulary training
106
- * ``embeddings `` - Generate embeddings using trained vocabulary (default)
113
+ **embeddings ** subcommand:
114
+
115
+ .. option :: <input-file >
116
+
117
+ The input LLVM IR or bitcode file to process. This positional argument is
118
+ required for the `embeddings ` subcommand.
107
119
108
120
.. option :: --level= <level >
109
121
110
- Specify the embedding generation level. Valid values are:
122
+ Specify the embedding generation level. Valid values are:
111
123
112
- * ``inst `` - Generate instruction-level embeddings
113
- * ``bb `` - Generate basic block-level embeddings
114
- * ``func `` - Generate function-level embeddings (default)
124
+ * ``inst `` - Generate instruction-level embeddings
125
+ * ``bb `` - Generate basic block-level embeddings
126
+ * ``func `` - Generate function-level embeddings (default)
115
127
116
128
.. option :: --function= <name >
117
129
118
- Process only the specified function instead of all functions in the module.
130
+ Process only the specified function instead of all functions in the module.
119
131
120
132
.. option :: --ir2vec-vocab-path= <path >
121
133
122
- Specify the path to the vocabulary file (required for embedding mode ).
123
- The vocabulary file should be in JSON format and contain the trained
124
- vocabulary for embedding generation. See `llvm/lib/Analysis/models `
125
- for pre-trained vocabulary files.
134
+ Specify the path to the vocabulary file (required for embedding generation ).
135
+ The vocabulary file should be in JSON format and contain the trained
136
+ vocabulary for embedding generation. See `llvm/lib/Analysis/models `
137
+ for pre-trained vocabulary files.
126
138
127
139
.. option :: --ir2vec-opc-weight= <weight >
128
140
129
- Specify the weight for opcode embeddings (default: 1.0). This controls
130
- the relative importance of instruction opcodes in the final embedding.
141
+ Specify the weight for opcode embeddings (default: 1.0). This controls
142
+ the relative importance of instruction opcodes in the final embedding.
131
143
132
144
.. option :: --ir2vec-type-weight= <weight >
133
145
134
- Specify the weight for type embeddings (default: 0.5). This controls
135
- the relative importance of type information in the final embedding.
146
+ Specify the weight for type embeddings (default: 0.5). This controls
147
+ the relative importance of type information in the final embedding.
136
148
137
149
.. option :: --ir2vec-arg-weight= <weight >
138
150
139
- Specify the weight for argument embeddings (default: 0.2). This controls
140
- the relative importance of operand information in the final embedding.
151
+ Specify the weight for argument embeddings (default: 0.2). This controls
152
+ the relative importance of operand information in the final embedding.
141
153
142
- .. option :: -o <filename >
143
154
144
- Specify the output filename. Use `` - `` to write to standard output (default).
155
+ ** triplets ** subcommand:
145
156
146
- .. option :: --help
147
-
148
- Print a summary of command line options.
149
-
150
- .. note ::
157
+ .. option :: <input-file >
151
158
152
- ``--level ``, ``--function ``, ``--ir2vec-vocab-path ``, ``--ir2vec-opc-weight ``,
153
- ``--ir2vec-type-weight ``, and ``--ir2vec-arg-weight `` are only used in embedding
154
- mode. These options are ignored in triplet and entity modes.
159
+ The input LLVM IR or bitcode file to process. This positional argument is
160
+ required for the `triplets ` subcommand.
155
161
156
- INPUT FILE FORMAT
157
- -----------------
162
+ **entities ** subcommand:
158
163
159
- :program: `llvm-ir2vec ` accepts LLVM bitcode files (``.bc ``) and LLVM IR files
160
- (``.ll ``) as input. The input file should contain valid LLVM IR.
164
+ No subcommand-specific options.
161
165
162
166
OUTPUT FORMAT
163
167
-------------
0 commit comments