-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[llvm-ir2vec][MIR2Vec] Supporting MIR mode in triplet and entity generation #164329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: users/svkeerthy/10-17-update_mlgo_doc
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-mlgo @llvm/pr-subscribers-llvm-binary-utilities Author: S. VenkataKeerthy (svkeerthy) ChangesAdd support for Machine IR (MIR) triplet and entity generation in llvm-ir2vec. This change extends llvm-ir2vec to support Machine IR (MIR) in addition to LLVM IR, enabling the generation of training data for MIR2Vec embeddings. MIR2Vec provides machine-level code embeddings that capture target-specific instruction semantics, complementing the target-independent IR2Vec embeddings.
(Partially addresses #162200 ; Tracking issue - #141817) Patch is 150.71 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164329.diff 8 Files Affected:
diff --git a/llvm/docs/CommandGuide/llvm-ir2vec.rst b/llvm/docs/CommandGuide/llvm-ir2vec.rst
index 55fe75d2084b1..f51da065b43d8 100644
--- a/llvm/docs/CommandGuide/llvm-ir2vec.rst
+++ b/llvm/docs/CommandGuide/llvm-ir2vec.rst
@@ -68,32 +68,52 @@ these two modes are used to generate the triplets and entity mappings.
Triplet Generation
~~~~~~~~~~~~~~~~~~
-With the `triplets` subcommand, :program:`llvm-ir2vec` analyzes LLVM IR and extracts
-numeric triplets consisting of opcode IDs, type IDs, and operand IDs. These triplets
+With the `triplets` subcommand, :program:`llvm-ir2vec` analyzes LLVM IR or Machine IR
+and extracts numeric triplets consisting of opcode IDs and operand IDs. These triplets
are generated in the standard format used for knowledge graph embedding training.
-The tool outputs numeric IDs directly using the ir2vec::Vocabulary mapping
-infrastructure, eliminating the need for string-to-ID preprocessing.
+The tool outputs numeric IDs directly using the vocabulary mapping infrastructure,
+eliminating the need for string-to-ID preprocessing.
-Usage:
+Usage for LLVM IR:
.. code-block:: bash
- llvm-ir2vec triplets input.bc -o triplets_train2id.txt
+ llvm-ir2vec triplets --mode=llvm input.bc -o triplets_train2id.txt
+
+Usage for Machine IR:
+
+.. code-block:: bash
+
+ llvm-ir2vec triplets --mode=mir input.mir -o triplets_train2id.txt
Entity Mapping Generation
~~~~~~~~~~~~~~~~~~~~~~~~~
With the `entities` subcommand, :program:`llvm-ir2vec` generates the entity mappings
-supported by IR2Vec in the standard format used for knowledge graph embedding
-training. This subcommand outputs all supported entities (opcodes, types, and
-operands) with their corresponding numeric IDs, and is not specific for an
-LLVM IR file.
+supported by IR2Vec or MIR2Vec in the standard format used for knowledge graph embedding
+training. This subcommand outputs all supported entities with their corresponding numeric IDs.
+
+For LLVM IR, entities include opcodes, types, and operands. For Machine IR, entities include
+machine opcodes, common operands, and register classes (both physical and virtual).
+
+Usage for LLVM IR:
-Usage:
+.. code-block:: bash
+
+ llvm-ir2vec entities --mode=llvm -o entity2id.txt
+
+Usage for Machine IR:
.. code-block:: bash
- llvm-ir2vec entities -o entity2id.txt
+ llvm-ir2vec entities --mode=mir input.mir -o entity2id.txt
+
+.. note::
+
+ For LLVM IR mode, the entity mapping is target-independent and does not require an input file.
+ For Machine IR mode, an input .mir file is required to determine the target architecture,
+ as entity mappings vary by target (different architectures have different instruction sets
+ and register classes).
Embedding Generation
~~~~~~~~~~~~~~~~~~~~
@@ -222,12 +242,17 @@ Subcommand-specific options:
.. option:: <input-file>
- The input LLVM IR or bitcode file to process. This positional argument is
- required for the `triplets` subcommand.
+ The input LLVM IR/bitcode file (.ll/.bc) or Machine IR file (.mir) to process.
+ This positional argument is required for the `triplets` subcommand.
**entities** subcommand:
- No subcommand-specific options.
+.. option:: <input-file>
+
+ The input Machine IR file (.mir) to process. This positional argument is required
+ for the `entities` subcommand when using ``--mode=mir``, as the entity mappings
+ are target-specific. For ``--mode=llvm``, no input file is required as IR2Vec
+ entity mappings are target-independent.
OUTPUT FORMAT
-------------
@@ -240,19 +265,37 @@ metadata headers. The format includes:
.. code-block:: text
- MAX_RELATIONS=<max_relations_count>
+ MAX_RELATION=<max_relation_count>
<head_entity_id> <tail_entity_id> <relation_id>
<head_entity_id> <tail_entity_id> <relation_id>
...
Each line after the metadata header represents one instruction relationship,
-with numeric IDs for head entity, relation, and tail entity. The metadata
-header (MAX_RELATIONS) provides counts for post-processing and training setup.
+with numeric IDs for head entity, tail entity, and relation type. The metadata
+header (MAX_RELATION) indicates the maximum relation ID used.
+
+**Relation Types:**
+
+For LLVM IR (IR2Vec):
+ * **0** = Type relationship (instruction to its type)
+ * **1** = Next relationship (sequential instructions)
+ * **2+** = Argument relationships (Arg0, Arg1, Arg2, ...)
+
+For Machine IR (MIR2Vec):
+ * **0** = Next relationship (sequential instructions)
+ * **1+** = Argument relationships (Arg0, Arg1, Arg2, ...)
+
+**Entity IDs:**
+
+For LLVM IR: Entity IDs represent opcodes, types, and operands as defined by the IR2Vec vocabulary.
+
+For Machine IR: Entity IDs represent machine opcodes, common operands (immediate, frame index, etc.),
+physical register classes, and virtual register classes as defined by the MIR2Vec vocabulary. The entity layout is target-specific.
Entity Mode Output
~~~~~~~~~~~~~~~~~~
-In entity mode, the output consists of entity mapping in the format:
+In entity mode, the output consists of entity mappings in the format:
.. code-block:: text
@@ -264,6 +307,13 @@ In entity mode, the output consists of entity mapping in the format:
The first line contains the total number of entities, followed by one entity
mapping per line with tab-separated entity string and numeric ID.
+For LLVM IR, entities include instruction opcodes (e.g., "Add", "Ret"), types
+(e.g., "INT", "PTR"), and operand kinds.
+
+For Machine IR, entities include machine opcodes (e.g., "COPY", "ADD"),
+common operands (e.g., "Immediate", "FrameIndex"), physical register classes
+(e.g., "PhyReg_GR32"), and virtual register classes (e.g., "VirtReg_GR32").
+
Embedding Mode Output
~~~~~~~~~~~~~~~~~~~~~
diff --git a/llvm/include/llvm/CodeGen/MIR2Vec.h b/llvm/include/llvm/CodeGen/MIR2Vec.h
index f47d9abb042d8..696fe3957930e 100644
--- a/llvm/include/llvm/CodeGen/MIR2Vec.h
+++ b/llvm/include/llvm/CodeGen/MIR2Vec.h
@@ -111,6 +111,11 @@ class MIRVocabulary {
size_t TotalEntries = 0;
} Layout;
+ // ToDo: See if we can have only one reg classes section instead of physical
+ // and virtual separate sections in the vocabulary. This would reduce the
+ // number of vocabulary entities significantly.
+ // We can potentially distinguish physical and virtual registers by
+ // considering them as a separate feature.
enum class Section : unsigned {
Opcodes = 0,
CommonOperands = 1,
@@ -125,7 +130,7 @@ class MIRVocabulary {
// Some instructions have optional register operands that may be NoRegister.
// We return a zero vector in such cases.
- mutable Embedding ZeroEmbedding;
+ Embedding ZeroEmbedding;
// We have specialized MO_Register handling in the Register operand section,
// so we don't include it here. Also, no MO_DbgInstrRef for now.
@@ -185,6 +190,25 @@ class MIRVocabulary {
return Storage[static_cast<unsigned>(SectionID)][LocalIndex];
}
+ /// Get entity ID (flat index) for a common operand type
+ /// This is used for triplet generation
+ unsigned getEntityIDForCommonOperand(
+ MachineOperand::MachineOperandType OperandType) const {
+ return Layout.CommonOperandBase + getCommonOperandIndex(OperandType);
+ }
+
+ /// Get entity ID (flat index) for a register
+ /// This is used for triplet generation
+ unsigned getEntityIDForRegister(Register Reg) const {
+ if (!Reg.isValid() || Reg.isStack())
+ return Layout
+ .VirtRegBase; // Return VirtRegBase for invalid/stack registers
+ unsigned LocalIndex = getRegisterOperandIndex(Reg);
+ size_t BaseOffset =
+ Reg.isPhysical() ? Layout.PhyRegBase : Layout.VirtRegBase;
+ return BaseOffset + LocalIndex;
+ }
+
public:
/// Static method for extracting base opcode names (public for testing)
static std::string extractBaseOpcodeName(StringRef InstrName);
@@ -201,6 +225,20 @@ class MIRVocabulary {
unsigned getDimension() const { return Storage.getDimension(); }
+ /// Get entity ID (flat index) for an opcode
+ /// This is used for triplet generation
+ unsigned getEntityIDForOpcode(unsigned Opcode) const {
+ return Layout.OpcodeBase + getCanonicalOpcodeIndex(Opcode);
+ }
+
+ /// Get entity ID (flat index) for a machine operand
+ /// This is used for triplet generation
+ unsigned getEntityIDForMachineOperand(const MachineOperand &MO) const {
+ if (MO.getType() == MachineOperand::MO_Register)
+ return getEntityIDForRegister(MO.getReg());
+ return getEntityIDForCommonOperand(MO.getType());
+ }
+
// Accessor methods
const Embedding &operator[](unsigned Opcode) const {
unsigned LocalIndex = getCanonicalOpcodeIndex(Opcode);
diff --git a/llvm/test/tools/llvm-ir2vec/entities.mir b/llvm/test/tools/llvm-ir2vec/entities.mir
new file mode 100644
index 0000000000000..60d9c7a783c4c
--- /dev/null
+++ b/llvm/test/tools/llvm-ir2vec/entities.mir
@@ -0,0 +1,28 @@
+# REQUIRES: x86_64-linux
+# RUN: llvm-ir2vec entities --mode=mir %s -o 2>&1 %t1.log
+# RUN: diff %S/output/reference_x86_entities.txt %t1.log
+
+--- |
+ target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+ target triple = "x86_64-unknown-linux-gnu"
+
+ define dso_local noundef i32 @test_function(i32 noundef %a) {
+ entry:
+ ret i32 %a
+ }
+...
+---
+name: test_function
+alignment: 16
+tracksRegLiveness: true
+registers:
+ - { id: 0, class: gr32 }
+liveins:
+ - { reg: '$edi', virtual-reg: '%0' }
+body: |
+ bb.0.entry:
+ liveins: $edi
+
+ %0:gr32 = COPY $edi
+ $eax = COPY %0
+ RET 0, $eax
diff --git a/llvm/test/tools/llvm-ir2vec/output/lit.local.cfg b/llvm/test/tools/llvm-ir2vec/output/lit.local.cfg
new file mode 100644
index 0000000000000..2406f19eebcdd
--- /dev/null
+++ b/llvm/test/tools/llvm-ir2vec/output/lit.local.cfg
@@ -0,0 +1,3 @@
+# Don't treat files in this directory as tests
+# These are reference data files, not test scripts
+config.suffixes = []
diff --git a/llvm/test/tools/llvm-ir2vec/output/reference_triplets.txt b/llvm/test/tools/llvm-ir2vec/output/reference_triplets.txt
new file mode 100644
index 0000000000000..dfbac4ce0c4d3
--- /dev/null
+++ b/llvm/test/tools/llvm-ir2vec/output/reference_triplets.txt
@@ -0,0 +1,33 @@
+MAX_RELATION=4
+187 7072 1
+187 6968 2
+187 187 0
+187 7072 1
+187 6969 2
+187 10 0
+10 7072 1
+10 7072 2
+10 7072 3
+10 6961 4
+10 187 0
+187 6952 1
+187 7072 2
+187 1555 0
+1555 6882 1
+1555 6952 2
+187 7072 1
+187 6968 2
+187 187 0
+187 7072 1
+187 6969 2
+187 601 0
+601 7072 1
+601 7072 2
+601 7072 3
+601 6961 4
+601 187 0
+187 6952 1
+187 7072 2
+187 1555 0
+1555 6882 1
+1555 6952 2
diff --git a/llvm/test/tools/llvm-ir2vec/output/reference_x86_entities.txt b/llvm/test/tools/llvm-ir2vec/output/reference_x86_entities.txt
new file mode 100644
index 0000000000000..dc436d123fd35
--- /dev/null
+++ b/llvm/test/tools/llvm-ir2vec/output/reference_x86_entities.txt
@@ -0,0 +1,7174 @@
+7173
+AAA 0
+AAD 1
+AADD 2
+AAM 3
+AAND 4
+AAS 5
+ABS_F 6
+ABS_Fp 7
+ADC 8
+ADCX 9
+ADD 10
+ADDPDrm 11
+ADDPDrr 12
+ADDPSrm 13
+ADDPSrr 14
+ADDR 15
+ADDSDrm 16
+ADDSDrm_Int 17
+ADDSDrr 18
+ADDSDrr_Int 19
+ADDSSrm 20
+ADDSSrm_Int 21
+ADDSSrr 22
+ADDSSrr_Int 23
+ADDSUBPDrm 24
+ADDSUBPDrr 25
+ADDSUBPSrm 26
+ADDSUBPSrr 27
+ADD_F 28
+ADD_FI 29
+ADD_FPrST 30
+ADD_FST 31
+ADD_Fp 32
+ADD_FpI 33
+ADD_FrST 34
+ADJCALLSTACKDOWN 35
+ADJCALLSTACKUP 36
+ADOX 37
+AESDEC 38
+AESDECLASTrm 39
+AESDECLASTrr 40
+AESDECWIDE 41
+AESDECrm 42
+AESDECrr 43
+AESENC 44
+AESENCLASTrm 45
+AESENCLASTrr 46
+AESENCWIDE 47
+AESENCrm 48
+AESENCrr 49
+AESIMCrm 50
+AESIMCrr 51
+AESKEYGENASSISTrmi 52
+AESKEYGENASSISTrri 53
+AND 54
+ANDN 55
+ANDNPDrm 56
+ANDNPDrr 57
+ANDNPSrm 58
+ANDNPSrr 59
+ANDPDrm 60
+ANDPDrr 61
+ANDPSrm 62
+ANDPSrr 63
+ANNOTATION_LABEL 64
+AOR 65
+ARITH_FENCE 66
+ARPL 67
+ASAN_CHECK_MEMACCESS 68
+AVX 69
+AVX_SET 70
+AXOR 71
+BEXTR 72
+BEXTRI 73
+BLCFILL 74
+BLCI 75
+BLCIC 76
+BLCMSK 77
+BLCS 78
+BLENDPDrmi 79
+BLENDPDrri 80
+BLENDPSrmi 81
+BLENDPSrri 82
+BLENDVPDrm 83
+BLENDVPDrr 84
+BLENDVPSrm 85
+BLENDVPSrr 86
+BLSFILL 87
+BLSI 88
+BLSIC 89
+BLSMSK 90
+BLSR 91
+BOUNDS 92
+BSF 93
+BSR 94
+BSWAP 95
+BT 96
+BTC 97
+BTR 98
+BTS 99
+BUNDLE 100
+BZHI 101
+CALL 102
+CALLpcrel 103
+CATCHRET 104
+CBW 105
+CCMP 106
+CDQ 107
+CDQE 108
+CFCMOV 109
+CFI_INSTRUCTION 110
+CHS_F 111
+CHS_Fp 112
+CLAC 113
+CLC 114
+CLD 115
+CLDEMOTE 116
+CLEANUPRET 117
+CLFLUSH 118
+CLFLUSHOPT 119
+CLGI 120
+CLI 121
+CLRSSBSY 122
+CLTS 123
+CLUI 124
+CLWB 125
+CLZERO 126
+CMC 127
+CMOV 128
+CMOVBE_F 129
+CMOVBE_Fp 130
+CMOVB_F 131
+CMOVB_Fp 132
+CMOVE_F 133
+CMOVE_Fp 134
+CMOVNBE_F 135
+CMOVNBE_Fp 136
+CMOVNB_F 137
+CMOVNB_Fp 138
+CMOVNE_F 139
+CMOVNE_Fp 140
+CMOVNP_F 141
+CMOVNP_Fp 142
+CMOVP_F 143
+CMOVP_Fp 144
+CMOV_FR 145
+CMOV_GR 146
+CMOV_RFP 147
+CMOV_VK 148
+CMOV_VR 149
+CMP 150
+CMPCCXADDmr 151
+CMPPDrmi 152
+CMPPDrri 153
+CMPPSrmi 154
+CMPPSrri 155
+CMPSB 156
+CMPSDrmi 157
+CMPSDrmi_Int 158
+CMPSDrri 159
+CMPSDrri_Int 160
+CMPSL 161
+CMPSQ 162
+CMPSSrmi 163
+CMPSSrmi_Int 164
+CMPSSrri 165
+CMPSSrri_Int 166
+CMPSW 167
+CMPXCHG 168
+COMISDrm 169
+COMISDrm_Int 170
+COMISDrr 171
+COMISDrr_Int 172
+COMISSrm 173
+COMISSrm_Int 174
+COMISSrr 175
+COMISSrr_Int 176
+COMP_FST 177
+COM_FIPr 178
+COM_FIr 179
+COM_FST 180
+COM_FpIr 181
+COM_Fpr 182
+CONVERGENCECTRL_ANCHOR 183
+CONVERGENCECTRL_ENTRY 184
+CONVERGENCECTRL_GLUE 185
+CONVERGENCECTRL_LOOP 186
+COPY 187
+COPY_TO_REGCLASS 188
+CPUID 189
+CQO 190
+CRC 191
+CS_PREFIX 192
+CTEST 193
+CVTDQ 194
+CVTPD 195
+CVTPS 196
+CVTSD 197
+CVTSI 198
+CVTSS 199
+CVTTPD 200
+CVTTPS 201
+CVTTSD 202
+CVTTSS 203
+CWD 204
+CWDE 205
+DAA 206
+DAS 207
+DATA 208
+DBG_INSTR_REF 209
+DBG_LABEL 210
+DBG_PHI 211
+DBG_VALUE 212
+DBG_VALUE_LIST 213
+DEC 214
+DIV 215
+DIVPDrm 216
+DIVPDrr 217
+DIVPSrm 218
+DIVPSrr 219
+DIVR_F 220
+DIVR_FI 221
+DIVR_FPrST 222
+DIVR_FST 223
+DIVR_Fp 224
+DIVR_FpI 225
+DIVR_FrST 226
+DIVSDrm 227
+DIVSDrm_Int 228
+DIVSDrr 229
+DIVSDrr_Int 230
+DIVSSrm 231
+DIVSSrm_Int 232
+DIVSSrr 233
+DIVSSrr_Int 234
+DIV_F 235
+DIV_FI 236
+DIV_FPrST 237
+DIV_FST 238
+DIV_Fp 239
+DIV_FpI 240
+DIV_FrST 241
+DPPDrmi 242
+DPPDrri 243
+DPPSrmi 244
+DPPSrri 245
+DS_PREFIX 246
+DYN_ALLOCA 247
+EH_LABEL 248
+EH_RETURN 249
+EH_SjLj_LongJmp 250
+EH_SjLj_SetJmp 251
+EH_SjLj_Setup 252
+ENCLS 253
+ENCLU 254
+ENCLV 255
+ENCODEKEY 256
+ENDBR 257
+ENQCMD 258
+ENQCMDS 259
+ENTER 260
+ERETS 261
+ERETU 262
+ES_PREFIX 263
+EXTRACTPSmri 264
+EXTRACTPSrri 265
+EXTRACT_SUBREG 266
+EXTRQ 267
+EXTRQI 268
+F 269
+FAKE_USE 270
+FARCALL 271
+FARJMP 272
+FAULTING_OP 273
+FBLDm 274
+FBSTPm 275
+FCOM 276
+FCOMP 277
+FCOMPP 278
+FCOS 279
+FDECSTP 280
+FEMMS 281
+FENTRY_CALL 282
+FFREE 283
+FFREEP 284
+FICOM 285
+FICOMP 286
+FINCSTP 287
+FLDCW 288
+FLDENVm 289
+FLDL 290
+FLDLG 291
+FLDLN 292
+FLDPI 293
+FNCLEX 294
+FNINIT 295
+FNOP 296
+FNSTCW 297
+FNSTSW 298
+FNSTSWm 299
+FP 300
+FPATAN 301
+FPREM 302
+FPTAN 303
+FRNDINT 304
+FRSTORm 305
+FSAVEm 306
+FSCALE 307
+FSIN 308
+FSINCOS 309
+FSTENVm 310
+FS_PREFIX 311
+FXRSTOR 312
+FXSAVE 313
+FXTRACT 314
+FYL 315
+FsFLD 316
+GC_LABEL 317
+GETSEC 318
+GF 319
+GS_PREFIX 320
+G_ABDS 321
+G_ABDU 322
+G_ABS 323
+G_ADD 324
+G_ADDRSPACE_CAST 325
+G_AND 326
+G_ANYEXT 327
+G_ASHR 328
+G_ASSERT_ALIGN 329
+G_ASSERT_SEXT 330
+G_ASSERT_ZEXT 331
+G_ATOMICRMW_ADD 332
+G_ATOMICRMW_AND 333
+G_ATOMICRMW_FADD 334
+G_ATOMICRMW_FMAX 335
+G_ATOMICRMW_FMAXIMUM 336
+G_ATOMICRMW_FMIN 337
+G_ATOMICRMW_FMINIMUM 338
+G_ATOMICRMW_FSUB 339
+G_ATOMICRMW_MAX 340
+G_ATOMICRMW_MIN 341
+G_ATOMICRMW_NAND 342
+G_ATOMICRMW_OR 343
+G_ATOMICRMW_SUB 344
+G_ATOMICRMW_UDEC_WRAP 345
+G_ATOMICRMW_UINC_WRAP 346
+G_ATOMICRMW_UMAX 347
+G_ATOMICRMW_UMIN 348
+G_ATOMICRMW_USUB_COND 349
+G_ATOMICRMW_USUB_SAT 350
+G_ATOMICRMW_XCHG 351
+G_ATOMICRMW_XOR 352
+G_ATOMIC_CMPXCHG 353
+G_ATOMIC_CMPXCHG_WITH_SUCCESS 354
+G_BITCAST 355
+G_BITREVERSE 356
+G_BLOCK_ADDR 357
+G_BR 358
+G_BRCOND 359
+G_BRINDIRECT 360
+G_BRJT 361
+G_BSWAP 362
+G_BUILD_VECTOR 363
+G_BUILD_VECTOR_TRUNC 364
+G_BZERO 365
+G_CONCAT_VECTORS 366
+G_CONSTANT 367
+G_CONSTANT_FOLD_BARRIER 368
+G_CONSTANT_POOL 369
+G_CTLZ 370
+G_CTLZ_ZERO_UNDEF 371
+G_CTPOP 372
+G_CTTZ 373
+G_CTTZ_ZERO_UNDEF 374
+G_DEBUGTRAP 375
+G_DYN_STACKALLOC 376
+G_EXTRACT 377
+G_EXTRACT_SUBVECTOR 378
+G_EXTRACT_VECTOR_ELT 379
+G_FABS 380
+G_FACOS 381
+G_FADD 382
+G_FASIN 383
+G_FATAN 384
+G_FCANONICALIZE 385
+G_FCEIL 386
+G_FCMP 387
+G_FCONSTANT 388
+G_FCOPYSIGN 389
+G_FCOS 390
+G_FCOSH 391
+G_FDIV 392
+G_FENCE 393
+G_FEXP 394
+G_FFLOOR 395
+G_FFREXP 396
+G_FILD 397
+G_FIST 398
+G_FLDCW 399
+G_FLDEXP 400
+G_FLOG 401
+G_FMA 402
+G_FMAD 403
+G_FMAXIMUM 404
+G_FMAXIMUMNUM 405
+G_FMAXNUM 406
+G_FMAXNUM_IEEE 407
+G_FMINIMUM 408
+G_FMINIMUMNUM 409
+G_FMINNUM 410
+G_FMINNUM_IEEE 411
+G_FMODF 412
+G_FMUL 413
+G_FNEARBYINT 414
+G_FNEG 415
+G_FNSTCW 416
+G_FPEXT 417
+G_FPOW 418
+G_FPOWI 419
+G_FPTOSI 420
+G_FPTOSI_SAT 421
+G_FPTOUI 422
+G_FPTOUI_SAT 423
+G_FPTRUNC 424
+G_FRAME_INDEX 425
+G_FREEZE 426
+G_FREM 427
+G_FRINT 428
+G_FSHL 429
+G_FSHR 430
+G_FSIN 431
+G_FSINCOS 432
+G_FSINH 433
+G_FSQRT 434
+G_FSUB 435
+G_FTAN 436
+G_FTANH 437
+G_GET_FPENV 438
+G_GET_FPMODE 439
+G_GET_ROUNDING 440
+G_GLOBAL_VALUE 441
+G_ICMP 442
+G_IMPLICIT_DEF 443
+G_INDEXED_LOAD 444
+G_INDEXED_SEXTLOAD 445
+G_INDEXED_STORE 446
+G_INDEXED_ZEXTLOAD 447
+G_INSERT 448
+G_INSERT_SUBVECTOR 449
+G_INSERT_VECTOR_ELT 450
+G_INTRINSIC 451
+G_INTRINSIC_CONVERGENT 452
+G_INTRINSIC_CONVERGENT_W_SIDE_EFFECTS 453
+G_INTRINSIC_FPTRUNC_ROUND 454
+G_INTRINSIC_LLRINT 455
+G_INTRINSIC_LRINT 456
+G_INTRINSIC_ROUND 457
+G_INTRINSIC_ROUNDEVEN 458
+G_INTRINSIC_TRUNC 459
+G_INTRINSIC_W_SIDE_EFFECTS 460
+G_INTTOPTR 461
+G_INVOKE_REGION_START 462
+G_IS_FPCLASS 463
+G_JUMP_TABLE 464
+G_LLROUND 465
+G_LOAD 466
+G_LROUND 467
+G_LSHR 468
+G_MEMCPY 469
+G_MEMCPY_INLINE 470
+G_MEMMOVE 471
+G_MEMSET 472
+G_MERGE_VALUES 473
+G_MUL 474
+G_OR 475
+G_PHI 476
+G_PREFETCH 477
+G_PTRAUTH_GLOBAL_VALUE 478
+G_PTRMASK 479
+G_PTRTOINT 480
+G_PTR_ADD 481
+G_READCYCLECOUNTER 482
+G_READSTEADYCOUNTER 483
+G_READ_REGISTER 484
+G_RESET_FPENV 485
+G_RESET_FPMODE 486
+G_ROTL 487
+G_ROTR 488
+G_SADDE 489
+G_SADDO 490
+G_SADDSAT 491
+G_SBFX 492
+G_SCMP 493
+G_SDIV 494
+G_SDIVFIX 495
+G_SDIVFIXSAT 496
+G_SDIVREM 497
+G_SELECT 498
+G_SET_FPENV 499
+G_SET_FPMODE 500
+G_SET_ROUNDING 501
+G_SEXT 502
+G_SEXTLOAD 503
+G_SEXT_INREG 504
+G_SHL 505
+G_SHUFFLE_VECTOR 506
+G_SITOFP 507
+G_SMAX 508
+G_SMIN 509
+G_SMULFIX 510
+G_SMULFIXSAT 511
+G_SMULH 512
+G_SMULO 513
+G_SPLAT_VECTOR 514
+G_SREM 515
+G_SSHLSAT 516
+G_SSUBE 517
+G_SSUBO 518
+G_SSUBSAT 519
+G_STACKRESTORE 520
+G_STACKSAVE 521
+G_STEP_VECTOR 522
+G_STORE 523
+G_STRICT_FADD 524
+G_STRICT_FDIV 525
+G_STRICT_FLDEXP 526
+G_STRICT_FMA 527
+G_STRICT_FMUL 528
+G_STRICT_FREM 529
+G_STRICT_FSQRT 530
+G_STRICT_FSUB 531
+G_SUB 532
+G_TRAP 533
+G_TRUNC 534
+G_TRUNC_SSAT_S 535
+G_TRUNC_SSAT_U 536
+G_TRUNC_USAT_U 537
+G_UADDE 538
+G_UADDO 539
+G_UADDSAT 540
+G_UBFX 541
+G_UBSANTRAP 542
+G_UCMP 543
+G_UDIV 544
+G_UDIVFIX 545
+G_UDIVFIXSAT 546
+G_UDIVREM 547
+G_UITOFP 548
+G_UMAX 549
+G_UMIN 550
+G_UMULFIX 551
+G_UMULFIXSAT 552
+G_UMULH 553
+G_UMULO 554
+G_UNMERGE_VALUES 555
+G_UREM 556
+G_USHLSAT 557
+G_USUBE 558
+G_USUBO 559
+G_USUBSAT 560
+G_VAARG 561
+G_VASTART 562
+G_VECREDUCE_ADD 563
+G_VECREDUCE_AND 564
+G_VECREDUCE_FADD 565
+G_VECREDUCE_FMAX 566
+G_VECREDUCE_FMAXIMUM 567
+G_VECREDUCE_FMIN 568
+G_VECREDUCE_FMINIMUM 569
+G_VECREDUCE_FMUL 570
+G_VECREDUCE_MUL 571
+G_VECREDUCE_OR 572
+G_VECREDUCE_SEQ_FADD 573
+G_VECREDUCE_SEQ_FMUL 574
+G_VECREDUCE_SMAX 575
+G_VECREDUCE_SMIN 576
+G_VECREDUCE_UMAX 577
+G_VECREDUCE_UMIN 578
+G_VECREDUCE_XOR 579
+G_VECTOR_COMPRESS 580
+G_VSCALE 581
+G_WRITE_REGISTER 582
+G_XOR 583
+G_ZEXT 584
+G_ZEXTLOAD 585
+HADDPDrm 586
+HADDPDrr 587
+HADDPSrm 588
+HADDPSrr 589
+HLT 590
+HRESET 591
+HSUBPDrm 592
+HSUBPDrr 593
+HSUBPSrm 594
+HSUBPSrr 595
+ICALL_BRANCH_FUNNEL 596
+IDIV 597
+ILD_F 598
+ILD_Fp 599
+IMPLICIT_DEF 600
+IMUL 601
+IMULZU 602
+IN 603
+INC 604
+INCSSPD 605
+INCSSPQ 606
+INDIRECT_THUNK_CALL 607
+INDIRECT_THUNK_TCRETURN 608
+INIT_UNDEF 609
+INLINEASM 610
+INLINEASM_BR 611
+INSB 612
+INSERTPSrmi 613
+INSERTPSrri 614
+INSERTQ 615
+INSERTQI 616
+INSERT_SUBREG 617
+INSL 618
+INSW 619
+INT 620
+INTO 621
+INVD 622
+INVEPT 623
+INVLPG 624
+INVLPGA 625
+INVLPGB 626
+INVPCID 627
+INVVPID 628
+IRET 629
+ISTT_FP 630
+ISTT_Fp 631
+IST_F 632
+IST_FP 633
+IST_Fp 634
+Int_eh_sjlj_setup_dispatch 635
+JCC 636
+JCXZ 637
+JEC...
[truncated]
|
65fe880
to
2c5f2d3
Compare
8a4c8cb
to
3ed5648
Compare
2c5f2d3
to
869c0a3
Compare
3ed5648
to
669ca87
Compare
669ca87
to
ea491e0
Compare
869c0a3
to
1cd5b76
Compare
ea491e0
to
28474c5
Compare
1cd5b76
to
3d9c8cd
Compare
Add support for Machine IR (MIR) triplet and entity generation in llvm-ir2vec.
This change extends llvm-ir2vec to support Machine IR (MIR) in addition to LLVM IR, enabling the generation of training data for MIR2Vec embeddings. MIR2Vec provides machine-level code embeddings that capture target-specific instruction semantics, complementing the target-independent IR2Vec embeddings.
--mode=mir
option to specify MIR mode (vs LLVM IR mode)(Partially addresses #162200 ; Tracking issue - #141817)