Skip to content

Conversation

svkeerthy
Copy link
Contributor

@svkeerthy svkeerthy commented Oct 2, 2025

Refactor MIRVocabulary to improve opcode lookup and add Section enum for better organization. This is useful for embedder lookups (next patches)

(Tracking issue - #141817)

@svkeerthy svkeerthy changed the title MIRVocabulary changes [IR2Vec] Refactor MIR vocabulary to use opcode-based indexing Oct 2, 2025
@svkeerthy svkeerthy marked this pull request as ready for review October 2, 2025 18:17
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-02-mirvocabulary_changes branch from abf8984 to c37a901 Compare October 6, 2025 21:16
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-01-introducing_mir2vec branch from 39467a8 to 69a74b6 Compare October 6, 2025 21:16
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-01-introducing_mir2vec branch from 69a74b6 to 747fe0f Compare October 6, 2025 21:17
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-02-mirvocabulary_changes branch from c37a901 to 0ebc739 Compare October 6, 2025 21:17
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-01-introducing_mir2vec branch from 747fe0f to 049be7b Compare October 6, 2025 22:38
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-02-mirvocabulary_changes branch 2 times, most recently from 5778be1 to ffc59e7 Compare October 6, 2025 22:39
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-01-introducing_mir2vec branch from 049be7b to 3f44240 Compare October 6, 2025 22:39
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-02-mirvocabulary_changes branch from ffc59e7 to 8a80321 Compare October 6, 2025 23:29
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-01-introducing_mir2vec branch from 3f44240 to e0d0999 Compare October 6, 2025 23:35
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-02-mirvocabulary_changes branch from 8a80321 to 4864150 Compare October 6, 2025 23:35
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-01-introducing_mir2vec branch from e0d0999 to 9565a4d Compare October 7, 2025 20:01
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-02-mirvocabulary_changes branch from 4864150 to 671c686 Compare October 7, 2025 20:01
Base automatically changed from users/svkeerthy/10-01-introducing_mir2vec to main October 7, 2025 20:45
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-02-mirvocabulary_changes branch 3 times, most recently from 8e371bd to 4d87a59 Compare October 7, 2025 22:29
@svkeerthy svkeerthy force-pushed the users/svkeerthy/10-02-mirvocabulary_changes branch from 4d87a59 to 0afa19d Compare October 7, 2025 23:07
@svkeerthy svkeerthy changed the title [IR2Vec] Refactor MIR vocabulary to use opcode-based indexing [MIR2Vec] Refactor MIR vocabulary to use opcode-based indexing Oct 7, 2025
Copy link
Contributor Author

svkeerthy commented Oct 7, 2025

Merge activity

  • Oct 7, 11:43 PM UTC: A user started a stack merge that includes this pull request via Graphite.
  • Oct 7, 11:44 PM UTC: @svkeerthy merged this pull request with Graphite.

@svkeerthy svkeerthy merged commit 566040e into main Oct 7, 2025
9 checks passed
@svkeerthy svkeerthy deleted the users/svkeerthy/10-02-mirvocabulary_changes branch October 7, 2025 23:44
@jyknight
Copy link
Member

jyknight commented Oct 8, 2025

This introduces UB. In the error case, the nullptr in:

frame #1: 0x000055555ad9f2ee llc`::getMIR2VecVocabulary() at MIR2Vec.cpp:244:14
   241    if (StrVocabMap.empty()) {
   242      if (Error Err = readVocabulary()) {
   243        emitError(std::move(Err), M.getContext());
-> 244        return mir2vec::MIRVocabulary(std::move(StrVocabMap), nullptr);
   245      }
   246    }
   247 

is passed to the constructor which immediately dereferences it and stores it into a reference (invalid for nullptr):

   frame #0: 0x000055555ad9e01d llc`::MIRVocabulary() at MIR2Vec.cpp:0:1
   50  
   51   MIRVocabulary::MIRVocabulary(VocabMap &&OpcodeEntries,
   52                                const TargetInstrInfo *TII)
-> 53       : TII(*TII) {
   54     // Fixme: Use static factory methods for creating vocabularies instead of
   55     // public constructors
   56     // Early return for invalid inputs - creates empty/invalid vocabulary

Either the TII member needs to be a pointer, or else the constructor can't be called with nullptr.

@svkeerthy
Copy link
Contributor Author

Thanks @jyknight. I will put up a patch shortly.

svkeerthy added a commit that referenced this pull request Oct 9, 2025
Added factory methods for vocabulary creation. This also would fix UB
issue introduced by #161713
svkeerthy added a commit that referenced this pull request Oct 9, 2025
Refactor MIRVocabulary to improve opcode lookup and add Section enum for better organization. This is useful for embedder lookups (next patches)

(Tracking issue - #141817)
svkeerthy added a commit that referenced this pull request Oct 9, 2025
Added factory methods for vocabulary creation. This also would fix UB
issue introduced by #161713
clingfei pushed a commit to clingfei/llvm-project that referenced this pull request Oct 10, 2025
Added factory methods for vocabulary creation. This also would fix UB
issue introduced by llvm#161713
DharuniRAcharya pushed a commit to DharuniRAcharya/llvm-project that referenced this pull request Oct 13, 2025
Added factory methods for vocabulary creation. This also would fix UB
issue introduced by llvm#161713
akadutta pushed a commit to akadutta/llvm-project that referenced this pull request Oct 14, 2025
Added factory methods for vocabulary creation. This also would fix UB
issue introduced by llvm#161713
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants