Skip to content

Conversation

r-belenov
Copy link
Contributor

Currently MCA takes instruction properties from scheduling model. However, some instructions may execute differently depending on external factors - for example, latency of memory instructions may vary differently depending on whether the load comes from L1 cache, L2 or DRAM. While MCA as a static analysis tool cannot model such differences (and currently takes some static decision, e.g. all memory ops are treated as L1 accesses), it makes sense to allow manual modification of instruction properties to model different behavior (e.g. sensitivity of code performance to cache misses in particular load instruction). This patch addresses this need.

The library modification is intentionally generic - arbitrary modifications to InstrDesc are allowed. The tool support is currently limited to changing instruction latencies (single number applies to all output arguments and MaxLatency) via coments in the input assembler code; the format is the like this:

add (%eax), eax // LLVM-MCA-LATENCY:100

Users of MCA library can already make additional customizations; command line tool can be extended in the future.

Note that InstructionView currently shows per-instruction information according to scheduling model and is not affected by this change.

See #133429 for additional clarifications (including explanation why existing customization mechanisms do not provide required functionality)

…to change latency via special comments

* Added customizations to header

* Added customization to implementation

* Use default small vector size for custom instructions

* Typo fixed

* Use std::vector for custom descriptors

* Revert to SmallVector

* Erroneous dereference removed

* Introduced annotated instructions

* Updated getter method

* Updated getter usage

* Fixed type usage

* Revert to MCInst for regions

* Keep annotations in separate map

* Removed usage of annotated instruction

* Use std::optional

* Add setter/getter for explicit latency

* Set instruction latency according to annotation

* Simplify default return

* Revert to explicit default value

* Add missing brace

* Added missing field name

* Typo fixed

* Added missing namespace tags

* Whitespace removed

* Index annotations by smloc

* Search latency by SMLoc

* Use pointer as a map key

* Missing const added

* Added optional annotation to MCStreamerWrapper

* Added interface to annotate instructions

* Move annotations to CodeRegons

* Get explicit latency from Regions

* Erroneous dereference removed

* Add interface to pass latency to MCStreamerWrapper

* Fixed incorrect code placement

* Make Streamer available to CommentConsumer

* Move MCStreamerWrapper definition to make it visible in comment consumers

* Parse latency comment

* Fix drop_front usage

* Fix whitespace skipping on latency annotation

* Added test for latency annotation

* Additional test for latency annotations

* Use load in test instruction

* Use load in test instruction

* Added instruction builder customization

* Added customized instruction creation

* Pass builder to custom creation functor

* Added header with InstrBuilder declaration

* Typo fixed

* Added base for instruction customizations test

* Added missing result object

* Hardcode name for summary view

* Explicitly model single instruction

* Add actual custom latency test

* Typo fixed

* Pass latency to lambda body

* Adding missing const

* Specify lambda return type

* Erroneous argument removed

* Removed unnecessary lambda return type

* Code cleanup

* Code cleanup
@llvmbot
Copy link
Member

llvmbot commented Aug 26, 2025

@llvm/pr-subscribers-tools-llvm-mca

@llvm/pr-subscribers-backend-x86

Author: Roman Belenov (r-belenov)

Changes

Currently MCA takes instruction properties from scheduling model. However, some instructions may execute differently depending on external factors - for example, latency of memory instructions may vary differently depending on whether the load comes from L1 cache, L2 or DRAM. While MCA as a static analysis tool cannot model such differences (and currently takes some static decision, e.g. all memory ops are treated as L1 accesses), it makes sense to allow manual modification of instruction properties to model different behavior (e.g. sensitivity of code performance to cache misses in particular load instruction). This patch addresses this need.

The library modification is intentionally generic - arbitrary modifications to InstrDesc are allowed. The tool support is currently limited to changing instruction latencies (single number applies to all output arguments and MaxLatency) via coments in the input assembler code; the format is the like this:

add (%eax), eax // LLVM-MCA-LATENCY:100

Users of MCA library can already make additional customizations; command line tool can be extended in the future.

Note that InstructionView currently shows per-instruction information according to scheduling model and is not affected by this change.

See #133429 for additional clarifications (including explanation why existing customization mechanisms do not provide required functionality)


Full diff: https://github.com/llvm/llvm-project/pull/155420.diff

11 Files Affected:

  • (modified) llvm/include/llvm/MCA/InstrBuilder.h (+9-2)
  • (modified) llvm/lib/MCA/InstrBuilder.cpp (+12-3)
  • (added) llvm/test/tools/llvm-mca/X86/llvm-mca-markers-13.s (+10)
  • (added) llvm/test/tools/llvm-mca/X86/llvm-mca-markers-14.s (+10)
  • (modified) llvm/tools/llvm-mca/CodeRegion.h (+18)
  • (modified) llvm/tools/llvm-mca/CodeRegionGenerator.cpp (+12)
  • (modified) llvm/tools/llvm-mca/CodeRegionGenerator.h (+48-37)
  • (modified) llvm/tools/llvm-mca/llvm-mca.cpp (+5-1)
  • (modified) llvm/unittests/tools/llvm-mca/MCATestBase.cpp (+3-2)
  • (modified) llvm/unittests/tools/llvm-mca/MCATestBase.h (+6-1)
  • (modified) llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp (+26)
diff --git a/llvm/include/llvm/MCA/InstrBuilder.h b/llvm/include/llvm/MCA/InstrBuilder.h
index e0949d975fa99..9bb8c5b28fdad 100644
--- a/llvm/include/llvm/MCA/InstrBuilder.h
+++ b/llvm/include/llvm/MCA/InstrBuilder.h
@@ -25,6 +25,7 @@
 #include "llvm/MCA/Support.h"
 #include "llvm/Support/Compiler.h"
 #include "llvm/Support/Error.h"
+#include <functional>
 
 namespace llvm {
 namespace mca {
@@ -78,6 +79,10 @@ class InstrBuilder {
   DenseMap<std::pair<hash_code, unsigned>, std::unique_ptr<const InstrDesc>>
       VariantDescriptors;
 
+  // These descriptors are customized for particular instructions and cannot
+  // be reused
+  SmallVector<std::unique_ptr<const InstrDesc>> CustomDescriptors;
+
   bool FirstCallInst;
   bool FirstReturnInst;
   unsigned CallLatency;
@@ -87,7 +92,8 @@ class InstrBuilder {
 
   Expected<unsigned> getVariantSchedClassID(const MCInst &MCI, unsigned SchedClassID);
   Expected<const InstrDesc &>
-  createInstrDescImpl(const MCInst &MCI, const SmallVector<Instrument *> &IVec);
+  createInstrDescImpl(const MCInst &MCI, const SmallVector<Instrument *> &IVec,
+                      std::function<void(InstrDesc&)> Customizer = {});
   Expected<const InstrDesc &>
   getOrCreateInstrDesc(const MCInst &MCI,
                        const SmallVector<Instrument *> &IVec);
@@ -116,7 +122,8 @@ class InstrBuilder {
   void setInstRecycleCallback(InstRecycleCallback CB) { InstRecycleCB = CB; }
 
   LLVM_ABI Expected<std::unique_ptr<Instruction>>
-  createInstruction(const MCInst &MCI, const SmallVector<Instrument *> &IVec);
+  createInstruction(const MCInst &MCI, const SmallVector<Instrument *> &IVec,
+                    std::function<void(InstrDesc&)> Customizer = {});
 };
 } // namespace mca
 } // namespace llvm
diff --git a/llvm/lib/MCA/InstrBuilder.cpp b/llvm/lib/MCA/InstrBuilder.cpp
index cad25a6ddd3f5..4a0f09c4160d9 100644
--- a/llvm/lib/MCA/InstrBuilder.cpp
+++ b/llvm/lib/MCA/InstrBuilder.cpp
@@ -558,7 +558,8 @@ Expected<unsigned> InstrBuilder::getVariantSchedClassID(const MCInst &MCI,
 
 Expected<const InstrDesc &>
 InstrBuilder::createInstrDescImpl(const MCInst &MCI,
-                                  const SmallVector<Instrument *> &IVec) {
+                                  const SmallVector<Instrument *> &IVec,
+                                  std::function<void(InstrDesc&)> Customizer) {
   assert(STI.getSchedModel().hasInstrSchedModel() &&
          "Itineraries are not yet supported!");
 
@@ -632,6 +633,12 @@ InstrBuilder::createInstrDescImpl(const MCInst &MCI,
     return std::move(Err);
 
   // Now add the new descriptor.
+
+  if (Customizer) {
+    Customizer(*ID);
+    return *CustomDescriptors.emplace_back(std::move(ID));
+  }
+  
   bool IsVariadic = MCDesc.isVariadic();
   if ((ID->IsRecyclable = !IsVariadic && !IsVariant)) {
     auto DKey = std::make_pair(MCI.getOpcode(), SchedClassID);
@@ -675,8 +682,10 @@ STATISTIC(NumVariantInst, "Number of MCInsts that doesn't have static Desc");
 
 Expected<std::unique_ptr<Instruction>>
 InstrBuilder::createInstruction(const MCInst &MCI,
-                                const SmallVector<Instrument *> &IVec) {
-  Expected<const InstrDesc &> DescOrErr = getOrCreateInstrDesc(MCI, IVec);
+                                const SmallVector<Instrument *> &IVec,
+                                std::function<void(InstrDesc&)> Customizer) {
+  Expected<const InstrDesc &> DescOrErr = Customizer? createInstrDescImpl(MCI, IVec, Customizer) :
+      getOrCreateInstrDesc(MCI, IVec);
   if (!DescOrErr)
     return DescOrErr.takeError();
   const InstrDesc &D = *DescOrErr;
diff --git a/llvm/test/tools/llvm-mca/X86/llvm-mca-markers-13.s b/llvm/test/tools/llvm-mca/X86/llvm-mca-markers-13.s
new file mode 100644
index 0000000000000..9115ffadbdaee
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/X86/llvm-mca-markers-13.s
@@ -0,0 +1,10 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=10 %s 2>&1 | FileCheck %s
+
+add (%eax), %eax // LLVM-MCA-LATENCY : 100
+mov %eax, (%ebx)
+
+# CHECK:      Iterations:        10
+# CHECK-NEXT: Instructions:      20
+# CHECK-NEXT: Total Cycles:      1004
+# CHECK-NEXT: Total uOps:        20
diff --git a/llvm/test/tools/llvm-mca/X86/llvm-mca-markers-14.s b/llvm/test/tools/llvm-mca/X86/llvm-mca-markers-14.s
new file mode 100644
index 0000000000000..41da9f4b4cef1
--- /dev/null
+++ b/llvm/test/tools/llvm-mca/X86/llvm-mca-markers-14.s
@@ -0,0 +1,10 @@
+# NOTE: Assertions have been autogenerated by utils/update_mca_test_checks.py
+# RUN: llvm-mca -mtriple=x86_64-unknown-unknown -mcpu=btver2 -iterations=10 %s 2>&1 | FileCheck %s
+
+add (%eax), %eax // LLVM-MCA-LATENCY : 100
+mov %eax, (%ebx) // LLVM-MCA-LATENCY : 100
+
+# CHECK:      Iterations:        10
+# CHECK-NEXT: Instructions:      20
+# CHECK-NEXT: Total Cycles:      1103
+# CHECK-NEXT: Total uOps:        20
diff --git a/llvm/tools/llvm-mca/CodeRegion.h b/llvm/tools/llvm-mca/CodeRegion.h
index c04ea51169cfb..d16f976fe101e 100644
--- a/llvm/tools/llvm-mca/CodeRegion.h
+++ b/llvm/tools/llvm-mca/CodeRegion.h
@@ -69,10 +69,15 @@
 #include "llvm/Support/SMLoc.h"
 #include "llvm/Support/SourceMgr.h"
 #include <vector>
+#include <utility>
 
 namespace llvm {
 namespace mca {
 
+struct InstAnnotation {
+  std::optional<unsigned> Latency;
+};
+
 /// A region of assembly code.
 ///
 /// It identifies a sequence of machine instructions.
@@ -155,6 +160,9 @@ class CodeRegions {
   llvm::StringMap<unsigned> ActiveRegions;
   bool FoundErrors;
 
+  // Annotations specified in comments, indexed by SMLoc value
+  llvm::DenseMap<const char*, InstAnnotation> Annotations;
+
 public:
   CodeRegions(llvm::SourceMgr &S) : SM(S), FoundErrors(false) {}
   virtual ~CodeRegions() = default;
@@ -170,6 +178,16 @@ class CodeRegions {
   void addInstruction(const llvm::MCInst &Instruction);
   llvm::SourceMgr &getSourceMgr() const { return SM; }
 
+  void Annotate(llvm::SMLoc Loc, const InstAnnotation& A) { Annotations[Loc.getPointer()] = A; }
+  std::optional<unsigned> getExplicitLatency(llvm::SMLoc Loc) const {
+    const auto It = Annotations.find(Loc.getPointer());
+    if (It != Annotations.end()) {
+      return It->second.Latency;
+    } else {
+      return {};
+    }
+  }
+
   llvm::ArrayRef<llvm::MCInst> getInstructionSequence(unsigned Idx) const {
     return Regions[Idx]->getInstructions();
   }
diff --git a/llvm/tools/llvm-mca/CodeRegionGenerator.cpp b/llvm/tools/llvm-mca/CodeRegionGenerator.cpp
index f7f929e49ead9..767f92bf6f935 100644
--- a/llvm/tools/llvm-mca/CodeRegionGenerator.cpp
+++ b/llvm/tools/llvm-mca/CodeRegionGenerator.cpp
@@ -95,6 +95,18 @@ void AnalysisRegionCommentConsumer::HandleComment(SMLoc Loc,
     return;
 
   Comment = Comment.drop_front(Position);
+  if (Comment.starts_with("LLVM-MCA-LATENCY")) {
+      auto Parts = Comment.split(':');
+      Position = Parts.second.find_first_not_of(" \t");
+      if (Position >= Parts.second.size())
+        return;
+      auto LatStr = Parts.second.drop_front(Position);
+      unsigned Latency = 0;
+      if (!LatStr.getAsInteger(10, Latency))
+        Streamer.AddLatencyAnnotation(Latency);
+      return;
+  }
+
   if (Comment.consume_front("LLVM-MCA-END")) {
     // Skip spaces and tabs.
     Position = Comment.find_first_not_of(" \t");
diff --git a/llvm/tools/llvm-mca/CodeRegionGenerator.h b/llvm/tools/llvm-mca/CodeRegionGenerator.h
index a48c67a22f27b..2c632be5e7f53 100644
--- a/llvm/tools/llvm-mca/CodeRegionGenerator.h
+++ b/llvm/tools/llvm-mca/CodeRegionGenerator.h
@@ -31,6 +31,50 @@
 namespace llvm {
 namespace mca {
 
+// This class provides the callbacks that occur when parsing input assembly.
+class MCStreamerWrapper : public MCStreamer {
+protected:
+  CodeRegions &Regions;
+  std::optional<InstAnnotation> CurrentAnnotation;
+
+public:
+  MCStreamerWrapper(MCContext &Context, mca::CodeRegions &R)
+      : MCStreamer(Context), Regions(R) {}
+
+  // We only want to intercept the emission of new instructions.
+  void emitInstruction(const MCInst &Inst,
+                       const MCSubtargetInfo & /* unused */) override {
+    Regions.addInstruction(Inst);
+    if (CurrentAnnotation) {
+      Regions.Annotate(Inst.getLoc(), *CurrentAnnotation);
+      CurrentAnnotation = {};
+    }
+  }
+
+  void AddLatencyAnnotation(unsigned Lat) {
+    if (!CurrentAnnotation) CurrentAnnotation = InstAnnotation();
+    CurrentAnnotation->Latency = Lat;
+  }
+
+  bool emitSymbolAttribute(MCSymbol *Symbol, MCSymbolAttr Attribute) override {
+    return true;
+  }
+
+  void emitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
+                        Align ByteAlignment) override {}
+  void emitZerofill(MCSection *Section, MCSymbol *Symbol = nullptr,
+                    uint64_t Size = 0, Align ByteAlignment = Align(1),
+                    SMLoc Loc = SMLoc()) override {}
+  void beginCOFFSymbolDef(const MCSymbol *Symbol) override {}
+  void emitCOFFSymbolStorageClass(int StorageClass) override {}
+  void emitCOFFSymbolType(int Type) override {}
+  void endCOFFSymbolDef() override {}
+
+  ArrayRef<MCInst> GetInstructionSequence(unsigned Index) const {
+    return Regions.getInstructionSequence(Index);
+  }
+};
+
 class MCACommentConsumer : public AsmCommentConsumer {
 protected:
   bool FoundError = false;
@@ -44,9 +88,10 @@ class MCACommentConsumer : public AsmCommentConsumer {
 /// A comment consumer that parses strings.  The only valid tokens are strings.
 class AnalysisRegionCommentConsumer : public MCACommentConsumer {
   AnalysisRegions &Regions;
+  MCStreamerWrapper &Streamer;
 
 public:
-  AnalysisRegionCommentConsumer(AnalysisRegions &R) : Regions(R) {}
+  AnalysisRegionCommentConsumer(AnalysisRegions &R, MCStreamerWrapper &S) : Regions(R), Streamer(S) {}
 
   /// Parses a comment. It begins a new region if it is of the form
   /// LLVM-MCA-BEGIN. It ends a region if it is of the form LLVM-MCA-END.
@@ -82,40 +127,6 @@ class InstrumentRegionCommentConsumer : public MCACommentConsumer {
   InstrumentManager &getInstrumentManager() { return IM; }
 };
 
-// This class provides the callbacks that occur when parsing input assembly.
-class MCStreamerWrapper : public MCStreamer {
-protected:
-  CodeRegions &Regions;
-
-public:
-  MCStreamerWrapper(MCContext &Context, mca::CodeRegions &R)
-      : MCStreamer(Context), Regions(R) {}
-
-  // We only want to intercept the emission of new instructions.
-  void emitInstruction(const MCInst &Inst,
-                       const MCSubtargetInfo & /* unused */) override {
-    Regions.addInstruction(Inst);
-  }
-
-  bool emitSymbolAttribute(MCSymbol *Symbol, MCSymbolAttr Attribute) override {
-    return true;
-  }
-
-  void emitCommonSymbol(MCSymbol *Symbol, uint64_t Size,
-                        Align ByteAlignment) override {}
-  void emitZerofill(MCSection *Section, MCSymbol *Symbol = nullptr,
-                    uint64_t Size = 0, Align ByteAlignment = Align(1),
-                    SMLoc Loc = SMLoc()) override {}
-  void beginCOFFSymbolDef(const MCSymbol *Symbol) override {}
-  void emitCOFFSymbolStorageClass(int StorageClass) override {}
-  void emitCOFFSymbolType(int Type) override {}
-  void endCOFFSymbolDef() override {}
-
-  ArrayRef<MCInst> GetInstructionSequence(unsigned Index) const {
-    return Regions.getInstructionSequence(Index);
-  }
-};
-
 class InstrumentMCStreamer : public MCStreamerWrapper {
   InstrumentManager &IM;
 
@@ -210,15 +221,15 @@ class AsmCodeRegionGenerator : public virtual CodeRegionGenerator {
 
 class AsmAnalysisRegionGenerator final : public AnalysisRegionGenerator,
                                          public AsmCodeRegionGenerator {
-  AnalysisRegionCommentConsumer CC;
   MCStreamerWrapper Streamer;
+  AnalysisRegionCommentConsumer CC;
 
 public:
   AsmAnalysisRegionGenerator(const Target &T, llvm::SourceMgr &SM, MCContext &C,
                              const MCAsmInfo &A, const MCSubtargetInfo &S,
                              const MCInstrInfo &I)
       : AnalysisRegionGenerator(SM), AsmCodeRegionGenerator(T, C, A, S, I),
-        CC(Regions), Streamer(Ctx, Regions) {}
+        Streamer(Ctx, Regions), CC(Regions, Streamer) {}
 
   MCACommentConsumer *getCommentConsumer() override { return &CC; };
   CodeRegions &getRegions() override { return Regions; };
diff --git a/llvm/tools/llvm-mca/llvm-mca.cpp b/llvm/tools/llvm-mca/llvm-mca.cpp
index 4b0611186d21f..5f653ffcd83ed 100644
--- a/llvm/tools/llvm-mca/llvm-mca.cpp
+++ b/llvm/tools/llvm-mca/llvm-mca.cpp
@@ -633,7 +633,11 @@ int main(int argc, char **argv) {
       const SmallVector<mca::Instrument *> Instruments =
           InstrumentRegions.getActiveInstruments(Loc);
 
-      Expected<std::unique_ptr<mca::Instruction>> Inst =
+      auto Latency = Regions.getExplicitLatency(Loc);
+      Expected<std::unique_ptr<mca::Instruction>> Inst = Latency ?
+          IB.createInstruction(MCI, Instruments, [=](llvm::mca::InstrDesc& ID) {
+            for (auto& W : ID.Writes) W.Latency = *Latency;
+            ID.MaxLatency = *Latency; }) :
           IB.createInstruction(MCI, Instruments);
       if (!Inst) {
         if (auto NewE = handleErrors(
diff --git a/llvm/unittests/tools/llvm-mca/MCATestBase.cpp b/llvm/unittests/tools/llvm-mca/MCATestBase.cpp
index cd3258ef96d64..96413168b8f5e 100644
--- a/llvm/unittests/tools/llvm-mca/MCATestBase.cpp
+++ b/llvm/unittests/tools/llvm-mca/MCATestBase.cpp
@@ -61,7 +61,8 @@ void MCATestBase::SetUp() {
 
 Error MCATestBase::runBaselineMCA(json::Object &Result, ArrayRef<MCInst> Insts,
                                   ArrayRef<mca::View *> Views,
-                                  const mca::PipelineOptions *PO) {
+                                  const mca::PipelineOptions *PO,
+                                  Builder B) {
   mca::Context MCA(*MRI, *STI);
 
   // Default InstrumentManager
@@ -72,7 +73,7 @@ Error MCATestBase::runBaselineMCA(json::Object &Result, ArrayRef<MCInst> Insts,
   SmallVector<std::unique_ptr<mca::Instruction>> LoweredInsts;
   for (const auto &MCI : Insts) {
     Expected<std::unique_ptr<mca::Instruction>> Inst =
-        IB.createInstruction(MCI, Instruments);
+        B ? B(IB, MCI, Instruments) : IB.createInstruction(MCI, Instruments);
     if (!Inst) {
       if (auto NewE =
               handleErrors(Inst.takeError(),
diff --git a/llvm/unittests/tools/llvm-mca/MCATestBase.h b/llvm/unittests/tools/llvm-mca/MCATestBase.h
index 66e20a45c96ce..399a8e892bc28 100644
--- a/llvm/unittests/tools/llvm-mca/MCATestBase.h
+++ b/llvm/unittests/tools/llvm-mca/MCATestBase.h
@@ -24,6 +24,7 @@
 #include "llvm/MC/MCTargetOptions.h"
 #include "llvm/MC/TargetRegistry.h"
 #include "llvm/MCA/Context.h"
+#include "llvm/MCA/InstrBuilder.h"
 #include "llvm/TargetParser/SubtargetFeature.h"
 #include "llvm/TargetParser/Triple.h"
 
@@ -70,12 +71,16 @@ class MCATestBase : public ::testing::Test {
 
   void SetUp() override;
 
+  using Builder = std::function<Expected<std::unique_ptr<mca::Instruction>>
+      (mca::InstrBuilder&, const MCInst&, const SmallVector<mca::Instrument *>&)>;
+
   /// Utility function to run MCA with (nearly) the same configuration as the
   /// `llvm-mca` tool to verify result correctness.
   /// This function only displays on SummaryView by default.
   virtual Error runBaselineMCA(json::Object &Result, ArrayRef<MCInst> Insts,
                                ArrayRef<mca::View *> Views = {},
-                               const mca::PipelineOptions *PO = nullptr);
+                               const mca::PipelineOptions *PO = nullptr,
+                               Builder B = {});
 };
 
 } // end namespace mca
diff --git a/llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp b/llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp
index 1a14c687295ca..cf66460b11fc8 100644
--- a/llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp
+++ b/llvm/unittests/tools/llvm-mca/X86/TestIncrementalMCA.cpp
@@ -234,3 +234,29 @@ TEST_F(X86TestBase, TestVariantInstructionsSameAddress) {
   Expected<unsigned> Cycles = P->run();
   ASSERT_TRUE(static_cast<bool>(Cycles));
 }
+
+TEST_F(X86TestBase, TestInstructionCustomization) {
+  const unsigned ExplicitLatency = 100;  
+  SmallVector<MCInst> MCIs;
+  MCInst InstructionToAdd = MCInstBuilder(X86::XOR64rr)
+                                .addReg(X86::RAX)
+                                .addReg(X86::RAX)
+                                .addReg(X86::RAX);
+  MCIs.push_back(InstructionToAdd);
+
+  // Run the baseline.
+  json::Object BaselineResult;
+  auto E = runBaselineMCA(BaselineResult, MCIs, {}, nullptr,
+      [=](InstrBuilder& IB, const MCInst& MCI, const SmallVector<Instrument*>& Instruments) {
+        return IB.createInstruction(MCI, Instruments,
+          [=](InstrDesc& ID) {
+            for (auto& W : ID.Writes) W.Latency = ExplicitLatency;
+            ID.MaxLatency = ExplicitLatency;
+          });
+      });
+  auto *BaselineObj = BaselineResult.getObject("SummaryView");
+  auto V = BaselineObj->getInteger("TotalCycles");
+  ASSERT_TRUE(V);
+  // Additional 3 cycles for Dispatch, Executed and Retired states
+  ASSERT_EQ(unsigned(*V), ExplicitLatency + 3) << "Total cycles do not match";
+}

Copy link

⚠️ We detected that you are using a GitHub private e-mail address to contribute to the repo.
Please turn off Keep my email addresses private setting in your account.
See LLVM Developer Policy and LLVM Discourse for more information.

Copy link

github-actions bot commented Aug 26, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add another new callback in InstrumentManager to implement the customizer, and move the parsing logics of LLVM-MCA-LATENCY from CodeRegionGenerator.* to InstrumentManager (either putting it in X86's custom behavior or in InstrumentManager's base implementation) because InstrumentManager was created exactly for supporting custom annotations like LLVM-MCA-LATENCY.

Expected<const InstrDesc &>
createInstrDescImpl(const MCInst &MCI, const SmallVector<Instrument *> &IVec);
createInstrDescImpl(const MCInst &MCI, const SmallVector<Instrument *> &IVec,
std::function<void(InstrDesc &)> Customizer = {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

LLVM_ABI Expected<std::unique_ptr<Instruction>>
createInstruction(const MCInst &MCI, const SmallVector<Instrument *> &IVec);
createInstruction(const MCInst &MCI, const SmallVector<Instrument *> &IVec,
std::function<void(InstrDesc &)> Customizer = {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* Switch to function_ref

* Switch to function_ref

* Check pipeline execution status

* Switch to function_ref
Extended InstrumentManager to allow InstrDesc modification with instruments; added common instrument that can modify instruction latencies.
@r-belenov
Copy link
Contributor Author

Refactored code to use InstrumentManager (not quite polished yet).

Fixed formatting issues
Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your current solution could be (greatly) simplified, please see my inline comment

const MCSubtargetInfo &STI;
const MCInstrInfo &MCII;
bool EnableDefaults;
std::unique_ptr<InstrumentManager> TargetIM;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InstrumentManager is already target-specific, why do you need TargetIM here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have InstrDesc customization generic (e.g. latency change code is generic, other things I do locally also do not require target details) and coexist with target-specific instrumentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to have InstrDesc customization generic (e.g. latency change code is generic, other things I do locally also do not require target details) and coexist with target-specific instrumentation.

I think my concern here is that this additional layer of pointer indirection could be avoid if we could simply put this latency updating logics in the base IM and called it from the target-specific, derived IM when needed, as I described in the sibling comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now the customization in InstrumentManager class and can be reused.

return std::vector<std::unique_ptr<View>>();
}

class CustomInstrument : public Instrument {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be solved in a much simpler way: given a line "LLVM-MCA-LATENCY: 100", the default Instrument could already provide you Desc and Data of "LLVM-MCA-LATENCY:" and "100" respectively. So I don't think you need the CustomInstrument here.
And then for supportsInstrumentType, canCustomize and customize -- you could just put the implementation directly in the function, and call this base implementation for the target-specific override, if a target overrides it. Such that you don't need the slightly confusing TargetIM indirection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given a line "LLVM-MCA-LATENCY: 100", the default Instrument could already provide you Desc and Data of "LLVM-MCA-LATENCY:" and "100" respectively. So I don't think you need the CustomInstrument here.
Ok, I can move the logic to base Instrument implementation
Such that you don't need the slightly confusing TargetIM indirection.
Without TargetIM generic customization will be disabled in case target-specific IM is used (since it will override createInstrument), unless all target IMs call base class createInstrument for unrecognized types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, supportsIntrumentType() is normally overridden in target-specific IMs, while we need to check for both common and target specific instruments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added target IM creation into default IM constructor to simplify the logic in the tool code.

Copy link
Member

@mshockwave mshockwave Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless all target IMs call base class createInstrument for unrecognized types.

yes, thats what i meant when I said "call this base implementation for the target-specific override, if a target overrides it". Calling parent class's base implementation in an override function is a pretty common pattern in LLVM, and I would prefer that pattern to be used here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll change the code to leave the logic in the base IM only without TargetIM indirection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PS Pushed the changes that remove targetIM

@mshockwave
Copy link
Member

We cant just keep the strings and do the parsing in the callbacks called from InstrBuilder, but that looks too expensive for me (especially in case MCA is used as a library and latency already comes in numeric form from some external source, so no parsing is required).

I was indeed thinking about re-parsing every time, but you have a point here and I agree, so maybe we can restore to one of your previous approaches which create a new class derived from Instrument (it was called CustomInstrument IIRC, but we can have some more specific name like LatencyInstrument). That being said, I still don't think we should have the Instrument::customize and Instrument::canCustomize -- I believe we can conditionally static_cast to this new LatencyInstrument by the getDesc it returns.

@r-belenov
Copy link
Contributor Author

r-belenov commented Sep 11, 2025

That being said, I still don't think we should have the Instrument::customize and Instrument::canCustomize -- I believe we can conditionally static_cast to this new LatencyInstrument by the getDesc it returns.

We still need callbacks in InstrBuilder to check whether some instrument can customize InstrDesc (so we cannot reuse the description) and to do the customization, and latency change is not the only possible customization (though it is the only one in the current patch), so I don't like having the code in InstrumentManager that checks for specific instruments.

Latency customization logic is moved to dedicated instrument.
* Update CustomBehaviour.h

* Update TestIncrementalMCA.cpp
<functional> was required for initial approach with customization passed via std::function, now we use IM methods.
Ensure the same value is used to set the latency and test the result
@mshockwave
Copy link
Member

That being said, I still don't think we should have the Instrument::customize and Instrument::canCustomize -- I believe we can conditionally static_cast to this new LatencyInstrument by the getDesc it returns.

We still need callbacks in InstrBuilder to check whether some instrument can customize InstrDesc (so we cannot reuse the description) and to do the customization, and latency change is not the only possible customization (though it is the only one in the current patch), so I don't like having the code in InstrumentManager that checks for specific instruments.

Instrument was originally invented as an opaque "handle" pass to InstrumentManager, and InstrumentManager does the actual works. Otherwise we would have a Instrument::getSchedClassID virtual function (because some instruments are used by InstrumentManager::getSchedClassID). For every new features added into InstrumentManager in the future, for instance InstrumentManager::doFoo, if we also need to add a new virtual function counterpart into Instrument, like Instrument::doFoo, I don't think it'll scale well.

Other than that, I think most parts of this patch are coming into shape, thanks for pushing this work!

Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM w/ the condition that the last batch of minor comments are addressed.
Cheers!

}

bool hasValue() const { return bool(Latency); }
unsigned getLatency() { return *Latency; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should also be const

if (I->getDesc() == LatencyInstrument::DESC_NAME) {
auto LatInst = static_cast<LatencyInstrument *>(I);
if (LatInst->hasValue()) {
auto Latency = LatInst->getLatency();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please spell out the type

runBaselineMCA(json::Object &Result, ArrayRef<MCInst> Insts,
ArrayRef<mca::View *> Views = {},
const mca::PipelineOptions *PO = nullptr,
SmallVector<std::pair<StringRef, StringRef>> Descs = {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use ArrayRef here: https://llvm.org/docs/ProgrammersManual.html#llvm-adt-smallvector-h (see the "Note")

Comment on lines 143 to 148
// Skip spaces and tabs.
unsigned Position = Data.find_first_not_of(" \t");
if (Position >= Data.size())
// We reached the end of the comment. Bail out.
return;
Data = Data.drop_front(Position);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use StringRef::trim to replace this part of the code?

std::optional<unsigned> Latency;

public:
static const llvm::StringRef DESC_NAME;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop "llvm::"

const SmallVector<Instrument *> &IVec) const;

// Return true if instruments can modify instruction description
virtual bool canCustomize(const SmallVector<Instrument *> &IVec) const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use ArrayRef here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Makes sense for getSchedClassID too; didn't touch it since it would break existing API.

virtual bool canCustomize(const SmallVector<Instrument *> &IVec) const;

// Customize instruction description
virtual void customize(const SmallVector<Instrument *> &IVec,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto ArrayRef for IVec

@mshockwave
Copy link
Member

I guess you need help to merge this PR?

@r-belenov
Copy link
Contributor Author

Yes, I don't have merge permissions.

@mshockwave mshockwave merged commit 4a9fdda into llvm:main Sep 19, 2025
10 checks passed
mshockwave pushed a commit that referenced this pull request Sep 23, 2025
Recently added latency customization
([PR](#155420)) does not work
on RISCV since it has target-specific InstrumentManager that overrides
default functionality. Added calls to base class to ensure that common
instruments (including latency customizer) are available.
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 23, 2025
Recently added latency customization
([PR](llvm/llvm-project#155420)) does not work
on RISCV since it has target-specific InstrumentManager that overrides
default functionality. Added calls to base class to ensure that common
instruments (including latency customizer) are available.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants