Skip to content

Conversation

@asi-sc
Copy link
Contributor

@asi-sc asi-sc commented Jan 29, 2025

ResourceUsage is a very sparse table. On large input asm sequences it consumes a lot of memory utilizing only a few percents of it (~4% on my benchmark). Reorganization of ResourceUsage to keep only used fields allows saving up to 18% of total memory use by mca or ~850% of input file size (~1.1GB in absolute values in my case).

NB: I use mca rather in unusual way by providing really big input asm sequences and I don't think there would be any noticeable change for small input files, but I still think it worth fixing.

ResourceUsage is a very sparse table. On large input asm sequences it consumes
a lot of memory utilizing only a few percents of it (~4% on my benchmark).
Reorganization of ResourceUsage to keep only used fields allows saving up to
18% of total memory use by mca or ~850% of input file size (~1.1GB in absolute
values in my case).
@llvmbot
Copy link
Member

llvmbot commented Jan 29, 2025

@llvm/pr-subscribers-tools-llvm-mca

Author: Anton Sidorenko (asi-sc)

Changes

ResourceUsage is a very sparse table. On large input asm sequences it consumes a lot of memory utilizing only a few percents of it (~4% on my benchmark). Reorganization of ResourceUsage to keep only used fields allows saving up to 18% of total memory use by mca or ~850% of input file size (~1.1GB in absolute values in my case).

NB: I use mca rather in unusual way by providing really big input asm sequences and I don't think there would be any noticeable change for small input files, but I still think it worth fixing.


Full diff: https://github.com/llvm/llvm-project/pull/124904.diff

2 Files Affected:

  • (modified) llvm/tools/llvm-mca/Views/ResourcePressureView.cpp (+57-18)
  • (modified) llvm/tools/llvm-mca/Views/ResourcePressureView.h (+7-2)
diff --git a/llvm/tools/llvm-mca/Views/ResourcePressureView.cpp b/llvm/tools/llvm-mca/Views/ResourcePressureView.cpp
index f39350f3b45852..26150849243764 100644
--- a/llvm/tools/llvm-mca/Views/ResourcePressureView.cpp
+++ b/llvm/tools/llvm-mca/Views/ResourcePressureView.cpp
@@ -37,8 +37,16 @@ ResourcePressureView::ResourcePressureView(const llvm::MCSubtargetInfo &sti,
   }
 
   NumResourceUnits = R2VIndex;
-  ResourceUsage.resize(NumResourceUnits * (getSource().size() + 1));
-  std::fill(ResourceUsage.begin(), ResourceUsage.end(), 0.0);
+  ResourceUsage.resize(getSource().size());
+
+  ResourceReleaseAtCycles InitValue{0, 0};
+  auto Generator = [&InitValue]() {
+    auto Old = InitValue;
+    ++InitValue.ResourceIdx;
+    return Old;
+  };
+  std::generate_n(std::back_inserter(CommonResourceUsage), NumResourceUnits,
+                  Generator);
 }
 
 void ResourcePressureView::onEvent(const HWInstructionEvent &Event) {
@@ -60,8 +68,19 @@ void ResourcePressureView::onEvent(const HWInstructionEvent &Event) {
     assert(Resource2VecIndex.contains(RR.first));
     unsigned R2VIndex = Resource2VecIndex[RR.first];
     R2VIndex += llvm::countr_zero(RR.second);
-    ResourceUsage[R2VIndex + NumResourceUnits * SourceIdx] += Use.second;
-    ResourceUsage[R2VIndex + NumResourceUnits * Source.size()] += Use.second;
+
+    InstResourceUsage &RU = ResourceUsage[SourceIdx];
+    ResourceReleaseAtCycles NewUsage{R2VIndex, Use.second};
+    auto ResCyclesIt =
+        lower_bound(RU, NewUsage, [](const auto &L, const auto &R) {
+          return L.ResourceIdx < R.ResourceIdx;
+        });
+    if (ResCyclesIt != RU.end() && ResCyclesIt->ResourceIdx == R2VIndex)
+      ResCyclesIt->Cycles += NewUsage.Cycles;
+    else
+      RU.insert(ResCyclesIt, std::move(NewUsage));
+
+    CommonResourceUsage[R2VIndex].Cycles += NewUsage.Cycles;
   }
 }
 
@@ -135,10 +154,17 @@ void ResourcePressureView::printResourcePressurePerIter(raw_ostream &OS) const {
 
   ArrayRef<llvm::MCInst> Source = getSource();
   const unsigned Executions = LastInstructionIdx / Source.size() + 1;
+  auto UsageEntryEnd = CommonResourceUsage.end();
+  auto UsageEntryIt = CommonResourceUsage.begin();
   for (unsigned I = 0, E = NumResourceUnits; I < E; ++I) {
-    double Usage = ResourceUsage[I + Source.size() * E];
-    printResourcePressure(FOS, Usage / Executions, (I + 1) * 7);
+    double Pressure = 0.0;
+    if (UsageEntryIt != UsageEntryEnd && UsageEntryIt->ResourceIdx == I) {
+      Pressure = UsageEntryIt->Cycles / Executions;
+      ++UsageEntryIt;
+    }
+    printResourcePressure(FOS, Pressure, (I + 1) * 7);
   }
+  assert(UsageEntryIt == UsageEntryEnd);
 
   FOS.flush();
   OS << Buffer;
@@ -157,11 +183,17 @@ void ResourcePressureView::printResourcePressurePerInst(raw_ostream &OS) const {
   ArrayRef<llvm::MCInst> Source = getSource();
   const unsigned Executions = LastInstructionIdx / Source.size() + 1;
   for (const MCInst &MCI : Source) {
-    unsigned BaseEltIdx = InstrIndex * NumResourceUnits;
+    auto UsageEntryEnd = ResourceUsage[InstrIndex].end();
+    auto UsageEntryIt = ResourceUsage[InstrIndex].begin();
     for (unsigned J = 0; J < NumResourceUnits; ++J) {
-      double Usage = ResourceUsage[J + BaseEltIdx];
-      printResourcePressure(FOS, Usage / Executions, (J + 1) * 7);
+      double Pressure = 0.0;
+      if (UsageEntryIt != UsageEntryEnd && UsageEntryIt->ResourceIdx == J) {
+        Pressure = UsageEntryIt->Cycles / Executions;
+        ++UsageEntryIt;
+      }
+      printResourcePressure(FOS, Pressure, (J + 1) * 7);
     }
+    assert(UsageEntryIt == UsageEntryEnd);
 
     FOS << printInstructionString(MCI) << '\n';
     FOS.flush();
@@ -180,17 +212,24 @@ json::Value ResourcePressureView::toJSON() const {
   // non-zero values.
   ArrayRef<llvm::MCInst> Source = getSource();
   const unsigned Executions = LastInstructionIdx / Source.size() + 1;
-  for (const auto &R : enumerate(ResourceUsage)) {
-    const ReleaseAtCycles &RU = R.value();
-    if (RU.getNumerator() == 0)
-      continue;
-    unsigned InstructionIndex = R.index() / NumResourceUnits;
-    unsigned ResourceIndex = R.index() % NumResourceUnits;
-    double Usage = RU / Executions;
+
+  auto AddToJSON = [&ResourcePressureInfo, Executions](
+                       const ResourceReleaseAtCycles &RU, unsigned InstIndex) {
+    assert(RU.Cycles.getNumerator() != 0);
+    double Usage = RU.Cycles / Executions;
     ResourcePressureInfo.push_back(
-        json::Object({{"InstructionIndex", InstructionIndex},
-                      {"ResourceIndex", ResourceIndex},
+        json::Object({{"InstructionIndex", InstIndex},
+                      {"ResourceIndex", RU.ResourceIdx},
                       {"ResourceUsage", Usage}}));
+  };
+  for (const auto &R : enumerate(ResourceUsage)) {
+    unsigned InstructionIndex = R.index();
+    for (const auto &RU : R.value())
+      AddToJSON(RU, InstructionIndex);
+  }
+  for (const auto &RU : CommonResourceUsage) {
+    if (RU.Cycles.getNumerator() != 0)
+      AddToJSON(RU, Source.size());
   }
 
   json::Object JO({{"ResourcePressureInfo", std::move(ResourcePressureInfo)}});
diff --git a/llvm/tools/llvm-mca/Views/ResourcePressureView.h b/llvm/tools/llvm-mca/Views/ResourcePressureView.h
index be8ad04102fd05..3deb9e8a21d5ea 100644
--- a/llvm/tools/llvm-mca/Views/ResourcePressureView.h
+++ b/llvm/tools/llvm-mca/Views/ResourcePressureView.h
@@ -77,8 +77,13 @@ class ResourcePressureView : public InstructionView {
   // resource ID.
   llvm::DenseMap<unsigned, unsigned> Resource2VecIndex;
 
-  // Table of resources used by instructions.
-  std::vector<ReleaseAtCycles> ResourceUsage;
+  struct ResourceReleaseAtCycles {
+    unsigned ResourceIdx;
+    ReleaseAtCycles Cycles;
+  };
+  using InstResourceUsage = std::vector<ResourceReleaseAtCycles>;
+  std::vector<InstResourceUsage> ResourceUsage;
+  InstResourceUsage CommonResourceUsage;
   unsigned NumResourceUnits;
 
   void printResourcePressurePerIter(llvm::raw_ostream &OS) const;

{"ResourceIndex", RU.ResourceIdx},
{"ResourceUsage", Usage}}));
};
for (const auto &R : enumerate(ResourceUsage)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can use const auto &[..., ...] : enumerate(ResourceUsage) now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thanks for noticing


ResourceReleaseAtCycles InitValue{0, 0};
auto Generator = [&InitValue]() {
auto Old = InitValue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please spell out the type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@adibiagio
Copy link
Collaborator

NB: I use mca rather in unusual way by providing really big input asm sequences and I don't think there would be any noticeable change for small input files, but I still think it worth fixing.

I also don't think that this will have any significant impact on small input files.
However, it is still worth improving the memory usage for your particular case.

I am happy for this change to go in once Min's comments have been addressed.
Thanks!

Copy link
Collaborator

@adibiagio adibiagio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@asi-sc asi-sc merged commit a5f237f into llvm:main Jan 31, 2025
6 of 8 checks passed
asi-sc added a commit that referenced this pull request Jan 31, 2025
…cyGraphNode (NFC) (#125080)

For each instruction from the input assembly sequence, DependencyGraph
has a dedicated node (DGNode). Outgoing edges (data, resource and memory
dependencies) are tracked as SmallVector<..., 8> for each DGNode in the
graph. However, it's unlikely that a usual input instruction will have
approximately eight dependent instructions. Below is my statistics for
several RISC-V input sequences:

```
Number of  | Number of nodes with
edges      | this # of edges
---------------------------------
         0 | 8239447
         1 | 464252
         2 | 6164
         3 | 6783
         4 | 939
         5 | 500
         6 | 545
         7 | 116
         8 | 2
         9 | 1
        10 | 1
```

Approximately the same distribution is produced by llvm-mca lit tests
for X86, AArch and RISC-V (even modified ones with extra dependencies
added).
On a rather big input asm sequences, the use of SmallVector<..., 8>
dramatically increases memory consumption without any need for it. In my
case, replacing it with SmallVector<...,0> reduces memory usage by ~28%
or ~1700% of input file size (2.2GB in absolute values).

There is no change in execution time, I verified it on mca lit-tests and
on my big test (execution time is ~30s in both cases).

This change was made with the same intention as #124904 and optimizes I
believe quite an unusual scenario. However, if there is no negative
impact on other known scenarios, I'd like to have the change in
llvm-project.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants