-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[MCA] Optimize memory consumption in resource pressure view (NFC) #124904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ResourceUsage is a very sparse table. On large input asm sequences it consumes a lot of memory utilizing only a few percents of it (~4% on my benchmark). Reorganization of ResourceUsage to keep only used fields allows saving up to 18% of total memory use by mca or ~850% of input file size (~1.1GB in absolute values in my case).
|
@llvm/pr-subscribers-tools-llvm-mca Author: Anton Sidorenko (asi-sc) ChangesResourceUsage is a very sparse table. On large input asm sequences it consumes a lot of memory utilizing only a few percents of it (~4% on my benchmark). Reorganization of ResourceUsage to keep only used fields allows saving up to 18% of total memory use by mca or ~850% of input file size (~1.1GB in absolute values in my case). NB: I use mca rather in unusual way by providing really big input asm sequences and I don't think there would be any noticeable change for small input files, but I still think it worth fixing. Full diff: https://github.com/llvm/llvm-project/pull/124904.diff 2 Files Affected:
diff --git a/llvm/tools/llvm-mca/Views/ResourcePressureView.cpp b/llvm/tools/llvm-mca/Views/ResourcePressureView.cpp
index f39350f3b45852..26150849243764 100644
--- a/llvm/tools/llvm-mca/Views/ResourcePressureView.cpp
+++ b/llvm/tools/llvm-mca/Views/ResourcePressureView.cpp
@@ -37,8 +37,16 @@ ResourcePressureView::ResourcePressureView(const llvm::MCSubtargetInfo &sti,
}
NumResourceUnits = R2VIndex;
- ResourceUsage.resize(NumResourceUnits * (getSource().size() + 1));
- std::fill(ResourceUsage.begin(), ResourceUsage.end(), 0.0);
+ ResourceUsage.resize(getSource().size());
+
+ ResourceReleaseAtCycles InitValue{0, 0};
+ auto Generator = [&InitValue]() {
+ auto Old = InitValue;
+ ++InitValue.ResourceIdx;
+ return Old;
+ };
+ std::generate_n(std::back_inserter(CommonResourceUsage), NumResourceUnits,
+ Generator);
}
void ResourcePressureView::onEvent(const HWInstructionEvent &Event) {
@@ -60,8 +68,19 @@ void ResourcePressureView::onEvent(const HWInstructionEvent &Event) {
assert(Resource2VecIndex.contains(RR.first));
unsigned R2VIndex = Resource2VecIndex[RR.first];
R2VIndex += llvm::countr_zero(RR.second);
- ResourceUsage[R2VIndex + NumResourceUnits * SourceIdx] += Use.second;
- ResourceUsage[R2VIndex + NumResourceUnits * Source.size()] += Use.second;
+
+ InstResourceUsage &RU = ResourceUsage[SourceIdx];
+ ResourceReleaseAtCycles NewUsage{R2VIndex, Use.second};
+ auto ResCyclesIt =
+ lower_bound(RU, NewUsage, [](const auto &L, const auto &R) {
+ return L.ResourceIdx < R.ResourceIdx;
+ });
+ if (ResCyclesIt != RU.end() && ResCyclesIt->ResourceIdx == R2VIndex)
+ ResCyclesIt->Cycles += NewUsage.Cycles;
+ else
+ RU.insert(ResCyclesIt, std::move(NewUsage));
+
+ CommonResourceUsage[R2VIndex].Cycles += NewUsage.Cycles;
}
}
@@ -135,10 +154,17 @@ void ResourcePressureView::printResourcePressurePerIter(raw_ostream &OS) const {
ArrayRef<llvm::MCInst> Source = getSource();
const unsigned Executions = LastInstructionIdx / Source.size() + 1;
+ auto UsageEntryEnd = CommonResourceUsage.end();
+ auto UsageEntryIt = CommonResourceUsage.begin();
for (unsigned I = 0, E = NumResourceUnits; I < E; ++I) {
- double Usage = ResourceUsage[I + Source.size() * E];
- printResourcePressure(FOS, Usage / Executions, (I + 1) * 7);
+ double Pressure = 0.0;
+ if (UsageEntryIt != UsageEntryEnd && UsageEntryIt->ResourceIdx == I) {
+ Pressure = UsageEntryIt->Cycles / Executions;
+ ++UsageEntryIt;
+ }
+ printResourcePressure(FOS, Pressure, (I + 1) * 7);
}
+ assert(UsageEntryIt == UsageEntryEnd);
FOS.flush();
OS << Buffer;
@@ -157,11 +183,17 @@ void ResourcePressureView::printResourcePressurePerInst(raw_ostream &OS) const {
ArrayRef<llvm::MCInst> Source = getSource();
const unsigned Executions = LastInstructionIdx / Source.size() + 1;
for (const MCInst &MCI : Source) {
- unsigned BaseEltIdx = InstrIndex * NumResourceUnits;
+ auto UsageEntryEnd = ResourceUsage[InstrIndex].end();
+ auto UsageEntryIt = ResourceUsage[InstrIndex].begin();
for (unsigned J = 0; J < NumResourceUnits; ++J) {
- double Usage = ResourceUsage[J + BaseEltIdx];
- printResourcePressure(FOS, Usage / Executions, (J + 1) * 7);
+ double Pressure = 0.0;
+ if (UsageEntryIt != UsageEntryEnd && UsageEntryIt->ResourceIdx == J) {
+ Pressure = UsageEntryIt->Cycles / Executions;
+ ++UsageEntryIt;
+ }
+ printResourcePressure(FOS, Pressure, (J + 1) * 7);
}
+ assert(UsageEntryIt == UsageEntryEnd);
FOS << printInstructionString(MCI) << '\n';
FOS.flush();
@@ -180,17 +212,24 @@ json::Value ResourcePressureView::toJSON() const {
// non-zero values.
ArrayRef<llvm::MCInst> Source = getSource();
const unsigned Executions = LastInstructionIdx / Source.size() + 1;
- for (const auto &R : enumerate(ResourceUsage)) {
- const ReleaseAtCycles &RU = R.value();
- if (RU.getNumerator() == 0)
- continue;
- unsigned InstructionIndex = R.index() / NumResourceUnits;
- unsigned ResourceIndex = R.index() % NumResourceUnits;
- double Usage = RU / Executions;
+
+ auto AddToJSON = [&ResourcePressureInfo, Executions](
+ const ResourceReleaseAtCycles &RU, unsigned InstIndex) {
+ assert(RU.Cycles.getNumerator() != 0);
+ double Usage = RU.Cycles / Executions;
ResourcePressureInfo.push_back(
- json::Object({{"InstructionIndex", InstructionIndex},
- {"ResourceIndex", ResourceIndex},
+ json::Object({{"InstructionIndex", InstIndex},
+ {"ResourceIndex", RU.ResourceIdx},
{"ResourceUsage", Usage}}));
+ };
+ for (const auto &R : enumerate(ResourceUsage)) {
+ unsigned InstructionIndex = R.index();
+ for (const auto &RU : R.value())
+ AddToJSON(RU, InstructionIndex);
+ }
+ for (const auto &RU : CommonResourceUsage) {
+ if (RU.Cycles.getNumerator() != 0)
+ AddToJSON(RU, Source.size());
}
json::Object JO({{"ResourcePressureInfo", std::move(ResourcePressureInfo)}});
diff --git a/llvm/tools/llvm-mca/Views/ResourcePressureView.h b/llvm/tools/llvm-mca/Views/ResourcePressureView.h
index be8ad04102fd05..3deb9e8a21d5ea 100644
--- a/llvm/tools/llvm-mca/Views/ResourcePressureView.h
+++ b/llvm/tools/llvm-mca/Views/ResourcePressureView.h
@@ -77,8 +77,13 @@ class ResourcePressureView : public InstructionView {
// resource ID.
llvm::DenseMap<unsigned, unsigned> Resource2VecIndex;
- // Table of resources used by instructions.
- std::vector<ReleaseAtCycles> ResourceUsage;
+ struct ResourceReleaseAtCycles {
+ unsigned ResourceIdx;
+ ReleaseAtCycles Cycles;
+ };
+ using InstResourceUsage = std::vector<ResourceReleaseAtCycles>;
+ std::vector<InstResourceUsage> ResourceUsage;
+ InstResourceUsage CommonResourceUsage;
unsigned NumResourceUnits;
void printResourcePressurePerIter(llvm::raw_ostream &OS) const;
|
| {"ResourceIndex", RU.ResourceIdx}, | ||
| {"ResourceUsage", Usage}})); | ||
| }; | ||
| for (const auto &R : enumerate(ResourceUsage)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We can use const auto &[..., ...] : enumerate(ResourceUsage) now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, thanks for noticing
|
|
||
| ResourceReleaseAtCycles InitValue{0, 0}; | ||
| auto Generator = [&InitValue]() { | ||
| auto Old = InitValue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please spell out the type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed
I also don't think that this will have any significant impact on small input files. I am happy for this change to go in once Min's comments have been addressed. |
adibiagio
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks
…cyGraphNode (NFC) (#125080) For each instruction from the input assembly sequence, DependencyGraph has a dedicated node (DGNode). Outgoing edges (data, resource and memory dependencies) are tracked as SmallVector<..., 8> for each DGNode in the graph. However, it's unlikely that a usual input instruction will have approximately eight dependent instructions. Below is my statistics for several RISC-V input sequences: ``` Number of | Number of nodes with edges | this # of edges --------------------------------- 0 | 8239447 1 | 464252 2 | 6164 3 | 6783 4 | 939 5 | 500 6 | 545 7 | 116 8 | 2 9 | 1 10 | 1 ``` Approximately the same distribution is produced by llvm-mca lit tests for X86, AArch and RISC-V (even modified ones with extra dependencies added). On a rather big input asm sequences, the use of SmallVector<..., 8> dramatically increases memory consumption without any need for it. In my case, replacing it with SmallVector<...,0> reduces memory usage by ~28% or ~1700% of input file size (2.2GB in absolute values). There is no change in execution time, I verified it on mca lit-tests and on my big test (execution time is ~30s in both cases). This change was made with the same intention as #124904 and optimizes I believe quite an unusual scenario. However, if there is no negative impact on other known scenarios, I'd like to have the change in llvm-project.
ResourceUsage is a very sparse table. On large input asm sequences it consumes a lot of memory utilizing only a few percents of it (~4% on my benchmark). Reorganization of ResourceUsage to keep only used fields allows saving up to 18% of total memory use by mca or ~850% of input file size (~1.1GB in absolute values in my case).
NB: I use mca rather in unusual way by providing really big input asm sequences and I don't think there would be any noticeable change for small input files, but I still think it worth fixing.