Skip to content

Conversation

@asi-sc
Copy link
Contributor

@asi-sc asi-sc commented Jan 30, 2025

For each instruction from the input assembly sequence, DependencyGraph has a dedicated node (DGNode). Outgoing edges (data, resource and memory dependencies) are tracked as SmallVector<..., 8> for each DGNode in the graph. However, it's unlikely that a usual input instruction will have approximately eight dependent instructions. Below is my statistics for several RISC-V input sequences:

Number of  | Number of nodes with
edges      | this # of edges
---------------------------------
         0 | 8239447
         1 | 464252
         2 | 6164
         3 | 6783
         4 | 939
         5 | 500
         6 | 545
         7 | 116
         8 | 2
         9 | 1
        10 | 1

Approximately the same distribution is produced by llvm-mca lit tests for X86, AArch and RISC-V (even modified ones with extra dependencies added).
On a rather big input asm sequences, the use of SmallVector<..., 8> dramatically increases memory consumption without any need for it. In my case, replacing it with SmallVector<...,0> reduces memory usage by ~28% or ~1700% of input file size (2.2GB in absolute values).

There is no change in execution time, I verified it on mca lit-tests and on my big test (execution time is ~30s in both cases).

This change was made with the same intention as #124904 and optimizes I believe quite an unusual scenario. However, if there is no negative impact on other known scenarios, I'd like to have the change in llvm-project.

…cyGraphNode (NFC)

For each instruction from the input assembly sequence, DependencyGraph has a
dedicated node (DGNode). Outgoing edges (data, resource and memory dependencies)
are tracked as SmallVector<..., 8> for each DGNode in the graph. However, it's
rather unlikely that a usual input instruction will have approximately eight
dependent instructions. Below is my statistics for several RISC-V input
sequences:

Number of  | Number of nodes with
edges      | this # of edges
--------------------------------------------
         0 | 8239447
         1 | 464252
         2 | 6164
         3 | 6783
         4 | 939
         5 | 500
         6 | 545
         7 | 116
         8 | 2
         9 | 1
        10 | 1

Approximately the same distribution is produced by llvm-mca lit tests (even
modified ones with extra dependencies added).
On a rather big input asm sequences, the use of SmallVector<..., 8> dramatically
increases memory consumption without any need for it. In my case, replacing it
with SmallVector<...,0> reduces memory usage by ~28% or ~1700% of input file size
(2.2GB in absolute values).

There is no change in execution time, I verified it on mca lit-tests and on my big
test (execution time is ~30s in both cases).
@llvmbot
Copy link
Member

llvmbot commented Jan 30, 2025

@llvm/pr-subscribers-tools-llvm-mca

Author: Anton Sidorenko (asi-sc)

Changes

For each instruction from the input assembly sequence, DependencyGraph has a dedicated node (DGNode). Outgoing edges (data, resource and memory dependencies) are tracked as SmallVector<..., 8> for each DGNode in the graph. However, it's unlikely that a usual input instruction will have approximately eight dependent instructions. Below is my statistics for several RISC-V input sequences:

Number of  | Number of nodes with
edges      | this # of edges
---------------------------------
         0 | 8239447
         1 | 464252
         2 | 6164
         3 | 6783
         4 | 939
         5 | 500
         6 | 545
         7 | 116
         8 | 2
         9 | 1
        10 | 1

Approximately the same distribution is produced by llvm-mca lit tests for X86, AArch and RISC-V (even modified ones with extra dependencies added).
On a rather big input asm sequences, the use of SmallVector<..., 8> dramatically increases memory consumption without any need for it. In my case, replacing it with SmallVector<...,0> reduces memory usage by ~28% or ~1700% of input file size (2.2GB in absolute values).

There is no change in execution time, I verified it on mca lit-tests and on my big test (execution time is ~30s in both cases).

This change was made with the same intention as #124904 and optimizes I believe quite an unusual scenario. However, if there is no negative impact on other known scenarios, I'd like to have the change in llvm-project.


Full diff: https://github.com/llvm/llvm-project/pull/125080.diff

1 Files Affected:

  • (modified) llvm/tools/llvm-mca/Views/BottleneckAnalysis.h (+1-1)
diff --git a/llvm/tools/llvm-mca/Views/BottleneckAnalysis.h b/llvm/tools/llvm-mca/Views/BottleneckAnalysis.h
index 529090cf543fc4..2621efb0413ae2 100644
--- a/llvm/tools/llvm-mca/Views/BottleneckAnalysis.h
+++ b/llvm/tools/llvm-mca/Views/BottleneckAnalysis.h
@@ -228,7 +228,7 @@ class DependencyGraph {
     unsigned Depth;
 
     DependencyEdge CriticalPredecessor;
-    SmallVector<DependencyEdge, 8> OutgoingEdges;
+    SmallVector<DependencyEdge, 0> OutgoingEdges;
   };
   SmallVector<DGNode, 16> Nodes;
 

Copy link
Member

@mshockwave mshockwave left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Number of | Number of nodes with
edges | this # of edges

     0 | 8239447
     1 | 464252
     2 | 6164
     3 | 6783
     4 | 939
     5 | 500
     6 | 545
     7 | 116
     8 | 2
     9 | 1
    10 | 1

This is an interesting distribution. If my math is correct ~95% of the nodes have no out-going edges. Thus I think this is a reasonable change.

@adibiagio
Copy link
Collaborator

Thanks for this patch!

I left a minor comment. Otherwise, the change looks good to me.

@mshockwave
Copy link
Member

Thanks for this patch!

I left a minor comment. Otherwise, the change looks good to me.

I don't see any comment, I think you might have forgot to press the submit button :-)


DependencyEdge CriticalPredecessor;
SmallVector<DependencyEdge, 8> OutgoingEdges;
SmallVector<DependencyEdge, 0> OutgoingEdges;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is useful to have a comment (not more than a couple of lines) explaining why zero was used for the number of inline elements. Otherwise it would be unclear for the reader why we chose that number in this case.

@asi-sc asi-sc merged commit 9cf8ee9 into llvm:main Jan 31, 2025
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants