[AArch64] Update IssueWidth for Neoverse V1, N1, N3 #154495

simonwallis2 · 2025-08-20T09:30:35Z

Recently the IssueWidth in the scheduling model was reduced for Neoverse-V2 and N2. This patch does the same for Neoverse-V1, N1 and N3.

On Neoverse-V1, various values of IssueWidth (15, 8, 7, 6, 5) were tried with runs of various workloads.
The highest overall geomean score was achieved with an issue width of 8. No significant regressions were noted.

On Neoverse-N1, various values of IssueWidth (8, 6, 5, 4, 3) were tried with runs of various workloads.
The highest overall geomean score was achieved with an issue width of 3. No significant regressions were noted.

On Neoverse-N3, it makes sense to do exactly the same as was done for N2. It is proposed to use an issue width of 5.

Related V2 PR: #142565
Related N2 PR: #145717

Recently the IssueWidth in the scheduling model was reduced for Neoverse-V2 and N2. This patch does the same for Neoverse-V1, N1 and N3. On Neoverse-V1, various values of IssueWidth (15, 8, 7, 6, 5) were tried with runs of various workloads. The highest overall geomean score was achieved with an issue width of 8. No significant regressions were noted. On Neoverse-N1, various values of IssueWidth (8, 6, 5, 4, 3) were tried with runs of various workloads. The highest overall geomean score was achieved with an issue width of 3. No significant regressions were noted. On Neoverse-N3, it makes sense to do exactly the same as was done for N2. It is proposed to use an issue width of 5. Related V2 PR: llvm#142565 Related N2 PR: llvm#145717 Change-Id: I7d17359e7a4ae7a527e321a208bdf96893fe50f9

llvmbot · 2025-08-20T09:31:08Z

@llvm/pr-subscribers-backend-aarch64

Author: Simon Wallis (simonwallis2)

Changes

Recently the IssueWidth in the scheduling model was reduced for Neoverse-V2 and N2. This patch does the same for Neoverse-V1, N1 and N3.

On Neoverse-V1, various values of IssueWidth (15, 8, 7, 6, 5) were tried with runs of various workloads.
The highest overall geomean score was achieved with an issue width of 8. No significant regressions were noted.

On Neoverse-N1, various values of IssueWidth (8, 6, 5, 4, 3) were tried with runs of various workloads.
The highest overall geomean score was achieved with an issue width of 3. No significant regressions were noted.

On Neoverse-N3, it makes sense to do exactly the same as was done for N2. It is proposed to use an issue width of 5.

Related V2 PR: #142565
Related N2 PR: #145717

Patch is 964.04 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/154495.diff

12 Files Affected:

(modified) llvm/lib/Target/AArch64/AArch64SchedNeoverseN1.td (+1-1)
(modified) llvm/lib/Target/AArch64/AArch64SchedNeoverseN3.td (+1-1)
(modified) llvm/lib/Target/AArch64/AArch64SchedNeoverseV1.td (+1-1)
(modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/N1-writeback.s (+2214-2203)
(modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/N3-writeback.s (+1916-1906)
(modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-basic-instructions.s (+1-1)
(modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-clear-upper-regs.s (+16-16)
(modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-forwarding.s (+123-123)
(modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-misc-instructions.s (+15-15)
(modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-sve-instructions.s (+9-9)
(modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-writeback.s (+1457-1456)
(modified) llvm/test/tools/llvm-mca/AArch64/Neoverse/V1-zero-dependency.s (+1-1)

diff --git a/llvm/lib/Target/AArch64/AArch64SchedNeoverseN1.td b/llvm/lib/Target/AArch64/AArch64SchedNeoverseN1.td
index 524fa33f498bb..9da42322dd10d 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedNeoverseN1.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedNeoverseN1.td
@@ -15,7 +15,7 @@
 //===----------------------------------------------------------------------===//
 
 def NeoverseN1Model : SchedMachineModel {
-  let IssueWidth            =   8; // Maximum micro-ops dispatch rate.
+  let IssueWidth            =   3; // Maximum micro-ops dispatch rate.
   let MicroOpBufferSize     = 128; // NOTE: Copied from Cortex-A76.
   let LoadLatency           =   4; // Optimistic load latency.
   let MispredictPenalty     =  11; // Cycles cost of branch mispredicted.
diff --git a/llvm/lib/Target/AArch64/AArch64SchedNeoverseN3.td b/llvm/lib/Target/AArch64/AArch64SchedNeoverseN3.td
index e44d40f8d7020..cd0d8a9186d5b 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedNeoverseN3.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedNeoverseN3.td
@@ -11,7 +11,7 @@
 //===----------------------------------------------------------------------===//
 
 def NeoverseN3Model : SchedMachineModel {
-    let IssueWidth            =  10; // Micro-ops dispatched at a time.
+    let IssueWidth            =   5; // Micro-ops dispatched at a time.
     let MicroOpBufferSize     = 160; // Entries in micro-op re-order buffer. NOTE: Copied from N2.
     let LoadLatency           =   4; // Optimistic load latency.
     let MispredictPenalty     =  10; // Extra cycles for mispredicted branch. NOTE: Copied from N2.
diff --git a/llvm/lib/Target/AArch64/AArch64SchedNeoverseV1.td b/llvm/lib/Target/AArch64/AArch64SchedNeoverseV1.td
index 44625a2034d9d..b78c5e90d6338 100644
--- a/llvm/lib/Target/AArch64/AArch64SchedNeoverseV1.td
+++ b/llvm/lib/Target/AArch64/AArch64SchedNeoverseV1.td
@@ -19,7 +19,7 @@
 //===----------------------------------------------------------------------===//
 
 def NeoverseV1Model : SchedMachineModel {
-  let IssueWidth            =  15; // Maximum micro-ops dispatch rate.
+  let IssueWidth            =   8; // Maximum micro-ops dispatch rate.
   let MicroOpBufferSize     = 256; // Micro-op re-order buffer.
   let LoadLatency           =   4; // Optimistic load latency.
   let MispredictPenalty     =  11; // Cycles cost of branch mispredicted.
diff --git a/llvm/test/tools/llvm-mca/AArch64/Neoverse/N1-writeback.s b/llvm/test/tools/llvm-mca/AArch64/Neoverse/N1-writeback.s
index 8fe21167a5bd3..127c8c30fc2c6 100644
--- a/llvm/test/tools/llvm-mca/AArch64/Neoverse/N1-writeback.s
+++ b/llvm/test/tools/llvm-mca/AArch64/Neoverse/N1-writeback.s
@@ -1165,10 +1165,10 @@ add x0, x27, 1
 # CHECK-NEXT: Total Cycles:      507
 # CHECK-NEXT: Total uOps:        1500
 
-# CHECK:      Dispatch Width:    8
+# CHECK:      Dispatch Width:    3
 # CHECK-NEXT: uOps Per Cycle:    2.96
 # CHECK-NEXT: IPC:               1.97
-# CHECK-NEXT: Block RThroughput: 3.3
+# CHECK-NEXT: Block RThroughput: 5.0
 
 # CHECK:      Timeline view:
 # CHECK-NEXT:                     01
@@ -1176,14 +1176,14 @@ add x0, x27, 1
 
 # CHECK:      [0,0]     DeeeeeER  ..   ld1	{ v1.1d }, [x27], #8
 # CHECK-NEXT: [0,1]     D=eE---R  ..   add	x0, x27, #1
-# CHECK-NEXT: [0,2]     D=eeeeeER ..   ld1	{ v1.2d }, [x27], #16
-# CHECK-NEXT: [0,3]     D==eE---R ..   add	x0, x27, #1
-# CHECK-NEXT: [0,4]     D==eeeeeER..   ld1	{ v1.2s }, [x27], #8
-# CHECK-NEXT: [0,5]     .D==eE---R..   add	x0, x27, #1
-# CHECK-NEXT: [0,6]     .D==eeeeeER.   ld1	{ v1.4h }, [x27], #8
-# CHECK-NEXT: [0,7]     .D===eE---R.   add	x0, x27, #1
-# CHECK-NEXT: [0,8]     .D===eeeeeER   ld1	{ v1.4s }, [x27], #16
-# CHECK-NEXT: [0,9]     .D====eE---R   add	x0, x27, #1
+# CHECK-NEXT: [0,2]     .DeeeeeER ..   ld1	{ v1.2d }, [x27], #16
+# CHECK-NEXT: [0,3]     .D=eE---R ..   add	x0, x27, #1
+# CHECK-NEXT: [0,4]     . DeeeeeER..   ld1	{ v1.2s }, [x27], #8
+# CHECK-NEXT: [0,5]     . D=eE---R..   add	x0, x27, #1
+# CHECK-NEXT: [0,6]     .  DeeeeeER.   ld1	{ v1.4h }, [x27], #8
+# CHECK-NEXT: [0,7]     .  D=eE---R.   add	x0, x27, #1
+# CHECK-NEXT: [0,8]     .   DeeeeeER   ld1	{ v1.4s }, [x27], #16
+# CHECK-NEXT: [0,9]     .   D=eE---R   add	x0, x27, #1
 
 # CHECK:      Average Wait times (based on the timeline view):
 # CHECK-NEXT: [0]: Executions
@@ -1194,15 +1194,15 @@ add x0, x27, 1
 # CHECK:            [0]    [1]    [2]    [3]
 # CHECK-NEXT: 0.     1     1.0    1.0    0.0       ld1	{ v1.1d }, [x27], #8
 # CHECK-NEXT: 1.     1     2.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 2.     1     2.0    0.0    0.0       ld1	{ v1.2d }, [x27], #16
-# CHECK-NEXT: 3.     1     3.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 4.     1     3.0    0.0    0.0       ld1	{ v1.2s }, [x27], #8
-# CHECK-NEXT: 5.     1     3.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 6.     1     3.0    0.0    0.0       ld1	{ v1.4h }, [x27], #8
-# CHECK-NEXT: 7.     1     4.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 8.     1     4.0    0.0    0.0       ld1	{ v1.4s }, [x27], #16
-# CHECK-NEXT: 9.     1     5.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT:        1     3.0    0.1    1.5       <total>
+# CHECK-NEXT: 2.     1     1.0    0.0    0.0       ld1	{ v1.2d }, [x27], #16
+# CHECK-NEXT: 3.     1     2.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 4.     1     1.0    0.0    0.0       ld1	{ v1.2s }, [x27], #8
+# CHECK-NEXT: 5.     1     2.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 6.     1     1.0    0.0    0.0       ld1	{ v1.4h }, [x27], #8
+# CHECK-NEXT: 7.     1     2.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 8.     1     1.0    0.0    0.0       ld1	{ v1.4s }, [x27], #16
+# CHECK-NEXT: 9.     1     2.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT:        1     1.5    0.1    1.5       <total>
 
 # CHECK:      [1] Code Region - G02
 
@@ -1211,10 +1211,10 @@ add x0, x27, 1
 # CHECK-NEXT: Total Cycles:      507
 # CHECK-NEXT: Total uOps:        1500
 
-# CHECK:      Dispatch Width:    8
+# CHECK:      Dispatch Width:    3
 # CHECK-NEXT: uOps Per Cycle:    2.96
 # CHECK-NEXT: IPC:               1.97
-# CHECK-NEXT: Block RThroughput: 3.3
+# CHECK-NEXT: Block RThroughput: 5.0
 
 # CHECK:      Timeline view:
 # CHECK-NEXT:                     01
@@ -1222,14 +1222,14 @@ add x0, x27, 1
 
 # CHECK:      [0,0]     DeeeeeER  ..   ld1	{ v1.8b }, [x27], #8
 # CHECK-NEXT: [0,1]     D=eE---R  ..   add	x0, x27, #1
-# CHECK-NEXT: [0,2]     D=eeeeeER ..   ld1	{ v1.8h }, [x27], #16
-# CHECK-NEXT: [0,3]     D==eE---R ..   add	x0, x27, #1
-# CHECK-NEXT: [0,4]     D==eeeeeER..   ld1	{ v1.16b }, [x27], #16
-# CHECK-NEXT: [0,5]     .D==eE---R..   add	x0, x27, #1
-# CHECK-NEXT: [0,6]     .D==eeeeeER.   ld1	{ v1.1d }, [x27], x28
-# CHECK-NEXT: [0,7]     .D===eE---R.   add	x0, x27, #1
-# CHECK-NEXT: [0,8]     .D===eeeeeER   ld1	{ v1.2d }, [x27], x28
-# CHECK-NEXT: [0,9]     .D====eE---R   add	x0, x27, #1
+# CHECK-NEXT: [0,2]     .DeeeeeER ..   ld1	{ v1.8h }, [x27], #16
+# CHECK-NEXT: [0,3]     .D=eE---R ..   add	x0, x27, #1
+# CHECK-NEXT: [0,4]     . DeeeeeER..   ld1	{ v1.16b }, [x27], #16
+# CHECK-NEXT: [0,5]     . D=eE---R..   add	x0, x27, #1
+# CHECK-NEXT: [0,6]     .  DeeeeeER.   ld1	{ v1.1d }, [x27], x28
+# CHECK-NEXT: [0,7]     .  D=eE---R.   add	x0, x27, #1
+# CHECK-NEXT: [0,8]     .   DeeeeeER   ld1	{ v1.2d }, [x27], x28
+# CHECK-NEXT: [0,9]     .   D=eE---R   add	x0, x27, #1
 
 # CHECK:      Average Wait times (based on the timeline view):
 # CHECK-NEXT: [0]: Executions
@@ -1240,15 +1240,15 @@ add x0, x27, 1
 # CHECK:            [0]    [1]    [2]    [3]
 # CHECK-NEXT: 0.     1     1.0    1.0    0.0       ld1	{ v1.8b }, [x27], #8
 # CHECK-NEXT: 1.     1     2.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 2.     1     2.0    0.0    0.0       ld1	{ v1.8h }, [x27], #16
-# CHECK-NEXT: 3.     1     3.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 4.     1     3.0    0.0    0.0       ld1	{ v1.16b }, [x27], #16
-# CHECK-NEXT: 5.     1     3.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 6.     1     3.0    0.0    0.0       ld1	{ v1.1d }, [x27], x28
-# CHECK-NEXT: 7.     1     4.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 8.     1     4.0    0.0    0.0       ld1	{ v1.2d }, [x27], x28
-# CHECK-NEXT: 9.     1     5.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT:        1     3.0    0.1    1.5       <total>
+# CHECK-NEXT: 2.     1     1.0    0.0    0.0       ld1	{ v1.8h }, [x27], #16
+# CHECK-NEXT: 3.     1     2.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 4.     1     1.0    0.0    0.0       ld1	{ v1.16b }, [x27], #16
+# CHECK-NEXT: 5.     1     2.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 6.     1     1.0    0.0    0.0       ld1	{ v1.1d }, [x27], x28
+# CHECK-NEXT: 7.     1     2.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 8.     1     1.0    0.0    0.0       ld1	{ v1.2d }, [x27], x28
+# CHECK-NEXT: 9.     1     2.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT:        1     1.5    0.1    1.5       <total>
 
 # CHECK:      [2] Code Region - G03
 
@@ -1257,10 +1257,10 @@ add x0, x27, 1
 # CHECK-NEXT: Total Cycles:      507
 # CHECK-NEXT: Total uOps:        1500
 
-# CHECK:      Dispatch Width:    8
+# CHECK:      Dispatch Width:    3
 # CHECK-NEXT: uOps Per Cycle:    2.96
 # CHECK-NEXT: IPC:               1.97
-# CHECK-NEXT: Block RThroughput: 3.3
+# CHECK-NEXT: Block RThroughput: 5.0
 
 # CHECK:      Timeline view:
 # CHECK-NEXT:                     01
@@ -1268,14 +1268,14 @@ add x0, x27, 1
 
 # CHECK:      [0,0]     DeeeeeER  ..   ld1	{ v1.2s }, [x27], x28
 # CHECK-NEXT: [0,1]     D=eE---R  ..   add	x0, x27, #1
-# CHECK-NEXT: [0,2]     D=eeeeeER ..   ld1	{ v1.4h }, [x27], x28
-# CHECK-NEXT: [0,3]     D==eE---R ..   add	x0, x27, #1
-# CHECK-NEXT: [0,4]     D==eeeeeER..   ld1	{ v1.4s }, [x27], x28
-# CHECK-NEXT: [0,5]     .D==eE---R..   add	x0, x27, #1
-# CHECK-NEXT: [0,6]     .D==eeeeeER.   ld1	{ v1.8b }, [x27], x28
-# CHECK-NEXT: [0,7]     .D===eE---R.   add	x0, x27, #1
-# CHECK-NEXT: [0,8]     .D===eeeeeER   ld1	{ v1.8h }, [x27], x28
-# CHECK-NEXT: [0,9]     .D====eE---R   add	x0, x27, #1
+# CHECK-NEXT: [0,2]     .DeeeeeER ..   ld1	{ v1.4h }, [x27], x28
+# CHECK-NEXT: [0,3]     .D=eE---R ..   add	x0, x27, #1
+# CHECK-NEXT: [0,4]     . DeeeeeER..   ld1	{ v1.4s }, [x27], x28
+# CHECK-NEXT: [0,5]     . D=eE---R..   add	x0, x27, #1
+# CHECK-NEXT: [0,6]     .  DeeeeeER.   ld1	{ v1.8b }, [x27], x28
+# CHECK-NEXT: [0,7]     .  D=eE---R.   add	x0, x27, #1
+# CHECK-NEXT: [0,8]     .   DeeeeeER   ld1	{ v1.8h }, [x27], x28
+# CHECK-NEXT: [0,9]     .   D=eE---R   add	x0, x27, #1
 
 # CHECK:      Average Wait times (based on the timeline view):
 # CHECK-NEXT: [0]: Executions
@@ -1286,42 +1286,42 @@ add x0, x27, 1
 # CHECK:            [0]    [1]    [2]    [3]
 # CHECK-NEXT: 0.     1     1.0    1.0    0.0       ld1	{ v1.2s }, [x27], x28
 # CHECK-NEXT: 1.     1     2.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 2.     1     2.0    0.0    0.0       ld1	{ v1.4h }, [x27], x28
-# CHECK-NEXT: 3.     1     3.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 4.     1     3.0    0.0    0.0       ld1	{ v1.4s }, [x27], x28
-# CHECK-NEXT: 5.     1     3.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 6.     1     3.0    0.0    0.0       ld1	{ v1.8b }, [x27], x28
-# CHECK-NEXT: 7.     1     4.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 8.     1     4.0    0.0    0.0       ld1	{ v1.8h }, [x27], x28
-# CHECK-NEXT: 9.     1     5.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT:        1     3.0    0.1    1.5       <total>
+# CHECK-NEXT: 2.     1     1.0    0.0    0.0       ld1	{ v1.4h }, [x27], x28
+# CHECK-NEXT: 3.     1     2.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 4.     1     1.0    0.0    0.0       ld1	{ v1.4s }, [x27], x28
+# CHECK-NEXT: 5.     1     2.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 6.     1     1.0    0.0    0.0       ld1	{ v1.8b }, [x27], x28
+# CHECK-NEXT: 7.     1     2.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 8.     1     1.0    0.0    0.0       ld1	{ v1.8h }, [x27], x28
+# CHECK-NEXT: 9.     1     2.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT:        1     1.5    0.1    1.5       <total>
 
 # CHECK:      [3] Code Region - G04
 
 # CHECK:      Iterations:        100
 # CHECK-NEXT: Instructions:      1000
-# CHECK-NEXT: Total Cycles:      507
+# CHECK-NEXT: Total Cycles:      906
 # CHECK-NEXT: Total uOps:        1900
 
-# CHECK:      Dispatch Width:    8
-# CHECK-NEXT: uOps Per Cycle:    3.75
-# CHECK-NEXT: IPC:               1.97
-# CHECK-NEXT: Block RThroughput: 4.5
+# CHECK:      Dispatch Width:    3
+# CHECK-NEXT: uOps Per Cycle:    2.10
+# CHECK-NEXT: IPC:               1.10
+# CHECK-NEXT: Block RThroughput: 6.3
 
 # CHECK:      Timeline view:
-# CHECK-NEXT:                     01
+# CHECK-NEXT:                     01234
 # CHECK-NEXT: Index     0123456789
 
-# CHECK:      [0,0]     DeeeeeER  ..   ld1	{ v1.16b }, [x27], x28
-# CHECK-NEXT: [0,1]     D=eE---R  ..   add	x0, x27, #1
-# CHECK-NEXT: [0,2]     D=eeeeeER ..   ld1	{ v1.1d, v2.1d }, [x27], #16
-# CHECK-NEXT: [0,3]     D==eE---R ..   add	x0, x27, #1
-# CHECK-NEXT: [0,4]     .D=eeeeeER..   ld1	{ v1.2d, v2.2d }, [x27], #32
-# CHECK-NEXT: [0,5]     .D==eE---R..   add	x0, x27, #1
-# CHECK-NEXT: [0,6]     .D==eeeeeER.   ld1	{ v1.2s, v2.2s }, [x27], #16
-# CHECK-NEXT: [0,7]     .D===eE---R.   add	x0, x27, #1
-# CHECK-NEXT: [0,8]     . D==eeeeeER   ld1	{ v1.4h, v2.4h }, [x27], #16
-# CHECK-NEXT: [0,9]     . D===eE---R   add	x0, x27, #1
+# CHECK:      [0,0]     DeeeeeER  .   .   ld1	{ v1.16b }, [x27], x28
+# CHECK-NEXT: [0,1]     D=eE---R  .   .   add	x0, x27, #1
+# CHECK-NEXT: [0,2]     .DeeeeeER .   .   ld1	{ v1.1d, v2.1d }, [x27], #16
+# CHECK-NEXT: [0,3]     . DeE---R .   .   add	x0, x27, #1
+# CHECK-NEXT: [0,4]     .  DeeeeeER   .   ld1	{ v1.2d, v2.2d }, [x27], #32
+# CHECK-NEXT: [0,5]     .   DeE---R   .   add	x0, x27, #1
+# CHECK-NEXT: [0,6]     .    DeeeeeER .   ld1	{ v1.2s, v2.2s }, [x27], #16
+# CHECK-NEXT: [0,7]     .    .DeE---R .   add	x0, x27, #1
+# CHECK-NEXT: [0,8]     .    . DeeeeeER   ld1	{ v1.4h, v2.4h }, [x27], #16
+# CHECK-NEXT: [0,9]     .    .  DeE---R   add	x0, x27, #1
 
 # CHECK:      Average Wait times (based on the timeline view):
 # CHECK-NEXT: [0]: Executions
@@ -1332,42 +1332,42 @@ add x0, x27, 1
 # CHECK:            [0]    [1]    [2]    [3]
 # CHECK-NEXT: 0.     1     1.0    1.0    0.0       ld1	{ v1.16b }, [x27], x28
 # CHECK-NEXT: 1.     1     2.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 2.     1     2.0    0.0    0.0       ld1	{ v1.1d, v2.1d }, [x27], #16
-# CHECK-NEXT: 3.     1     3.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 4.     1     2.0    0.0    0.0       ld1	{ v1.2d, v2.2d }, [x27], #32
-# CHECK-NEXT: 5.     1     3.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 6.     1     3.0    0.0    0.0       ld1	{ v1.2s, v2.2s }, [x27], #16
-# CHECK-NEXT: 7.     1     4.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 8.     1     3.0    0.0    0.0       ld1	{ v1.4h, v2.4h }, [x27], #16
-# CHECK-NEXT: 9.     1     4.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT:        1     2.7    0.1    1.5       <total>
+# CHECK-NEXT: 2.     1     1.0    0.0    0.0       ld1	{ v1.1d, v2.1d }, [x27], #16
+# CHECK-NEXT: 3.     1     1.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 4.     1     1.0    1.0    0.0       ld1	{ v1.2d, v2.2d }, [x27], #32
+# CHECK-NEXT: 5.     1     1.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 6.     1     1.0    1.0    0.0       ld1	{ v1.2s, v2.2s }, [x27], #16
+# CHECK-NEXT: 7.     1     1.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 8.     1     1.0    1.0    0.0       ld1	{ v1.4h, v2.4h }, [x27], #16
+# CHECK-NEXT: 9.     1     1.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT:        1     1.1    0.4    1.5       <total>
 
 # CHECK:      [4] Code Region - G05
 
 # CHECK:      Iterations:        100
 # CHECK-NEXT: Instructions:      1000
-# CHECK-NEXT: Total Cycles:      507
+# CHECK-NEXT: Total Cycles:      1006
 # CHECK-NEXT: Total uOps:        2000
 
-# CHECK:      Dispatch Width:    8
-# CHECK-NEXT: uOps Per Cycle:    3.94
-# CHECK-NEXT: IPC:               1.97
-# CHECK-NEXT: Block RThroughput: 5.0
+# CHECK:      Dispatch Width:    3
+# CHECK-NEXT: uOps Per Cycle:    1.99
+# CHECK-NEXT: IPC:               0.99
+# CHECK-NEXT: Block RThroughput: 6.7
 
 # CHECK:      Timeline view:
-# CHECK-NEXT:                     01
+# CHECK-NEXT:                     012345
 # CHECK-NEXT: Index     0123456789
 
-# CHECK:      [0,0]     DeeeeeER  ..   ld1	{ v1.4s, v2.4s }, [x27], #32
-# CHECK-NEXT: [0,1]     D=eE---R  ..   add	x0, x27, #1
-# CHECK-NEXT: [0,2]     D=eeeeeER ..   ld1	{ v1.8b, v2.8b }, [x27], #16
-# CHECK-NEXT: [0,3]     D==eE---R ..   add	x0, x27, #1
-# CHECK-NEXT: [0,4]     .D=eeeeeER..   ld1	{ v1.8h, v2.8h }, [x27], #32
-# CHECK-NEXT: [0,5]     .D==eE---R..   add	x0, x27, #1
-# CHECK-NEXT: [0,6]     .D==eeeeeER.   ld1	{ v1.16b, v2.16b }, [x27], #32
-# CHECK-NEXT: [0,7]     .D===eE---R.   add	x0, x27, #1
-# CHECK-NEXT: [0,8]     . D==eeeeeER   ld1	{ v1.1d, v2.1d }, [x27], x28
-# CHECK-NEXT: [0,9]     . D===eE---R   add	x0, x27, #1
+# CHECK:      [0,0]     DeeeeeER  .    .   ld1	{ v1.4s, v2.4s }, [x27], #32
+# CHECK-NEXT: [0,1]     .DeE---R  .    .   add	x0, x27, #1
+# CHECK-NEXT: [0,2]     . DeeeeeER.    .   ld1	{ v1.8b, v2.8b }, [x27], #16
+# CHECK-NEXT: [0,3]     .  DeE---R.    .   add	x0, x27, #1
+# CHECK-NEXT: [0,4]     .   DeeeeeER   .   ld1	{ v1.8h, v2.8h }, [x27], #32
+# CHECK-NEXT: [0,5]     .    DeE---R   .   add	x0, x27, #1
+# CHECK-NEXT: [0,6]     .    .DeeeeeER .   ld1	{ v1.16b, v2.16b }, [x27], #32
+# CHECK-NEXT: [0,7]     .    . DeE---R .   add	x0, x27, #1
+# CHECK-NEXT: [0,8]     .    .  DeeeeeER   ld1	{ v1.1d, v2.1d }, [x27], x28
+# CHECK-NEXT: [0,9]     .    .   DeE---R   add	x0, x27, #1
 
 # CHECK:      Average Wait times (based on the timeline view):
 # CHECK-NEXT: [0]: Executions
@@ -1377,43 +1377,43 @@ add x0, x27, 1
 
 # CHECK:            [0]    [1]    [2]    [3]
 # CHECK-NEXT: 0.     1     1.0    1.0    0.0       ld1	{ v1.4s, v2.4s }, [x27], #32
-# CHECK-NEXT: 1.     1     2.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 2.     1     2.0    0.0    0.0       ld1	{ v1.8b, v2.8b }, [x27], #16
-# CHECK-NEXT: 3.     1     3.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 4.     1     2.0    0.0    0.0       ld1	{ v1.8h, v2.8h }, [x27], #32
-# CHECK-NEXT: 5.     1     3.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 6.     1     3.0    0.0    0.0       ld1	{ v1.16b, v2.16b }, [x27], #32
-# CHECK-NEXT: 7.     1     4.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT: 8.     1     3.0    0.0    0.0       ld1	{ v1.1d, v2.1d }, [x27], x28
-# CHECK-NEXT: 9.     1     4.0    0.0    3.0       add	x0, x27, #1
-# CHECK-NEXT:        1     2.7    0.1    1.5       <total>
+# CHECK-NEXT: 1.     1     1.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 2.     1     1.0    1.0    0.0       ld1	{ v1.8b, v2.8b }, [x27], #16
+# CHECK-NEXT: 3.     1     1.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 4.     1     1.0    1.0    0.0       ld1	{ v1.8h, v2.8h }, [x27], #32
+# CHECK-NEXT: 5.     1     1.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 6.     1     1.0    1.0    0.0       ld1	{ v1.16b, v2.16b }, [x27], #32
+# CHECK-NEXT: 7.     1     1.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT: 8.     1     1.0    1.0    0.0       ld1	{ v1.1d, v2.1d }, [x27], x28
+# CHECK-NEXT: 9.     1     1.0    0.0    3.0       add	x0, x27, #1
+# CHECK-NEXT:        1     1.0    0.5    1.5       <total>
 
 # CHECK:      [5] Code Region - G06
 
 # CHECK:      Iterations:        100
 # CHECK-NEXT: Instructions:      1000
-# CHECK-NEXT: Total Cycles:      507
+# CHECK-NEXT: Total Cycles:      1006
 # CHECK-NEXT: Total uOps:        2000
 
-# CHECK:      Dispatch Width:    8
-# CHECK-NEXT: uOps Per Cycle:    3.94
-# CHECK-NEXT: IPC:               1.97
-# CHECK-NEXT: Block RThroughput: 5.0
+# CHECK:      Dispatch Width:    3
+# CHECK-NEXT: uOps Per Cycle:    1.99
+# CHECK-NEXT: IPC:               0.99
+# CHECK-NEXT: Block RThroughput: 6.7
 
 # CHECK:      Timeline view:
-# CHECK-NEXT:                     01
+# CHECK-NEXT:                     012345
 # CHECK-NEXT: Index     0123456789
 
-# CHECK:      [0,0]     DeeeeeER  ..   ld1	{ v1.2d, v2.2d }, [x27], x28
-# CHECK-NEXT: [0,1]     D=eE---R  .. ...
[truncated]

davemgreen · 2025-08-20T11:35:34Z

This sounds sensible. What was the difference between 3 and 4 for neoverse-n1? A value of 4 would help match the SWOG (and #136374), and if there isn't much in it would make a better value I believe.

simonwallis2 · 2025-08-20T12:42:54Z

This sounds sensible. What was the difference between 3 and 4 for neoverse-n1? A value of 4 would help match the SWOG (and #136374), and if there isn't much in it would make a better value I believe.

For neoverse-n1, with a value of 4 I saw some small amount of noise but no overall improvement over the original value of 8.

c-rhodes · 2025-08-20T14:54:41Z

llvm/lib/Target/AArch64/AArch64SchedNeoverseN1.td


 def NeoverseN1Model : SchedMachineModel {
-  let IssueWidth            =   8; // Maximum micro-ops dispatch rate.
+  let IssueWidth            =   3; // Maximum micro-ops dispatch rate.


can we update these comments to match the v2 scheduler following this discussion: #142565 (comment)

Sure. Done for AArch64SchedNeoverseN1.td and AArch64SchedNeoverseV1.td.

Recently the IssueWidth in the scheduling model was reduced for Neoverse-V2 and N2. This patch does the same for Neoverse-V1, N1 and N3. On Neoverse-V1, various values of IssueWidth (15, 8, 7, 6, 5) were tried with runs of various workloads. The highest overall geomean score was achieved with an issue width of 8. No significant regressions were noted. On Neoverse-N1, various values of IssueWidth (8, 6, 5, 4, 3) were tried with runs of various workloads. The highest overall geomean score was achieved with an issue width of 3. No significant regressions were noted. On Neoverse-N3, it makes sense to do exactly the same as was done for N2. It is proposed to use an issue width of 5. Related V2 PR: llvm#142565 Related N2 PR: llvm#145717 Change-Id: I7d1e71e0470407bfd49b535a1068ff52235e9792

david-arm · 2025-08-21T09:10:37Z

Thanks for this! In the commit message it says The highest overall geomean score was achieved, but I think it might be helpful for anyone attempting to reproduce this to know which score or benchmarks were analysed.

davemgreen

I would go with a IssueWidth of 4 for NeoverseN1, as I believe that matches reality better, but either way this LGTM.

c-rhodes

Also LGTM, cheers

vvereschaka · 2025-09-17T17:39:11Z

@simonwallis2 ,

https://lab.llvm.org/buildbot/#/builders/193/builds/10666

broken tests on the cross builder:

******************** TEST 'LLVM :: tools/llvm-mca/AArch64/Neoverse/V1-misc-instructions.s' FAILED ********************
Exit Code: 1
Command Output (stdout):
--
# RUN: at line 2
c:\buildbot\as-builder-2\x-aarch64\build\bin\llvm-mca.exe -mtriple=aarch64 -mcpu=neoverse-v1 -instruction-tables < C:\buildbot\as-builder-2\x-aarch64\llvm-project\llvm\test\tools\llvm-mca\AArch64\Neoverse\V1-misc-instructions.s | c:\buildbot\as-builder-2\x-aarch64\build\bin\filecheck.exe C:\buildbot\as-builder-2\x-aarch64\llvm-project\llvm\test\tools\llvm-mca\AArch64\Neoverse\V1-misc-instructions.s
# executed command: 'c:\buildbot\as-builder-2\x-aarch64\build\bin\llvm-mca.exe' -mtriple=aarch64 -mcpu=neoverse-v1 -instruction-tables
# executed command: 'c:\buildbot\as-builder-2\x-aarch64\build\bin\filecheck.exe' 'C:\buildbot\as-builder-2\x-aarch64\llvm-project\llvm\test\tools\llvm-mca\AArch64\Neoverse\V1-misc-instructions.s'
# .---command stderr------------
# | C:\buildbot\as-builder-2\x-aarch64\llvm-project\llvm\test\tools\llvm-mca\AArch64\Neoverse\V1-misc-instructions.s:33:15: error: CHECK-NEXT: expected string not found in input
# | # CHECK-NEXT: 1 1 0.12 U at s12e1r, x28
# |               ^
# | <stdin>:11:38: note: scanning from here
# | [1] [2] [3] [4] [5] [6] Instructions:
# |                                      ^
# | <stdin>:12:2: note: possible intended match here
# |  1 1 0.13 U at s12e1r, x28
# |  ^
# | 
# | Input file: <stdin>
# | Check file: C:\buildbot\as-builder-2\x-aarch64\llvm-project\llvm\test\tools\llvm-mca\AArch64\Neoverse\V1-misc-instructions.s
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |            .
# |            .
# |            .
# |            6: [3]: RThroughput 
# |            7: [4]: MayLoad 
# |            8: [5]: MayStore 
# |            9: [6]: HasSideEffects (U) 
# |           10:  
# |           11: [1] [2] [3] [4] [5] [6] Instructions: 
# | next:33'0                                          X error: no match found
# |           12:  1 1 0.13 U at s12e1r, x28 
# | next:33'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~
# | next:33'1      ?                          possible intended match
# |           13:  1 1 0.13 U brk #0x8415 
# | next:33'0     ~~~~~~~~~~~~~~~~~~~~~~~~
# |           14:  1 1 0.13 * * U clrex 
# | next:33'0     ~~~~~~~~~~~~~~~~~~~~~~
# |           15:  1 1 0.13 * * U csdb 
# | next:33'0     ~~~~~~~~~~~~~~~~~~~~~
# |           16:  1 1 0.13 U dcps1 
# | next:33'0     ~~~~~~~~~~~~~~~~~~
# |           17:  1 1 0.13 U dcps2 
# | next:33'0     ~~~~~~~~~~~~~~~~~~
# |            .
# |            .
# |            .
# | >>>>>>
# `-----------------------------
# error: command failed with exit status: 1
--
********************

would you take care of it?

simonwallis2 · 2025-09-18T08:57:21Z

@vvereschaka Thanks for highlighting this.

All 3 of the failures reported by buildbot 193 in builds 10666 onwards are similar:
#CHECK-NEXT: 1 1 0.12 U 1 MRS mrs x3, ID_AA64ZRF)_EL1
Note: possible intended match here
#CHECK-NEXT: 1 1 0.13 U 1 MRS mrs x3, ID_AA64ZRF)_EL1

For builds 10665 and earlier, the IssueWidth for V1 was 15, so the reciprocal throughput 1/15 was 0.07 to 2 decimal places.
From build 10666, after PR154495 changed the IssueWidth to 8, the reciprocal throughput is 1/8 or 0.125.

The test case looks for 0.12. It has rounded 0.125 down. This is what I see on various linux hosts I tried.
The lllcm-clang-win-x-aarch64 build bot sees 0.13. It has rounded 0.125 up.

So we have inconsistent llvm-mca behaviour between this buildbot and other machines. I did not expect that.

As a short term hack to get this buildbot green again, it would be possible to modify these 3 test cases to expect rounding up when the host OS is windows, if that is the root of the differences.
But it would be better to make llvm-mca behave consistently.

simonwallis2 · 2025-09-18T09:16:25Z

In llvm/tools/llvm-mca, there are 10 instances of
format("%.2f", …)

9 of these instances explicitly round up, using
floor((X * 100) + 0.5) / 100)

This RT output is the 10th instance.
llvm-mca/InstructionInfoView.cpp should be modified to match the other 9, and the test cases updated to match the new rounded-up value.

Explicitly round up the reciprocal calculation, so that .125 is displayed as 0.13 consistently across all hosts. Fix buildbot failure https://lab.llvm.org/buildbot/#/builders/193/builds/10666 since #154495

…(#159544) Explicitly round up the reciprocal calculation, so that .125 is displayed as 0.13 consistently across all hosts. Fix buildbot failure https://lab.llvm.org/buildbot/#/builders/193/builds/10666 since llvm/llvm-project#154495

llvmbot added the backend:AArch64 label Aug 20, 2025

simonwallis2 requested review from c-rhodes, davemgreen, david-arm, ebahapo and sjoerdmeijer August 20, 2025 10:20

c-rhodes reviewed Aug 20, 2025

View reviewed changes

davemgreen approved these changes Sep 3, 2025

View reviewed changes

c-rhodes approved these changes Sep 3, 2025

View reviewed changes

simonwallis2 merged commit a044d61 into llvm:main Sep 17, 2025
9 checks passed

simonwallis2 mentioned this pull request Sep 18, 2025

[llvm-mca] Round UP when formatting Reciprocal Throughput #159544

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AArch64] Update IssueWidth for Neoverse V1, N1, N3 #154495

[AArch64] Update IssueWidth for Neoverse V1, N1, N3 #154495

Uh oh!

simonwallis2 commented Aug 20, 2025

Uh oh!

llvmbot commented Aug 20, 2025

Uh oh!

davemgreen commented Aug 20, 2025

Uh oh!

simonwallis2 commented Aug 20, 2025

Uh oh!

c-rhodes Aug 20, 2025

Uh oh!

simonwallis2 Aug 21, 2025

Uh oh!

david-arm commented Aug 21, 2025

Uh oh!

davemgreen left a comment

Uh oh!

c-rhodes left a comment

Uh oh!

Uh oh!

vvereschaka commented Sep 17, 2025

Uh oh!

simonwallis2 commented Sep 18, 2025

Uh oh!

simonwallis2 commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

[AArch64] Update IssueWidth for Neoverse V1, N1, N3 #154495

[AArch64] Update IssueWidth for Neoverse V1, N1, N3 #154495

Uh oh!

Conversation

simonwallis2 commented Aug 20, 2025

Uh oh!

llvmbot commented Aug 20, 2025

Uh oh!

davemgreen commented Aug 20, 2025

Uh oh!

simonwallis2 commented Aug 20, 2025

Uh oh!

c-rhodes Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

simonwallis2 Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

david-arm commented Aug 21, 2025

Uh oh!

davemgreen left a comment

Choose a reason for hiding this comment

Uh oh!

c-rhodes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vvereschaka commented Sep 17, 2025

Uh oh!

simonwallis2 commented Sep 18, 2025

Uh oh!

simonwallis2 commented Sep 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants