[ARM] Test unroll behaviour on machines with low overhead branching #118692

VladiKrapp-Arm · 2024-12-04T20:10:19Z

Add test for existing loop unroll behaviour.

Current behaviour is the single loop with fmul gets runtime unrolled by
count of 4, with the loop remainder unrolled as the 3 for.body9.us.prol
sections. This is quite a lot of compare and branch, negating the
benefits of the low overhead loop mechanism.

llvmbot · 2024-12-04T20:10:57Z

@llvm/pr-subscribers-llvm-transforms

Author: None (VladiKrapp-Arm)

Changes

Add test for existing loop unroll behaviour.

Full diff: https://github.com/llvm/llvm-project/pull/118692.diff

1 Files Affected:

(added) llvm/test/Transforms/LoopUnroll/ARM/lob-unroll.ll (+63)

diff --git a/llvm/test/Transforms/LoopUnroll/ARM/lob-unroll.ll b/llvm/test/Transforms/LoopUnroll/ARM/lob-unroll.ll
new file mode 100644
index 00000000000000..b155f5d31045f9
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/ARM/lob-unroll.ll
@@ -0,0 +1,63 @@
+; RUN: opt -mcpu=cortex-m55 -mtriple=thumbv8.1m.main -passes=loop-unroll -S  %s -o - | FileCheck %s --check-prefix=LOB
+
+; This test checks behaviour of loop unrolling on processors with low overhead branching available 
+
+; LOB-CHECK-LABEL: for.body{{.*}}.prol
+; LOB-COUNT-1:     fmul fast float 
+; LOB-CHECK-LABEL: for.body{{.*}}.prol.1
+; LOB-COUNT-1:     fmul fast float 
+; LOB-CHECK-LABEL: for.body{{.*}}.prol.2
+; LOB-COUNT-1:     fmul fast float 
+; LOB-CHECK-LABEL: for.body{{.*}}
+; LOB-COUNT-4:     fmul fast float 
+; LOB-NOT:     fmul fast float 
+
+; Function Attrs: nofree norecurse nosync nounwind memory(argmem: readwrite)
+define dso_local void @test(i32 noundef %n, ptr nocapture noundef %pA) local_unnamed_addr #0 {
+entry:
+  %cmp46 = icmp sgt i32 %n, 0
+  br i1 %cmp46, label %for.body, label %for.cond.cleanup
+
+for.cond.loopexit:                                ; preds = %for.cond6.for.cond.cleanup8_crit_edge.us, %for.body
+  %exitcond49.not = icmp eq i32 %add, %n
+  br i1 %exitcond49.not, label %for.cond.cleanup, label %for.body
+
+for.cond.cleanup:                                 ; preds = %for.cond.loopexit, %entry
+  ret void
+
+for.body:                                         ; preds = %entry, %for.cond.loopexit
+  %k.047 = phi i32 [ %add, %for.cond.loopexit ], [ 0, %entry ]
+  %add = add nuw nsw i32 %k.047, 1
+  %cmp244 = icmp slt i32 %add, %n
+  br i1 %cmp244, label %for.cond6.preheader.lr.ph, label %for.cond.loopexit
+
+for.cond6.preheader.lr.ph:                        ; preds = %for.body
+  %invariant.gep = getelementptr float, ptr %pA, i32 %k.047
+  br label %for.cond6.preheader.us
+
+for.cond6.preheader.us:                           ; preds = %for.cond6.for.cond.cleanup8_crit_edge.us, %for.cond6.preheader.lr.ph
+  %w.045.us = phi i32 [ %add, %for.cond6.preheader.lr.ph ], [ %inc19.us, %for.cond6.for.cond.cleanup8_crit_edge.us ]
+  %mul.us = mul nuw nsw i32 %w.045.us, %n
+  %0 = getelementptr float, ptr %pA, i32 %mul.us
+  %arrayidx.us = getelementptr float, ptr %0, i32 %k.047
+  br label %for.body9.us
+
+for.body9.us:                                     ; preds = %for.cond6.preheader.us, %for.body9.us
+  %x.043.us = phi i32 [ %add, %for.cond6.preheader.us ], [ %inc.us, %for.body9.us ]
+  %1 = load float, ptr %arrayidx.us, align 4
+  %mul11.us = mul nuw nsw i32 %x.043.us, %n
+  %gep.us = getelementptr float, ptr %invariant.gep, i32 %mul11.us
+  %2 = load float, ptr %gep.us, align 4
+  %mul14.us = fmul fast float %2, %1
+  %arrayidx17.us = getelementptr float, ptr %0, i32 %x.043.us
+  store float %mul14.us, ptr %arrayidx17.us, align 4
+  %inc.us = add nuw nsw i32 %x.043.us, 1
+  %exitcond.not = icmp eq i32 %inc.us, %n
+  br i1 %exitcond.not, label %for.cond6.for.cond.cleanup8_crit_edge.us, label %for.body9.us
+
+for.cond6.for.cond.cleanup8_crit_edge.us:         ; preds = %for.body9.us
+  %inc19.us = add nuw nsw i32 %w.045.us, 1
+  %exitcond48.not = icmp eq i32 %inc19.us, %n
+  br i1 %exitcond48.not, label %for.cond.loopexit, label %for.cond6.preheader.us
+}
+

nasherm

LGTM. You might someone else to also look at it though before merging

stuij

I'd be a bit more descriptive on why you're putting this test in: you're planning to put up a patch for X that will change this behaviour in what way.

Current behaviour is the single loop with fmul gets runtime unrolled by count of 4, with the loop remainder unrolled as the 3 for.body9.us.prol sections. This is quite a lot of compare and branch, negating the benefits of the low overhead loop mechanism.

VladiKrapp-Arm · 2024-12-05T11:21:32Z

@stuij : I've updated the commit message (and the first comment on the PR)

llvmbot added the llvm:transforms label Dec 4, 2024

nasherm approved these changes Dec 5, 2024

View reviewed changes

stuij reviewed Dec 5, 2024

View reviewed changes

VladiKrapp-Arm force-pushed the add_lob_unroll_test branch from 44d5944 to e518dde Compare December 5, 2024 11:19

stuij approved these changes Dec 5, 2024

View reviewed changes

jthackray merged commit bb3eb0c into llvm:main Dec 6, 2024
8 checks passed

VladiKrapp-Arm mentioned this pull request Dec 6, 2024

Request Commit Access For VladiKrapp-Arm #118998

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ARM] Test unroll behaviour on machines with low overhead branching #118692

[ARM] Test unroll behaviour on machines with low overhead branching #118692

Uh oh!

VladiKrapp-Arm commented Dec 4, 2024 •

edited

Loading

Uh oh!

llvmbot commented Dec 4, 2024

Uh oh!

nasherm left a comment

Uh oh!

stuij left a comment

Uh oh!

VladiKrapp-Arm commented Dec 5, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[ARM] Test unroll behaviour on machines with low overhead branching #118692

[ARM] Test unroll behaviour on machines with low overhead branching #118692

Uh oh!

Conversation

VladiKrapp-Arm commented Dec 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Dec 4, 2024

Uh oh!

nasherm left a comment

Choose a reason for hiding this comment

Uh oh!

stuij left a comment

Choose a reason for hiding this comment

Uh oh!

VladiKrapp-Arm commented Dec 5, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

VladiKrapp-Arm commented Dec 4, 2024 •

edited

Loading