Skip to content

[AVR] Fix Avr indvar detection and strength reduction (missed optimization) #152028

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 9, 2025

Conversation

tomtor
Copy link
Contributor

@tomtor tomtor commented Aug 4, 2025

Fix #151080

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Aug 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 4, 2025

@llvm/pr-subscribers-clang

Author: Tom Vijlbrief (tomtor)

Changes

Fix #151080


Full diff: https://github.com/llvm/llvm-project/pull/152028.diff

7 Files Affected:

  • (modified) clang/lib/Basic/Targets/AVR.h (+2-1)
  • (modified) llvm/lib/Target/AVR/AVRTargetMachine.cpp (+9-2)
  • (modified) llvm/lib/Target/AVR/AVRTargetMachine.h (+2)
  • (added) llvm/lib/Target/AVR/AVRTargetTransformInfo.h (+68)
  • (modified) llvm/test/CodeGen/AVR/bug-143247.ll (+2-2)
  • (modified) llvm/test/CodeGen/AVR/load.ll (+17-17)
  • (modified) llvm/test/CodeGen/AVR/shift.ll (+1-1)
diff --git a/clang/lib/Basic/Targets/AVR.h b/clang/lib/Basic/Targets/AVR.h
index 75c969fd59dc9..11aa844f2ce55 100644
--- a/clang/lib/Basic/Targets/AVR.h
+++ b/clang/lib/Basic/Targets/AVR.h
@@ -57,7 +57,8 @@ class LLVM_LIBRARY_VISIBILITY AVRTargetInfo : public TargetInfo {
     Int16Type = SignedInt;
     Char32Type = UnsignedLong;
     SigAtomicType = SignedChar;
-    resetDataLayout("e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8");
+    resetDataLayout(
+        "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-n16:8-a:8");
   }
 
   void getTargetDefines(const LangOptions &Opts,
diff --git a/llvm/lib/Target/AVR/AVRTargetMachine.cpp b/llvm/lib/Target/AVR/AVRTargetMachine.cpp
index b75417a0896a5..8d05005e287f4 100644
--- a/llvm/lib/Target/AVR/AVRTargetMachine.cpp
+++ b/llvm/lib/Target/AVR/AVRTargetMachine.cpp
@@ -19,6 +19,7 @@
 
 #include "AVR.h"
 #include "AVRMachineFunctionInfo.h"
+#include "AVRTargetTransformInfo.h"
 #include "AVRTargetObjectFile.h"
 #include "MCTargetDesc/AVRMCTargetDesc.h"
 #include "TargetInfo/AVRTargetInfo.h"
@@ -28,7 +29,7 @@
 namespace llvm {
 
 static const char *AVRDataLayout =
-    "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8";
+    "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-n16:8-a:8";
 
 /// Processes a CPU name.
 static StringRef getCPU(StringRef CPU) {
@@ -62,7 +63,9 @@ namespace {
 class AVRPassConfig : public TargetPassConfig {
 public:
   AVRPassConfig(AVRTargetMachine &TM, PassManagerBase &PM)
-      : TargetPassConfig(TM, PM) {}
+      : TargetPassConfig(TM, PM) {
+    EnableLoopTermFold = true;
+  }
 
   AVRTargetMachine &getAVRTargetMachine() const {
     return getTM<AVRTargetMachine>();
@@ -107,6 +110,10 @@ const AVRSubtarget *AVRTargetMachine::getSubtargetImpl(const Function &) const {
   return &SubTarget;
 }
 
+TargetTransformInfo AVRTargetMachine::getTargetTransformInfo(const Function &F) const {
+  return TargetTransformInfo(std::make_unique<AVRTTIImpl>(this, F));
+}
+
 MachineFunctionInfo *AVRTargetMachine::createMachineFunctionInfo(
     BumpPtrAllocator &Allocator, const Function &F,
     const TargetSubtargetInfo *STI) const {
diff --git a/llvm/lib/Target/AVR/AVRTargetMachine.h b/llvm/lib/Target/AVR/AVRTargetMachine.h
index 167d0076e9581..9452b3d8cd8a5 100644
--- a/llvm/lib/Target/AVR/AVRTargetMachine.h
+++ b/llvm/lib/Target/AVR/AVRTargetMachine.h
@@ -48,6 +48,8 @@ class AVRTargetMachine : public CodeGenTargetMachineImpl {
   createMachineFunctionInfo(BumpPtrAllocator &Allocator, const Function &F,
                             const TargetSubtargetInfo *STI) const override;
 
+  TargetTransformInfo getTargetTransformInfo(const Function &F) const override;
+
   bool isNoopAddrSpaceCast(unsigned SrcAs, unsigned DestAs) const override {
     // While AVR has different address spaces, they are all represented by
     // 16-bit pointers that can be freely casted between (of course, a pointer
diff --git a/llvm/lib/Target/AVR/AVRTargetTransformInfo.h b/llvm/lib/Target/AVR/AVRTargetTransformInfo.h
new file mode 100644
index 0000000000000..b877918a18151
--- /dev/null
+++ b/llvm/lib/Target/AVR/AVRTargetTransformInfo.h
@@ -0,0 +1,68 @@
+//===- AVRTargetTransformInfo.h - AVR specific TTI ---------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+/// \file
+/// This file defines a TargetTransformInfoImplBase conforming object specific
+/// to the AVR target machine. It uses the target's detailed information to
+/// provide more precise answers to certain TTI queries, while letting the
+/// target independent and default TTI implementations handle the rest.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIB_TARGET_AVR_AVRTARGETTRANSFORMINFO_H
+#define LLVM_LIB_TARGET_AVR_AVRTARGETTRANSFORMINFO_H
+
+#include "AVRSubtarget.h"
+#include "AVRTargetMachine.h"
+#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/CodeGen/BasicTTIImpl.h"
+#include "llvm/IR/Function.h"
+#include <optional>
+
+namespace llvm {
+
+class AVRTTIImpl final : public BasicTTIImplBase<AVRTTIImpl> {
+  using BaseT = BasicTTIImplBase<AVRTTIImpl>;
+  using TTI = TargetTransformInfo;
+
+  friend BaseT;
+
+  const AVRSubtarget *ST;
+  const AVRTargetLowering *TLI;
+  const Function *currentF;
+
+  const AVRSubtarget *getST() const { return ST; }
+  const AVRTargetLowering *getTLI() const { return TLI; }
+
+public:
+  explicit AVRTTIImpl(const AVRTargetMachine *TM, const Function &F)
+      : BaseT(TM, F.getDataLayout()), ST(TM->getSubtargetImpl(F)),
+        TLI(ST->getTargetLowering()), currentF(&F) {}
+
+  bool isLSRCostLess(const TargetTransformInfo::LSRCost &C1,
+                     const TargetTransformInfo::LSRCost &C2) const override {
+    // Detect %incdec.ptr because loop-reduce loses them
+    for (const BasicBlock &BB : *currentF) {
+      if (BB.getName().find("while.body") != std::string::npos) {
+        for (const Instruction &I : BB) {
+          std::string str;
+          llvm::raw_string_ostream(str) << I;
+          if (str.find("%incdec.ptr") != std::string::npos)
+            return false;
+        }
+      }
+    }
+    if (C2.Insns == ~0u)
+      return true;
+    return 2 * C1.Insns + C1.AddRecCost + C1.SetupCost + C1.NumRegs <
+           2 * C2.Insns + C2.AddRecCost + C2.SetupCost + C2.NumRegs;
+  }
+};
+
+} // end namespace llvm
+
+#endif // LLVM_LIB_TARGET_AVR_AVRTARGETTRANSFORMINFO_H
diff --git a/llvm/test/CodeGen/AVR/bug-143247.ll b/llvm/test/CodeGen/AVR/bug-143247.ll
index 07c4c6562c950..d4493272af76d 100644
--- a/llvm/test/CodeGen/AVR/bug-143247.ll
+++ b/llvm/test/CodeGen/AVR/bug-143247.ll
@@ -8,18 +8,18 @@ define void @complex_sbi() {
 ; CHECK:       ; %bb.0: ; %entry
 ; CHECK-NEXT:    push r16
 ; CHECK-NEXT:    push r17
-; CHECK-NEXT:    ldi r24, 0
+; CHECK-NEXT:    ldi r24, 1
 ; CHECK-NEXT:    ldi r25, 0
 ; CHECK-NEXT:  .LBB0_1: ; %while.cond
 ; CHECK-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; CHECK-NEXT:    sbi 1, 7
-; CHECK-NEXT:    adiw r24, 1
 ; CHECK-NEXT:    movw r16, r24
 ; CHECK-NEXT:    andi r24, 15
 ; CHECK-NEXT:    andi r25, 0
 ; CHECK-NEXT:    adiw r24, 1
 ; CHECK-NEXT:    call nil
 ; CHECK-NEXT:    movw r24, r16
+; CHECK-NEXT:    adiw r24, 1
 ; CHECK-NEXT:    rjmp .LBB0_1
 entry:
   br label %while.cond
diff --git a/llvm/test/CodeGen/AVR/load.ll b/llvm/test/CodeGen/AVR/load.ll
index 5de6b48652914..e2e7844b49c02 100644
--- a/llvm/test/CodeGen/AVR/load.ll
+++ b/llvm/test/CodeGen/AVR/load.ll
@@ -1,4 +1,4 @@
-; RUN: llc -mattr=avr6,sram < %s -mtriple=avr -verify-machineinstrs | FileCheck %s
+; RUN: llc -mattr=avr6,sram < %s -mtriple=avr-none -verify-machineinstrs | FileCheck %s
 
 define i8 @load8(ptr %x) {
 ; CHECK-LABEL: load8:
@@ -76,26 +76,26 @@ while.end:                                        ; preds = %while.body, %entry
   ret i8 %r.0.lcssa
 }
 
-define i16 @load16postinc(ptr %x, i16 %y) {
+define i16 @load16postinc(ptr %p, i16 %cnt) {
 ; CHECK-LABEL: load16postinc:
 ; CHECK: ld {{.*}}, {{[XYZ]}}+
 ; CHECK: ld {{.*}}, {{[XYZ]}}+
 entry:
-  %tobool2 = icmp eq i16 %y, 0
-  br i1 %tobool2, label %while.end, label %while.body
-while.body:                                       ; preds = %entry, %while.body
-  %r.05 = phi i16 [ %add, %while.body ], [ 0, %entry ]
-  %y.addr.04 = phi i16 [ %dec, %while.body ], [ %y, %entry ]
-  %x.addr.03 = phi ptr [ %incdec.ptr, %while.body ], [ %x, %entry ]
-  %dec = add nsw i16 %y.addr.04, -1
-  %incdec.ptr = getelementptr inbounds i16, ptr %x.addr.03, i16 1
-  %0 = load i16, ptr %x.addr.03
-  %add = add nsw i16 %0, %r.05
-  %tobool = icmp eq i16 %dec, 0
-  br i1 %tobool, label %while.end, label %while.body
-while.end:                                        ; preds = %while.body, %entry
-  %r.0.lcssa = phi i16 [ 0, %entry ], [ %add, %while.body ]
-  ret i16 %r.0.lcssa
+  %cmp3 = icmp sgt i16 %cnt, 0
+  br i1 %cmp3, label %for.body, label %for.cond.cleanup
+for.cond.cleanup:                                 ; preds = %for.body, %entry
+  %sum.0.lcssa = phi i16 [ 0, %entry ], [ %add, %for.body ]
+  ret i16 %sum.0.lcssa
+for.body:                                         ; preds = %entry, %for.body
+  %i.06 = phi i16 [ %inc, %for.body ], [ 0, %entry ]
+  %sum.05 = phi i16 [ %add, %for.body ], [ 0, %entry ]
+  %p.addr.04 = phi ptr [ %incdec.ptr, %for.body ], [ %p, %entry ]
+  %incdec.ptr = getelementptr inbounds nuw i8, ptr %p.addr.04, i16 2
+  %0 = load i16, ptr %p.addr.04, align 1
+  %add = add nsw i16 %0, %sum.05
+  %inc = add nuw nsw i16 %i.06, 1
+  %exitcond.not = icmp eq i16 %inc, %cnt
+  br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
 }
 
 define i8 @load8predec(ptr %x, i8 %y) {
diff --git a/llvm/test/CodeGen/AVR/shift.ll b/llvm/test/CodeGen/AVR/shift.ll
index 9836f93527b3c..1bd9b45999d7b 100644
--- a/llvm/test/CodeGen/AVR/shift.ll
+++ b/llvm/test/CodeGen/AVR/shift.ll
@@ -1,5 +1,5 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc < %s -mtriple=avr -mtriple=avr -verify-machineinstrs | FileCheck %s
+; RUN: llc < %s -mtriple=avr-none -verify-machineinstrs | FileCheck %s
 
 ; Optimize for speed.
 define i8 @shift_i8_i8_speed(i8 %a, i8 %b) {

Copy link

github-actions bot commented Aug 4, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@tomtor
Copy link
Contributor Author

tomtor commented Aug 4, 2025

loop-reduce does not preserve predecrement operations in loops which it cannot optimize.
This is a non AVR specific bug/issue, but on architectures with rich addressing modes it is hidden (a scaled index is used on Intel/ARM) and RISCV has no post/pre addressing.

In functions which have an incdec ptr I disable the loop-reduce. That is a bit of overkill, this could/should be done on an individual loop instead of the whole function, but in practice the difference will be small, because incdec ptrs are rare in generated AVR code.

I see no easy way to do this on individual loops. @Patryk27 suggestions?

@tomtor
Copy link
Contributor Author

tomtor commented Aug 5, 2025

char test(char);
char asd(char *p) {
        char s = 0;
        while (s < 100)
                s += test(*--p);
        return s;
}
000000b4 <asd>:
  b4:   ef 92           push    r14
  b6:   ff 92           push    r15
  b8:   1f 93           push    r17
  ba:   dc 01           movw    r26, r24
  bc:   11 2d           mov     r17, r1
  be:   8e 91           ld      r24, -X
  c0:   7d 01           movw    r14, r26
  c2:   18 d0           rcall   .+48            ; 0xf4 <test>
  c4:   d7 01           movw    r26, r14
  c6:   18 0f           add     r17, r24
  c8:   14 36           cpi     r17, 0x64       ; 100
  ca:   cc f3           brlt    .-14            ; 0xbe <.Lname25+0x7>
  cc:   81 2f           mov     r24, r17
  ce:   1f 91           pop     r17
  d0:   ff 90           pop     r15
  d2:   ef 90           pop     r14
  d4:   08 95           ret
000000b4 <asd>:
  b4:   ef 92           push    r14
  b6:   ff 92           push    r15
  b8:   1f 93           push    r17
  ba:   dc 01           movw    r26, r24
  bc:   11 97           sbiw    r26, 0x01       ; 1
  be:   11 2d           mov     r17, r1
  c0:   8c 91           ld      r24, X
  c2:   7d 01           movw    r14, r26
  c4:   19 d0           rcall   .+50            ; 0xf8 <test>
  c6:   d7 01           movw    r26, r14
  c8:   18 0f           add     r17, r24
  ca:   11 97           sbiw    r26, 0x01       ; 1
  cc:   14 36           cpi     r17, 0x64       ; 100
  ce:   c4 f3           brlt    .-16            ; 0xc0 <.Lname26+0x1>
  d0:   81 2f           mov     r24, r17
  d2:   1f 91           pop     r17
  d4:   ff 90           pop     r15
  d6:   ef 90           pop     r14
  d8:   08 95           ret

So the test for incdec saves 2 instructions (1 in the loop and 1 before the loop) because the -X is not lost. I propose to accept that we do prefer working loop optimization in general and cleaner code and disable the existing predec test cases with an TODO.

@tomtor tomtor requested a review from benshi001 August 5, 2025 08:41
@benshi001
Copy link
Member

Alao a new test (maybe issue-151080.ll) is needed to show what is fixed by this PR.

Copy link
Contributor

@Patryk27 Patryk27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very nice catch! I think the code is good (modulo the comments already left by @benshi001) and it'd be nice to get this merged 🙂

@tomtor
Copy link
Contributor Author

tomtor commented Aug 6, 2025

I am currently testing this PR in a Rust build and I am seeing some changes in generated code which I want to study before a merge of this PR.

Rust normally uses a patched llvm and I am directly using this PR so that could be the cause but I want to check that first. @Patryk27

@Patryk27
Copy link
Contributor

Patryk27 commented Aug 6, 2025

Rust normally uses a patched llvm and I am directly using this PR so that could be the cause but I want to check that first.

Yeah, I usually just cherry-pick my changes on top of Rust's fork of LLVM when testing, to make sure I don't bring too many unrelated changes there 👀

@tomtor
Copy link
Contributor Author

tomtor commented Aug 7, 2025

Building RUST without this PR resulted in the same issues, so it is not related to this PR. I am now rebuilding with this PR on top of the patched Rust llvm.

In file included from /media/scratch/rust/rust/src/llvm-project/llvm/lib/Target/AVR/AVRTargetTransformInfo.h:
21,
                 from /media/scratch/rust/rust/src/llvm-project/llvm/lib/Target/AVR/AVRTargetMachine.cpp:22:
/media/scratch/rust/rust/src/llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h: In instantiation
of 'llvm::TargetTransformInfo::TargetTransformInfo(T) [with T = std::unique_ptr<llvm::AVRTTIImpl, std::defaul
t_delete<llvm::AVRTTIImpl> >]':
/media/scratch/rust/rust/src/llvm-project/llvm/lib/Target/AVR/AVRTargetMachine.cpp:113:67:   required from he
re
/media/scratch/rust/rust/src/llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:3181:15: error: us
e of deleted function 'std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = ll
vm::AVRTTIImpl; _Dp = std::default_delete<llvm::AVRTTIImpl>]'
 3181 |     : TTIImpl(new Model<T>(Impl)) {}
      |               ^~~~~~~~~~~~~~~~~~
In file included from /usr/include/c++/13/memory:78,
                 from /media/scratch/rust/rust/src/llvm-project/llvm/include/llvm/ADT/SmallVector.h:28,
                 from /media/scratch/rust/rust/src/llvm-project/llvm/include/llvm/ADT/ArrayRef.h:13,
                 from /media/scratch/rust/rust/src/llvm-project/llvm/include/llvm/IR/DataLayout.h:23,
                 from /media/scratch/rust/rust/src/llvm-project/llvm/include/llvm/Target/TargetMachine.h:17,
                 from /media/scratch/rust/rust/src/llvm-project/llvm/include/llvm/CodeGen/CodeGenTargetMachin
eImpl.h:15,
                 from /media/scratch/rust/rust/src/llvm-project/llvm/lib/Target/AVR/AVRTargetMachine.h:16,
                 from /media/scratch/rust/rust/src/llvm-project/llvm/lib/Target/AVR/AVRTargetMachine.cpp:13:
/usr/include/c++/13/bits/unique_ptr.h:522:7: note: declared here
  522 |       unique_ptr(const unique_ptr&) = delete;
      |       ^~~~~~~~~~
/media/scratch/rust/rust/src/llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:2339:11: note:   i
nitializing argument 1 of 'llvm::TargetTransformInfo::Model<T>::Model(T) [with T = std::unique_ptr<llvm::AVRT
TIImpl, std::default_delete<llvm::AVRTTIImpl> >]'
 2339 |   Model(T Impl) : Impl(std::move(Impl)) {}
      |         ~~^~~~
ninja: build stopped: subcommand failed.

I expect that this is related to having just the TransformInfo.h without a .cpp.
Edit: that is not an issue.

Hmm: RISCVTargetMachine.h has:

  const RISCVSubtarget *getSubtargetImpl(const Function &F) const override;
  // DO NOT IMPLEMENT: There is no such thing as a valid default subtarget,
  // subtargets are per-function entities based on the target-specific
  // attributes of each function.
  const RISCVSubtarget *getSubtargetImpl() const = delete;

But AVR DOES have an implementation.

Edit: It DOES build now, somehow an unclean Rust building environment.

@tomtor
Copy link
Contributor Author

tomtor commented Aug 7, 2025

000001f6 <avr_modern_demo::serial::Serial::write_int>:
 1f6:   ef 92           push    r14
 1f8:   0f 93           push    r16
 1fa:   1f 93           push    r17
 1fc:   8a 30           cpi     r24, 0x0A       ; 10
 1fe:   91 05           cpc     r25, r1
 200:   c0 f0           brcs    .+48            ; 0x232 <.Lname69+0x8>
 202:   2d ec           ldi     r18, 0xCD       ; 205
 204:   3c ec           ldi     r19, 0xCC       ; 204
 206:   40 e0           ldi     r20, 0x00       ; 0
 208:   50 e0           ldi     r21, 0x00       ; 0
 20a:   bc 01           movw    r22, r24
 20c:   e8 2e           mov     r14, r24
 20e:   ca 01           movw    r24, r20
 210:   0e 94 cd 02     call    0x59a   ; 0x59a <__mulsi3>
 214:   8c 01           movw    r16, r24
 216:   16 95           lsr     r17
 218:   07 95           ror     r16
 21a:   16 95           lsr     r17
 21c:   07 95           ror     r16
 21e:   16 95           lsr     r17
 220:   07 95           ror     r16
 222:   c8 01           movw    r24, r16
 224:   0e 94 fb 00     call    0x1f6   ; 0x1f6 <avr_modern_demo::serial::Serial::write_int>
 228:   86 ef           ldi     r24, 0xF6       ; 246
 22a:   08 02           muls    r16, r24
 22c:   11 24           eor     r1, r1
 22e:   80 2d           mov     r24, r0
 230:   8e 0d           add     r24, r14
 232:   90 91 04 08     lds     r25, 0x0804     ; 0x800804 <anon.0e9a9fafcd14a44665a97d696f684c31.7+0x7f7a23>
 236:   90 72           andi    r25, 0x20       ; 32
 238:   90 30           cpi     r25, 0x00       ; 0
 23a:   d9 f3           breq    .-10            ; 0x232 <.Lname69+0x8>
 23c:   80 63           ori     r24, 0x30       ; 48
 23e:   80 93 02 08     sts     0x0802, r24     ; 0x800802 <anon.0e9a9fafcd14a44665a97d696f684c31.7+0x7f7a21>
 242:   1f 91           pop     r17
 244:   0f 91           pop     r16
 246:   ef 90           pop     r14
 248:   08 95           ret
000001f8 <avr_modern_demo::serial::Serial::write_int>:
 1f8:   ef 92           push    r14
 1fa:   ff 92           push    r15
 1fc:   0f 93           push    r16
 1fe:   1f 93           push    r17
 200:   8a 30           cpi     r24, 0x0A       ; 10
 202:   91 05           cpc     r25, r1
 204:   08 f4           brcc    .+2             ; 0x208 <.Lname65+0x7>
 206:   2c c0           rjmp    .+88            ; 0x260 <.Lname73+0xa>
 208:   2d ec           ldi     r18, 0xCD       ; 205
 20a:   3c ec           ldi     r19, 0xCC       ; 204
 20c:   40 e0           ldi     r20, 0x00       ; 0
 20e:   50 e0           ldi     r21, 0x00       ; 0
 210:   7c 01           movw    r14, r24
 212:   bc 01           movw    r22, r24
 214:   ca 01           movw    r24, r20
 216:   0e 94 e3 02     call    0x5c6   ; 0x5c6 <__mulsi3>
 21a:   8c 01           movw    r16, r24
 21c:   16 95           lsr     r17
 21e:   07 95           ror     r16
 220:   16 95           lsr     r17
 222:   07 95           ror     r16
 224:   16 95           lsr     r17
 226:   07 95           ror     r16
 228:   c8 01           movw    r24, r16
 22a:   0e 94 fc 00     call    0x1f8   ; 0x1f8 <avr_modern_demo::serial::Serial::write_int>
 22e:   80 91 04 08     lds     r24, 0x0804     ; 0x800804 <anon.b5803c54398eee9364225baa7d2d2398.7+0x7f79f7>
 232:   80 72           andi    r24, 0x20       ; 32
 234:   80 30           cpi     r24, 0x00       ; 0
 236:   d9 f3           breq    .-10            ; 0x22e <.Lname69+0x4>
 238:   86 ef           ldi     r24, 0xF6       ; 246
 23a:   18 02           muls    r17, r24
 23c:   90 2d           mov     r25, r0
 23e:   11 24           eor     r1, r1
 240:   08 9f           mul     r16, r24
 242:   81 2d           mov     r24, r1
 244:   11 24           eor     r1, r1
 246:   80 1b           sub     r24, r16
 248:   89 0f           add     r24, r25
 24a:   28 2f           mov     r18, r24
 24c:   33 27           eor     r19, r19
 24e:   32 2f           mov     r19, r18
 250:   22 27           eor     r18, r18
 252:   80 2d           mov     r24, r0
 254:   99 27           eor     r25, r25
 256:   82 2b           or      r24, r18
 258:   93 2b           or      r25, r19
 25a:   8e 0d           add     r24, r14
 25c:   9f 1d           adc     r25, r15
 25e:   05 c0           rjmp    .+10            ; 0x26a <.Lname74+0x9>
 260:   20 91 04 08     lds     r18, 0x0804     ; 0x800804 <anon.b5803c54398eee9364225baa7d2d2398.7+0x7f79f7>
 264:   20 72           andi    r18, 0x20       ; 32
 266:   20 30           cpi     r18, 0x00       ; 0
 268:   d9 f3           breq    .-10            ; 0x260 <.Lname73+0xa>
 26a:   80 63           ori     r24, 0x30       ; 48
 26c:   80 93 02 08     sts     0x0802, r24     ; 0x800802 <anon.b5803c54398eee9364225baa7d2d2398.7+0x7f79f5>
 270:   1f 91           pop     r17
 272:   0f 91           pop     r16
 274:   ff 90           pop     r15
 276:   ef 90           pop     r14
 278:   08 95           ret
            pub fn write_int(&self, i: u16) {
                if i > 9 {
                    self.write_int(i / 10);
                    self.write_int(i % 10);
                } else {
                    self.write_c(b'0' + i as u8);
                }
            }

            pub fn write_c(&self, b: u8) {
                {
                    while self.p.$usartnr.status().read().dreif() == false {}
                    unsafe {
                        self.p.$usartnr.txdatal().write(|w| w.bits(b));
                    }
                }
            }

The long version is with the PR, the short from nightly.
I do not think there is an issue, but my own build probably differs in other build parameters from nightly... EDIT: yes, without the PR the same longer code.
Would be nice if we could compare with nightly.

@tomtor
Copy link
Contributor Author

tomtor commented Aug 8, 2025

I just checked the latest Rust nightly. It produces now also the longer version! So it is a Rust regression. This PR is fine.

@benshi001 benshi001 self-requested a review August 9, 2025 04:46
Copy link
Member

@benshi001 benshi001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@benshi001 benshi001 merged commit 97f0ff0 into llvm:main Aug 9, 2025
8 of 9 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 9, 2025

LLVM Buildbot has detected a new failure on builder clang-ppc64le-rhel running on ppc64le-clang-rhel-test while building clang,llvm at step 5 "build-unified-tree".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/145/builds/8944

Here is the relevant piece of the build log for the reference
Step 5 (build-unified-tree) failure: build (failure)
...
24.572 [103/23/396] Linking CXX shared library lib/libLLVMDWARFLinkerClassic.so.22.0git
24.578 [102/23/397] Linking CXX shared library lib/libLLVMLanaiCodeGen.so.22.0git
24.584 [101/23/398] Creating library symlink lib/libLLVMDWARFLinkerClassic.so
24.594 [101/22/399] Creating library symlink lib/libLLVMLanaiCodeGen.so
24.601 [101/21/400] Linking CXX shared library lib/libLLVMBPFCodeGen.so.22.0git
24.609 [100/21/401] Linking CXX shared library lib/libLLVMDWARFLinkerParallel.so.22.0git
24.617 [99/21/402] Creating library symlink lib/libLLVMBPFCodeGen.so
24.624 [99/20/403] Linking CXX shared library lib/libLLVMSparcCodeGen.so.22.0git
24.626 [98/20/404] Creating library symlink lib/libLLVMDWARFLinkerParallel.so
24.628 [98/19/405] Linking CXX shared library lib/libLLVMAVRCodeGen.so.22.0git
FAILED: lib/libLLVMAVRCodeGen.so.22.0git 
: && /home/buildbots/llvm-external-buildbots/clang.19.1.7/bin/clang++ --gcc-toolchain=/gcc-toolchain/usr -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete -Wl,--color-diagnostics   -Wl,--gc-sections  -Xlinker --dependency-file=lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/link.d -shared -Wl,-soname,libLLVMAVRCodeGen.so.22.0git -o lib/libLLVMAVRCodeGen.so.22.0git lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRAsmPrinter.cpp.o lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRExpandPseudoInsts.cpp.o lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRFrameLowering.cpp.o lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRInstrInfo.cpp.o lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRISelDAGToDAG.cpp.o lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRISelLowering.cpp.o lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRMCInstLower.cpp.o lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRRegisterInfo.cpp.o lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRSelectionDAGInfo.cpp.o lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRShiftExpand.cpp.o lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRSubtarget.cpp.o lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachine.cpp.o lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetObjectFile.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/lib:"  lib/libLLVMAVRDesc.so.22.0git  lib/libLLVMAVRInfo.so.22.0git  lib/libLLVMAsmPrinter.so.22.0git  lib/libLLVMSelectionDAG.so.22.0git  lib/libLLVMCodeGen.so.22.0git  lib/libLLVMCodeGenTypes.so.22.0git  lib/libLLVMTarget.so.22.0git  lib/libLLVMMC.so.22.0git  lib/libLLVMCore.so.22.0git  lib/libLLVMSupport.so.22.0git  -Wl,-rpath-link,/home/buildbots/llvm-external-buildbots/workers/ppc64le-clang-rhel-test/clang-ppc64le-rhel/build/lib && :
ld.lld: error: undefined symbol: llvm::TargetTransformInfo::TargetTransformInfo(std::unique_ptr<llvm::TargetTransformInfoImplBase const, std::default_delete<llvm::TargetTransformInfoImplBase const>>)
>>> referenced by AVRTargetMachine.cpp
>>>               lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachine.cpp.o:(llvm::AVRTargetMachine::getTargetTransformInfo(llvm::Function const&) const)

ld.lld: error: undefined symbol: llvm::TargetTransformInfoImplBase::~TargetTransformInfoImplBase()
>>> referenced by AVRTargetMachine.cpp
>>>               lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachine.cpp.o:(llvm::AVRTTIImpl::~AVRTTIImpl())
>>> referenced by AVRTargetMachine.cpp
>>>               lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachine.cpp.o:(vtable for llvm::AVRTTIImpl)

ld.lld: error: undefined symbol: llvm::TargetTransformInfo::getOperandInfo(llvm::Value const*)
>>> referenced by AVRTargetMachine.cpp
>>>               lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachine.cpp.o:(llvm::TargetTransformInfoImplCRTPBase<llvm::AVRTTIImpl>::getInstructionCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const)
>>> referenced by AVRTargetMachine.cpp
>>>               lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachine.cpp.o:(llvm::TargetTransformInfoImplCRTPBase<llvm::AVRTTIImpl>::getInstructionCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const)
>>> referenced by AVRTargetMachine.cpp
>>>               lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachine.cpp.o:(llvm::TargetTransformInfoImplCRTPBase<llvm::AVRTTIImpl>::getInstructionCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const)
>>> referenced 11 more times

ld.lld: error: undefined symbol: llvm::TargetTransformInfo::getCastContextHint(llvm::Instruction const*)
>>> referenced by AVRTargetMachine.cpp
>>>               lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachine.cpp.o:(llvm::TargetTransformInfoImplCRTPBase<llvm::AVRTTIImpl>::getInstructionCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const)

ld.lld: error: undefined symbol: llvm::IntrinsicCostAttributes::IntrinsicCostAttributes(unsigned int, llvm::CallBase const&, llvm::InstructionCost, bool, llvm::TargetLibraryInfo const*)
>>> referenced by AVRTargetMachine.cpp
>>>               lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachine.cpp.o:(llvm::TargetTransformInfoImplCRTPBase<llvm::AVRTTIImpl>::getInstructionCost(llvm::User const*, llvm::ArrayRef<llvm::Value const*>, llvm::TargetTransformInfo::TargetCostKind) const)

ld.lld: error: undefined symbol: llvm::OptimizationRemarkEmitter::emit(llvm::DiagnosticInfoOptimizationBase&)
>>> referenced by AVRTargetMachine.cpp
>>>               lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachine.cpp.o:(llvm::BasicTTIImplBase<llvm::AVRTTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) const)

ld.lld: error: undefined symbol: llvm::Triple::isArch64Bit() const
>>> referenced by AVRTargetMachine.cpp
>>>               lib/Target/AVR/CMakeFiles/LLVMAVRCodeGen.dir/AVRTargetMachine.cpp.o:(llvm::BasicTTIImplBase<llvm::AVRTTIImpl>::shouldBuildRelLookupTables() const)

ld.lld: error: undefined symbol: llvm::IntrinsicCostAttributes::IntrinsicCostAttributes(unsigned int, llvm::Type*, llvm::ArrayRef<llvm::Value const*>, llvm::ArrayRef<llvm::Type*>, llvm::FastMathFlags, llvm::IntrinsicInst const*, llvm::InstructionCost, llvm::TargetLibraryInfo const*)
>>> referenced by AVRTargetMachine.cpp

@tomtor
Copy link
Contributor Author

tomtor commented Aug 9, 2025

We get a build error with lld @benshi001
The windows build went fine.

I think I have to add a .cpp definition

@tomtor
Copy link
Contributor Author

tomtor commented Aug 10, 2025

FYI, the code generation regression is no longer in current nigthly, it was only in 2025-08-07.

The day I was testing this PR :( Bad timing...

searched toolchains ec7c02612527d185c379900b613311bc1dcbf7dc through 7d82b83ed57d188ab3f2441a765a6419685a88a3


********************************************************************************
Regression in dc0bae1db725fbba8524f195f74f680995fd549e
********************************************************************************

Attempting to search unrolled perf builds
ERROR: couldn't find perf build comment
==================================================================================
= Please file this regression report on the rust-lang/rust GitHub repository     =
=        New issue: https://github.com/rust-lang/rust/issues/new                 =
=     Known issues: https://github.com/rust-lang/rust/issues                     =
= Copy and paste the text below into the issue report thread.  Thanks!           =
==================================================================================

searched nightlies: from nightly-2025-08-01 to nightly-2025-08-08
regressed nightly: nightly-2025-08-07
searched commit range: https://github.com/rust-lang/rust/compare/ec7c02612527d185c379900b613311bc1dcbf7dc...7
d82b83ed57d188ab3f2441a765a6419685a88a3
regressed commit: https://github.com/rust-lang/rust/commit/dc0bae1db725fbba8524f195f74f680995fd549e

<details>
<summary>bisected with <a href='https://github.com/rust-lang/cargo-bisect-rustc'>cargo-bisect-rustc</a> v0.6.
10</summary>


Host triple: x86_64-unknown-linux-gnu
Reproduce with:
```bash
cargo bisect-rustc -vv --with-src --start=2025-08-01 --end=2025-08-08 --script ./test.sh
```

@tomtor
Copy link
Contributor Author

tomtor commented Aug 10, 2025

62735d2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[AVR] Induction variable not found (missed optimization)
5 participants