Skip to content

Conversation

@vporpo
Copy link
Contributor

@vporpo vporpo commented Feb 11, 2025

This patch fixes the way the top-of-schedule variable gets set and updated. Before this patch it used to get updated whenever we scheduled a bundle, which is wrong, as the top-of-schedule needs to be maintained across scheduling attempts.

It should get reset only when we clear the schedule or when we destroy the current schedule and re-schedule.

This patch fixes the way the top-of-schedule variable gets set and updated.
Before this patch it used to get updated whenever we scheduled a bundle,
which is wrong, as the top-of-schedule needs to be maintained across
scheduling attempts.

It should get reset only when we clear the schedule or when we destroy the
current schedule and re-schedule.
@llvmbot
Copy link
Member

llvmbot commented Feb 11, 2025

@llvm/pr-subscribers-vectorizers

Author: vporpo (vporpo)

Changes

This patch fixes the way the top-of-schedule variable gets set and updated. Before this patch it used to get updated whenever we scheduled a bundle, which is wrong, as the top-of-schedule needs to be maintained across scheduling attempts.

It should get reset only when we clear the schedule or when we destroy the current schedule and re-schedule.


Full diff: https://github.com/llvm/llvm-project/pull/126820.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SandboxVectorizer/Scheduler.cpp (+10-2)
  • (modified) llvm/test/Transforms/SandboxVectorizer/bottomup_basic.ll (+2-2)
  • (modified) llvm/test/Transforms/SandboxVectorizer/bottomup_seed_slice_pow2.ll (+1-1)
  • (modified) llvm/test/Transforms/SandboxVectorizer/repeated_instrs.ll (+2-2)
  • (modified) llvm/test/Transforms/SandboxVectorizer/scheduler.ll (+25)
diff --git a/llvm/lib/Transforms/Vectorize/SandboxVectorizer/Scheduler.cpp b/llvm/lib/Transforms/Vectorize/SandboxVectorizer/Scheduler.cpp
index dd24cc3d98cf8..2f7d7087ca880 100644
--- a/llvm/lib/Transforms/Vectorize/SandboxVectorizer/Scheduler.cpp
+++ b/llvm/lib/Transforms/Vectorize/SandboxVectorizer/Scheduler.cpp
@@ -230,11 +230,13 @@ bool Scheduler::trySchedule(ArrayRef<Instruction *> Instrs) {
     // top-most part of the schedule that includes the instrs in the bundle and
     // re-schedule.
     trimSchedule(Instrs);
+    ScheduleTopItOpt = std::nullopt;
     [[fallthrough]];
   case BndlSchedState::NoneScheduled: {
     // TODO: Set the window of the DAG that we are interested in.
-    // We start scheduling at the bottom instr of Instrs.
-    ScheduleTopItOpt = std::next(VecUtils::getLowest(Instrs)->getIterator());
+    if (!ScheduleTopItOpt)
+      // We start scheduling at the bottom instr of Instrs.
+      ScheduleTopItOpt = std::next(VecUtils::getLowest(Instrs)->getIterator());
 
     // TODO: For now don't cross BBs.
     if (!DAG.getInterval().empty()) {
@@ -262,6 +264,12 @@ bool Scheduler::trySchedule(ArrayRef<Instruction *> Instrs) {
 void Scheduler::dump(raw_ostream &OS) const {
   OS << "ReadyList:\n";
   ReadyList.dump(OS);
+  OS << "Top of schedule: ";
+  if (ScheduleTopItOpt)
+    OS << **ScheduleTopItOpt;
+  else
+    OS << "Empty";
+  OS << "\n";
 }
 void Scheduler::dump() const { dump(dbgs()); }
 #endif // NDEBUG
diff --git a/llvm/test/Transforms/SandboxVectorizer/bottomup_basic.ll b/llvm/test/Transforms/SandboxVectorizer/bottomup_basic.ll
index ee8592c04b62c..45b937dc1b1b6 100644
--- a/llvm/test/Transforms/SandboxVectorizer/bottomup_basic.ll
+++ b/llvm/test/Transforms/SandboxVectorizer/bottomup_basic.ll
@@ -77,7 +77,7 @@ define void @store_fadd_load(ptr %ptr) {
 ; CHECK-NEXT:    [[PTR0:%.*]] = getelementptr float, ptr [[PTR]], i32 0
 ; CHECK-NEXT:    [[VECL:%.*]] = load <2 x float>, ptr [[PTR0]], align 4
 ; CHECK-NEXT:    [[VECL1:%.*]] = load <2 x float>, ptr [[PTR0]], align 4
-; CHECK-NEXT:    [[VEC:%.*]] = fadd <2 x float> [[VECL]], [[VECL1]]
+; CHECK-NEXT:    [[VEC:%.*]] = fadd <2 x float> [[VECL1]], [[VECL]]
 ; CHECK-NEXT:    store <2 x float> [[VEC]], ptr [[PTR0]], align 4
 ; CHECK-NEXT:    ret void
 ;
@@ -247,8 +247,8 @@ define void @diamondMultiInput(ptr %ptr, ptr %ptrX) {
 ; CHECK-LABEL: define void @diamondMultiInput(
 ; CHECK-SAME: ptr [[PTR:%.*]], ptr [[PTRX:%.*]]) {
 ; CHECK-NEXT:    [[PTR0:%.*]] = getelementptr float, ptr [[PTR]], i32 0
-; CHECK-NEXT:    [[VECL:%.*]] = load <2 x float>, ptr [[PTR0]], align 4
 ; CHECK-NEXT:    [[LDX:%.*]] = load float, ptr [[PTRX]], align 4
+; CHECK-NEXT:    [[VECL:%.*]] = load <2 x float>, ptr [[PTR0]], align 4
 ; CHECK-NEXT:    [[VINS:%.*]] = insertelement <2 x float> poison, float [[LDX]], i32 0
 ; CHECK-NEXT:    [[VEXT:%.*]] = extractelement <2 x float> [[VECL]], i32 0
 ; CHECK-NEXT:    [[VINS1:%.*]] = insertelement <2 x float> [[VINS]], float [[VEXT]], i32 1
diff --git a/llvm/test/Transforms/SandboxVectorizer/bottomup_seed_slice_pow2.ll b/llvm/test/Transforms/SandboxVectorizer/bottomup_seed_slice_pow2.ll
index f1c6e3297d79c..1b189831569f5 100644
--- a/llvm/test/Transforms/SandboxVectorizer/bottomup_seed_slice_pow2.ll
+++ b/llvm/test/Transforms/SandboxVectorizer/bottomup_seed_slice_pow2.ll
@@ -7,8 +7,8 @@ define void @pow2(ptr %ptr, float %val) {
 ; POW2-SAME: ptr [[PTR:%.*]], float [[VAL:%.*]]) {
 ; POW2-NEXT:    [[PTR0:%.*]] = getelementptr float, ptr [[PTR]], i32 0
 ; POW2-NEXT:    [[PTR2:%.*]] = getelementptr float, ptr [[PTR]], i32 2
-; POW2-NEXT:    [[VECL:%.*]] = load <2 x float>, ptr [[PTR0]], align 4
 ; POW2-NEXT:    [[LD2:%.*]] = load float, ptr [[PTR2]], align 4
+; POW2-NEXT:    [[VECL:%.*]] = load <2 x float>, ptr [[PTR0]], align 4
 ; POW2-NEXT:    store <2 x float> [[VECL]], ptr [[PTR0]], align 4
 ; POW2-NEXT:    store float [[LD2]], ptr [[PTR2]], align 4
 ; POW2-NEXT:    ret void
diff --git a/llvm/test/Transforms/SandboxVectorizer/repeated_instrs.ll b/llvm/test/Transforms/SandboxVectorizer/repeated_instrs.ll
index 25d9d79154d35..add762ac2d894 100644
--- a/llvm/test/Transforms/SandboxVectorizer/repeated_instrs.ll
+++ b/llvm/test/Transforms/SandboxVectorizer/repeated_instrs.ll
@@ -5,10 +5,10 @@ define i32 @repeated_splat(ptr %ptr, i32 %v) #0 {
 ; CHECK-LABEL: define i32 @repeated_splat(
 ; CHECK-SAME: ptr [[PTR:%.*]], i32 [[V:%.*]]) {
 ; CHECK-NEXT:    [[GEP0:%.*]] = getelementptr inbounds i32, ptr [[PTR]], i64 0
-; CHECK-NEXT:    [[VECL:%.*]] = load <2 x i32>, ptr [[GEP0]], align 4
 ; CHECK-NEXT:    [[SPLAT:%.*]] = add i32 [[V]], 0
 ; CHECK-NEXT:    [[PACK:%.*]] = insertelement <2 x i32> poison, i32 [[SPLAT]], i32 0
 ; CHECK-NEXT:    [[PACK1:%.*]] = insertelement <2 x i32> [[PACK]], i32 [[SPLAT]], i32 1
+; CHECK-NEXT:    [[VECL:%.*]] = load <2 x i32>, ptr [[GEP0]], align 4
 ; CHECK-NEXT:    [[VEC:%.*]] = mul <2 x i32> [[VECL]], [[PACK1]]
 ; CHECK-NEXT:    store <2 x i32> [[VEC]], ptr [[GEP0]], align 4
 ; CHECK-NEXT:    ret i32 0
@@ -31,6 +31,7 @@ define i32 @repeated_partial(ptr %ptr, i32 %v) #0 {
 ; CHECK-NEXT:    [[GEP0:%.*]] = getelementptr inbounds i32, ptr [[PTR]], i64 0
 ; CHECK-NEXT:    [[GEP1:%.*]] = getelementptr inbounds i32, ptr [[PTR]], i64 1
 ; CHECK-NEXT:    [[GEP3:%.*]] = getelementptr inbounds i32, ptr [[PTR]], i64 3
+; CHECK-NEXT:    [[SPLAT:%.*]] = add i32 [[V]], 0
 ; CHECK-NEXT:    [[LD0:%.*]] = load i32, ptr [[GEP0]], align 4
 ; CHECK-NEXT:    [[LD1:%.*]] = load i32, ptr [[GEP1]], align 4
 ; CHECK-NEXT:    [[LD3:%.*]] = load i32, ptr [[GEP3]], align 4
@@ -39,7 +40,6 @@ define i32 @repeated_partial(ptr %ptr, i32 %v) #0 {
 ; CHECK-NEXT:    [[PACK2:%.*]] = insertelement <4 x i32> [[PACK1]], i32 [[LD1]], i32 2
 ; CHECK-NEXT:    [[PACK3:%.*]] = insertelement <4 x i32> [[PACK2]], i32 [[LD3]], i32 3
 ; CHECK-NEXT:    [[VECL:%.*]] = load <4 x i32>, ptr [[GEP0]], align 4
-; CHECK-NEXT:    [[SPLAT:%.*]] = add i32 [[V]], 0
 ; CHECK-NEXT:    [[VEC:%.*]] = mul <4 x i32> [[VECL]], [[PACK3]]
 ; CHECK-NEXT:    store <4 x i32> [[VEC]], ptr [[GEP0]], align 4
 ; CHECK-NEXT:    ret i32 0
diff --git a/llvm/test/Transforms/SandboxVectorizer/scheduler.ll b/llvm/test/Transforms/SandboxVectorizer/scheduler.ll
index 92a78a979192b..acbec80db6b06 100644
--- a/llvm/test/Transforms/SandboxVectorizer/scheduler.ll
+++ b/llvm/test/Transforms/SandboxVectorizer/scheduler.ll
@@ -49,3 +49,28 @@ define void @check_dag_scheduler_update(ptr noalias %p, ptr noalias %p1) {
   store i32 %add21, ptr %arrayidx23
   ret void
 }
+
+; This used to generate use-before-def because of a buggy update of the
+; top-of-schedule variable.
+define <4 x float> @check_top_of_schedule(ptr %0) {
+; CHECK-LABEL: define <4 x float> @check_top_of_schedule(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[INS_1:%.*]] = insertelement <4 x float> zeroinitializer, float poison, i64 0
+; CHECK-NEXT:    [[TRUNC_1:%.*]] = fptrunc double 0.000000e+00 to float
+; CHECK-NEXT:    [[INS_2:%.*]] = insertelement <4 x float> [[INS_1]], float [[TRUNC_1]], i64 0
+; CHECK-NEXT:    [[GEP_1:%.*]] = getelementptr double, ptr [[TMP0]], i64 1
+; CHECK-NEXT:    store <2 x double> <double 0.000000e+00, double 1.000000e+00>, ptr [[GEP_1]], align 8
+; CHECK-NEXT:    ret <4 x float> [[INS_2]]
+;
+  %trunc.1 = fptrunc double 0.000000e+00 to float
+  %trunc.2 = fptrunc double 1.000000e+00 to float
+  %ins.1 = insertelement <4 x float> zeroinitializer, float poison, i64 0
+  %ins.2 = insertelement <4 x float> %ins.1, float %trunc.1, i64 0
+  %ext.1 = fpext float %trunc.1 to double
+  %gep.1  = getelementptr double, ptr %0, i64 1
+  store double %ext.1, ptr %gep.1, align 8
+  %ext.2 = fpext float %trunc.2 to double
+  %gep.2 = getelementptr double, ptr %0, i64 2
+  store double %ext.2, ptr %gep.2, align 8
+  ret <4 x float> %ins.2
+}

@llvmbot
Copy link
Member

llvmbot commented Feb 11, 2025

@llvm/pr-subscribers-llvm-transforms

Author: vporpo (vporpo)

Changes

This patch fixes the way the top-of-schedule variable gets set and updated. Before this patch it used to get updated whenever we scheduled a bundle, which is wrong, as the top-of-schedule needs to be maintained across scheduling attempts.

It should get reset only when we clear the schedule or when we destroy the current schedule and re-schedule.


Full diff: https://github.com/llvm/llvm-project/pull/126820.diff

5 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/SandboxVectorizer/Scheduler.cpp (+10-2)
  • (modified) llvm/test/Transforms/SandboxVectorizer/bottomup_basic.ll (+2-2)
  • (modified) llvm/test/Transforms/SandboxVectorizer/bottomup_seed_slice_pow2.ll (+1-1)
  • (modified) llvm/test/Transforms/SandboxVectorizer/repeated_instrs.ll (+2-2)
  • (modified) llvm/test/Transforms/SandboxVectorizer/scheduler.ll (+25)
diff --git a/llvm/lib/Transforms/Vectorize/SandboxVectorizer/Scheduler.cpp b/llvm/lib/Transforms/Vectorize/SandboxVectorizer/Scheduler.cpp
index dd24cc3d98cf8..2f7d7087ca880 100644
--- a/llvm/lib/Transforms/Vectorize/SandboxVectorizer/Scheduler.cpp
+++ b/llvm/lib/Transforms/Vectorize/SandboxVectorizer/Scheduler.cpp
@@ -230,11 +230,13 @@ bool Scheduler::trySchedule(ArrayRef<Instruction *> Instrs) {
     // top-most part of the schedule that includes the instrs in the bundle and
     // re-schedule.
     trimSchedule(Instrs);
+    ScheduleTopItOpt = std::nullopt;
     [[fallthrough]];
   case BndlSchedState::NoneScheduled: {
     // TODO: Set the window of the DAG that we are interested in.
-    // We start scheduling at the bottom instr of Instrs.
-    ScheduleTopItOpt = std::next(VecUtils::getLowest(Instrs)->getIterator());
+    if (!ScheduleTopItOpt)
+      // We start scheduling at the bottom instr of Instrs.
+      ScheduleTopItOpt = std::next(VecUtils::getLowest(Instrs)->getIterator());
 
     // TODO: For now don't cross BBs.
     if (!DAG.getInterval().empty()) {
@@ -262,6 +264,12 @@ bool Scheduler::trySchedule(ArrayRef<Instruction *> Instrs) {
 void Scheduler::dump(raw_ostream &OS) const {
   OS << "ReadyList:\n";
   ReadyList.dump(OS);
+  OS << "Top of schedule: ";
+  if (ScheduleTopItOpt)
+    OS << **ScheduleTopItOpt;
+  else
+    OS << "Empty";
+  OS << "\n";
 }
 void Scheduler::dump() const { dump(dbgs()); }
 #endif // NDEBUG
diff --git a/llvm/test/Transforms/SandboxVectorizer/bottomup_basic.ll b/llvm/test/Transforms/SandboxVectorizer/bottomup_basic.ll
index ee8592c04b62c..45b937dc1b1b6 100644
--- a/llvm/test/Transforms/SandboxVectorizer/bottomup_basic.ll
+++ b/llvm/test/Transforms/SandboxVectorizer/bottomup_basic.ll
@@ -77,7 +77,7 @@ define void @store_fadd_load(ptr %ptr) {
 ; CHECK-NEXT:    [[PTR0:%.*]] = getelementptr float, ptr [[PTR]], i32 0
 ; CHECK-NEXT:    [[VECL:%.*]] = load <2 x float>, ptr [[PTR0]], align 4
 ; CHECK-NEXT:    [[VECL1:%.*]] = load <2 x float>, ptr [[PTR0]], align 4
-; CHECK-NEXT:    [[VEC:%.*]] = fadd <2 x float> [[VECL]], [[VECL1]]
+; CHECK-NEXT:    [[VEC:%.*]] = fadd <2 x float> [[VECL1]], [[VECL]]
 ; CHECK-NEXT:    store <2 x float> [[VEC]], ptr [[PTR0]], align 4
 ; CHECK-NEXT:    ret void
 ;
@@ -247,8 +247,8 @@ define void @diamondMultiInput(ptr %ptr, ptr %ptrX) {
 ; CHECK-LABEL: define void @diamondMultiInput(
 ; CHECK-SAME: ptr [[PTR:%.*]], ptr [[PTRX:%.*]]) {
 ; CHECK-NEXT:    [[PTR0:%.*]] = getelementptr float, ptr [[PTR]], i32 0
-; CHECK-NEXT:    [[VECL:%.*]] = load <2 x float>, ptr [[PTR0]], align 4
 ; CHECK-NEXT:    [[LDX:%.*]] = load float, ptr [[PTRX]], align 4
+; CHECK-NEXT:    [[VECL:%.*]] = load <2 x float>, ptr [[PTR0]], align 4
 ; CHECK-NEXT:    [[VINS:%.*]] = insertelement <2 x float> poison, float [[LDX]], i32 0
 ; CHECK-NEXT:    [[VEXT:%.*]] = extractelement <2 x float> [[VECL]], i32 0
 ; CHECK-NEXT:    [[VINS1:%.*]] = insertelement <2 x float> [[VINS]], float [[VEXT]], i32 1
diff --git a/llvm/test/Transforms/SandboxVectorizer/bottomup_seed_slice_pow2.ll b/llvm/test/Transforms/SandboxVectorizer/bottomup_seed_slice_pow2.ll
index f1c6e3297d79c..1b189831569f5 100644
--- a/llvm/test/Transforms/SandboxVectorizer/bottomup_seed_slice_pow2.ll
+++ b/llvm/test/Transforms/SandboxVectorizer/bottomup_seed_slice_pow2.ll
@@ -7,8 +7,8 @@ define void @pow2(ptr %ptr, float %val) {
 ; POW2-SAME: ptr [[PTR:%.*]], float [[VAL:%.*]]) {
 ; POW2-NEXT:    [[PTR0:%.*]] = getelementptr float, ptr [[PTR]], i32 0
 ; POW2-NEXT:    [[PTR2:%.*]] = getelementptr float, ptr [[PTR]], i32 2
-; POW2-NEXT:    [[VECL:%.*]] = load <2 x float>, ptr [[PTR0]], align 4
 ; POW2-NEXT:    [[LD2:%.*]] = load float, ptr [[PTR2]], align 4
+; POW2-NEXT:    [[VECL:%.*]] = load <2 x float>, ptr [[PTR0]], align 4
 ; POW2-NEXT:    store <2 x float> [[VECL]], ptr [[PTR0]], align 4
 ; POW2-NEXT:    store float [[LD2]], ptr [[PTR2]], align 4
 ; POW2-NEXT:    ret void
diff --git a/llvm/test/Transforms/SandboxVectorizer/repeated_instrs.ll b/llvm/test/Transforms/SandboxVectorizer/repeated_instrs.ll
index 25d9d79154d35..add762ac2d894 100644
--- a/llvm/test/Transforms/SandboxVectorizer/repeated_instrs.ll
+++ b/llvm/test/Transforms/SandboxVectorizer/repeated_instrs.ll
@@ -5,10 +5,10 @@ define i32 @repeated_splat(ptr %ptr, i32 %v) #0 {
 ; CHECK-LABEL: define i32 @repeated_splat(
 ; CHECK-SAME: ptr [[PTR:%.*]], i32 [[V:%.*]]) {
 ; CHECK-NEXT:    [[GEP0:%.*]] = getelementptr inbounds i32, ptr [[PTR]], i64 0
-; CHECK-NEXT:    [[VECL:%.*]] = load <2 x i32>, ptr [[GEP0]], align 4
 ; CHECK-NEXT:    [[SPLAT:%.*]] = add i32 [[V]], 0
 ; CHECK-NEXT:    [[PACK:%.*]] = insertelement <2 x i32> poison, i32 [[SPLAT]], i32 0
 ; CHECK-NEXT:    [[PACK1:%.*]] = insertelement <2 x i32> [[PACK]], i32 [[SPLAT]], i32 1
+; CHECK-NEXT:    [[VECL:%.*]] = load <2 x i32>, ptr [[GEP0]], align 4
 ; CHECK-NEXT:    [[VEC:%.*]] = mul <2 x i32> [[VECL]], [[PACK1]]
 ; CHECK-NEXT:    store <2 x i32> [[VEC]], ptr [[GEP0]], align 4
 ; CHECK-NEXT:    ret i32 0
@@ -31,6 +31,7 @@ define i32 @repeated_partial(ptr %ptr, i32 %v) #0 {
 ; CHECK-NEXT:    [[GEP0:%.*]] = getelementptr inbounds i32, ptr [[PTR]], i64 0
 ; CHECK-NEXT:    [[GEP1:%.*]] = getelementptr inbounds i32, ptr [[PTR]], i64 1
 ; CHECK-NEXT:    [[GEP3:%.*]] = getelementptr inbounds i32, ptr [[PTR]], i64 3
+; CHECK-NEXT:    [[SPLAT:%.*]] = add i32 [[V]], 0
 ; CHECK-NEXT:    [[LD0:%.*]] = load i32, ptr [[GEP0]], align 4
 ; CHECK-NEXT:    [[LD1:%.*]] = load i32, ptr [[GEP1]], align 4
 ; CHECK-NEXT:    [[LD3:%.*]] = load i32, ptr [[GEP3]], align 4
@@ -39,7 +40,6 @@ define i32 @repeated_partial(ptr %ptr, i32 %v) #0 {
 ; CHECK-NEXT:    [[PACK2:%.*]] = insertelement <4 x i32> [[PACK1]], i32 [[LD1]], i32 2
 ; CHECK-NEXT:    [[PACK3:%.*]] = insertelement <4 x i32> [[PACK2]], i32 [[LD3]], i32 3
 ; CHECK-NEXT:    [[VECL:%.*]] = load <4 x i32>, ptr [[GEP0]], align 4
-; CHECK-NEXT:    [[SPLAT:%.*]] = add i32 [[V]], 0
 ; CHECK-NEXT:    [[VEC:%.*]] = mul <4 x i32> [[VECL]], [[PACK3]]
 ; CHECK-NEXT:    store <4 x i32> [[VEC]], ptr [[GEP0]], align 4
 ; CHECK-NEXT:    ret i32 0
diff --git a/llvm/test/Transforms/SandboxVectorizer/scheduler.ll b/llvm/test/Transforms/SandboxVectorizer/scheduler.ll
index 92a78a979192b..acbec80db6b06 100644
--- a/llvm/test/Transforms/SandboxVectorizer/scheduler.ll
+++ b/llvm/test/Transforms/SandboxVectorizer/scheduler.ll
@@ -49,3 +49,28 @@ define void @check_dag_scheduler_update(ptr noalias %p, ptr noalias %p1) {
   store i32 %add21, ptr %arrayidx23
   ret void
 }
+
+; This used to generate use-before-def because of a buggy update of the
+; top-of-schedule variable.
+define <4 x float> @check_top_of_schedule(ptr %0) {
+; CHECK-LABEL: define <4 x float> @check_top_of_schedule(
+; CHECK-SAME: ptr [[TMP0:%.*]]) {
+; CHECK-NEXT:    [[INS_1:%.*]] = insertelement <4 x float> zeroinitializer, float poison, i64 0
+; CHECK-NEXT:    [[TRUNC_1:%.*]] = fptrunc double 0.000000e+00 to float
+; CHECK-NEXT:    [[INS_2:%.*]] = insertelement <4 x float> [[INS_1]], float [[TRUNC_1]], i64 0
+; CHECK-NEXT:    [[GEP_1:%.*]] = getelementptr double, ptr [[TMP0]], i64 1
+; CHECK-NEXT:    store <2 x double> <double 0.000000e+00, double 1.000000e+00>, ptr [[GEP_1]], align 8
+; CHECK-NEXT:    ret <4 x float> [[INS_2]]
+;
+  %trunc.1 = fptrunc double 0.000000e+00 to float
+  %trunc.2 = fptrunc double 1.000000e+00 to float
+  %ins.1 = insertelement <4 x float> zeroinitializer, float poison, i64 0
+  %ins.2 = insertelement <4 x float> %ins.1, float %trunc.1, i64 0
+  %ext.1 = fpext float %trunc.1 to double
+  %gep.1  = getelementptr double, ptr %0, i64 1
+  store double %ext.1, ptr %gep.1, align 8
+  %ext.2 = fpext float %trunc.2 to double
+  %gep.2 = getelementptr double, ptr %0, i64 2
+  store double %ext.2, ptr %gep.2, align 8
+  ret <4 x float> %ins.2
+}

@vporpo vporpo merged commit 6d7a84d into llvm:main Feb 12, 2025
11 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Feb 12, 2025

LLVM Buildbot has detected a new failure on builder openmp-offload-amdgpu-runtime running on omp-vega20-0 while building llvm at step 7 "Add check check-offload".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/30/builds/15678

Here is the relevant piece of the build log for the reference
Step 7 (Add check check-offload) failure: test (failure)
...
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug53727.cpp (999 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49779.cpp (1000 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/test_libc.cpp (1001 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/wtime.c (1002 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/std_complex_arithmetic.cpp (1003 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu :: offloading/bug49021.cpp (1004 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/complex_reduction.cpp (1005 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49021.cpp (1006 of 1008)
PASS: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/std_complex_arithmetic.cpp (1007 of 1008)
TIMEOUT: libomptarget :: amdgcn-amd-amdhsa :: offloading/ctor_dtor.cpp (1008 of 1008)
******************** TEST 'libomptarget :: amdgcn-amd-amdhsa :: offloading/ctor_dtor.cpp' FAILED ********************
Exit Code: -9
Timeout: Reached timeout of 100 seconds

Command Output (stdout):
--
# RUN: at line 1
/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ -fopenmp    -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib  -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/ctor_dtor.cpp -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/ctor_dtor.cpp.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a && /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/ctor_dtor.cpp.tmp | /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/ctor_dtor.cpp
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/clang++ -fopenmp -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test -I /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -L /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -nogpulib -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -Wl,-rpath,/home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib -fopenmp-targets=amdgcn-amd-amdhsa /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/ctor_dtor.cpp -o /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/ctor_dtor.cpp.tmp /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./lib/libomptarget.devicertl.a
# note: command had no output on stdout or stderr
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/runtimes/runtimes-bins/offload/test/amdgcn-amd-amdhsa/offloading/Output/ctor_dtor.cpp.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -9
# error: command reached timeout: True
# executed command: /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.build/./bin/FileCheck /home/ompworker/bbot/openmp-offload-amdgpu-runtime/llvm.src/offload/test/offloading/ctor_dtor.cpp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -9
# error: command reached timeout: True

--

********************
Slowest Tests:
--------------------------------------------------------------------------
100.05s: libomptarget :: amdgcn-amd-amdhsa :: offloading/ctor_dtor.cpp
15.54s: libomptarget :: amdgcn-amd-amdhsa :: offloading/bug49021.cpp
12.43s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_target_teams_reduction_min.cpp
12.22s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_target_teams_reduction_max.cpp
10.64s: libomptarget :: amdgcn-amd-amdhsa :: offloading/complex_reduction.cpp
9.33s: libomptarget :: amdgcn-amd-amdhsa :: jit/empty_kernel_lvl2.c
9.10s: libomptarget :: x86_64-unknown-linux-gnu :: offloading/bug49021.cpp
7.43s: libomptarget :: amdgcn-amd-amdhsa :: offloading/ompx_saxpy_mixed.c
7.34s: libomptarget :: amdgcn-amd-amdhsa :: offloading/barrier_fence.c
7.16s: libomptarget :: x86_64-unknown-linux-gnu :: offloading/std_complex_arithmetic.cpp
7.07s: libomptarget :: x86_64-unknown-linux-gnu :: offloading/complex_reduction.cpp
6.55s: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/bug49021.cpp
6.02s: libomptarget :: amdgcn-amd-amdhsa :: offloading/parallel_target_teams_reduction.cpp
5.42s: libomptarget :: amdgcn-amd-amdhsa :: sanitizer/kernel_trap_many.c
5.10s: libomptarget :: x86_64-unknown-linux-gnu-LTO :: offloading/std_complex_arithmetic.cpp

flovent pushed a commit to flovent/llvm-project that referenced this pull request Feb 13, 2025
This patch fixes the way the top-of-schedule variable gets set and
updated. Before this patch it used to get updated whenever we scheduled
a bundle, which is wrong, as the top-of-schedule needs to be maintained
across scheduling attempts.

It should get reset only when we clear the schedule or when we destroy
the current schedule and re-schedule.
joaosaffran pushed a commit to joaosaffran/llvm-project that referenced this pull request Feb 14, 2025
This patch fixes the way the top-of-schedule variable gets set and
updated. Before this patch it used to get updated whenever we scheduled
a bundle, which is wrong, as the top-of-schedule needs to be maintained
across scheduling attempts.

It should get reset only when we clear the schedule or when we destroy
the current schedule and re-schedule.
sivan-shani pushed a commit to sivan-shani/llvm-project that referenced this pull request Feb 24, 2025
This patch fixes the way the top-of-schedule variable gets set and
updated. Before this patch it used to get updated whenever we scheduled
a bundle, which is wrong, as the top-of-schedule needs to be maintained
across scheduling attempts.

It should get reset only when we clear the schedule or when we destroy
the current schedule and re-schedule.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants