Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 62 additions & 9 deletions llvm/lib/CodeGen/MachineTraceMetrics.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/TargetInstrInfo.h"
#include "llvm/CodeGen/TargetRegisterInfo.h"
#include "llvm/CodeGen/TargetSchedule.h"
#include "llvm/CodeGen/TargetSubtargetInfo.h"
Expand Down Expand Up @@ -761,6 +762,59 @@ static void updatePhysDepsDownwards(const MachineInstr *UseMI,
}
}

/// Estimates the number of cycles elapsed between DefMI and UseMI, DefMI
/// inclusive and UseMI exclusive, if they're in the same MBB. Returns
/// std::nullopt if they're in different MBBs, and 0 if UseMI is null.
static std::optional<unsigned>
estimateDefUseCycles(const TargetSchedModel &Sched, const MachineInstr *DefMI,
const MachineInstr *UseMI) {
if (!UseMI)
return 0;
if (DefMI->getParent() != UseMI->getParent())
return std::nullopt;

const auto DefIt = DefMI->getIterator();
const auto UseIt = UseMI->getIterator();

unsigned NumMicroOps = 0;
for (auto It = DefIt; It != UseIt; ++It) {
// In cases where the UseMI is a PHI at the beginning of the MBB, compute
// MicroOps until the end of the MBB.
if (It.isEnd())
break;

NumMicroOps += Sched.getNumMicroOps(&*It);
}
return NumMicroOps / Sched.getIssueWidth();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would be useful to use defaultDefLatency for the instructions we are iterating over here in the case that some of the instructions we are iterating over depend on eachother. Imagine a scenario:

defmi = ...
a = ...
b = use a
usemi = ...

In this case, what if we checked defaultDefLatency of a and used it to understand the number of cycles elapsed between a and b? For example, if a has a default latency of 10, then b can't really start in the next NumMicroOps / IssueWidth cycles, since it has to wait 10 additional cycles.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'm having trouble understanding your example. Here's what I think I understand:

c = ...
a = ...
b = use a
d = use c

Here, if c has a default def latency of N cycles, a has that of M cycles, b can start after M - 1 cycles, and d can start after N - 3 cycles, assuming issue-width = num-micro-ops = 1. What do you mean by "instructions that depends on each other"? How can one instruction depend on another, which in turn depends on the first? Wouldn't this break basic dominance criteria? Also, if I understand correctly, I think we're in SSA form at this point, so we don't have to worry about re-definitions.

Copy link
Contributor

@michaelmaitland michaelmaitland Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a loop that iterates over instructions between DefMI and UseMI:

for (auto It = DefIt; It != UseIt; ++It) {
    // In cases where the UseMI is a PHI at the beginning of the MBB, compute
    // MicroOps until the end of the MBB.
    if (It.isEnd())
      break;

    NumMicroOps += Sched.getNumMicroOps(&*It);
  }

I suggest the following scenario:

defmi = ...
a = ...
b = use a
usemi = ...

In this case, we will be looping over instructions a and b and adding their number of micro ops to calculate the number of cycles elapsed between defmi and usemi.

Let's take the assumption that a has default def latency of N cycles and b has default latency of M cycles.

What do you mean by "instructions that depends on each other"?

In my scenario, b uses the result of a. It cannot start until that result is ready (an extra N cycles). If we want to make it concrete, we could imagine that it looks like this:

a = add 3, 2
b = sub a, 2

We cannot start the subtraction until a is finished calculating. This is what I am calling a dependency. We can assume for sake of simplicity here that a and b are independent from defmi and usemi.

In this scenario, I am suggesting that if the default latency of a is larger than the number of micro-ops, then we must wait at least the default latency of a before starting b. I suggest that we can incorporate this into the estimation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the explanation! However, I think we're missing the fact that CPUs are usually pipelined, and the defmi-usemi dependency will be in one pipeline, while the a-b dependency will be in another pipeline. Hence, I think the defmi-usemi dependency should be independent of the a-b dependency. When the code is called with DefMI = a, and UseMI = b, it will return the correct answer for that dependency.

Now, we haven't actually modeled any pipelines, but do you think this is feasible?

Copy link
Contributor

@michaelmaitland michaelmaitland Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good point that if there are multiple pipelines and the defmi-usemi instructions goes down one and the a-b goes down another, then what you have is correct. There definitely is also a scenario where these dependency pairs go down the same pipeline and in that case what I am suggesting is probably the better model.

Unfortunately we don't have any pipeline information because we don't have the scheduler model, so we don't actually know what we should do.

One argument is to keep what we have no because its simple and less expensive to compute.
Another argument is to pick the "more common" approach. I'd prefer not to make a blanket statement and say that "it is more likely for independent instructions to go down different pipelines", although I wouldn't be surprised if this was the case.

For these reasons, I am content with the approach you are proposing. Happy to see if anyone else has thoughts on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have missed something but the use of microops is worrying me - many arch don't guarantee that uop and latency are a close match (e.g. alderlake divpd ymm uops=1 latency=15). And then dividing by issuewidth makes it feel more like a throughput estimate than latency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Will adderlake divpd not dispatch in one cycle though? We're subtracting DefaultDefLatency by number of cycles elapsed between the DefMI and UseMI. If there's a adderlake divpd between the DefMI and UseMI, should we be subtracting 15 cycles for it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RKSimon do you still have concerns about use of micro-ops and issue width here?

I think it makes sense to use issue width in determining "the number of cycles between two instructions" in the calculation here. For example:

a = def
b = ...
c = use

If the issue width is 1, then a is issued in once cycle, and b the next. But if the issue width is 2 then a and b are issued in the same cycle. In the former case we should estimate 2 cycles between [a, c) and in the latter estimate 1 cycle.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this makes assumptions that the ops can be issued on any pipe? I'm still not convinced your approach makes sense tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't have the scheduling model, we could either assume that all ops are issued on the same pipeline, or that independent ops are always issued on different pipelines. I think the latter case is more common. The patch is a first-order improvement over DefaultDefLatency: it is by no means an accurate representation of how the machine functions, but wouldn't you agree that the patch is an improvement over DefaultDefLatency?

Of course, if you have a better idea concerning how to improve to latency computation in the fallback case, please do suggest.

}

/// Wraps Sched.computeOperandLatency, accounting for the case when
/// InstrSchedModel and InstrItineraries are not available: in this case,
/// Sched.computeOperandLatency returns DefaultDefLatency, which is a very rough
/// approximate; to improve this approximate, offset it by the approximate
/// cycles elapsed from DefMI to UseMI (since the MIs could be re-ordered by the
/// scheduler, and we don't have this information, this cannot be known
/// exactly). When scheduling information is available,
/// Sched.computeOperandLatency returns a much better estimate (especially if
/// UseMI is non-null), so we just return that.
static unsigned computeOperandLatency(const TargetSchedModel &Sched,
const MachineInstr *DefMI,
unsigned DefOperIdx,
const MachineInstr *UseMI,
unsigned UseOperIdx) {
assert(DefMI && "Non-null DefMI expected");
if (!Sched.hasInstrSchedModel() && !Sched.hasInstrItineraries()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of placing this in this wrapper, could the default implementation of computeOperandLatency handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially tried this in #74088, but @jayfoad said that I should consider doing it in callers. Perhaps it's time to revisit that?

unsigned DefaultDefLatency = Sched.getInstrInfo()->defaultDefLatency(
*Sched.getMCSchedModel(), *DefMI);
std::optional<unsigned> DefUseCycles =
estimateDefUseCycles(Sched, DefMI, UseMI);
if (!DefUseCycles || DefaultDefLatency <= DefUseCycles)
return 0;
return DefaultDefLatency - *DefUseCycles;
}
return Sched.computeOperandLatency(DefMI, DefOperIdx, UseMI, UseOperIdx);
}

/// The length of the critical path through a trace is the maximum of two path
/// lengths:
///
Expand Down Expand Up @@ -813,8 +867,8 @@ updateDepth(MachineTraceMetrics::TraceBlockInfo &TBI, const MachineInstr &UseMI,
unsigned DepCycle = Cycles.lookup(Dep.DefMI).Depth;
// Add latency if DefMI is a real instruction. Transients get latency 0.
if (!Dep.DefMI->isTransient())
DepCycle += MTM.SchedModel
.computeOperandLatency(Dep.DefMI, Dep.DefOp, &UseMI, Dep.UseOp);
DepCycle += computeOperandLatency(MTM.SchedModel, Dep.DefMI, Dep.DefOp,
&UseMI, Dep.UseOp);
Cycle = std::max(Cycle, DepCycle);
}
// Remember the instruction depth.
Expand Down Expand Up @@ -929,8 +983,8 @@ static unsigned updatePhysDepsUpwards(const MachineInstr &MI, unsigned Height,
if (!MI.isTransient()) {
// We may not know the UseMI of this dependency, if it came from the
// live-in list. SchedModel can handle a NULL UseMI.
DepHeight += SchedModel.computeOperandLatency(&MI, MO.getOperandNo(),
I->MI, I->Op);
DepHeight += computeOperandLatency(SchedModel, &MI, MO.getOperandNo(),
I->MI, I->Op);
}
Height = std::max(Height, DepHeight);
// This regunit is dead above MI.
Expand Down Expand Up @@ -963,10 +1017,9 @@ static bool pushDepHeight(const DataDep &Dep, const MachineInstr &UseMI,
unsigned UseHeight, MIHeightMap &Heights,
const TargetSchedModel &SchedModel,
const TargetInstrInfo *TII) {
// Adjust height by Dep.DefMI latency.
if (!Dep.DefMI->isTransient())
UseHeight += SchedModel.computeOperandLatency(Dep.DefMI, Dep.DefOp, &UseMI,
Dep.UseOp);
UseHeight += computeOperandLatency(SchedModel, Dep.DefMI, Dep.DefOp, &UseMI,
Dep.UseOp);

// Update Heights[DefMI] to be the maximum height seen.
MIHeightMap::iterator I;
Expand Down Expand Up @@ -1192,8 +1245,8 @@ MachineTraceMetrics::Trace::getPHIDepth(const MachineInstr &PHI) const {
unsigned DepCycle = getInstrCycles(*Dep.DefMI).Depth;
// Add latency if DefMI is a real instruction. Transients get latency 0.
if (!Dep.DefMI->isTransient())
DepCycle += TE.MTM.SchedModel.computeOperandLatency(Dep.DefMI, Dep.DefOp,
&PHI, Dep.UseOp);
DepCycle += computeOperandLatency(TE.MTM.SchedModel, Dep.DefMI, Dep.DefOp,
&PHI, Dep.UseOp);
return DepCycle;
}

Expand Down
44 changes: 22 additions & 22 deletions llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
Original file line number Diff line number Diff line change
Expand Up @@ -94,15 +94,15 @@ define i7 @bitreverse_i7(i7 %x) {
; RV32-NEXT: or a1, a1, a2
; RV32-NEXT: slli a2, a0, 2
; RV32-NEXT: andi a2, a2, 16
; RV32-NEXT: or a1, a1, a2
; RV32-NEXT: andi a0, a0, 127
; RV32-NEXT: andi a3, a0, 8
; RV32-NEXT: or a2, a2, a3
; RV32-NEXT: andi a2, a0, 8
; RV32-NEXT: or a1, a1, a2
; RV32-NEXT: srli a2, a0, 2
; RV32-NEXT: andi a2, a2, 4
; RV32-NEXT: srli a3, a0, 4
; RV32-NEXT: andi a3, a3, 2
; RV32-NEXT: or a2, a2, a3
; RV32-NEXT: or a1, a1, a2
; RV32-NEXT: srli a2, a0, 4
; RV32-NEXT: andi a2, a2, 2
; RV32-NEXT: or a1, a1, a2
; RV32-NEXT: srli a0, a0, 6
; RV32-NEXT: or a0, a1, a0
Expand All @@ -117,15 +117,15 @@ define i7 @bitreverse_i7(i7 %x) {
; RV64-NEXT: or a1, a1, a2
; RV64-NEXT: slli a2, a0, 2
; RV64-NEXT: andi a2, a2, 16
; RV64-NEXT: or a1, a1, a2
; RV64-NEXT: andi a0, a0, 127
; RV64-NEXT: andi a3, a0, 8
; RV64-NEXT: or a2, a2, a3
; RV64-NEXT: andi a2, a0, 8
; RV64-NEXT: or a1, a1, a2
; RV64-NEXT: srliw a2, a0, 2
; RV64-NEXT: andi a2, a2, 4
; RV64-NEXT: srliw a3, a0, 4
; RV64-NEXT: andi a3, a3, 2
; RV64-NEXT: or a2, a2, a3
; RV64-NEXT: or a1, a1, a2
; RV64-NEXT: srliw a2, a0, 4
; RV64-NEXT: andi a2, a2, 2
; RV64-NEXT: or a1, a1, a2
; RV64-NEXT: srliw a0, a0, 6
; RV64-NEXT: or a0, a1, a0
Expand All @@ -145,24 +145,24 @@ define i24 @bitreverse_i24(i24 %x) {
; RV32-NEXT: or a0, a0, a1
; RV32-NEXT: lui a1, 1048335
; RV32-NEXT: addi a1, a1, 240
; RV32-NEXT: and a3, a1, a2
; RV32-NEXT: and a3, a0, a3
; RV32-NEXT: and a3, a0, a1
; RV32-NEXT: and a3, a3, a2
; RV32-NEXT: srli a3, a3, 4
; RV32-NEXT: slli a0, a0, 4
; RV32-NEXT: and a0, a0, a1
; RV32-NEXT: or a0, a3, a0
; RV32-NEXT: lui a1, 1047757
; RV32-NEXT: addi a1, a1, -820
; RV32-NEXT: and a3, a1, a2
; RV32-NEXT: and a3, a0, a3
; RV32-NEXT: and a3, a0, a1
; RV32-NEXT: and a3, a3, a2
; RV32-NEXT: srli a3, a3, 2
; RV32-NEXT: slli a0, a0, 2
; RV32-NEXT: and a0, a0, a1
; RV32-NEXT: or a0, a3, a0
; RV32-NEXT: lui a1, 1047211
; RV32-NEXT: addi a1, a1, -1366
; RV32-NEXT: and a2, a1, a2
; RV32-NEXT: and a2, a0, a2
; RV32-NEXT: and a3, a0, a1
; RV32-NEXT: and a2, a3, a2
; RV32-NEXT: srli a2, a2, 1
; RV32-NEXT: slli a0, a0, 1
; RV32-NEXT: and a0, a0, a1
Expand All @@ -179,24 +179,24 @@ define i24 @bitreverse_i24(i24 %x) {
; RV64-NEXT: or a0, a0, a1
; RV64-NEXT: lui a1, 1048335
; RV64-NEXT: addi a1, a1, 240
; RV64-NEXT: and a3, a1, a2
; RV64-NEXT: and a3, a0, a3
; RV64-NEXT: and a3, a0, a1
; RV64-NEXT: and a3, a3, a2
; RV64-NEXT: srliw a3, a3, 4
; RV64-NEXT: slli a0, a0, 4
; RV64-NEXT: and a0, a0, a1
; RV64-NEXT: or a0, a3, a0
; RV64-NEXT: lui a1, 1047757
; RV64-NEXT: addi a1, a1, -820
; RV64-NEXT: and a3, a1, a2
; RV64-NEXT: and a3, a0, a3
; RV64-NEXT: and a3, a0, a1
; RV64-NEXT: and a3, a3, a2
; RV64-NEXT: srliw a3, a3, 2
; RV64-NEXT: slli a0, a0, 2
; RV64-NEXT: and a0, a0, a1
; RV64-NEXT: or a0, a3, a0
; RV64-NEXT: lui a1, 1047211
; RV64-NEXT: addiw a1, a1, -1366
; RV64-NEXT: and a2, a1, a2
; RV64-NEXT: and a2, a0, a2
; RV64-NEXT: and a3, a0, a1
; RV64-NEXT: and a2, a3, a2
; RV64-NEXT: srliw a2, a2, 1
; RV64-NEXT: slliw a0, a0, 1
; RV64-NEXT: and a0, a0, a1
Expand Down
8 changes: 4 additions & 4 deletions llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll
Original file line number Diff line number Diff line change
Expand Up @@ -1252,8 +1252,8 @@ define iXLen @va4_va_copy(i32 %argno, ...) nounwind {
; RV32-NEXT: sw a3, 16(sp)
; RV32-NEXT: lw a2, 0(a2)
; RV32-NEXT: add a0, a0, s1
; RV32-NEXT: add a1, a1, a2
; RV32-NEXT: add a0, a0, a1
; RV32-NEXT: add a0, a0, a2
; RV32-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
; RV32-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
; RV32-NEXT: lw s1, 20(sp) # 4-byte Folded Reload
Expand Down Expand Up @@ -1308,8 +1308,8 @@ define iXLen @va4_va_copy(i32 %argno, ...) nounwind {
; RV64-NEXT: sd a3, 16(sp)
; RV64-NEXT: ld a2, 0(a2)
; RV64-NEXT: add a0, a0, s1
; RV64-NEXT: add a1, a1, a2
; RV64-NEXT: add a0, a0, a1
; RV64-NEXT: add a0, a0, a2
; RV64-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
; RV64-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
; RV64-NEXT: ld s1, 24(sp) # 8-byte Folded Reload
Expand Down Expand Up @@ -1363,8 +1363,8 @@ define iXLen @va4_va_copy(i32 %argno, ...) nounwind {
; RV32-WITHFP-NEXT: sw a3, -20(s0)
; RV32-WITHFP-NEXT: lw a2, 0(a2)
; RV32-WITHFP-NEXT: add a0, a0, s2
; RV32-WITHFP-NEXT: add a1, a1, a2
; RV32-WITHFP-NEXT: add a0, a0, a1
; RV32-WITHFP-NEXT: add a0, a0, a2
; RV32-WITHFP-NEXT: lw ra, 28(sp) # 4-byte Folded Reload
; RV32-WITHFP-NEXT: lw s0, 24(sp) # 4-byte Folded Reload
; RV32-WITHFP-NEXT: lw s1, 20(sp) # 4-byte Folded Reload
Expand Down Expand Up @@ -1422,8 +1422,8 @@ define iXLen @va4_va_copy(i32 %argno, ...) nounwind {
; RV64-WITHFP-NEXT: sd a3, -40(s0)
; RV64-WITHFP-NEXT: ld a2, 0(a2)
; RV64-WITHFP-NEXT: add a0, a0, s2
; RV64-WITHFP-NEXT: add a1, a1, a2
; RV64-WITHFP-NEXT: add a0, a0, a1
; RV64-WITHFP-NEXT: add a0, a0, a2
; RV64-WITHFP-NEXT: ld ra, 40(sp) # 8-byte Folded Reload
; RV64-WITHFP-NEXT: ld s0, 32(sp) # 8-byte Folded Reload
; RV64-WITHFP-NEXT: ld s1, 24(sp) # 8-byte Folded Reload
Expand Down
16 changes: 8 additions & 8 deletions llvm/test/CodeGen/RISCV/abds-neg.ll
Original file line number Diff line number Diff line change
Expand Up @@ -697,8 +697,8 @@ define i128 @abd_ext_i128(i128 %a, i128 %b) nounwind {
; RV32I-NEXT: snez a3, a3
; RV32I-NEXT: neg a4, a6
; RV32I-NEXT: sltu a5, a4, a3
; RV32I-NEXT: neg a6, a7
; RV32I-NEXT: sub a5, a6, a5
; RV32I-NEXT: add a5, a7, a5
; RV32I-NEXT: neg a5, a5
; RV32I-NEXT: snez a6, a1
; RV32I-NEXT: add a2, a2, a6
; RV32I-NEXT: neg a2, a2
Expand Down Expand Up @@ -816,8 +816,8 @@ define i128 @abd_ext_i128(i128 %a, i128 %b) nounwind {
; RV32ZBB-NEXT: snez a3, a3
; RV32ZBB-NEXT: neg a4, a6
; RV32ZBB-NEXT: sltu a5, a4, a3
; RV32ZBB-NEXT: neg a6, a7
; RV32ZBB-NEXT: sub a5, a6, a5
; RV32ZBB-NEXT: add a5, a7, a5
; RV32ZBB-NEXT: neg a5, a5
; RV32ZBB-NEXT: snez a6, a1
; RV32ZBB-NEXT: add a2, a2, a6
; RV32ZBB-NEXT: neg a2, a2
Expand Down Expand Up @@ -944,8 +944,8 @@ define i128 @abd_ext_i128_undef(i128 %a, i128 %b) nounwind {
; RV32I-NEXT: snez a3, a3
; RV32I-NEXT: neg a4, a6
; RV32I-NEXT: sltu a5, a4, a3
; RV32I-NEXT: neg a6, a7
; RV32I-NEXT: sub a5, a6, a5
; RV32I-NEXT: add a5, a7, a5
; RV32I-NEXT: neg a5, a5
; RV32I-NEXT: snez a6, a1
; RV32I-NEXT: add a2, a2, a6
; RV32I-NEXT: neg a2, a2
Expand Down Expand Up @@ -1063,8 +1063,8 @@ define i128 @abd_ext_i128_undef(i128 %a, i128 %b) nounwind {
; RV32ZBB-NEXT: snez a3, a3
; RV32ZBB-NEXT: neg a4, a6
; RV32ZBB-NEXT: sltu a5, a4, a3
; RV32ZBB-NEXT: neg a6, a7
; RV32ZBB-NEXT: sub a5, a6, a5
; RV32ZBB-NEXT: add a5, a7, a5
; RV32ZBB-NEXT: neg a5, a5
; RV32ZBB-NEXT: snez a6, a1
; RV32ZBB-NEXT: add a2, a2, a6
; RV32ZBB-NEXT: neg a2, a2
Expand Down
8 changes: 4 additions & 4 deletions llvm/test/CodeGen/RISCV/abds.ll
Original file line number Diff line number Diff line change
Expand Up @@ -2076,8 +2076,8 @@ define i128 @abd_subnsw_i128(i128 %a, i128 %b) nounwind {
; RV32I-NEXT: sltu t0, a7, a5
; RV32I-NEXT: snez a2, a2
; RV32I-NEXT: add a1, a1, a2
; RV32I-NEXT: add a1, a1, t0
; RV32I-NEXT: neg a1, a1
; RV32I-NEXT: sub a1, a1, t0
; RV32I-NEXT: sub a2, a7, a5
; RV32I-NEXT: neg a3, a3
; RV32I-NEXT: add a4, a4, a6
Expand Down Expand Up @@ -2139,8 +2139,8 @@ define i128 @abd_subnsw_i128(i128 %a, i128 %b) nounwind {
; RV32ZBB-NEXT: sltu t0, a7, a5
; RV32ZBB-NEXT: snez a2, a2
; RV32ZBB-NEXT: add a1, a1, a2
; RV32ZBB-NEXT: add a1, a1, t0
; RV32ZBB-NEXT: neg a1, a1
; RV32ZBB-NEXT: sub a1, a1, t0
; RV32ZBB-NEXT: sub a2, a7, a5
; RV32ZBB-NEXT: neg a3, a3
; RV32ZBB-NEXT: add a4, a4, a6
Expand Down Expand Up @@ -2207,8 +2207,8 @@ define i128 @abd_subnsw_i128_undef(i128 %a, i128 %b) nounwind {
; RV32I-NEXT: sltu t0, a7, a5
; RV32I-NEXT: snez a2, a2
; RV32I-NEXT: add a1, a1, a2
; RV32I-NEXT: add a1, a1, t0
; RV32I-NEXT: neg a1, a1
; RV32I-NEXT: sub a1, a1, t0
; RV32I-NEXT: sub a2, a7, a5
; RV32I-NEXT: neg a3, a3
; RV32I-NEXT: add a4, a4, a6
Expand Down Expand Up @@ -2270,8 +2270,8 @@ define i128 @abd_subnsw_i128_undef(i128 %a, i128 %b) nounwind {
; RV32ZBB-NEXT: sltu t0, a7, a5
; RV32ZBB-NEXT: snez a2, a2
; RV32ZBB-NEXT: add a1, a1, a2
; RV32ZBB-NEXT: add a1, a1, t0
; RV32ZBB-NEXT: neg a1, a1
; RV32ZBB-NEXT: sub a1, a1, t0
; RV32ZBB-NEXT: sub a2, a7, a5
; RV32ZBB-NEXT: neg a3, a3
; RV32ZBB-NEXT: add a4, a4, a6
Expand Down
8 changes: 4 additions & 4 deletions llvm/test/CodeGen/RISCV/abdu-neg.ll
Original file line number Diff line number Diff line change
Expand Up @@ -696,8 +696,8 @@ define i128 @abd_ext_i128(i128 %a, i128 %b) nounwind {
; RV32I-NEXT: sub a1, a1, a2
; RV32I-NEXT: snez a2, t3
; RV32I-NEXT: add a1, a1, a2
; RV32I-NEXT: add a1, a1, t5
; RV32I-NEXT: neg a1, a1
; RV32I-NEXT: sub a1, a1, t5
; RV32I-NEXT: sub a2, t4, t1
; RV32I-NEXT: add a3, a3, a7
; RV32I-NEXT: neg a3, a3
Expand Down Expand Up @@ -808,8 +808,8 @@ define i128 @abd_ext_i128(i128 %a, i128 %b) nounwind {
; RV32ZBB-NEXT: sub a1, a1, a2
; RV32ZBB-NEXT: snez a2, t3
; RV32ZBB-NEXT: add a1, a1, a2
; RV32ZBB-NEXT: add a1, a1, t5
; RV32ZBB-NEXT: neg a1, a1
; RV32ZBB-NEXT: sub a1, a1, t5
; RV32ZBB-NEXT: sub a2, t4, t1
; RV32ZBB-NEXT: add a3, a3, a7
; RV32ZBB-NEXT: neg a3, a3
Expand Down Expand Up @@ -929,8 +929,8 @@ define i128 @abd_ext_i128_undef(i128 %a, i128 %b) nounwind {
; RV32I-NEXT: sub a1, a1, a2
; RV32I-NEXT: snez a2, t3
; RV32I-NEXT: add a1, a1, a2
; RV32I-NEXT: add a1, a1, t5
; RV32I-NEXT: neg a1, a1
; RV32I-NEXT: sub a1, a1, t5
; RV32I-NEXT: sub a2, t4, t1
; RV32I-NEXT: add a3, a3, a7
; RV32I-NEXT: neg a3, a3
Expand Down Expand Up @@ -1041,8 +1041,8 @@ define i128 @abd_ext_i128_undef(i128 %a, i128 %b) nounwind {
; RV32ZBB-NEXT: sub a1, a1, a2
; RV32ZBB-NEXT: snez a2, t3
; RV32ZBB-NEXT: add a1, a1, a2
; RV32ZBB-NEXT: add a1, a1, t5
; RV32ZBB-NEXT: neg a1, a1
; RV32ZBB-NEXT: sub a1, a1, t5
; RV32ZBB-NEXT: sub a2, t4, t1
; RV32ZBB-NEXT: add a3, a3, a7
; RV32ZBB-NEXT: neg a3, a3
Expand Down
4 changes: 2 additions & 2 deletions llvm/test/CodeGen/RISCV/addcarry.ll
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ define i64 @addcarry(i64 %x, i64 %y) nounwind {
; RISCV32-NEXT: sltu a7, a4, a6
; RISCV32-NEXT: sltu a5, a6, a5
; RISCV32-NEXT: mulhu a6, a0, a3
; RISCV32-NEXT: mulhu t0, a1, a2
; RISCV32-NEXT: add a6, a6, t0
; RISCV32-NEXT: add a5, a6, a5
; RISCV32-NEXT: mulhu a6, a1, a2
; RISCV32-NEXT: add a5, a5, a6
; RISCV32-NEXT: add a5, a5, a7
; RISCV32-NEXT: mul a6, a1, a3
; RISCV32-NEXT: add a5, a5, a6
Expand Down
Loading