-
Notifications
You must be signed in to change notification settings - Fork 15k
[RISCV] Expand divisions larger than 64 bits on RV32. #163688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
+2,307
−6
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The __(u)divti3, __(u)modti3 functions don't exist in libgcc for RV32.
|
@llvm/pr-subscribers-backend-risc-v Author: Craig Topper (topperc) ChangesThe __(u)divti3, __(u)modti3 functions don't exist in libgcc for RV32. Patch is 77.29 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/163688.diff 2 Files Affected:
diff --git a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
index 7123a2d706787..eb875583ffca4 100644
--- a/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelLowering.cpp
@@ -1672,6 +1672,8 @@ RISCVTargetLowering::RISCVTargetLowering(const TargetMachine &TM,
if (Subtarget.useRVVForFixedLengthVectors())
setTargetDAGCombine(ISD::BITCAST);
+ setMaxDivRemBitWidthSupported(Subtarget.is64Bit() ? 128 : 64);
+
// Disable strict node mutation.
IsStrictFPEnabled = true;
EnableExtLdPromotion = true;
diff --git a/llvm/test/CodeGen/RISCV/idiv_large.ll b/llvm/test/CodeGen/RISCV/idiv_large.ll
index 9937627962208..d7b00f61a50b9 100644
--- a/llvm/test/CodeGen/RISCV/idiv_large.ll
+++ b/llvm/test/CodeGen/RISCV/idiv_large.ll
@@ -1,16 +1,2315 @@
-; RUN: llc -mtriple=riscv32 < %s | FileCheck %s
-; RUN: llc -mtriple=riscv64 < %s | FileCheck %s
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 6
+; RUN: llc -mtriple=riscv32 < %s | FileCheck %s --check-prefix=RV32
+; RUN: llc -mtriple=riscv64 < %s | FileCheck %s --check-prefix=RV64
+
+define i64 @udiv_i64(i64 %x, i64 %y) nounwind {
+; RV32-LABEL: udiv_i64:
+; RV32: # %bb.0:
+; RV32-NEXT: addi sp, sp, -16
+; RV32-NEXT: sw ra, 12(sp) # 4-byte Folded Spill
+; RV32-NEXT: call __udivdi3
+; RV32-NEXT: lw ra, 12(sp) # 4-byte Folded Reload
+; RV32-NEXT: addi sp, sp, 16
+; RV32-NEXT: ret
+;
+; RV64-LABEL: udiv_i64:
+; RV64: # %bb.0:
+; RV64-NEXT: tail __udivdi3
+ %res = udiv i64 %x, %y
+ ret i64 %res
+}
+
+define i65 @udiv_i65(i65 %x, i65 %y) nounwind {
+; RV32-LABEL: udiv_i65:
+; RV32: # %bb.0: # %_udiv-special-cases
+; RV32-NEXT: lw a3, 0(a2)
+; RV32-NEXT: lw a4, 4(a2)
+; RV32-NEXT: lw t1, 8(a2)
+; RV32-NEXT: lui a2, 349525
+; RV32-NEXT: lui a5, 209715
+; RV32-NEXT: lui a6, 61681
+; RV32-NEXT: addi t0, a2, 1365
+; RV32-NEXT: addi a7, a5, 819
+; RV32-NEXT: addi a6, a6, -241
+; RV32-NEXT: srli a2, a4, 1
+; RV32-NEXT: slli a5, t1, 31
+; RV32-NEXT: slli t3, a4, 31
+; RV32-NEXT: or t2, a5, a2
+; RV32-NEXT: srli a2, a3, 1
+; RV32-NEXT: or t4, a2, t3
+; RV32-NEXT: bnez t2, .LBB1_2
+; RV32-NEXT: # %bb.1: # %_udiv-special-cases
+; RV32-NEXT: srli a2, t4, 1
+; RV32-NEXT: or a2, t4, a2
+; RV32-NEXT: srli a5, a2, 2
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: srli a5, a2, 4
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: srli a5, a2, 8
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: srli a5, a2, 16
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: not a2, a2
+; RV32-NEXT: srli a5, a2, 1
+; RV32-NEXT: and a5, a5, t0
+; RV32-NEXT: sub a2, a2, a5
+; RV32-NEXT: and a5, a2, a7
+; RV32-NEXT: srli a2, a2, 2
+; RV32-NEXT: and a2, a2, a7
+; RV32-NEXT: add a2, a5, a2
+; RV32-NEXT: srli a5, a2, 4
+; RV32-NEXT: add a2, a2, a5
+; RV32-NEXT: and a2, a2, a6
+; RV32-NEXT: slli a5, a2, 8
+; RV32-NEXT: add a2, a2, a5
+; RV32-NEXT: slli a5, a2, 16
+; RV32-NEXT: add a2, a2, a5
+; RV32-NEXT: srli a2, a2, 24
+; RV32-NEXT: addi t3, a2, 32
+; RV32-NEXT: j .LBB1_3
+; RV32-NEXT: .LBB1_2:
+; RV32-NEXT: srli a2, t2, 1
+; RV32-NEXT: or a2, t2, a2
+; RV32-NEXT: srli a5, a2, 2
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: srli a5, a2, 4
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: srli a5, a2, 8
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: srli a5, a2, 16
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: not a2, a2
+; RV32-NEXT: srli a5, a2, 1
+; RV32-NEXT: and a5, a5, t0
+; RV32-NEXT: sub a2, a2, a5
+; RV32-NEXT: and a5, a2, a7
+; RV32-NEXT: srli a2, a2, 2
+; RV32-NEXT: and a2, a2, a7
+; RV32-NEXT: add a2, a5, a2
+; RV32-NEXT: srli a5, a2, 4
+; RV32-NEXT: add a2, a2, a5
+; RV32-NEXT: and a2, a2, a6
+; RV32-NEXT: slli a5, a2, 8
+; RV32-NEXT: add a2, a2, a5
+; RV32-NEXT: slli a5, a2, 16
+; RV32-NEXT: add a2, a2, a5
+; RV32-NEXT: srli t3, a2, 24
+; RV32-NEXT: .LBB1_3: # %_udiv-special-cases
+; RV32-NEXT: addi sp, sp, -96
+; RV32-NEXT: sw s0, 92(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s1, 88(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s2, 84(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s3, 80(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s4, 76(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s5, 72(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s6, 68(sp) # 4-byte Folded Spill
+; RV32-NEXT: slli a2, a3, 31
+; RV32-NEXT: li t5, 64
+; RV32-NEXT: bnez a2, .LBB1_5
+; RV32-NEXT: # %bb.4: # %_udiv-special-cases
+; RV32-NEXT: li s0, 64
+; RV32-NEXT: j .LBB1_6
+; RV32-NEXT: .LBB1_5:
+; RV32-NEXT: srli a5, a2, 1
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: srli a5, a2, 2
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: srli a5, a2, 4
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: srli a5, a2, 8
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: srli a5, a2, 16
+; RV32-NEXT: or a2, a2, a5
+; RV32-NEXT: not a2, a2
+; RV32-NEXT: srli a5, a2, 1
+; RV32-NEXT: and a5, a5, t0
+; RV32-NEXT: sub a2, a2, a5
+; RV32-NEXT: and a5, a2, a7
+; RV32-NEXT: srli a2, a2, 2
+; RV32-NEXT: and a2, a2, a7
+; RV32-NEXT: add a2, a5, a2
+; RV32-NEXT: srli a5, a2, 4
+; RV32-NEXT: add a2, a2, a5
+; RV32-NEXT: and a2, a2, a6
+; RV32-NEXT: slli a5, a2, 8
+; RV32-NEXT: add a2, a2, a5
+; RV32-NEXT: slli a5, a2, 16
+; RV32-NEXT: add a2, a2, a5
+; RV32-NEXT: srli s0, a2, 24
+; RV32-NEXT: .LBB1_6: # %_udiv-special-cases
+; RV32-NEXT: lw a5, 0(a1)
+; RV32-NEXT: lw a2, 4(a1)
+; RV32-NEXT: lw s2, 8(a1)
+; RV32-NEXT: or a1, t4, t2
+; RV32-NEXT: addi s1, s0, 64
+; RV32-NEXT: bnez a1, .LBB1_8
+; RV32-NEXT: # %bb.7: # %_udiv-special-cases
+; RV32-NEXT: mv t3, s1
+; RV32-NEXT: .LBB1_8: # %_udiv-special-cases
+; RV32-NEXT: snez s4, a1
+; RV32-NEXT: srli a1, a2, 1
+; RV32-NEXT: slli t2, s2, 31
+; RV32-NEXT: slli t4, a2, 31
+; RV32-NEXT: or a1, t2, a1
+; RV32-NEXT: srli t2, a5, 1
+; RV32-NEXT: or t6, t2, t4
+; RV32-NEXT: bnez a1, .LBB1_10
+; RV32-NEXT: # %bb.9: # %_udiv-special-cases
+; RV32-NEXT: srli t2, t6, 1
+; RV32-NEXT: or t2, t6, t2
+; RV32-NEXT: srli t4, t2, 2
+; RV32-NEXT: or t2, t2, t4
+; RV32-NEXT: srli t4, t2, 4
+; RV32-NEXT: or t2, t2, t4
+; RV32-NEXT: srli t4, t2, 8
+; RV32-NEXT: or t2, t2, t4
+; RV32-NEXT: srli t4, t2, 16
+; RV32-NEXT: or t2, t2, t4
+; RV32-NEXT: not t2, t2
+; RV32-NEXT: srli t4, t2, 1
+; RV32-NEXT: and t4, t4, t0
+; RV32-NEXT: sub t2, t2, t4
+; RV32-NEXT: and t4, t2, a7
+; RV32-NEXT: srli t2, t2, 2
+; RV32-NEXT: and t2, t2, a7
+; RV32-NEXT: add t2, t4, t2
+; RV32-NEXT: srli t4, t2, 4
+; RV32-NEXT: add t2, t2, t4
+; RV32-NEXT: and t2, t2, a6
+; RV32-NEXT: slli t4, t2, 8
+; RV32-NEXT: add t2, t2, t4
+; RV32-NEXT: slli t4, t2, 16
+; RV32-NEXT: add t2, t2, t4
+; RV32-NEXT: srli t2, t2, 24
+; RV32-NEXT: addi s3, t2, 32
+; RV32-NEXT: j .LBB1_11
+; RV32-NEXT: .LBB1_10:
+; RV32-NEXT: srli t2, a1, 1
+; RV32-NEXT: or t2, a1, t2
+; RV32-NEXT: srli t4, t2, 2
+; RV32-NEXT: or t2, t2, t4
+; RV32-NEXT: srli t4, t2, 4
+; RV32-NEXT: or t2, t2, t4
+; RV32-NEXT: srli t4, t2, 8
+; RV32-NEXT: or t2, t2, t4
+; RV32-NEXT: srli t4, t2, 16
+; RV32-NEXT: or t2, t2, t4
+; RV32-NEXT: not t2, t2
+; RV32-NEXT: srli t4, t2, 1
+; RV32-NEXT: and t4, t4, t0
+; RV32-NEXT: sub t2, t2, t4
+; RV32-NEXT: and t4, t2, a7
+; RV32-NEXT: srli t2, t2, 2
+; RV32-NEXT: and t2, t2, a7
+; RV32-NEXT: add t2, t4, t2
+; RV32-NEXT: srli t4, t2, 4
+; RV32-NEXT: add t2, t2, t4
+; RV32-NEXT: and t2, t2, a6
+; RV32-NEXT: slli t4, t2, 8
+; RV32-NEXT: add t2, t2, t4
+; RV32-NEXT: slli t4, t2, 16
+; RV32-NEXT: add t2, t2, t4
+; RV32-NEXT: srli s3, t2, 24
+; RV32-NEXT: .LBB1_11: # %_udiv-special-cases
+; RV32-NEXT: andi t4, s2, 1
+; RV32-NEXT: andi t1, t1, 1
+; RV32-NEXT: or t2, a3, a4
+; RV32-NEXT: or s2, a5, a2
+; RV32-NEXT: sltu s0, s1, s0
+; RV32-NEXT: slli s1, a5, 31
+; RV32-NEXT: addi s4, s4, -1
+; RV32-NEXT: beqz s1, .LBB1_13
+; RV32-NEXT: # %bb.12:
+; RV32-NEXT: srli t5, s1, 1
+; RV32-NEXT: or t5, s1, t5
+; RV32-NEXT: srli s1, t5, 2
+; RV32-NEXT: or t5, t5, s1
+; RV32-NEXT: srli s1, t5, 4
+; RV32-NEXT: or t5, t5, s1
+; RV32-NEXT: srli s1, t5, 8
+; RV32-NEXT: or t5, t5, s1
+; RV32-NEXT: srli s1, t5, 16
+; RV32-NEXT: or t5, t5, s1
+; RV32-NEXT: not t5, t5
+; RV32-NEXT: srli s1, t5, 1
+; RV32-NEXT: and t0, s1, t0
+; RV32-NEXT: sub t0, t5, t0
+; RV32-NEXT: and t5, t0, a7
+; RV32-NEXT: srli t0, t0, 2
+; RV32-NEXT: and a7, t0, a7
+; RV32-NEXT: add a7, t5, a7
+; RV32-NEXT: srli t0, a7, 4
+; RV32-NEXT: add a7, a7, t0
+; RV32-NEXT: and a6, a7, a6
+; RV32-NEXT: slli a7, a6, 8
+; RV32-NEXT: add a6, a6, a7
+; RV32-NEXT: slli a7, a6, 16
+; RV32-NEXT: add a6, a6, a7
+; RV32-NEXT: srli t5, a6, 24
+; RV32-NEXT: .LBB1_13: # %_udiv-special-cases
+; RV32-NEXT: or t0, t2, t1
+; RV32-NEXT: or a6, s2, t4
+; RV32-NEXT: and a7, s4, s0
+; RV32-NEXT: or t6, t6, a1
+; RV32-NEXT: addi s0, t5, 64
+; RV32-NEXT: bnez t6, .LBB1_15
+; RV32-NEXT: # %bb.14: # %_udiv-special-cases
+; RV32-NEXT: mv s3, s0
+; RV32-NEXT: .LBB1_15: # %_udiv-special-cases
+; RV32-NEXT: seqz a1, t0
+; RV32-NEXT: sltu t0, s0, t5
+; RV32-NEXT: snez t5, t6
+; RV32-NEXT: addi t5, t5, -1
+; RV32-NEXT: and t0, t5, t0
+; RV32-NEXT: sltu t5, t3, s3
+; RV32-NEXT: seqz a6, a6
+; RV32-NEXT: mv t6, t5
+; RV32-NEXT: beq a7, t0, .LBB1_17
+; RV32-NEXT: # %bb.16: # %_udiv-special-cases
+; RV32-NEXT: sltu t6, a7, t0
+; RV32-NEXT: .LBB1_17: # %_udiv-special-cases
+; RV32-NEXT: or a1, a1, a6
+; RV32-NEXT: andi a6, t6, 1
+; RV32-NEXT: sub a7, a7, t0
+; RV32-NEXT: sub t5, a7, t5
+; RV32-NEXT: sub a7, t3, s3
+; RV32-NEXT: beqz a6, .LBB1_19
+; RV32-NEXT: # %bb.18: # %_udiv-special-cases
+; RV32-NEXT: mv t0, a6
+; RV32-NEXT: j .LBB1_20
+; RV32-NEXT: .LBB1_19:
+; RV32-NEXT: sltiu t0, a7, 65
+; RV32-NEXT: xori t0, t0, 1
+; RV32-NEXT: snez t3, t5
+; RV32-NEXT: or t0, t0, t3
+; RV32-NEXT: .LBB1_20: # %_udiv-special-cases
+; RV32-NEXT: or t6, a1, t0
+; RV32-NEXT: addi a1, t6, -1
+; RV32-NEXT: and t3, t4, a1
+; RV32-NEXT: and t0, a1, a2
+; RV32-NEXT: and a1, a1, a5
+; RV32-NEXT: bnez t6, .LBB1_30
+; RV32-NEXT: # %bb.21: # %_udiv-special-cases
+; RV32-NEXT: xori t6, a7, 64
+; RV32-NEXT: or t6, t6, a6
+; RV32-NEXT: or t6, t6, t5
+; RV32-NEXT: beqz t6, .LBB1_30
+; RV32-NEXT: # %bb.22: # %udiv-bb1
+; RV32-NEXT: addi a1, a7, 1
+; RV32-NEXT: sw zero, 32(sp)
+; RV32-NEXT: sw zero, 36(sp)
+; RV32-NEXT: sw zero, 40(sp)
+; RV32-NEXT: sw zero, 44(sp)
+; RV32-NEXT: sw a5, 48(sp)
+; RV32-NEXT: sw a2, 52(sp)
+; RV32-NEXT: sw t4, 56(sp)
+; RV32-NEXT: li t0, 64
+; RV32-NEXT: addi t3, sp, 48
+; RV32-NEXT: neg s1, a7
+; RV32-NEXT: seqz t6, a1
+; RV32-NEXT: sub a7, t0, a7
+; RV32-NEXT: add t5, t5, t6
+; RV32-NEXT: andi t0, a7, 31
+; RV32-NEXT: srli a7, a7, 3
+; RV32-NEXT: or t6, a1, t5
+; RV32-NEXT: xori s2, t0, 31
+; RV32-NEXT: andi a7, a7, 12
+; RV32-NEXT: seqz t0, t6
+; RV32-NEXT: sub s3, t3, a7
+; RV32-NEXT: add a6, a6, t0
+; RV32-NEXT: lw t3, 0(s3)
+; RV32-NEXT: lw s4, 4(s3)
+; RV32-NEXT: andi a7, a6, 1
+; RV32-NEXT: or t6, t6, a7
+; RV32-NEXT: srli a6, t3, 1
+; RV32-NEXT: sll t0, s4, s1
+; RV32-NEXT: srl a6, a6, s2
+; RV32-NEXT: or t0, t0, a6
+; RV32-NEXT: sll a6, t3, s1
+; RV32-NEXT: li t3, 0
+; RV32-NEXT: beqz t6, .LBB1_28
+; RV32-NEXT: # %bb.23: # %udiv-preheader
+; RV32-NEXT: li t6, 0
+; RV32-NEXT: li s0, 0
+; RV32-NEXT: srli s4, s4, 1
+; RV32-NEXT: lw s3, 8(s3)
+; RV32-NEXT: sw zero, 16(sp)
+; RV32-NEXT: sw zero, 20(sp)
+; RV32-NEXT: sw zero, 24(sp)
+; RV32-NEXT: sw zero, 28(sp)
+; RV32-NEXT: sw a5, 0(sp)
+; RV32-NEXT: sw a2, 4(sp)
+; RV32-NEXT: sw t4, 8(sp)
+; RV32-NEXT: sw zero, 12(sp)
+; RV32-NEXT: srli a2, a1, 3
+; RV32-NEXT: srl a5, s4, s2
+; RV32-NEXT: mv t4, sp
+; RV32-NEXT: snez t2, t2
+; RV32-NEXT: andi a2, a2, 12
+; RV32-NEXT: add t1, t1, t2
+; RV32-NEXT: add a2, t4, a2
+; RV32-NEXT: lw t2, 0(a2)
+; RV32-NEXT: lw t4, 4(a2)
+; RV32-NEXT: lw a2, 8(a2)
+; RV32-NEXT: sll s1, s3, s1
+; RV32-NEXT: andi s2, a1, 31
+; RV32-NEXT: xori s2, s2, 31
+; RV32-NEXT: or s3, s1, a5
+; RV32-NEXT: slli a2, a2, 1
+; RV32-NEXT: slli a5, t4, 1
+; RV32-NEXT: sll a2, a2, s2
+; RV32-NEXT: sll s2, a5, s2
+; RV32-NEXT: srl s1, t4, a1
+; RV32-NEXT: or s1, s1, a2
+; RV32-NEXT: seqz a2, a3
+; RV32-NEXT: sub a2, a4, a2
+; RV32-NEXT: addi a5, t1, 1
+; RV32-NEXT: andi a5, a5, 1
+; RV32-NEXT: andi s3, s3, 1
+; RV32-NEXT: srl t1, t2, a1
+; RV32-NEXT: or s2, t1, s2
+; RV32-NEXT: addi t1, a3, -1
+; RV32-NEXT: j .LBB1_26
+; RV32-NEXT: .LBB1_24: # %udiv-do-while
+; RV32-NEXT: # in Loop: Header=BB1_26 Depth=1
+; RV32-NEXT: sltu t2, a2, s4
+; RV32-NEXT: .LBB1_25: # %udiv-do-while
+; RV32-NEXT: # in Loop: Header=BB1_26 Depth=1
+; RV32-NEXT: srli s1, s1, 31
+; RV32-NEXT: sub t4, a5, s1
+; RV32-NEXT: sub t2, t4, t2
+; RV32-NEXT: slli t2, t2, 31
+; RV32-NEXT: srai s1, t2, 31
+; RV32-NEXT: and s3, s1, a4
+; RV32-NEXT: li t2, 0
+; RV32-NEXT: li t4, 0
+; RV32-NEXT: srli s5, a6, 31
+; RV32-NEXT: sub s4, s4, s3
+; RV32-NEXT: slli s3, t0, 1
+; RV32-NEXT: or s3, s3, s5
+; RV32-NEXT: srli t0, t0, 31
+; RV32-NEXT: slli a6, a6, 1
+; RV32-NEXT: or a6, t3, a6
+; RV32-NEXT: seqz t3, a1
+; RV32-NEXT: or s0, s0, t0
+; RV32-NEXT: or s5, a1, t5
+; RV32-NEXT: sub t5, t5, t3
+; RV32-NEXT: and s6, s1, a3
+; RV32-NEXT: addi a1, a1, -1
+; RV32-NEXT: andi t3, s1, 1
+; RV32-NEXT: or t0, t6, s3
+; RV32-NEXT: sltu t6, s2, s6
+; RV32-NEXT: snez s5, s5
+; RV32-NEXT: andi s3, s0, 1
+; RV32-NEXT: sub s1, s4, t6
+; RV32-NEXT: add a7, a7, s5
+; RV32-NEXT: addi a7, a7, 1
+; RV32-NEXT: andi a7, a7, 1
+; RV32-NEXT: or t6, a1, t5
+; RV32-NEXT: or s4, t6, a7
+; RV32-NEXT: sub s2, s2, s6
+; RV32-NEXT: li t6, 0
+; RV32-NEXT: li s0, 0
+; RV32-NEXT: beqz s4, .LBB1_29
+; RV32-NEXT: .LBB1_26: # %udiv-do-while
+; RV32-NEXT: # =>This Inner Loop Header: Depth=1
+; RV32-NEXT: srli t2, s2, 31
+; RV32-NEXT: slli t4, s1, 1
+; RV32-NEXT: slli s2, s2, 1
+; RV32-NEXT: or s4, t4, t2
+; RV32-NEXT: andi t2, s3, 1
+; RV32-NEXT: or s2, s2, t2
+; RV32-NEXT: bne a2, s4, .LBB1_24
+; RV32-NEXT: # %bb.27: # in Loop: Header=BB1_26 Depth=1
+; RV32-NEXT: sltu t2, t1, s2
+; RV32-NEXT: j .LBB1_25
+; RV32-NEXT: .LBB1_28:
+; RV32-NEXT: li t2, 0
+; RV32-NEXT: li t4, 0
+; RV32-NEXT: .LBB1_29: # %udiv-loop-exit
+; RV32-NEXT: srli a2, a6, 31
+; RV32-NEXT: slli a3, t0, 1
+; RV32-NEXT: srli a4, t0, 31
+; RV32-NEXT: slli a6, a6, 1
+; RV32-NEXT: or a1, t3, a6
+; RV32-NEXT: or a2, t2, a2
+; RV32-NEXT: or a4, t4, a4
+; RV32-NEXT: or t0, a2, a3
+; RV32-NEXT: andi t3, a4, 1
+; RV32-NEXT: .LBB1_30: # %udiv-end
+; RV32-NEXT: andi a2, t3, 1
+; RV32-NEXT: sw a1, 0(a0)
+; RV32-NEXT: sw t0, 4(a0)
+; RV32-NEXT: sb a2, 8(a0)
+; RV32-NEXT: lw s0, 92(sp) # 4-byte Folded Reload
+; RV32-NEXT: lw s1, 88(sp) # 4-byte Folded Reload
+; RV32-NEXT: lw s2, 84(sp) # 4-byte Folded Reload
+; RV32-NEXT: lw s3, 80(sp) # 4-byte Folded Reload
+; RV32-NEXT: lw s4, 76(sp) # 4-byte Folded Reload
+; RV32-NEXT: lw s5, 72(sp) # 4-byte Folded Reload
+; RV32-NEXT: lw s6, 68(sp) # 4-byte Folded Reload
+; RV32-NEXT: addi sp, sp, 96
+; RV32-NEXT: ret
+;
+; RV64-LABEL: udiv_i65:
+; RV64: # %bb.0:
+; RV64-NEXT: addi sp, sp, -16
+; RV64-NEXT: sd ra, 8(sp) # 8-byte Folded Spill
+; RV64-NEXT: andi a1, a1, 1
+; RV64-NEXT: andi a3, a3, 1
+; RV64-NEXT: call __udivti3
+; RV64-NEXT: ld ra, 8(sp) # 8-byte Folded Reload
+; RV64-NEXT: addi sp, sp, 16
+; RV64-NEXT: ret
+ %res = udiv i65 %x, %y
+ ret i65 %res
+}
define i128 @udiv_i128(i128 %x, i128 %y) nounwind {
-; CHECK-LABEL: udiv_i128:
-; CHECK: call __udivti3
+; RV32-LABEL: udiv_i128:
+; RV32: # %bb.0: # %_udiv-special-cases
+; RV32-NEXT: addi sp, sp, -160
+; RV32-NEXT: sw ra, 156(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s0, 152(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s1, 148(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s2, 144(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s3, 140(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s4, 136(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s5, 132(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s6, 128(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s7, 124(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s8, 120(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s9, 116(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s10, 112(sp) # 4-byte Folded Spill
+; RV32-NEXT: sw s11, 108(sp) # 4-byte Folded Spill
+; RV32-NEXT: mv s7, a0
+; RV32-NEXT: lw s8, 0(a2)
+; RV32-NEXT: lw s9, 4(a2)
+; RV32-NEXT: lw s11, 8(a2)
+; RV32-NEXT: lw ra, 12(a2)
+; RV32-NEXT: lui t4, 349525
+; RV32-NEXT: addi t4, t4, 1365
+; RV32-NEXT: lui t3, 209715
+; RV32-NEXT: addi t3, t3, 819
+; RV32-NEXT: lui t2, 61681
+; RV32-NEXT: addi t2, t2, -241
+; RV32-NEXT: bnez s9, .LBB2_2
+; RV32-NEXT: # %bb.1: # %_udiv-special-cases
+; RV32-NEXT: srli a0, s8, 1
+; RV32-NEXT: or a0, s8, a0
+; RV32-NEXT: srli a3, a0, 2
+; RV32-NEXT: or a0, a0, a3
+; RV32-NEXT: srli a3, a0, 4
+; RV32-NEXT: or a0, a0, a3
+; RV32-NEXT: srli a3, a0, 8
+; RV32-NEXT: or a0, a0, a3
+; RV32-NEXT: srli a3, a0, 16
+; RV32-NEXT: or a0, a0, a3
+; RV32-NEXT: not a0, a0
+; RV32-NEXT: srli a3, a0, 1
+; RV32-NEXT: and a3, a3, t4
+; RV32-NEXT: sub a0, a0, a3
+; RV32-NEXT: and a3, a0, t3
+; RV32-NEXT: srli a0, a0, 2
+; RV32-NEXT: and a0, a0, t3
+; RV32-NEXT: add a0, a3, a0
+; RV32-NEXT: srli a3, a0, 4
+; RV32-NEXT: add a0, a0, a3
+; RV32-NEXT: and a0, a0, t2
+; RV32-NEXT: slli a3, a0, 8
+; RV32-NEXT: add a0, a0, a3
+; RV32-NEXT: slli a3, a0, 16
+; RV32-NEXT: add a0, a0, a3
+; RV32-NEXT: srli a0, a0, 24
+; RV32-NEXT: addi t6, a0, 32
+; RV32-NEXT: j .LBB2_3
+; RV32-NEXT: .LBB2_2:
+; RV32-NEXT: srli a0, s9, 1
+; RV32-NEXT: or a0, s9, a0
+; RV32-NEXT: srli a3, a0, 2
+; RV32-NEXT: or a0, a0, a3
+; RV32-NEXT: srli a3, a0, 4
+; RV32-NEXT: or a0, a0, a3
+; RV32-NEXT: srli a3, a0, 8
+; RV32-NEXT: or a0, a0, a3
+; RV32-NEXT: srli a3, a0, 16
+; RV32-NEXT: or a0, a0, a3
+; RV32-NEXT: not a0, a0
+; RV32-NEXT: srli a3, a0, 1
+; RV32-NEXT: and a3, a3, t4
+; RV32-NEXT: sub a0, a0, a3
+; RV32-NEXT: and a3, a0, t3
+; RV32-NEXT: srli a0, a0, 2
+; RV32-NEXT: and a0, a0, t3
+; RV32-NEXT: add a0, a3, a0
+; RV32-NEXT: srli a3, a0, 4
+; RV32-NEXT: add a0, a0, a3
+; RV32-NEXT: and a0, a0, t2
+; RV32-NEXT: slli a3, a0, 8
+; RV32-NEXT: add a0, a0, a3
+; RV32-NEXT: slli a3, a0, 16
+; RV32-NEXT: add a0, a0, a3
+; RV32-NEXT: srli t6, a0, 24
+; RV32-NEXT: .LBB2_3: # %_udiv-special-cases
+; RV32-NEXT: lw a6, 4(a1)
+; RV32-NEXT: or s0, s11, ra
+; RV32-NEXT: bnez ra, .LBB2_5
+; RV32-NEXT: # %bb.4: # %_udiv-special-cases
+; RV32-NEXT: srli a0, s11, 1
+; RV32-NEXT: or a0, s11, a0
+; RV32-NEXT: srli a3, a0, 2
+; RV32-NEXT: or a0, a0, a3
+; RV32-NEXT: srli a3, a0, 4
+; RV32-NEXT: or a0, a0, a3
+; RV32-NEXT: srli a3, a0, 8
+; RV32-NEXT: or a0, a0, a3
+; RV32-NEXT: srli a3, a0, 16
+; RV32-NEXT: or a0, a0, a3
+; RV32-NEXT: not a0, a0
+; RV32-NEXT: srli a3, a0, 1
+; RV32-NEXT: and a3, a3, t4
+; RV32-NEXT: sub a0, a0, a3
+; RV32-NEXT: and a3, a0, t3
+; RV32-NEXT: srli a0, a0, 2
+; RV32-NEXT: and a0, a0, t3
+; RV32-NEXT: add a0, a3, a0
+; RV32-NEXT: srli a3, a0, 4
+; RV32-NEXT: add a0, a0, a3
+; RV32-NEXT: and a0, a0, t2
+; RV32-NEXT: slli a3, a0, 8
+; RV32-NEXT: add a0, a0, a3
+; RV32-NEXT: slli a3, a0, 16
+; RV32-NEXT: add a0, a0, a3
+; RV32-NEXT: srli a0, a0, 24
+; RV32-NEXT: addi t5, a0, 32
+; RV32-NEXT: j .LBB2_6
+; RV32-NEXT:...
[truncated]
|
lenary
approved these changes
Oct 16, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The __(u)divti3, __(u)modti3 functions don't exist in libgcc for RV32.