Skip to content

Conversation

@wangpc-pp
Copy link
Contributor

@wangpc-pp wangpc-pp commented Nov 12, 2024

The results differ on different platforms so it is really hard to
determine a common default value.

Tune info for postra scheduling direction is added and CPUs can
set their own preferable postra scheduling direction.

@llvmbot
Copy link
Member

llvmbot commented Nov 12, 2024

@llvm/pr-subscribers-tablegen
@llvm/pr-subscribers-backend-risc-v

@llvm/pr-subscribers-llvm-globalisel

Author: Pengcheng Wang (wangpc-pp)

Changes

This helps improve scheduling result (more bubbles can be filled).

There are two commits in this PR:

  1. Enable PostRA scheduling by default. This is just for testing.
  2. Enable bidirectional scheduling.

Compared to topdown: 1011 files changed, 60562 insertions(+), 60682 deletions(-)


Patch is 15.68 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/115864.diff

1029 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVSubtarget.cpp (+9)
  • (modified) llvm/lib/Target/RISCV/RISCVSubtarget.h (+7-1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/fpr-gpr-copy-rv32.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/fpr-gpr-copy-rv64.ll (+8-1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/jumptable.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb-zbkb.ll (+19-19)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv32zbb.ll (+33-33)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv32zbkb.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb-zbkb.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll (+56-56)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/shift.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/stacksave-stackrestore.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/vararg.ll (+362-362)
  • (modified) llvm/test/CodeGen/RISCV/abds-neg.ll (+214-222)
  • (modified) llvm/test/CodeGen/RISCV/abds.ll (+253-263)
  • (modified) llvm/test/CodeGen/RISCV/abdu-neg.ll (+170-170)
  • (modified) llvm/test/CodeGen/RISCV/abdu.ll (+157-157)
  • (modified) llvm/test/CodeGen/RISCV/add-before-shl.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/add-imm.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/addc-adde-sube-subc.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/addcarry.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/addimm-mulimm.ll (+34-34)
  • (modified) llvm/test/CodeGen/RISCV/aext-to-sext.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/alloca.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/alu16.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/alu32.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/alu64.ll (+25-27)
  • (modified) llvm/test/CodeGen/RISCV/alu8.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/and-add-lsr.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/and.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/arith-with-overflow.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/atomic-cmpxchg-branch-on-result.ll (+21-21)
  • (modified) llvm/test/CodeGen/RISCV/atomic-cmpxchg.ll (+464-482)
  • (modified) llvm/test/CodeGen/RISCV/atomic-rmw-discard.ll (+60-60)
  • (modified) llvm/test/CodeGen/RISCV/atomic-rmw-sub.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/atomic-rmw.ll (+3175-3175)
  • (modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+686-686)
  • (modified) llvm/test/CodeGen/RISCV/atomicrmw-cond-sub-clamp.ll (+226-226)
  • (modified) llvm/test/CodeGen/RISCV/atomicrmw-uinc-udec-wrap.ll (+242-242)
  • (modified) llvm/test/CodeGen/RISCV/avgceils.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/avgceilu.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/avgfloors.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/avgflooru.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/bf16-promote.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-arith.ll (+31-31)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-br-fcmp.ll (+26-26)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-convert.ll (+133-133)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-fcmp.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-frem.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-imm.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-mem.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-select-fcmp.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/bfloat.ll (+53-53)
  • (modified) llvm/test/CodeGen/RISCV/bitextract-mac.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/bittest.ll (+91-91)
  • (modified) llvm/test/CodeGen/RISCV/blockaddress.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/branch-on-zero.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/branch-relaxation.ll (+255-255)
  • (modified) llvm/test/CodeGen/RISCV/bswap-bitreverse.ll (+51-51)
  • (modified) llvm/test/CodeGen/RISCV/byval.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/callee-saved-fpr32s.ll (+397-397)
  • (modified) llvm/test/CodeGen/RISCV/callee-saved-fpr64s.ll (+234-234)
  • (modified) llvm/test/CodeGen/RISCV/callee-saved-gprs.ll (+709-709)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-half.ll (+90-90)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-common.ll (+34-34)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32-ilp32f-ilp32d-common.ll (+176-176)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32.ll (+32-32)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32d.ll (+24-24)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32e.ll (+319-319)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-ilp32f-ilp32d-common.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-lp64-lp64f-common.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-lp64-lp64f-lp64d-common.ll (+63-63)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-lp64.ll (+33-33)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-lp64e.ll (+35-35)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-rv32f-ilp32.ll (+14-15)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-rv32f-ilp32e.ll (+13-14)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-sext-zext.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-vector-float.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/calling-conv-vector-on-stack.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/calls.ll (+126-126)
  • (modified) llvm/test/CodeGen/RISCV/cm_mvas_mvsa.ll (+28-28)
  • (modified) llvm/test/CodeGen/RISCV/cmov-branch-opt.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/codemodel-lowering.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/compress-opt-select.ll (+32-32)
  • (modified) llvm/test/CodeGen/RISCV/condbinops.ll (+25-25)
  • (modified) llvm/test/CodeGen/RISCV/condops.ll (+278-284)
  • (modified) llvm/test/CodeGen/RISCV/convert-highly-predictable-select-to-branch.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/copyprop.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/copysign-casts.ll (+62-62)
  • (modified) llvm/test/CodeGen/RISCV/ctlz-cttz-ctpop.ll (+118-120)
  • (modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+100-100)
  • (modified) llvm/test/CodeGen/RISCV/div-by-constant.ll (+53-53)
  • (modified) llvm/test/CodeGen/RISCV/div-pow2.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/div.ll (+58-58)
  • (modified) llvm/test/CodeGen/RISCV/double-arith-strict.ll (+120-120)
  • (modified) llvm/test/CodeGen/RISCV/double-arith.ll (+240-240)
  • (modified) llvm/test/CodeGen/RISCV/double-bitmanip-dagcombines.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/double-br-fcmp.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/double-calling-conv.ll (+33-33)
  • (modified) llvm/test/CodeGen/RISCV/double-convert-strict.ll (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/double-convert.ll (+344-344)
  • (modified) llvm/test/CodeGen/RISCV/double-fcmp-strict.ll (+173-185)
  • (modified) llvm/test/CodeGen/RISCV/double-fcmp.ll (+80-80)
  • (modified) llvm/test/CodeGen/RISCV/double-imm.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/double-intrinsics-strict.ll (+45-45)
  • (modified) llvm/test/CodeGen/RISCV/double-intrinsics.ll (+55-55)
  • (modified) llvm/test/CodeGen/RISCV/double-maximum-minimum.ll (+19-19)
  • (modified) llvm/test/CodeGen/RISCV/double-mem.ll (+23-23)
  • (modified) llvm/test/CodeGen/RISCV/double-previous-failure.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/double-round-conv-sat.ll (+180-180)
  • (modified) llvm/test/CodeGen/RISCV/double-round-conv.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/double-select-fcmp.ll (+28-28)
  • (modified) llvm/test/CodeGen/RISCV/double-select-icmp.ll (+20-20)
  • (modified) llvm/test/CodeGen/RISCV/double-stack-spill-restore.ll (+20-20)
  • (modified) llvm/test/CodeGen/RISCV/double-zfa.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/double_reduct.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/early-clobber-tied-def-subreg-liveness.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/exception-pointer-register.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/fastcc-bf16.ll (+11-11)
  • (modified) llvm/test/CodeGen/RISCV/fastcc-float.ll (+11-11)
  • (modified) llvm/test/CodeGen/RISCV/fastcc-half.ll (+11-11)
  • (modified) llvm/test/CodeGen/RISCV/fastcc-int.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/fastcc-without-f-reg.ll (+621-621)
  • (modified) llvm/test/CodeGen/RISCV/fli-licm.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/float-arith-strict.ll (+70-70)
  • (modified) llvm/test/CodeGen/RISCV/float-arith.ll (+142-142)
  • (modified) llvm/test/CodeGen/RISCV/float-bit-preserving-dagcombines.ll (+66-66)
  • (modified) llvm/test/CodeGen/RISCV/float-bitmanip-dagcombines.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/float-br-fcmp.ll (+20-20)
  • (modified) llvm/test/CodeGen/RISCV/float-convert-strict.ll (+22-22)
  • (modified) llvm/test/CodeGen/RISCV/float-convert.ll (+274-274)
  • (modified) llvm/test/CodeGen/RISCV/float-fcmp-strict.ll (+131-137)
  • (modified) llvm/test/CodeGen/RISCV/float-fcmp.ll (+62-62)
  • (modified) llvm/test/CodeGen/RISCV/float-intrinsics-strict.ll (+28-28)
  • (modified) llvm/test/CodeGen/RISCV/float-intrinsics.ll (+70-70)
  • (modified) llvm/test/CodeGen/RISCV/float-maximum-minimum.ll (+32-32)
  • (modified) llvm/test/CodeGen/RISCV/float-mem.ll (+17-17)
  • (modified) llvm/test/CodeGen/RISCV/float-round-conv-sat.ll (+222-222)
  • (modified) llvm/test/CodeGen/RISCV/float-round-conv.ll (+40-40)
  • (modified) llvm/test/CodeGen/RISCV/float-select-fcmp.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/float-zfa.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/fmax-fmin.ll (+56-56)
  • (modified) llvm/test/CodeGen/RISCV/fold-addi-loadstore.ll (+90-90)
  • (modified) llvm/test/CodeGen/RISCV/fold-binop-into-select.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/forced-atomics.ll (+725-725)
  • (modified) llvm/test/CodeGen/RISCV/fp-fcanonicalize.ll (+327-327)
  • (modified) llvm/test/CodeGen/RISCV/fp128.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/fp16-promote.ll (+21-21)
  • (modified) llvm/test/CodeGen/RISCV/fpclamptosat.ll (+228-228)
  • (modified) llvm/test/CodeGen/RISCV/fpenv.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/frame-info.ll (+34-34)
  • (modified) llvm/test/CodeGen/RISCV/frame.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/frameaddr-returnaddr.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/get-setcc-result-type.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-nonzero.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/global-merge-minsize-smalldata-zero.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/global-merge-minsize.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/global-merge-offset.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/global-merge.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/half-arith-strict.ll (+34-34)
  • (modified) llvm/test/CodeGen/RISCV/half-arith.ll (+446-446)
  • (modified) llvm/test/CodeGen/RISCV/half-bitmanip-dagcombines.ll (+11-11)
  • (modified) llvm/test/CodeGen/RISCV/half-br-fcmp.ll (+68-68)
  • (modified) llvm/test/CodeGen/RISCV/half-convert-strict.ll (+40-40)
  • (modified) llvm/test/CodeGen/RISCV/half-convert.ll (+730-730)
  • (modified) llvm/test/CodeGen/RISCV/half-fcmp-strict.ll (+56-62)
  • (modified) llvm/test/CodeGen/RISCV/half-fcmp.ll (+74-74)
  • (modified) llvm/test/CodeGen/RISCV/half-frem.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/half-imm.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/half-intrinsics.ll (+322-322)
  • (modified) llvm/test/CodeGen/RISCV/half-maximum-minimum.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/half-mem.ll (+39-39)
  • (modified) llvm/test/CodeGen/RISCV/half-round-conv-sat.ll (+504-504)
  • (modified) llvm/test/CodeGen/RISCV/half-round-conv.ll (+225-225)
  • (modified) llvm/test/CodeGen/RISCV/half-select-fcmp.ll (+43-43)
  • (modified) llvm/test/CodeGen/RISCV/half-zfa.ll (+17-17)
  • (modified) llvm/test/CodeGen/RISCV/hoist-global-addr-base.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/i64-icmp.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/iabs.ll (+36-36)
  • (modified) llvm/test/CodeGen/RISCV/imm-cse.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/imm.ll (+104-104)
  • (modified) llvm/test/CodeGen/RISCV/inline-asm-d-constraint-f.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/inline-asm-d-modifier-N.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/inline-asm-f-constraint-f.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/inline-asm-f-modifier-N.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/inline-asm-zdinx-constraint-r.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/inline-asm-zfh-constraint-f.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/inline-asm-zfh-modifier-N.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/inline-asm-zfinx-constraint-r.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/inline-asm-zhinx-constraint-r.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/inline-asm.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/interrupt-attr-callee.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/interrupt-attr-nocall.ll (+345-345)
  • (modified) llvm/test/CodeGen/RISCV/interrupt-attr.ll (+2698-2698)
  • (modified) llvm/test/CodeGen/RISCV/intrinsic-cttz-elts-vscale.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/intrinsic-cttz-elts.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/jumptable-swguarded.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/jumptable.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/lack-of-signed-truncation-check.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/large-stack.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/legalize-fneg.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/libcall-tail-calls.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/llvm.exp10.ll (+190-190)
  • (modified) llvm/test/CodeGen/RISCV/llvm.frexp.ll (+374-382)
  • (modified) llvm/test/CodeGen/RISCV/local-stack-slot-allocation.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/loop-strength-reduce-add-cheaper-than-mul.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/loop-strength-reduce-loop-invar.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/lpad.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/lsr-legaladdimm.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/machine-combiner-strategies.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/machine-combiner.ll (+78-78)
  • (modified) llvm/test/CodeGen/RISCV/machine-cse.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/machine-outliner-and-machine-copy-propagation.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/machine-outliner-throw.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/machine-sink-load-immediate.ll (+15-15)
  • (modified) llvm/test/CodeGen/RISCV/machinelicm-address-pseudos.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/machinelicm-constant-phys-reg.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/macro-fusion-lui-addi.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/mem.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/mem64.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/memcmp-optsize.ll (+127-127)
  • (modified) llvm/test/CodeGen/RISCV/memcmp.ll (+143-143)
  • (modified) llvm/test/CodeGen/RISCV/memcpy.ll (+38-38)
  • (modified) llvm/test/CodeGen/RISCV/memset-inline.ll (+656-656)
  • (modified) llvm/test/CodeGen/RISCV/min-max.ll (+11-11)
  • (modified) llvm/test/CodeGen/RISCV/miss-sp-restore-eh.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/mul.ll (+108-108)
  • (modified) llvm/test/CodeGen/RISCV/naked-fn-with-frame-pointer.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/narrow-shl-cst.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/neg-abs.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/nomerge.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/nontemporal.ll (+610-610)
  • (modified) llvm/test/CodeGen/RISCV/or-is-add.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/orc-b-patterns.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/out-of-reach-emergency-slot.mir (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/overflow-intrinsics.ll (+93-98)
  • (modified) llvm/test/CodeGen/RISCV/pr51206.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/pr56110.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/pr56457.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/pr58511.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/pr63816.ll (+31-31)
  • (modified) llvm/test/CodeGen/RISCV/pr64645.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/pr65025.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/pr68855.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/pr69586.ll (+95-95)
  • (modified) llvm/test/CodeGen/RISCV/pr84200.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/pr84653_pr85190.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/pr90652.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/pr94145.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/pr95271.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/pr95284.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/pr96366.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/push-pop-popret.ll (+894-894)
  • (modified) llvm/test/CodeGen/RISCV/reduce-unnecessary-extension.ll (+11-11)
  • (modified) llvm/test/CodeGen/RISCV/redundant-copy-from-tail-duplicate.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/regalloc-last-chance-recoloring-failure.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rem.ll (+34-34)
  • (modified) llvm/test/CodeGen/RISCV/remat.ll (+50-50)
  • (modified) llvm/test/CodeGen/RISCV/repeated-fp-divisors.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/riscv-codegenprepare-asm.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/riscv-shifted-extend.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/rotl-rotr.ll (+270-270)
  • (modified) llvm/test/CodeGen/RISCV/rv32e.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rv32i-rv64i-float-double.ll (+12-12)
  • (modified) llvm/test/CodeGen/RISCV/rv32i-rv64i-half.ll (+14-14)
  • (modified) llvm/test/CodeGen/RISCV/rv32xtheadba.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rv32xtheadbb.ll (+28-29)
  • (modified) llvm/test/CodeGen/RISCV/rv32xtheadbs.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rv32zba.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rv32zbb-zbkb.ll (+21-24)
  • (modified) llvm/test/CodeGen/RISCV/rv32zbb.ll (+70-70)
  • (modified) llvm/test/CodeGen/RISCV/rv32zbkb.ll (+11-11)
  • (modified) llvm/test/CodeGen/RISCV/rv32zbs.ll (+19-19)
  • (modified) llvm/test/CodeGen/RISCV/rv64-double-convert.ll (+57-57)
  • (modified) llvm/test/CodeGen/RISCV/rv64-float-convert.ll (+55-55)
  • (modified) llvm/test/CodeGen/RISCV/rv64-half-convert.ll (+52-52)
  • (modified) llvm/test/CodeGen/RISCV/rv64-patchpoint.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rv64-statepoint-call-lowering.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rv64-trampoline.ll (+16-16)
  • (modified) llvm/test/CodeGen/RISCV/rv64e.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rv64f-float-convert.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rv64i-complex-float.ll (+10-10)
  • (modified) llvm/test/CodeGen/RISCV/rv64i-demanded-bits.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64i-shift-sext.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64i-w-insts-legalization.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64xtheadbb.ll (+33-33)
  • (modified) llvm/test/CodeGen/RISCV/rv64zba.ll (+43-43)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbb-zbkb.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbb.ll (+50-50)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbc-intrinsic.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbc-zbkc-intrinsic.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbkb.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/rv64zfh-half-convert.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rv64zfhmin-half-convert.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv-cfi-info.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/65704-illegal-instruction.ll (+9-9)
  • (modified) llvm/test/CodeGen/RISCV/rvv/abs-vp.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rvv/access-fixed-objects-by-rvv.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/active_lane_mask.ll (+7-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/alloca-load-store-scalable-array.ll (+1-1)
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
index e7db1ededf383b..3fb756b0fab170 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.cpp
@@ -16,6 +16,7 @@
 #include "RISCV.h"
 #include "RISCVFrameLowering.h"
 #include "RISCVTargetMachine.h"
+#include "llvm/CodeGen/MachineScheduler.h"
 #include "llvm/CodeGen/MacroFusion.h"
 #include "llvm/CodeGen/ScheduleDAGMutation.h"
 #include "llvm/MC/TargetRegistry.h"
@@ -199,3 +200,11 @@ unsigned RISCVSubtarget::getMinimumJumpTableEntries() const {
              ? RISCVMinimumJumpTableEntries
              : TuneInfo->MinimumJumpTableEntries;
 }
+
+void RISCVSubtarget::overridePostRASchedPolicy(MachineSchedPolicy &Policy,
+                                               unsigned NumRegionInstrs) const {
+  // Do bidirectional scheduling since it provides a more balanced scheduling
+  // leading to better performance. This will increase compile time.
+  Policy.OnlyTopDown = false;
+  Policy.OnlyBottomUp = false;
+}
diff --git a/llvm/lib/Target/RISCV/RISCVSubtarget.h b/llvm/lib/Target/RISCV/RISCVSubtarget.h
index f59a3737ae76f9..5d1d64f5694243 100644
--- a/llvm/lib/Target/RISCV/RISCVSubtarget.h
+++ b/llvm/lib/Target/RISCV/RISCVSubtarget.h
@@ -124,7 +124,10 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
   }
   bool enableMachineScheduler() const override { return true; }
 
-  bool enablePostRAScheduler() const override { return UsePostRAScheduler; }
+  bool enablePostRAScheduler() const override {
+    // FIXNE: Just for tests, will revert this change when landing.
+    return true;
+  }
 
   Align getPrefFunctionAlignment() const {
     return Align(TuneInfo->PrefFunctionAlignment);
@@ -327,6 +330,9 @@ class RISCVSubtarget : public RISCVGenSubtargetInfo {
   unsigned getTailDupAggressiveThreshold() const {
     return TuneInfo->TailDupAggressiveThreshold;
   }
+
+  void overridePostRASchedPolicy(MachineSchedPolicy &Policy,
+                                 unsigned NumRegionInstrs) const override;
 };
 } // End llvm namespace
 
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll b/llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll
index 330f8b16065f13..45eb3478eef739 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/alu-roundtrip.ll
@@ -25,18 +25,18 @@ define i32 @add_i8_signext_i32(i8 %a, i8 %b) {
 ; RV32IM-LABEL: add_i8_signext_i32:
 ; RV32IM:       # %bb.0: # %entry
 ; RV32IM-NEXT:    slli a0, a0, 24
-; RV32IM-NEXT:    srai a0, a0, 24
 ; RV32IM-NEXT:    slli a1, a1, 24
 ; RV32IM-NEXT:    srai a1, a1, 24
+; RV32IM-NEXT:    srai a0, a0, 24
 ; RV32IM-NEXT:    add a0, a0, a1
 ; RV32IM-NEXT:    ret
 ;
 ; RV64IM-LABEL: add_i8_signext_i32:
 ; RV64IM:       # %bb.0: # %entry
 ; RV64IM-NEXT:    slli a0, a0, 56
-; RV64IM-NEXT:    srai a0, a0, 56
 ; RV64IM-NEXT:    slli a1, a1, 56
 ; RV64IM-NEXT:    srai a1, a1, 56
+; RV64IM-NEXT:    srai a0, a0, 56
 ; RV64IM-NEXT:    add a0, a0, a1
 ; RV64IM-NEXT:    ret
 entry:
@@ -49,15 +49,15 @@ entry:
 define i32 @add_i8_zeroext_i32(i8 %a, i8 %b) {
 ; RV32IM-LABEL: add_i8_zeroext_i32:
 ; RV32IM:       # %bb.0: # %entry
-; RV32IM-NEXT:    andi a0, a0, 255
 ; RV32IM-NEXT:    andi a1, a1, 255
+; RV32IM-NEXT:    andi a0, a0, 255
 ; RV32IM-NEXT:    add a0, a0, a1
 ; RV32IM-NEXT:    ret
 ;
 ; RV64IM-LABEL: add_i8_zeroext_i32:
 ; RV64IM:       # %bb.0: # %entry
-; RV64IM-NEXT:    andi a0, a0, 255
 ; RV64IM-NEXT:    andi a1, a1, 255
+; RV64IM-NEXT:    andi a0, a0, 255
 ; RV64IM-NEXT:    add a0, a0, a1
 ; RV64IM-NEXT:    ret
 entry:
@@ -404,8 +404,8 @@ define i64 @add_i64(i64 %a, i64 %b) {
 ; RV32IM-LABEL: add_i64:
 ; RV32IM:       # %bb.0: # %entry
 ; RV32IM-NEXT:    add a0, a0, a2
-; RV32IM-NEXT:    sltu a2, a0, a2
 ; RV32IM-NEXT:    add a1, a1, a3
+; RV32IM-NEXT:    sltu a2, a0, a2
 ; RV32IM-NEXT:    add a1, a1, a2
 ; RV32IM-NEXT:    ret
 ;
@@ -439,8 +439,8 @@ define i64 @sub_i64(i64 %a, i64 %b) {
 ; RV32IM-LABEL: sub_i64:
 ; RV32IM:       # %bb.0: # %entry
 ; RV32IM-NEXT:    sub a4, a0, a2
-; RV32IM-NEXT:    sltu a0, a0, a2
 ; RV32IM-NEXT:    sub a1, a1, a3
+; RV32IM-NEXT:    sltu a0, a0, a2
 ; RV32IM-NEXT:    sub a1, a1, a0
 ; RV32IM-NEXT:    mv a0, a4
 ; RV32IM-NEXT:    ret
@@ -460,8 +460,8 @@ define i64 @subi_i64(i64 %a) {
 ; RV32IM-NEXT:    lui a2, 1048275
 ; RV32IM-NEXT:    addi a2, a2, -1548
 ; RV32IM-NEXT:    add a0, a0, a2
-; RV32IM-NEXT:    sltu a2, a0, a2
 ; RV32IM-NEXT:    addi a1, a1, -1
+; RV32IM-NEXT:    sltu a2, a0, a2
 ; RV32IM-NEXT:    add a1, a1, a2
 ; RV32IM-NEXT:    ret
 ;
@@ -480,8 +480,8 @@ define i64 @neg_i64(i64 %a) {
 ; RV32IM-LABEL: neg_i64:
 ; RV32IM:       # %bb.0: # %entry
 ; RV32IM-NEXT:    neg a2, a0
-; RV32IM-NEXT:    snez a0, a0
 ; RV32IM-NEXT:    neg a1, a1
+; RV32IM-NEXT:    snez a0, a0
 ; RV32IM-NEXT:    sub a1, a1, a0
 ; RV32IM-NEXT:    mv a0, a2
 ; RV32IM-NEXT:    ret
@@ -500,8 +500,8 @@ entry:
 define i64 @and_i64(i64 %a, i64 %b) {
 ; RV32IM-LABEL: and_i64:
 ; RV32IM:       # %bb.0: # %entry
-; RV32IM-NEXT:    and a0, a0, a2
 ; RV32IM-NEXT:    and a1, a1, a3
+; RV32IM-NEXT:    and a0, a0, a2
 ; RV32IM-NEXT:    ret
 ;
 ; RV64IM-LABEL: and_i64:
@@ -516,8 +516,8 @@ entry:
 define i64 @andi_i64(i64 %a) {
 ; RV32IM-LABEL: andi_i64:
 ; RV32IM:       # %bb.0: # %entry
-; RV32IM-NEXT:    andi a0, a0, 1234
 ; RV32IM-NEXT:    li a1, 0
+; RV32IM-NEXT:    andi a0, a0, 1234
 ; RV32IM-NEXT:    ret
 ;
 ; RV64IM-LABEL: andi_i64:
@@ -532,8 +532,8 @@ entry:
 define i64 @or_i64(i64 %a, i64 %b) {
 ; RV32IM-LABEL: or_i64:
 ; RV32IM:       # %bb.0: # %entry
-; RV32IM-NEXT:    or a0, a0, a2
 ; RV32IM-NEXT:    or a1, a1, a3
+; RV32IM-NEXT:    or a0, a0, a2
 ; RV32IM-NEXT:    ret
 ;
 ; RV64IM-LABEL: or_i64:
@@ -563,8 +563,8 @@ entry:
 define i64 @xor_i64(i64 %a, i64 %b) {
 ; RV32IM-LABEL: xor_i64:
 ; RV32IM:       # %bb.0: # %entry
-; RV32IM-NEXT:    xor a0, a0, a2
 ; RV32IM-NEXT:    xor a1, a1, a3
+; RV32IM-NEXT:    xor a0, a0, a2
 ; RV32IM-NEXT:    ret
 ;
 ; RV64IM-LABEL: xor_i64:
@@ -597,8 +597,8 @@ define i64 @mul_i64(i64 %a, i64 %b) {
 ; RV32IM-NEXT:    mul a4, a0, a2
 ; RV32IM-NEXT:    mul a1, a1, a2
 ; RV32IM-NEXT:    mul a3, a0, a3
-; RV32IM-NEXT:    mulhu a0, a0, a2
 ; RV32IM-NEXT:    add a1, a1, a3
+; RV32IM-NEXT:    mulhu a0, a0, a2
 ; RV32IM-NEXT:    add a1, a1, a0
 ; RV32IM-NEXT:    mv a0, a4
 ; RV32IM-NEXT:    ret
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll b/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
index f33ba1d7a302ef..acd32cff21cad3 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/bitmanip.ll
@@ -6,18 +6,18 @@ define i2 @bitreverse_i2(i2 %x) {
 ; RV32-LABEL: bitreverse_i2:
 ; RV32:       # %bb.0:
 ; RV32-NEXT:    slli a1, a0, 1
-; RV32-NEXT:    andi a1, a1, 2
 ; RV32-NEXT:    andi a0, a0, 3
 ; RV32-NEXT:    srli a0, a0, 1
+; RV32-NEXT:    andi a1, a1, 2
 ; RV32-NEXT:    or a0, a1, a0
 ; RV32-NEXT:    ret
 ;
 ; RV64-LABEL: bitreverse_i2:
 ; RV64:       # %bb.0:
 ; RV64-NEXT:    slli a1, a0, 1
-; RV64-NEXT:    andi a1, a1, 2
 ; RV64-NEXT:    andi a0, a0, 3
 ; RV64-NEXT:    srli a0, a0, 1
+; RV64-NEXT:    andi a1, a1, 2
 ; RV64-NEXT:    or a0, a1, a0
 ; RV64-NEXT:    ret
   %rev = call i2 @llvm.bitreverse.i2(i2 %x)
@@ -31,8 +31,8 @@ define i3 @bitreverse_i3(i3 %x) {
 ; RV32-NEXT:    andi a1, a1, 4
 ; RV32-NEXT:    andi a0, a0, 7
 ; RV32-NEXT:    andi a2, a0, 2
-; RV32-NEXT:    or a1, a1, a2
 ; RV32-NEXT:    srli a0, a0, 2
+; RV32-NEXT:    or a1, a1, a2
 ; RV32-NEXT:    or a0, a1, a0
 ; RV32-NEXT:    ret
 ;
@@ -42,8 +42,8 @@ define i3 @bitreverse_i3(i3 %x) {
 ; RV64-NEXT:    andi a1, a1, 4
 ; RV64-NEXT:    andi a0, a0, 7
 ; RV64-NEXT:    andi a2, a0, 2
-; RV64-NEXT:    or a1, a1, a2
 ; RV64-NEXT:    srli a0, a0, 2
+; RV64-NEXT:    or a1, a1, a2
 ; RV64-NEXT:    or a0, a1, a0
 ; RV64-NEXT:    ret
   %rev = call i3 @llvm.bitreverse.i3(i3 %x)
@@ -61,8 +61,8 @@ define i4 @bitreverse_i4(i4 %x) {
 ; RV32-NEXT:    andi a0, a0, 15
 ; RV32-NEXT:    srli a2, a0, 1
 ; RV32-NEXT:    andi a2, a2, 2
-; RV32-NEXT:    or a1, a1, a2
 ; RV32-NEXT:    srli a0, a0, 3
+; RV32-NEXT:    or a1, a1, a2
 ; RV32-NEXT:    or a0, a1, a0
 ; RV32-NEXT:    ret
 ;
@@ -76,8 +76,8 @@ define i4 @bitreverse_i4(i4 %x) {
 ; RV64-NEXT:    andi a0, a0, 15
 ; RV64-NEXT:    srli a2, a0, 1
 ; RV64-NEXT:    andi a2, a2, 2
-; RV64-NEXT:    or a1, a1, a2
 ; RV64-NEXT:    srli a0, a0, 3
+; RV64-NEXT:    or a1, a1, a2
 ; RV64-NEXT:    or a0, a1, a0
 ; RV64-NEXT:    ret
   %rev = call i4 @llvm.bitreverse.i4(i4 %x)
@@ -103,8 +103,8 @@ define i7 @bitreverse_i7(i7 %x) {
 ; RV32-NEXT:    srli a3, a0, 4
 ; RV32-NEXT:    andi a3, a3, 2
 ; RV32-NEXT:    or a2, a2, a3
-; RV32-NEXT:    or a1, a1, a2
 ; RV32-NEXT:    srli a0, a0, 6
+; RV32-NEXT:    or a1, a1, a2
 ; RV32-NEXT:    or a0, a1, a0
 ; RV32-NEXT:    ret
 ;
@@ -126,8 +126,8 @@ define i7 @bitreverse_i7(i7 %x) {
 ; RV64-NEXT:    srli a3, a0, 4
 ; RV64-NEXT:    andi a3, a3, 2
 ; RV64-NEXT:    or a2, a2, a3
-; RV64-NEXT:    or a1, a1, a2
 ; RV64-NEXT:    srli a0, a0, 6
+; RV64-NEXT:    or a1, a1, a2
 ; RV64-NEXT:    or a0, a1, a0
 ; RV64-NEXT:    ret
   %rev = call i7 @llvm.bitreverse.i7(i7 %x)
@@ -163,9 +163,9 @@ define i24 @bitreverse_i24(i24 %x) {
 ; RV32-NEXT:    addi a1, a1, -1366
 ; RV32-NEXT:    and a2, a1, a2
 ; RV32-NEXT:    and a2, a0, a2
-; RV32-NEXT:    srli a2, a2, 1
 ; RV32-NEXT:    slli a0, a0, 1
 ; RV32-NEXT:    and a0, a0, a1
+; RV32-NEXT:    srli a2, a2, 1
 ; RV32-NEXT:    or a0, a2, a0
 ; RV32-NEXT:    ret
 ;
@@ -197,9 +197,9 @@ define i24 @bitreverse_i24(i24 %x) {
 ; RV64-NEXT:    addiw a1, a1, -1366
 ; RV64-NEXT:    and a2, a1, a2
 ; RV64-NEXT:    and a2, a0, a2
-; RV64-NEXT:    srli a2, a2, 1
 ; RV64-NEXT:    slli a0, a0, 1
 ; RV64-NEXT:    and a0, a0, a1
+; RV64-NEXT:    srli a2, a2, 1
 ; RV64-NEXT:    or a0, a2, a0
 ; RV64-NEXT:    ret
   %rev = call i24 @llvm.bitreverse.i24(i24 %x)
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
index 70d1b25309c844..9bea20efb3eccd 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv32.ll
@@ -46,10 +46,10 @@ define void @constant_fold_barrier_i128(ptr %p) {
 ; RV32-NEXT:    or a1, a4, a1
 ; RV32-NEXT:    add a5, a5, zero
 ; RV32-NEXT:    add a1, a5, a1
-; RV32-NEXT:    sw a2, 0(a0)
-; RV32-NEXT:    sw a6, 4(a0)
-; RV32-NEXT:    sw a3, 8(a0)
 ; RV32-NEXT:    sw a1, 12(a0)
+; RV32-NEXT:    sw a3, 8(a0)
+; RV32-NEXT:    sw a6, 4(a0)
+; RV32-NEXT:    sw a2, 0(a0)
 ; RV32-NEXT:    ret
 entry:
   %x = load i128, ptr %p
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
index 51e8b6da39d099..be4ade025b413f 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/constbarrier-rv64.ll
@@ -25,8 +25,8 @@ define i128 @constant_fold_barrier_i128(i128 %x) {
 ; RV64-NEXT:    and a0, a0, a2
 ; RV64-NEXT:    and a1, a1, zero
 ; RV64-NEXT:    add a0, a0, a2
-; RV64-NEXT:    sltu a2, a0, a2
 ; RV64-NEXT:    add a1, a1, zero
+; RV64-NEXT:    sltu a2, a0, a2
 ; RV64-NEXT:    add a1, a1, a2
 ; RV64-NEXT:    ret
 entry:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
index a4f92640697bc7..33ac5cc5a07443 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
@@ -43,8 +43,8 @@ define i32 @fcvt_wu_d(double %a) nounwind {
 define i32 @fcvt_wu_d_multiple_use(double %x, ptr %y) nounwind {
 ; RV32IFD-LABEL: fcvt_wu_d_multiple_use:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    fcvt.wu.d a1, fa0, rtz
 ; RV32IFD-NEXT:    li a0, 1
+; RV32IFD-NEXT:    fcvt.wu.d a1, fa0, rtz
 ; RV32IFD-NEXT:    beqz a1, .LBB4_2
 ; RV32IFD-NEXT:  # %bb.1:
 ; RV32IFD-NEXT:    mv a0, a1
@@ -156,8 +156,8 @@ define i64 @fmv_x_d(double %a, double %b) nounwind {
 ; RV32IFD-NEXT:    addi sp, sp, -16
 ; RV32IFD-NEXT:    fadd.d fa5, fa0, fa1
 ; RV32IFD-NEXT:    fsd fa5, 8(sp)
-; RV32IFD-NEXT:    lw a0, 8(sp)
 ; RV32IFD-NEXT:    lw a1, 12(sp)
+; RV32IFD-NEXT:    lw a0, 8(sp)
 ; RV32IFD-NEXT:    addi sp, sp, 16
 ; RV32IFD-NEXT:    ret
 ;
@@ -214,8 +214,8 @@ define double @fmv_d_x(i64 %a, i64 %b) nounwind {
 ; RV32IFD-NEXT:    sw a0, 8(sp)
 ; RV32IFD-NEXT:    sw a1, 12(sp)
 ; RV32IFD-NEXT:    fld fa5, 8(sp)
-; RV32IFD-NEXT:    sw a2, 8(sp)
 ; RV32IFD-NEXT:    sw a3, 12(sp)
+; RV32IFD-NEXT:    sw a2, 8(sp)
 ; RV32IFD-NEXT:    fld fa4, 8(sp)
 ; RV32IFD-NEXT:    fadd.d fa0, fa5, fa4
 ; RV32IFD-NEXT:    addi sp, sp, 16
@@ -223,8 +223,8 @@ define double @fmv_d_x(i64 %a, i64 %b) nounwind {
 ;
 ; RV64IFD-LABEL: fmv_d_x:
 ; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    fmv.d.x fa5, a0
 ; RV64IFD-NEXT:    fmv.d.x fa4, a1
+; RV64IFD-NEXT:    fmv.d.x fa5, a0
 ; RV64IFD-NEXT:    fadd.d fa0, fa5, fa4
 ; RV64IFD-NEXT:    ret
   %1 = bitcast i64 %a to double
@@ -330,17 +330,17 @@ define signext i16 @fcvt_w_s_i16(double %a) nounwind {
 define zeroext i16 @fcvt_wu_s_i16(double %a) nounwind {
 ; RV32IFD-LABEL: fcvt_wu_s_i16:
 ; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    fcvt.wu.d a0, fa0, rtz
 ; RV32IFD-NEXT:    lui a1, 16
 ; RV32IFD-NEXT:    addi a1, a1, -1
+; RV32IFD-NEXT:    fcvt.wu.d a0, fa0, rtz
 ; RV32IFD-NEXT:    and a0, a0, a1
 ; RV32IFD-NEXT:    ret
 ;
 ; RV64IFD-LABEL: fcvt_wu_s_i16:
 ; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    fcvt.wu.d a0, fa0, rtz
 ; RV64IFD-NEXT:    lui a1, 16
 ; RV64IFD-NEXT:    addiw a1, a1, -1
+; RV64IFD-NEXT:    fcvt.wu.d a0, fa0, rtz
 ; RV64IFD-NEXT:    and a0, a0, a1
 ; RV64IFD-NEXT:    ret
   %1 = fptoui double %a to i16
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
index 7e96d529af36ff..6ccef58d488108 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
@@ -27,8 +27,8 @@ define i32 @fcvt_wu_s(float %a) nounwind {
 define i32 @fcvt_wu_s_multiple_use(float %x, ptr %y) nounwind {
 ; RV32IF-LABEL: fcvt_wu_s_multiple_use:
 ; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    fcvt.wu.s a1, fa0, rtz
 ; RV32IF-NEXT:    li a0, 1
+; RV32IF-NEXT:    fcvt.wu.s a1, fa0, rtz
 ; RV32IF-NEXT:    beqz a1, .LBB2_2
 ; RV32IF-NEXT:  # %bb.1:
 ; RV32IF-NEXT:    mv a0, a1
@@ -120,8 +120,8 @@ define float @fcvt_s_wu_load(ptr %p) nounwind {
 define float @fmv_w_x(i32 %a, i32 %b) nounwind {
 ; CHECKIF-LABEL: fmv_w_x:
 ; CHECKIF:       # %bb.0:
-; CHECKIF-NEXT:    fmv.w.x fa5, a0
 ; CHECKIF-NEXT:    fmv.w.x fa4, a1
+; CHECKIF-NEXT:    fmv.w.x fa5, a0
 ; CHECKIF-NEXT:    fadd.s fa0, fa5, fa4
 ; CHECKIF-NEXT:    ret
 ; Ensure fmv.w.x is generated even for a soft float calling convention
@@ -302,17 +302,17 @@ define signext i16 @fcvt_w_s_i16(float %a) nounwind {
 define zeroext i16 @fcvt_wu_s_i16(float %a) nounwind {
 ; RV32IF-LABEL: fcvt_wu_s_i16:
 ; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    fcvt.wu.s a0, fa0, rtz
 ; RV32IF-NEXT:    lui a1, 16
 ; RV32IF-NEXT:    addi a1, a1, -1
+; RV32IF-NEXT:    fcvt.wu.s a0, fa0, rtz
 ; RV32IF-NEXT:    and a0, a0, a1
 ; RV32IF-NEXT:    ret
 ;
 ; RV64IF-LABEL: fcvt_wu_s_i16:
 ; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    fcvt.wu.s a0, fa0, rtz
 ; RV64IF-NEXT:    lui a1, 16
 ; RV64IF-NEXT:    addiw a1, a1, -1
+; RV64IF-NEXT:    fcvt.wu.s a0, fa0, rtz
 ; RV64IF-NEXT:    and a0, a0, a1
 ; RV64IF-NEXT:    ret
   %1 = fptoui float %a to i16
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/fpr-gpr-copy-rv32.ll b/llvm/test/CodeGen/RISCV/GlobalISel/fpr-gpr-copy-rv32.ll
index 1757e5550f81ae..250e8edafa836f 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/fpr-gpr-copy-rv32.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/fpr-gpr-copy-rv32.ll
@@ -9,8 +9,8 @@
 define float @fadd(float %x, float %y) {
 ; RV32I-LABEL: fadd:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    fmv.w.x fa5, a0
 ; RV32I-NEXT:    fmv.w.x fa4, a1
+; RV32I-NEXT:    fmv.w.x fa5, a0
 ; RV32I-NEXT:    fadd.s fa5, fa5, fa4
 ; RV32I-NEXT:    fmv.x.w a0, fa5
 ; RV32I-NEXT:    ret
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/fpr-gpr-copy-rv64.ll b/llvm/test/CodeGen/RISCV/GlobalISel/fpr-gpr-copy-rv64.ll
index 287bbbad6d52d7..717ecac7300b1b 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/fpr-gpr-copy-rv64.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/fpr-gpr-copy-rv64.ll
@@ -9,8 +9,8 @@
 define double @fadd_f64(double %x, double %y) {
 ; RV64I-LABEL: fadd_f64:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    fmv.d.x fa5, a0
 ; RV64I-NEXT:    fmv.d.x fa4, a1
+; RV64I-NEXT:    fmv.d.x fa5, a0
 ; RV64I-NEXT:    fadd.d fa5, fa5, fa4
 ; RV64I-NEXT:    fmv.x.d a0, fa5
 ; RV64I-NEXT:    ret
@@ -30,6 +30,13 @@ define float @fadd_f32(float %x, float %y) {
 ; RV32I-NEXT:    fadd.d fa5, fa5, fa4
 ; RV32I-NEXT:    fmv.x.d a0, fa5
 ; RV32I-NEXT:    ret
+; RV64I-LABEL: fadd_f32:
+; RV64I:       # %bb.0:
+; RV64I-NEXT:    fmv.w.x fa4, a1
+; RV64I-NEXT:    fmv.w.x fa5, a0
+; RV64I-NEXT:    fadd.s fa5, fa5, fa4
+; RV64I-NEXT:    fmv.x.w a0, fa5
+; RV64I-NEXT:    ret
   %a = fadd float %x, %y
   ret float %a
 }
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll b/llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll
index 05989c310541b8..82540a3976f357 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/iabs.ll
@@ -120,8 +120,8 @@ define i64 @abs64(i64 %x) {
 ; RV32I-NEXT:    sltu a3, a0, a2
 ; RV32I-NEXT:    add a1, a1, a2
 ; RV32I-NEXT:    add a1, a1, a3
-; RV32I-NEXT:    xor a0, a0, a2
 ; RV32I-NEXT:    xor a1, a1, a2
+; RV32I-NEXT:    xor a0, a0, a2
 ; RV32I-NEXT:    ret
 ;
 ; RV32ZBB-LABEL: abs64:
@@ -131,8 +131,8 @@ define i64 @abs64(i64 %x) {
 ; RV32ZBB-NEXT:    sltu a3, a0, a2
 ; RV32ZBB-NEXT:    add a1, a1, a2
 ; RV32ZBB-NEXT:    add a1, a1, a3
-; RV32ZBB-NEXT:    xor a0, a0, a2
 ; RV32ZBB-NEXT:    xor a1, a1, a2
+; RV32ZBB-NEXT:    xor a0, a0, a2
 ; RV32ZBB-NEXT:    ret
 ;
 ; RV64I-LABEL: abs64:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/jumptable.ll b/llvm/test/CodeGen/RISCV/GlobalISel/jumptable.ll
index 9dda1a241e042b..018c135cc8626c 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/jumptable.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/jumptable.ll
@@ -15,13 +15,13 @@
 define void @above_threshold(i32 signext %in, ptr %out) nounwind {
 ; RV32I-SMALL-LABEL: above_threshold:
 ; RV32I-SMALL:       # %bb.0: # %entry
-; RV32I-SMALL-NEXT:    li a2, 5
 ; RV32I-SMALL-NEXT:    addi a0, a0, -1
+; RV32I-SMALL-NEXT:    li a2, 5
 ; RV32I-SMALL-NEXT:    bltu a2, a0, .LBB0_9
 ; RV32I-SMALL-NEXT:  # %bb.1: # %entry
 ; RV32I-SMALL-NEXT:    lui a2, %hi(.LJTI0_0)
-; RV32I-SMALL-NEXT:    addi a2, a2, %lo(.LJTI0_0)
 ; RV32I-SMALL-NEXT:    slli a0, a0, 2
+; RV32I-SMALL-NEXT:    addi a2, a2, %lo(.LJTI0_0)
 ; RV32I-SMALL-NEXT:    add a0, a2, a0
 ; RV32I-SMALL-NEXT:    lw a0, 0(a0)
 ; RV32I-SMALL-NEXT:    jr a0
@@ -49,14 +49,14 @@ define void @above_threshold(i32 signext %in, ptr %out) nounwind {
 ;
 ; RV32I-MEDIUM-LABEL: above_threshold:
 ; RV32I-MEDIUM:       # %bb.0: # %entry
-; RV32I-MEDIUM-NEXT:    li a2, 5
 ; RV32I-MEDIUM-NEXT:    addi a0, a0, -1
+; RV32I-MEDIUM-NEXT:    li a2, 5
 ; RV32I-MEDIUM-NEXT:    bltu a2, a0, .LBB0_9
 ; RV32I-MEDIUM-NEXT:  # %bb.1: # %entry
 ; RV32I-MEDIUM-NEXT:  .Lpcrel_hi0:
 ; RV32I-MEDIUM-NEXT:    auipc a2, %pcrel_hi(.LJTI0_0)
-; RV32I-MEDIUM-NEXT:    addi a2, a2, %pcrel_lo(.Lpcrel_hi0)
 ; RV32I-MEDIUM-NEXT:    slli a0, a0, 2
+; RV32I-MEDIUM-NEXT:    addi a2, a2, %pcrel_lo(.Lpcrel_hi0)
 ; RV32I-MEDIUM-NEXT:    add a0, a2, a0
 ; RV32I-MEDIUM-NEXT:    lw a0, 0(a0)
 ; RV32I-MEDIUM-NEXT:    jr a0
@@ -84,14 +84,14 @@ define void @above_threshold(i32 signext %in, ptr %out) nounwind {
 ;
 ; RV32I-PIC-LABEL: above_threshold:
 ; RV32I-PIC:       # %bb.0: # %entry
-; RV32I-PIC-NEXT:    li a2, 5
 ; RV32I-PIC-NEXT:    addi a0, a0, -1
+; RV32I-PIC-NEXT:    li a2, 5
 ; RV32I-PIC-NEXT:    bltu a2, a0, .LBB0_9
 ; RV32I-PIC-NEXT:  # %bb.1: # %entry
 ; RV32I-PIC-NEXT:  .Lpcrel_hi0:
 ; RV32I-PIC-NEXT:    auipc a2, %pcrel_hi(.LJTI0_0)
-; RV32I-PIC-NEXT:    addi a2, a2, %pcrel_lo(.Lpcrel_hi0)
 ; RV32I-PIC-NEXT:    slli a0, a0, 2
+; RV32I-PIC-NEXT:    addi a2, a2, %pcrel_lo(.Lpcrel_hi0)
 ; RV32I-PIC-NEXT:    add a0, a2, a0
 ; RV32I-PIC-NEXT:    lw a0, 0(a0)
 ; RV32I-PIC-NEXT:    add a0, a0, a2
@@ -120,13 +120,13 @@ define void @above_threshold(i32 signext %in, ptr %out) nounwind {
 ;
 ; RV64I-SMALL-LABEL: above_threshold:
 ; RV64...
[truncated]

Copy link
Collaborator

@preames preames left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least for me, the diff here is too large to meaningfully review. I think we need an alternate form of justification that this is profitable. Can you present either some performance numbers or at minimum some kind of statistic that demonstrates value?

For context, I'm not particularly skeptical of the patch - it seems to make sense - the diffs are just way too big to be meaningfully skimmed.

@michaelmaitland
Copy link
Contributor

How are we evaluating this change? On spills? On impact to dynamic IC? On impact to runtime on real hardware?

@wangpc-pp
Copy link
Contributor Author

wangpc-pp commented Nov 13, 2024

How are we evaluating this change? On spills? On impact to dynamic IC? On impact to runtime on real hardware?

IIUC, PostRA scheduling basically won't impact on spills and dynamic instruction count. I will show some cycles/IPC based on GEM5.
I do agree this should be tuned by CPUs, I will make it a tuning feature later but we still need a default setting (topdown, bottomup or bidirectional, this can be debated).

@wangpc-pp
Copy link
Contributor Author

I got some results on different platforms.

  1. Coremark with 60000 iterations on GEM5: -O3 -march=rv64gc (use this configuration so there are more instructions to schedule and GEM5 has a better support)
  • RiscvO3CPU
Baseline TopDown BottomUp Bidirectional
Cycles 33140248616 33031782179 33173323700 33063777692
IPC 1.381871 1.386409 1.380822 1.383095
  • RiscvAtomicSimpleCPU
Baseline TopDown BottomUp Bidirectional
Cycles 30563561357 30935303916 30898754397 30563777692
IPC 0.879774 0.869409 0.870437 0.879768
  1. Coremark on Spacemit-X60 (-O3 -mcpu=spacemit-x60):
Baseline TopDown BottomUp Bidirectional
Iterations/Sec 4763.71 4993.34 4730.50 4853.86

The results differ on different platforms so it is really hard to determine a default value. I will make it a target feature and leave the default value to be Bidirectional (this can be debated).

@mshockwave
Copy link
Member

I got some results on different platforms.

  1. Coremark with 60000 iterations on GEM5: -O3 -march=rv64gc (use this configuration so there are more instructions to schedule and GEM5 has a better support)
  • RiscvO3CPU

Baseline TopDown BottomUp Bidirectional
Cycles 33140248616 33031782179 33173323700 33063777692
IPC 1.381871 1.386409 1.380822 1.383095

  • RiscvAtomicSimpleCPU

Baseline TopDown BottomUp Bidirectional
Cycles 30563561357 30935303916 30898754397 30563777692
IPC 0.879774 0.869409 0.870437 0.879768
2. Coremark on Spacemit-X60 (-O3 -mcpu=spacemit-x60):

Baseline TopDown BottomUp Bidirectional
Iterations/Sec 4763.71 4993.34 4730.50 4853.86
The results differ on different platforms so it is really hard to determine a default value. I will make it a target feature and leave the default value to be Bidirectional (this can be debated).

I have no objection making it into a tuning feature.

@wangpc-pp wangpc-pp force-pushed the main-riscv-override-postra-sched-policy branch from 4237baf to ae40c24 Compare November 19, 2024 05:21
@wangpc-pp wangpc-pp changed the title [RISCV] Enable bidirectional postra scheduling [RISCV] Add tune info for postra scheduling direction Nov 19, 2024
@topperc
Copy link
Collaborator

topperc commented Nov 19, 2024

Let me make sure I understand. BottomUp and bi-directional RA scheduling were added last year by @michaelmaitland 9106b58. I don't think we ended up enabling inside SiFive. Have any other targets adopted it yet?

@wangpc-pp
Copy link
Contributor Author

wangpc-pp commented Nov 19, 2024

Let me make sure I understand. BottomUp and bi-directional RA scheduling were added last year by @michaelmaitland 9106b58. I don't think we ended up enabling inside SiFive. Have any other targets adopted it yet?

Not yet. And I just found some issues when enabling postra bidirectional scheduling: #116592 and #116584.
Or we can set it to topdown by default, almost all current data show that it has the best result and it is the previous default direction.

@wangpc-pp wangpc-pp force-pushed the main-riscv-override-postra-sched-policy branch from ae40c24 to c3841ef Compare December 10, 2024 07:53
@topperc
Copy link
Collaborator

topperc commented Dec 10, 2024

Member

Let's go with top down by default since that's what other target use.

The results differ on different platforms so it is really hard to
determine a common default value.

Tune info for postra scheduling direction is added and CPUs can
set their own preferable postra scheduling direction.
@wangpc-pp wangpc-pp force-pushed the main-riscv-override-postra-sched-policy branch from c3841ef to 4989eb0 Compare December 13, 2024 04:24
@wangpc-pp
Copy link
Contributor Author

Member

Let's go with top down by default since that's what other target use.

Done.

Copy link
Contributor

@michaelmaitland michaelmaitland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@wangpc-pp wangpc-pp merged commit 9571d20 into llvm:main Dec 16, 2024
8 checks passed
@wangpc-pp wangpc-pp deleted the main-riscv-override-postra-sched-policy branch December 16, 2024 04:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants