Skip to content

Conversation

@futog
Copy link
Contributor

@futog futog commented May 14, 2025

By adding FRM/FFLAGS as implicit defs, ReadFRM is not optimized out.

@llvmbot
Copy link
Member

llvmbot commented May 14, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Gergely Futo (futog)

Changes

ReadFRM should not be optimized out.


Full diff: https://github.com/llvm/llvm-project/pull/139864.diff

1 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVInstrInfo.td (+2)
diff --git a/llvm/lib/Target/RISCV/RISCVInstrInfo.td b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
index e9bdeb88e4ca8..88ee311b51543 100644
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -2006,7 +2006,9 @@ class SwapSysRegImm<SysReg SR, list<Register> Regs>
   let Defs = Regs;
 }
 
+let hasSideEffects = true in
 def ReadFRM : ReadSysReg<SysRegFRM, [FRM]>;
+
 let hasPostISelHook = 1 in {
 def WriteFRM : WriteSysReg<SysRegFRM, [FRM]>;
 def WriteFRMImm : WriteSysRegImm<SysRegFRM, [FRM]>;

@futog futog requested a review from topperc May 14, 2025 09:22
Copy link
Contributor

@wangpc-pp wangpc-pp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a case to show the miscompile?

@futog
Copy link
Contributor Author

futog commented May 14, 2025

Do you have a case to show the miscompile?

Sure, here you go: https://godbolt.org/z/Kx1qsTeGY

@wangpc-pp
Copy link
Contributor

Do you have a case to show the miscompile?

Sure, here you go: https://godbolt.org/z/Kx1qsTeGY

Then please add a .ll test and precommit it.

@futog futog force-pushed the futog/riscv-readfrm branch from 240ec09 to d2515dd Compare May 14, 2025 15:07
@topperc
Copy link
Collaborator

topperc commented May 14, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

@futog
Copy link
Contributor Author

futog commented May 15, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

No. As far as I understand, this is used in case of inline asm.

@tclin914
Copy link
Contributor

Should we mark frm not preserved across call?

@topperc
Copy link
Collaborator

topperc commented May 16, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

No. As far as I understand, this is used in case of inline asm.

It's used to add FRM as an implicit def to calls.

  // Add rounding control registers as implicit def for function call.
  if (II.isCall() && MF->getFunction().hasFnAttribute(Attribute::StrictFP)) {
    ArrayRef<MCPhysReg> RCRegs = TLI->getRoundingControlRegisters();
    for (MCPhysReg Reg : RCRegs)
      UsedRegs.push_back(Reg);
  } 

I've tested locally that it does fix your test case.

@futog
Copy link
Contributor Author

futog commented May 16, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

No. As far as I understand, this is used in case of inline asm.

It's used to add FRM as an implicit def to calls.

  // Add rounding control registers as implicit def for function call.
  if (II.isCall() && MF->getFunction().hasFnAttribute(Attribute::StrictFP)) {
    ArrayRef<MCPhysReg> RCRegs = TLI->getRoundingControlRegisters();
    for (MCPhysReg Reg : RCRegs)
      UsedRegs.push_back(Reg);
  } 

I've tested locally that it does fix your test case.

Oops I missed that if, sorry, but yes I also implemented RISCVTargetLowering::getRoundingControlRegisters() and machine-cse still reordered things.

@topperc
Copy link
Collaborator

topperc commented May 16, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

No. As far as I understand, this is used in case of inline asm.

It's used to add FRM as an implicit def to calls.

  // Add rounding control registers as implicit def for function call.
  if (II.isCall() && MF->getFunction().hasFnAttribute(Attribute::StrictFP)) {
    ArrayRef<MCPhysReg> RCRegs = TLI->getRoundingControlRegisters();
    for (MCPhysReg Reg : RCRegs)
      UsedRegs.push_back(Reg);
  } 

I've tested locally that it does fix your test case.

Oops I missed that if, sorry, but yes I also implemented RISCVTargetLowering::getRoundingControlRegisters() and machine-cse still reordered things.

Do you have a test case that shows machine-cse reordering things?

@futog
Copy link
Contributor Author

futog commented May 17, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

No. As far as I understand, this is used in case of inline asm.

It's used to add FRM as an implicit def to calls.

  // Add rounding control registers as implicit def for function call.
  if (II.isCall() && MF->getFunction().hasFnAttribute(Attribute::StrictFP)) {
    ArrayRef<MCPhysReg> RCRegs = TLI->getRoundingControlRegisters();
    for (MCPhysReg Reg : RCRegs)
      UsedRegs.push_back(Reg);
  } 

I've tested locally that it does fix your test case.

Oops I missed that if, sorry, but yes I also implemented RISCVTargetLowering::getRoundingControlRegisters() and machine-cse still reordered things.

Do you have a test case that shows machine-cse reordering things?

Sure, the test in the precommit have this (#139921).

This is the behavior without the patch:

# *** IR Dump After Verify generated machine code (machineverifier) ***:
# Machine code for function test_get_rounding_sideeffect: IsSSA, TracksLiveness

bb.0.entry:
  successors: %bb.1(0x30000000), %bb.2(0x50000000); %bb.1(37.50%), %bb.2(62.50%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %3:gpr = ADDI $x0, 1
  $x10 = COPY %3:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %5:gpr = ReadFRM implicit $frm
  %6:gpr = SLLIW killed %5:gpr, 2
  %7:gpr = LUI 66
  %8:gpr = ADDIW killed %7:gpr, 769
  %9:gpr = SRL killed %8:gpr, killed %6:gpr
  %10:gpr = ANDI killed %9:gpr, 7
  %11:gpr = COPY $x0
  %2:gpr = COPY %11:gpr
  BNE killed %10:gpr, $x0, %bb.2
  PseudoBR %bb.1

bb.1.if.end:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %12:gpr = COPY $x0
  $x10 = COPY %12:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %14:gpr = ReadFRM implicit $frm
  %15:gpr = SLLIW killed %14:gpr, 2
  %16:gpr = LUI 66
  %17:gpr = ADDIW killed %16:gpr, 769
  %18:gpr = SRL killed %17:gpr, killed %15:gpr
  %19:gpr = ANDI killed %18:gpr, 7
  %20:gpr = ADDI killed %19:gpr, -1
  %0:gpr = SLTIU killed %20:gpr, 1

bb.2.return:
; predecessors: %bb.0, %bb.1

  %1:gpr = PHI %2:gpr, %bb.0, %0:gpr, %bb.1
  $x10 = COPY %1:gpr
  PseudoRET implicit $x10

# End machine code for function test_get_rounding_sideeffect.
# *** IR Dump After Machine Common Subexpression Elimination (machine-cse) ***:
# Machine code for function test_get_rounding_sideeffect: IsSSA, TracksLiveness

bb.0.entry:
  successors: %bb.1(0x30000000), %bb.2(0x50000000); %bb.1(37.50%), %bb.2(62.50%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %3:gpr = ADDI $x0, 1
  $x10 = COPY %3:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %5:gpr = ReadFRM implicit $frm
  %6:gpr = SLLIW %5:gpr, 2
  %7:gpr = LUI 66
  %8:gpr = ADDIW %7:gpr, 769
  %9:gpr = SRL %8:gpr, %6:gpr
  %10:gpr = ANDI %9:gpr, 7
  %11:gpr = COPY $x0
  %2:gpr = COPY %11:gpr
  BNE %10:gpr, $x0, %bb.2
  PseudoBR %bb.1

bb.1.if.end:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %12:gpr = COPY $x0
  $x10 = COPY %12:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %20:gpr = ADDI %10:gpr, -1
  %0:gpr = SLTIU killed %20:gpr, 1

bb.2.return:
; predecessors: %bb.0, %bb.1

  %1:gpr = PHI %2:gpr, %bb.0, %0:gpr, %bb.1
  $x10 = COPY %1:gpr
  PseudoRET implicit $x10

# End machine code for function test_get_rounding_sideeffect.

@topperc
Copy link
Collaborator

topperc commented May 17, 2025

Does implementing RISCVTargetLowering::getRoundingControlRegisters() fix this?

No. As far as I understand, this is used in case of inline asm.

It's used to add FRM as an implicit def to calls.

  // Add rounding control registers as implicit def for function call.
  if (II.isCall() && MF->getFunction().hasFnAttribute(Attribute::StrictFP)) {
    ArrayRef<MCPhysReg> RCRegs = TLI->getRoundingControlRegisters();
    for (MCPhysReg Reg : RCRegs)
      UsedRegs.push_back(Reg);
  } 

I've tested locally that it does fix your test case.

Oops I missed that if, sorry, but yes I also implemented RISCVTargetLowering::getRoundingControlRegisters() and machine-cse still reordered things.

Do you have a test case that shows machine-cse reordering things?

Sure, the test in the precommit have this (#139921).

This is the behavior without the patch:

# *** IR Dump After Verify generated machine code (machineverifier) ***:
# Machine code for function test_get_rounding_sideeffect: IsSSA, TracksLiveness

bb.0.entry:
  successors: %bb.1(0x30000000), %bb.2(0x50000000); %bb.1(37.50%), %bb.2(62.50%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %3:gpr = ADDI $x0, 1
  $x10 = COPY %3:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %5:gpr = ReadFRM implicit $frm
  %6:gpr = SLLIW killed %5:gpr, 2
  %7:gpr = LUI 66
  %8:gpr = ADDIW killed %7:gpr, 769
  %9:gpr = SRL killed %8:gpr, killed %6:gpr
  %10:gpr = ANDI killed %9:gpr, 7
  %11:gpr = COPY $x0
  %2:gpr = COPY %11:gpr
  BNE killed %10:gpr, $x0, %bb.2
  PseudoBR %bb.1

bb.1.if.end:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %12:gpr = COPY $x0
  $x10 = COPY %12:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %14:gpr = ReadFRM implicit $frm
  %15:gpr = SLLIW killed %14:gpr, 2
  %16:gpr = LUI 66
  %17:gpr = ADDIW killed %16:gpr, 769
  %18:gpr = SRL killed %17:gpr, killed %15:gpr
  %19:gpr = ANDI killed %18:gpr, 7
  %20:gpr = ADDI killed %19:gpr, -1
  %0:gpr = SLTIU killed %20:gpr, 1

bb.2.return:
; predecessors: %bb.0, %bb.1

  %1:gpr = PHI %2:gpr, %bb.0, %0:gpr, %bb.1
  $x10 = COPY %1:gpr
  PseudoRET implicit $x10

# End machine code for function test_get_rounding_sideeffect.
# *** IR Dump After Machine Common Subexpression Elimination (machine-cse) ***:
# Machine code for function test_get_rounding_sideeffect: IsSSA, TracksLiveness

bb.0.entry:
  successors: %bb.1(0x30000000), %bb.2(0x50000000); %bb.1(37.50%), %bb.2(62.50%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %3:gpr = ADDI $x0, 1
  $x10 = COPY %3:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %5:gpr = ReadFRM implicit $frm
  %6:gpr = SLLIW %5:gpr, 2
  %7:gpr = LUI 66
  %8:gpr = ADDIW %7:gpr, 769
  %9:gpr = SRL %8:gpr, %6:gpr
  %10:gpr = ANDI %9:gpr, 7
  %11:gpr = COPY $x0
  %2:gpr = COPY %11:gpr
  BNE %10:gpr, $x0, %bb.2
  PseudoBR %bb.1

bb.1.if.end:
; predecessors: %bb.0
  successors: %bb.2(0x80000000); %bb.2(100.00%)

  ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
  %12:gpr = COPY $x0
  $x10 = COPY %12:gpr
  PseudoCALL target-flags(riscv-call) @fesetround, <regmask $vlenb $x0 $x1 $x8 $x9 $x18 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $f8_f $f9_f $f18_f $f19_f $f20_f $f21_f $f22_f $f23_f $f24_f $f25_f $f26_f $f27_f $f8_h $f9_h $f18_h $f19_h $f20_h $f21_h and 40 more...>, implicit-def dead $x1, implicit $x10, implicit-def $x2, implicit-def $x10
  ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
  %20:gpr = ADDI %10:gpr, -1
  %0:gpr = SLTIU killed %20:gpr, 1

bb.2.return:
; predecessors: %bb.0, %bb.1

  %1:gpr = PHI %2:gpr, %bb.0, %0:gpr, %bb.1
  $x10 = COPY %1:gpr
  PseudoRET implicit $x10

# End machine code for function test_get_rounding_sideeffect.

The strictfp attribute is missing from the fesetround calls so the code I pasted from InstrEmitter.cpp that uses getRoundingControlRegisters doesn't fire.

@futog futog force-pushed the futog/riscv-readfrm branch from d2515dd to 73b1089 Compare May 17, 2025 07:57
@futog futog changed the title [RISCV] Add hasSideEffects = true to ReadFRM [RISCV] Implement RISCVTargetLowering::getRoundingControlRegisters May 17, 2025
@futog
Copy link
Contributor Author

futog commented May 17, 2025

updated the patch, thanks @topperc for your time and patience.

@futog futog force-pushed the futog/riscv-readfrm branch from 73b1089 to 3d808a0 Compare May 18, 2025 19:03
Copy link
Collaborator

@topperc topperc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

By adding FRM/FFLAGS as implicit defs, ReadFRM is not optimized out.
@futog futog force-pushed the futog/riscv-readfrm branch from 3d808a0 to bc2e640 Compare May 19, 2025 19:30
@futog futog merged commit e3b167c into llvm:main May 19, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants