Skip to content

Commit 1ba6abc

Browse files
arsenmtru
authored andcommitted
AMDGPU: Fix fast math log2 f32
Apparently afn doesn't allow you to drop the denormal handling according to OpenCL conformance. This was hidden by losing the flags during the library linking process. Fast log is still broken and needs more work. https://reviews.llvm.org/D157936 (cherry picked from commit e09b359)
1 parent 45d5dfb commit 1ba6abc

File tree

4 files changed

+317
-65
lines changed

4 files changed

+317
-65
lines changed

llvm/docs/AMDGPUUsage.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -980,8 +980,7 @@ The AMDGPU backend implements the following LLVM IR intrinsics.
980980
half). Not implemented for double. Hardware provides
981981
1ULP accuracy for float, and 0.51ULP for half. Float
982982
instruction does not natively support denormal
983-
inputs. Backend will optimize out denormal scaling if
984-
marked with the :ref:`afn <fastmath_afn>` flag.
983+
inputs.
985984

986985
:ref:`llvm.sqrt <int_sqrt>` Implemented for double, float and half (and vectors).
987986

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2528,7 +2528,7 @@ SDValue AMDGPUTargetLowering::getIsFinite(SelectionDAG &DAG, SDValue Src,
25282528
std::pair<SDValue, SDValue>
25292529
AMDGPUTargetLowering::getScaledLogInput(SelectionDAG &DAG, const SDLoc SL,
25302530
SDValue Src, SDNodeFlags Flags) const {
2531-
if (allowApproxFunc(DAG, Flags) || !needsDenormHandlingF32(DAG, Src, Flags))
2531+
if (!needsDenormHandlingF32(DAG, Src, Flags))
25322532
return {};
25332533

25342534
MVT VT = MVT::f32;

llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3037,8 +3037,7 @@ static bool needsDenormHandlingF32(const MachineFunction &MF, Register Src,
30373037
std::pair<Register, Register>
30383038
AMDGPULegalizerInfo::getScaledLogInput(MachineIRBuilder &B, Register Src,
30393039
unsigned Flags) const {
3040-
if (allowApproxFunc(B.getMF(), Flags) ||
3041-
!needsDenormHandlingF32(B.getMF(), Src, Flags))
3040+
if (!needsDenormHandlingF32(B.getMF(), Src, Flags))
30423041
return {};
30433042

30443043
const LLT F32 = LLT::scalar(32);

0 commit comments

Comments
 (0)