Skip to content

Conversation

@Varnike
Copy link
Contributor

@Varnike Varnike commented Apr 24, 2025

Added support for FENV_ACCESS pragma on hard-float ARM platforms. Also changes were made to clang/test/Parser/pragma-fp-warn.c so that for thumbv7a only the soft-float-abi target case is checked.

@github-actions
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:ARM clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Apr 24, 2025
@llvmbot
Copy link
Member

llvmbot commented Apr 24, 2025

@llvm/pr-subscribers-llvm-globalisel
@llvm/pr-subscribers-backend-arm

@llvm/pr-subscribers-clang

Author: Erik Enikeev (Varnike)

Changes

Added support for FENV_ACCESS pragma on hard-float ARM platforms. Also changes were made to clang/test/Parser/pragma-fp-warn.c so that for thumbv7a only the soft-float-abi target case is checked.


Full diff: https://github.com/llvm/llvm-project/pull/137101.diff

2 Files Affected:

  • (modified) clang/lib/Basic/Targets/ARM.cpp (+2)
  • (modified) clang/test/Parser/pragma-fp-warn.c (+1-1)
diff --git a/clang/lib/Basic/Targets/ARM.cpp b/clang/lib/Basic/Targets/ARM.cpp
index ca2c1ffbb0eb7..e3ab6e9abf78a 100644
--- a/clang/lib/Basic/Targets/ARM.cpp
+++ b/clang/lib/Basic/Targets/ARM.cpp
@@ -363,6 +363,8 @@ ARMTargetInfo::ARMTargetInfo(const llvm::Triple &Triple,
                            : "\01mcount";
 
   SoftFloatABI = llvm::is_contained(Opts.FeaturesAsWritten, "+soft-float-abi");
+  if (!SoftFloatABI)
+    HasStrictFP = true;
 }
 
 StringRef ARMTargetInfo::getABI() const { return ABI; }
diff --git a/clang/test/Parser/pragma-fp-warn.c b/clang/test/Parser/pragma-fp-warn.c
index c52bd4e4805ab..f743cb87997dc 100644
--- a/clang/test/Parser/pragma-fp-warn.c
+++ b/clang/test/Parser/pragma-fp-warn.c
@@ -1,6 +1,6 @@
 
 // RUN: %clang_cc1 -triple wasm32 -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
-// RUN: %clang_cc1 -triple thumbv7 -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
+// RUN: %clang_cc1 -triple thumbv7 -fsyntax-only -target-feature +soft-float-abi  -Wno-unknown-pragmas -Wignored-pragmas -verify %s
 // RUN: %clang_cc1 -DEXPOK -triple aarch64 -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
 // RUN: %clang_cc1 -DEXPOK -triple x86_64 -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s
 // RUN: %clang_cc1 -DEXPOK -triple systemz -fsyntax-only -Wno-unknown-pragmas -Wignored-pragmas -verify %s

@davemgreen
Copy link
Collaborator

I believe the backend would still need work to make sure this is supported, which has not been done yet. I was expecting it to fail more noisily, but it appears the strict nodes are lowered to generic nodes. That doesn't mean that strict-fp is supported by the Arm backend, as it might not handle the instructions correctly at the moment (moving instructions past one another, for example).

@davemgreen
Copy link
Collaborator

@john-brawn-arm did the work for AArch64, I'm not sure if he would have an idea how much work it would involve for the Arm backend.

@john-brawn-arm
Copy link
Collaborator

It's been several years since I looked at this, but from my notes what's needed before setting HasStrictFP=true is:

  • Instruction selection patterns for strict fp ops. This is probably just changing e.g. "fadd" to "any_fadd" as I expect for most or all of the instruction selection patterns work fine with strict fp.
  • Instructions that can raise fp exceptions need to have mayRaiseFPException=1. Without this transformations can move around or just remove instructions that raise fp exceptions.
  • We need proper modelling of reads/write of the FPSCR rounding control bits for controllable rounding modes to be handled correctly.

@john-brawn-arm
Copy link
Collaborator

Simple example where setting HasStrictFP=true without doing the above gives wrong results:

int fetestexcept( int excepts );
#define FE_DIVBYZERO 0x04

int fn(float x) {
#pragma STDC FENV_ACCESS ON
  x / 0;
  return fetestexcept(FE_DIVBYZERO);
}

With --target=aarch64-none-elf we get a divide instruction and so return FE_DIVBYZERO, with --target=arm-non-eabi we get no divide instruction and return 0.

@Varnike Varnike force-pushed the main branch 2 times, most recently from f57085e to 1b9b201 Compare June 9, 2025 16:08
@Varnike Varnike force-pushed the main branch 2 times, most recently from 0231231 to 6010639 Compare July 16, 2025 11:18
@Varnike
Copy link
Contributor Author

Varnike commented Jul 16, 2025

Hello!

Thank you for the comments and sorry for the delayed response.

I've now returned to this task and plan to work towards full support for strict fp on hard-float ARM targets.

Based on @john-brawn-arm's example and suggestions, I was able to add support for STRICT_FDIV. With such changes the example is compiled correctly (with divide instruction). Could you please review these changes and let me know if you spot any issues? Are there any additional cases that require special handling? Also, is there a test suite that would be sufficient to cover most strict fp usage scenarios?

If these changes are acceptable, I plan to proceed with support for the remaining strict fp ops.

@davemgreen
Copy link
Collaborator

Hi - I went looking at out internal tracker for what happened the last time we enabled strictfp for AArch64. This was the list of patches mentioned, there might have been some more either before these or some minor fixups.

94843ea7d7e5 [AArch64] Make machine combiner patterns preserve MIFlags
6f53960d6416 [AArch64] Adjust machine-combiner-reassociate.mir test
bca998ed3c9a [AArch64] Generate fcmps when appropriate for neon intrinsics
0d8092dd485a [AArch64] Fix legalization of v1f64 strict_fsetcc and strict_fsetccs
d4342efb6959 [AArch64] Add instruction selection for strict FP
9d68ed08178d [AArch64] Allow strict opcodes in fp->int->fp patterns
b670da798d35 [AArch64] Allow strict opcodes in indexed fmul and fma patterns
d916856bee11 [AArch64] Allow strict opcodes in faddp patterns
8e17c9613f36 [AArch64] Add some missing strict FP vector lowering
bbd7eac800e6 [AArch64] Remove an unused variable in my previous patch
12c1022679d4 [AArch64] Lowering and legalization of strict FP16
1b1466c34669 [AArch64] Adjust aarch64 constrained intrinsics tests and un-XFAIL
27a8735a444f [AArch64] Add mayRaiseFPException to appropriate instructions
88ac25b357aa [MachineCSE] Allow PRE of instructions that read physical registers
2d8c1597e51c [MIRVRegNamer] Avoid opcode hash collision
49510c50200c [AArch64] Mark all instructions that read/write FPCR as doing so
9e3264ab2021 [FPEnv] Enable strict fp for AArch64 in clang

We would probably need something similar for Arm - it is worth not underestimating the amount of work we would need to do, but if you are happy to try and push it through we would need to handle all the strict variants of the instructions and add mayRaiseFPException/FPCR to the relevant places.

@Varnike
Copy link
Contributor Author

Varnike commented Jul 18, 2025

Now I have an approximate volume of work and plan to try to do it. I have provided these edits rather as an example of correction for a particular case (not taking tests into account), by analogy with which other cases can be considered. Therefore, I would like to know how correct and complete they are.

@davemgreen
Copy link
Collaborator

@john-brawn-arm any ideas? It looks OK to me. (It would need to handle all instructions before HasStrictFP was added though).

@Varnike
Copy link
Contributor Author

Varnike commented Aug 1, 2025

Hi,

I added the processing of remaining strict fp ops for VFP and adjusted the tests. Could you please review these changes and tell what else needs to be done? I also have a question about whether it's necessary to change the instruction selection patterns for MVE and NEON (as was done for VFP), since, as far as I understand, they do not provide full ieee compliance on all ARM platforms.

Regarding HasStrictFP setting, I have left this commit for now; however, it can only be considered once the conditions you specified are met.

@Varnike
Copy link
Contributor Author

Varnike commented Aug 6, 2025

@john-brawn-arm, @davemgreen can you please take a look at the patch?

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it looks sensible. We will need strict-fp tests for the new operations now supported, and there is quite a bit going on in this patch. Neon and MVE sometimes don't implement ieee-fp exactly, I would need to remind myself of the details.

I would not expect the tests to change (like llvm/test/CodeGen/Thumb2/mve-fmas.ll) - strictfp should not make the normal case worse. Do you know what might be causing those to be different?

@Varnike
Copy link
Contributor Author

Varnike commented Aug 13, 2025

We will need strict-fp tests for the new operations now supported, and there is quite a bit going on in this patch.

Will it be enough to add strict fp tests similar to those made for AArch64?

Regarding the changes in some tests (e.g. Thumb2/mve-fmas.ll): they arise after updating patterns in ARMInstrVFP.td, which in turn reorders some instructions. For VFP tests, as far as I understand, such strict fp behavior can be expected, but I’m unsure to what extent the MVE tests should be affected.

@github-actions
Copy link

github-actions bot commented Aug 14, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@davemgreen
Copy link
Collaborator

Will it be enough to add strict fp tests similar to those made for AArch64?

Yeah I think so for the most part. There is the added complication of soft-fp but it should mostly be a case of testing all the strict-fp operations under a couple of configs to make sure everything works OK.
Note that there is talk of changing how the constrained-fp intrinsics are represented in the IR https://discourse.llvm.org/t/rfc-change-of-strict-fp-operation-representation-in-ir/85021, but my understanding is that they will work the same for codegen.

Regarding the changes in some tests (e.g. Thumb2/mve-fmas.ll): they arise after updating patterns in ARMInstrVFP.td, which in turn reorders some instructions. For VFP tests, as far as I understand, such strict fp behavior can be expected, but I’m unsure to what extent the MVE tests should be affected.

I think we need to teach it that the fpscr bits of nofpexcept instructions and fpscr_nzcv do not interact and can be treated separately for scheduling. In general we need to be careful not to make perf worse for all the people who do not use strict-fp. Maybe in this case it is caused by the vmrs APSR_nzcv, fpscr instructions?

…s. Also changes were made to clang/test/Parser/pragma-fp-warn.c so that for thumbv7a only the soft-float-abi target case is checked.
…mayRaiseFPException to appropriate instructions
@Varnike
Copy link
Contributor Author

Varnike commented Aug 27, 2025

  • Managed to remove the connection between fpscr bits of nofpexcept instructions and fpscr_nzcv, now the tests like Thumb2/mve-fmas.ll do not change.
  • Added strict fp tests.
  • Function calls marked as possibly changing FPSCR (by analogy with https://reviews.llvm.org/D143001).

@Varnike
Copy link
Contributor Author

Varnike commented Aug 28, 2025

Fix test failures reported by CI.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is progressing nicely. It's quite a large change and if there are any parts we can pull out and commit separately, that would help.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we really say this doesn't alias FPSCR? If we set fpscr_nzcv it will set bits of fpscr and vice-versa. The exception status bits (and the rounding controls) are separate from nzcv, but that might not be the only use of the register.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in general they should be aliases. However, this behavior will lead to the test changes discussed above (e.g. Thumb2/mve-fmas.ll): they occur because some of the instructions begin to change the FPSCR (that is, FPSCR_NZCV too because they are aliases), so during scheduling it is no longer possible to leave everything as is.

I think we need to teach it that the fpscr bits of nofpexcept instructions and fpscr_nzcv do not interact and can be treated separately for scheduling.

My understanding is that while they alias, this approach won’t work (please correct me if I'm wrong). To keep the existing tests unchanged, it might be necessary to change the patterns for most instructions which can raise fp exception and change FPSCR: instead of using any_* add individual patterns fot strict_*. This can lead to significant complications in writing patterns for fp.

Can you tell if you see a simpler way to solve this? Also how critical is the fact that some tests (e.g. Thumb2/mve-fmas.ll) may change?

@@ -0,0 +1,202 @@
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference when they are needed, this was a list of strict-fp ops that we would need to make sure are tested. Hopefully a lot of them are simple expanded into libcalls.

llvm.experimental.constrained.acos
llvm.experimental.constrained.asin
llvm.experimental.constrained.atan2
llvm.experimental.constrained.atan
llvm.experimental.constrained.ceil
llvm.experimental.constrained.cosh
llvm.experimental.constrained.cos
llvm.experimental.constrained.exp2
llvm.experimental.constrained.exp
llvm.experimental.constrained.fadd
llvm.experimental.constrained.fcmp
llvm.experimental.constrained.fcmp
llvm.experimental.constrained.fdiv
llvm.experimental.constrained.floor
llvm.experimental.constrained.fma
llvm.experimental.constrained.fmuladd
llvm.experimental.constrained.fmul
llvm.experimental.constrained.fpext
llvm.experimental.constrained.fptosi
llvm.experimental.constrained.fptoui
llvm.experimental.constrained.fptrunc
llvm.experimental.constrained.frem
llvm.experimental.constrained.fsub
llvm.experimental.constrained.ldexp
llvm.experimental.constrained.llrint
llvm.experimental.constrained.llround
llvm.experimental.constrained.log10
llvm.experimental.constrained.log2
llvm.experimental.constrained.log
llvm.experimental.constrained.lrint
llvm.experimental.constrained.lround
llvm.experimental.constrained.maximum
llvm.experimental.constrained.maxnum
llvm.experimental.constrained.minimum
llvm.experimental.constrained.minnum
llvm.experimental.constrained.nearbyint
llvm.experimental.constrained.powi
llvm.experimental.constrained.pow
llvm.experimental.constrained.rint
llvm.experimental.constrained.roundeven
llvm.experimental.constrained.round
llvm.experimental.constrained.sinh
llvm.experimental.constrained.sin
llvm.experimental.constrained.sitofp
llvm.experimental.constrained.sqrt
llvm.experimental.constrained.tanh
llvm.experimental.constrained.tan
llvm.experimental.constrained.trunc
llvm.experimental.constrained.uitofp

There are double. float and fp16 types that are of interest, both soft-fp and hard. I hope the others are handled automatically. I'm not sure what happens with vectors, both neon and mve. The frontend intrinsics I think should be considered as being in the current fp environment, strict-fp and all. Some do not do denormal flushing or set flags though.

@Varnike
Copy link
Contributor Author

Varnike commented Sep 4, 2025

  • Added remaining tests for f16/32/64 intrinsic.
  • Fixed lowering for several f16 ops, which leaded to errors at selection stage (https://godbolt.org/z/G8E9Ksv4P).
  • The only untested intrinsics are maximum and minimum. At the moment they also lead to compilation fail (https://godbolt.org/z/cbdx5bj9v, same for llvm.maximum and minimum). Is there support for these intrinsics in the backend and should it be addressed within this patch?

I haven’t addressed FPSCR yet; before proceeding I'd like to hear your opinion on how should it be done.

@Varnike
Copy link
Contributor Author

Varnike commented Sep 15, 2025

@davemgreen @john-brawn-arm, ping.

@davemgreen
Copy link
Collaborator

I will try and find some time to take a look at this, I haven't managed to yet. Are there any smaller parts we can split away into their own reviews to start moving it forward?

@Varnike
Copy link
Contributor Author

Varnike commented Sep 16, 2025

Are there any smaller parts we can split away into their own reviews to start moving it forward?

Yes, I think it will be possible to divide the patch into several separate ones. However, they will depend on the patch with changes to intruction selection patterns (commit 9e5fbe0), which should be processed first. This patch affects changes in tests related to FPSCR (e.g. Thumb2/mve-fmas.ll). Therefore, at first it would be better to decide on the position regarding this issue, after which it will be possible to split the patch.

As for how to organize the patches after separation: should each individual patch have a separate PR?

@davemgreen
Copy link
Collaborator

Am I right that the bits of FPSCR that are read by the instruction are just the rounding mode? Can we add a new register for just those bits, like FPSCR_NZCV. FPSCR_RM would alias with FPSCR but not FPSCR_NZCV.

def FPSCR_RM   : ARMReg<3,  "fpscr_rm"> {
  let Aliases = [FPSCR];
}

I didn't try it past a few of the basic tests in llvm. The important part is that we don't hoist the instructions that use the rounding mode past an instructions that sets it. (And an instruction that sets FPSCR doesn't get moved past one that sets FPSCR_NZCV).

The alternative might be to try and make fpscr_rm and fpscr_nzcv SubRegs of fpscr, which might work but ran into other issues when I tried it.

As for how to organize the patches after separation: should each individual patch have a separate PR?

Ideally yeah, So long as they are individually testable. It is more difficult to review larger changes and get them to stick in llvm without causing issues and needing to be reverted.

@Varnike
Copy link
Contributor Author

Varnike commented Sep 25, 2025

Can we add a new register for just those bits, like FPSCR_NZCV. FPSCR_RM would alias with FPSCR but not FPSCR_NZCV.

Thanks for the comment! I took this approach and it seems that everything works with it.

I managed to split the patch into 4 parts that can be tested independently:

However, this split leaves three tests, added by analogy with AArch64, that I haven’t yet included in any patch. One of them (strict-fp-ops.ll) depends on both patches, which I still wanted to keep separate for review convenience. The other two (strict-fp-int-promote.ll, strictfp_f16_abi_promote.ll) are already supported correctly without my changes. Perhaps it would be better to add them after the other patches have been committed to LLVM? If this approach is not appropriate and the tests should be included in the patches above, please let me know and I will update them.

davemgreen added a commit to davemgreen/llvm-project that referenced this pull request Oct 2, 2025
As shown in llvm#137101, fp16 lrint are not handled correctly on Arm. This adds
soft-half promotion for them, reusing the function that promotes a value with
operands (and can handle strict fp once that is added).
davemgreen added a commit that referenced this pull request Oct 7, 2025
As shown in #137101, fp16 lrint are not handled correctly on Arm. This
adds soft-half promotion for them, reusing the function that promotes a
value with operands (and can handle strict fp once that is added).
davemgreen pushed a commit that referenced this pull request Oct 7, 2025
…l instructions that read/write fpscr rounding bits as doing so (#160698)

Added new register FPSCR_RM to correctly model interactions with
rounding mode control bits of fpscr and to avoid performance regressions
in normal non-strictfp case

This PR is part of the work on adding strict FP support in ARM, which
was previously discussed in #137101.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:ARM clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category llvm:globalisel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants