-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[compiler-rt][ARM] Optimized mulsf3 and divsf3 #161546
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 8 commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
6e3b7ab
[compiler-rt][ARM] Optimized mulsf3 and divsf3
statham-arm 8c7228f
Fix the Thumb1 build which I forgot to test
statham-arm 4edb28b
Use DEFINE_COMPILERRT_THUMB_FUNCTION in Thumb1
statham-arm c73dfea
Tweak final return in fnan2 as suggested
statham-arm b236372
Clarify comment about 4-byte boundary
statham-arm 7a24535
Lowercase instruction mnemonics and shifter operands
statham-arm 40e3621
Mention size/speed tradeoff in the cmake option help
statham-arm 66a3bcb
Update build and test setup
statham-arm 33232d8
Check for the right spelling of the implicit-it option
statham-arm ecfa7fc
Fix build failure on every non-Arm platform
statham-arm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,42 @@ | ||
| //===-- fnan2.c - Handle single-precision NaN inputs to binary operation --===// | ||
| // | ||
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://llvm.org/LICENSE.txt for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
| // | ||
| // This helper function is available for use by single-precision float | ||
| // arithmetic implementations to handle propagating NaNs from the input | ||
| // operands to the output, in a way that matches Arm hardware FP. | ||
| // | ||
| // On input, a and b are floating-point numbers in IEEE 754 encoding, and at | ||
| // least one of them must be a NaN. The return value is the correct output NaN. | ||
| // | ||
| // A signalling NaN in the input (with bit 22 clear) takes priority over any | ||
| // quiet NaN, and is adjusted on return by setting bit 22 to make it quiet. If | ||
| // both inputs are the same type of NaN then the first input takes priority: | ||
| // the input a is used instead of b. | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| #include <stdint.h> | ||
|
|
||
| uint32_t __compiler_rt_fnan2(uint32_t a, uint32_t b) { | ||
| // Make shifted-left copies of a and b to discard the sign bit. Then add 1 at | ||
| // the bit position where the quiet vs signalling bit ended up. This squashes | ||
| // all the signalling NaNs to the top of the range of 32-bit values, from | ||
| // 0xff800001 to 0xffffffff inclusive; meanwhile, all the quiet NaN values | ||
| // wrap round to the bottom, from 0 to 0x007fffff inclusive. So we can detect | ||
| // a signalling NaN by asking if it's greater than 0xff800000, and a quiet | ||
| // one by asking if it's less than 0x00800000. | ||
| uint32_t aadj = (a << 1) + 0x00800000; | ||
| uint32_t badj = (b << 1) + 0x00800000; | ||
| if (aadj > 0xff800000) // a is a signalling NaN? | ||
| return a | 0x00400000; // if so, return it with the quiet bit set | ||
| if (badj > 0xff800000) // b is a signalling NaN? | ||
| return b | 0x00400000; // if so, return it with the quiet bit set | ||
| if (aadj < 0x00800000) // a is a quiet NaN? | ||
| return a; // if so, return it | ||
| return b; // otherwise we expect b must be a quiet NaN | ||
| } |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,62 @@ | ||
| //===-- fnorm2.c - Handle single-precision denormal inputs to binary op ---===// | ||
| // | ||
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://llvm.org/LICENSE.txt for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
| // | ||
| // This helper function is available for use by single-precision float | ||
| // arithmetic implementations, to handle denormal inputs on entry by | ||
| // renormalizing the mantissa and modifying the exponent to match. | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| #include <stdint.h> | ||
|
|
||
| // Structure containing the function's inputs and outputs. | ||
| // | ||
| // On entry: a, b are two input floating-point numbers, still in IEEE 754 | ||
| // encoding. expa and expb are the 8-bit exponents of those numbers, extracted | ||
| // and shifted down to the low 8 bits of the word, with no other change. | ||
| // Neither value should be zero, or have the maximum exponent (indicating an | ||
| // infinity or NaN). | ||
| // | ||
| // On exit: each of a and b contains the mantissa of the input value, with the | ||
| // leading 1 bit made explicit, and shifted up to the top of the word. If expa | ||
| // was zero (indicating that a was denormal) then it is now represented as a | ||
| // normalized number with an out-of-range exponent (zero or negative). The same | ||
| // applies to expb and b. | ||
| struct fnorm2 { | ||
| uint32_t a, b, expa, expb; | ||
| }; | ||
|
|
||
| void __compiler_rt_fnorm2(struct fnorm2 *values) { | ||
| // Shift the mantissas of a and b to the right place to follow a leading 1 in | ||
| // the top bit, if there is one. | ||
| values->a <<= 8; | ||
| values->b <<= 8; | ||
|
|
||
| // Test if a is denormal. | ||
| if (values->expa == 0) { | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Future enhancement idea: extract the adjustment into a helper and share across the two values. |
||
| // If so, decide how much further up to shift its mantissa, and adjust its | ||
| // exponent to match. This brings the leading 1 of the denormal mantissa to | ||
| // the top of values->a. | ||
| uint32_t shift = __builtin_clz(values->a); | ||
| values->a <<= shift; | ||
| values->expa = 1 - shift; | ||
| } else { | ||
| // Otherwise, leave the mantissa of a in its current position, and OR in | ||
| // the explicit leading 1. | ||
| values->a |= 0x80000000; | ||
| } | ||
|
|
||
| // Do the same operation on b. | ||
| if (values->expb == 0) { | ||
| uint32_t shift = __builtin_clz(values->b); | ||
| values->b <<= shift; | ||
| values->expb = 1 - shift; | ||
| } else { | ||
| values->b |= 0x80000000; | ||
| } | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| //===-- funder.c - Handle single-precision floating-point underflow -------===// | ||
| // | ||
| // Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. | ||
| // See https://llvm.org/LICENSE.txt for license information. | ||
| // SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
| // | ||
| // This helper function is available for use by single-precision float | ||
| // arithmetic implementations to handle underflowed output values, if they were | ||
| // computed in the form of a normalized mantissa and an out-of-range exponent. | ||
| // | ||
| // On input: x should be a complete IEEE 754 floating-point value representing | ||
| // the desired output scaled up by 2^192 (the same value that would have been | ||
| // passed to an underflow trap handler in IEEE 754:1985). | ||
| // | ||
| // This isn't enough information to re-round to the correct output denormal | ||
| // without also knowing whether x itself has already been rounded, and which | ||
| // way. 'errsign' gives this information, by indicating the sign of the value | ||
| // (true result - x). That is, if errsign > 0 it means the true value was | ||
| // larger (x was rounded down); if errsign < 0 then x was rounded up; if | ||
| // errsign == 0 then x represents the _exact_ desired output value. | ||
| // | ||
| //===----------------------------------------------------------------------===// | ||
|
|
||
| #include <stdint.h> | ||
|
|
||
| #define SIGNBIT 0x80000000 | ||
| #define MANTSIZE 23 | ||
| #define BIAS 0xc0 | ||
|
|
||
| uint32_t __compiler_rt_funder(uint32_t x, uint32_t errsign) { | ||
| uint32_t sign = x & SIGNBIT; | ||
| uint32_t exponent = (x << 1) >> 24; | ||
|
|
||
| // Rule out exponents so small (or large!) that no denormalisation | ||
| // is needed. | ||
| if (exponent > BIAS) { | ||
| // Exponent 0xc1 or above means a normalised number got here by | ||
| // mistake, so we just remove the 0xc0 exponent bias and go | ||
| // straight home. | ||
| return x - (BIAS << MANTSIZE); | ||
| } | ||
| uint32_t bits_lost = BIAS + 1 - exponent; | ||
| if (bits_lost > MANTSIZE + 1) { | ||
| // The implicit leading 1 of the intermediate value's mantissa is | ||
| // below the lowest mantissa bit of a denormal by at least 2 bits. | ||
| // Round down to 0 unconditionally. | ||
| return sign; | ||
| } | ||
|
|
||
| // Make the full mantissa (with leading bit) at the top of the word. | ||
| uint32_t mantissa = 0x80000000 | (x << 8); | ||
| // Adjust by 1 depending on the sign of the error. | ||
| mantissa -= errsign >> 31; | ||
| mantissa += (-errsign) >> 31; | ||
|
|
||
| // Shift down to the output position, keeping the bits shifted off. | ||
| uint32_t outmant, shifted_off; | ||
| if (bits_lost == MANTSIZE + 1) { | ||
| // Special case for the exponent where we have to shift the whole | ||
| // of 'mantissa' off the bottom of the word. | ||
| outmant = 0; | ||
| shifted_off = mantissa; | ||
| } else { | ||
| outmant = mantissa >> (8 + bits_lost); | ||
| shifted_off = mantissa << (32 - (8 + bits_lost)); | ||
| } | ||
|
|
||
| // Re-round. | ||
| if (shifted_off >> 31) { | ||
| outmant++; | ||
| if (!(shifted_off << 1)) | ||
| outmant &= ~1; // halfway case: round to even | ||
| } | ||
|
|
||
| return sign | outmant; | ||
| } |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to limit this to when the driver supports it. This is important if the build is using the raw assembler vs the compiler driver (i.e.
-implicit-it=alwaysvs -Xa-implicit-it=always)Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I hadn't thought of that. (I didn't know it was even possible to use
llvm-mcdirectly in a cmake build.)Do you happen to know how to check which assembler command-line syntax is in use? It looks to me as if cmake doesn't have
check_asm_compiler_flagorCMAKE_ASM_COMPILER_FRONTEND_VARIANT(whereas it has both of those forcxx). I'm sure I can make up my own system if I have to, but if there's an existing one I haven't found, I'd prefer to use it, and surely you would too 🙂There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use GNU as by passing CMAKE_ASM_COMPILER. I think that check_compiler_flag is the right tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was delayed getting back to this, sorry. But I think the answer to this is that if you're trying to build the Arm assembler builtins with bare GNU
as, you're doomed anyway.The most immediate problem, when I tried it just now, was that invocations of
aswould fail with error messages of the form "Fatal error: bad defsym; format is --defsym name=value". That arises because cmake has put things like--defsym VISIBILITY_HIDDENand--defsym _LARGEFILE_SOURCEon the as command line, which indeed violates the syntax of--defsym, which requires an explicit value in every definition.But worse, those
--defsymsymbols are clearly the ones that would have been used with-Dif cmake had thought it was using a compiler driver as its assembler. So it's expecting them to turn into macros that can be tested using#ifdef. And even if there hadn't been a command-line syntax error, that won't work –--defsymand-Ddon't mean the same thing, and in any case, bareasdoesn't run cpp at all, so the#ifdefs in the source files will be syntax errors regardless of what was defined on the command line. Not to mention the#includes.This is all true of the existing assembly language source files in
lib/builtins, not just my new ones. I think there's no hope of getting any of them to build with any kind of bare assembler likeasorllvm-mc; they need a compiler driver which supports-Dand will run the preprocessor.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@compnerd, ping?
In case that comment wasn't clear enough, what I'm saying is: I don't think there is any Arm assembler which can consume these source files at all and does not support
-Wa,-mimplicit-it=always, because bare GNU as or bare LLVM mc won't be able to handle the use of the C preprocessor. Therefore I don't think there's any need to add a check for whether that command-line option is accepted.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cpp+as doesn't guarantee that
-Wais supported IMO. In fact, I think that-mimplicit-it=alwayscan just be passed without the-Wa,if you are claiming that no assembler could actually support this.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still very confused about how to configure a test build of compiler-rt which will successfully assemble the Arm builtins without being a compiler driver that speaks
-Wa,. You seem to be saying here that there's some other way to configure cmake to invoke both cpp and as? But I don't know what it is; surely CMAKE_ASM_COMPILER would need to be set to a single command, and I can't think what command would work.Huh, apparently that works in clang! TIL. Doesn't work in my nearest gcc, though – that still demands
-Wa,. But I suppose at least that gives me a way to test a piece of cmake that checks which flag works.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, now I've put in a check. It turned out that no cmake command higher-level than the general-purpose
try_compilewould handle assembly language -- all the things likecheck_compiler_flag()had a list of supported languages not including ASM. So I had to write my own helper function.I still haven't figured out how to test it against anything that isn't a compiler driver. But I'm testing for
-mimplicit-itfirst, which means that compiling with clang the first test passes, and with gcc the first test fails and we fall back to-Wa,-mimplicit-it, so I've at least been able to check that these tests are doing something nontrivial.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@compnerd , ping?