-
-
Notifications
You must be signed in to change notification settings - Fork 923
core: Add ARM64 FJCVTZS
instruction optimization for f64
to i32
#21780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Did you do any (reproducible) benchmarks to compare performance between using a generic conversion vs checking runtime feature and using the native conversion? |
There are some data from here Sorry I don't have a mac to test but there is a benchamrk setup. |
Why is the generic f64 to i32 implementation in oxc different than ours? Out of those 3 conversions:
Either some has to differ or Ruffle's and oxc's has to be equivalent, in which case it doesn't make sense why oxc's is so complex compared to ours. (Just quickly skimming though it, didn't analyze the differences) |
As a side note: Flash conversions don't always have to match the spec, plus the behavior might be different between 32 and 64 bit runtimes. Do we have extensive testing of this conversion on both runtimes? I'm not sure actually |
It's originally copied from boa engine, and since both implementations pass the test, this shouldn't be a problem. |
Since this a aarch64 instruction, 32 bit target shouldn't be affected. |
So if both impls are equivalent, we cannot use the benchmarks done on oxc's impl, because it's (seemingly) less efficient, so it's expected we'd see better performance of |
I'm talking about Flash's 32/64 bit runtimes, because there are slight differences in behavior between them, and I recall stumbling upon those differences in conversions. Ruffle tries to behave as the 32-bit runtime, but we're thinking about letting the user choose which runtime to emulate, so at least we have to be aware of those differences. |
We could just replace the implementation with our own, and someone with mac to test. Assembly reference: https://godbolt.org/z/Me1TaP79e (pure fjcvtzs function)
Runtime behaviour should depend on how this conversion is used. |
Also why doesn’t Rust’s |
Can't figure out why one scrolling test fails on firefox, need help here. |
That's flaky, nothing to do with this PR. |
261a837
to
8a28f71
Compare
8a28f71
to
8f75b24
Compare
This instruction is only available when the jsconv target_feature is available, so on ARMv8.3 or higher. It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by any JS engine probably. [0] ruffle-rs/ruffle#21780
This instruction is only available when the jsconv target_feature is available, so on ARMv8.3 or higher. It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by any JS engine probably. I’ve picked the stdarch_aarch64_jscvt feature because it’s the name of the FEAT_JSCVT, but hesitated with naming it stdarch_aarch64_jsconv (the name of the target_feature) or stdarch_aarch64_jcvt (the name of the C intrinsic) or stdarch_aarch64_fjcvtzs (the name of the instruction), this choice is purely arbitrary and I guess it could be argued one way or another. I wouldn’t expect it to stay unstable for too long, so ultimately this shouldn’t matter much. [0] ruffle-rs/ruffle#21780
This instruction is only available when the jsconv target_feature is available, so on ARMv8.3 or higher. It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by any JS engine probably. I’ve picked the stdarch_aarch64_jscvt feature because it’s the name of the FEAT_JSCVT, but hesitated with naming it stdarch_aarch64_jsconv (the name of the target_feature) or stdarch_aarch64_jcvt (the name of the C intrinsic) or stdarch_aarch64_fjcvtzs (the name of the instruction), this choice is purely arbitrary and I guess it could be argued one way or another. I wouldn’t expect it to stay unstable for too long, so ultimately this shouldn’t matter much. This feature is now tracked in this issue[1]. [0] ruffle-rs/ruffle#21780 [1] rust-lang/rust#147555
This instruction is only available when the jsconv target_feature is available, so on ARMv8.3 or higher. It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by any JS engine probably. I’ve picked the stdarch_aarch64_jscvt feature because it’s the name of the FEAT_JSCVT, but hesitated with naming it stdarch_aarch64_jsconv (the name of the target_feature) or stdarch_aarch64_jcvt (the name of the C intrinsic) or stdarch_aarch64_fjcvtzs (the name of the instruction), this choice is purely arbitrary and I guess it could be argued one way or another. I wouldn’t expect it to stay unstable for too long, so ultimately this shouldn’t matter much. This feature is now tracked in this issue[1]. [0] ruffle-rs/ruffle#21780 [1] rust-lang/rust#147555
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reran those benchmarks with new code: https://github.com/kjarosh/jsconv-benchmark/actions/runs/18427684643
platform | generic | fjcvtzs | fallback |
---|---|---|---|
ubuntu | 29 ns | 10 ns | 33 ns |
macos | 17 ns | 4.8 ns | 21 ns |
windows | 29 ns | 25 ns | 34 ns |
It shows ~3x improvement for ubuntu and macos, 16% improvement for windows. For fallback (CPUs that don't support fjcvtzs
) it's 14%–24% slower.
Do you happen to know what's the popularity of jsconv in ARM processors? I guess if it's reasonable (>50% per platform) we can merge this PR. I think we're pretty safe with Apple hardware here, but what about ARM PCs? What about Android phones? |
I didn't find precise data on these, but as an early instruction set extension of ARM64, ARMv8.3 is supported as built-in functions by both GCC and Clang. Additionally, V8, JSC, and SpiderMonkey also use this instruction. I believe the adoption rate is likely quite high. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
Closes #21773.