Skip to content

Conversation

CrazyboyQCD
Copy link
Contributor

Closes #21773.

@kjarosh kjarosh added T-perf Type: Performance Improvements macos Issues related to MacOS A-core Area: Core player, where no other category fits labels Sep 26, 2025
@kjarosh
Copy link
Member

kjarosh commented Sep 26, 2025

Did you do any (reproducible) benchmarks to compare performance between using a generic conversion vs checking runtime feature and using the native conversion?

@CrazyboyQCD
Copy link
Contributor Author

CrazyboyQCD commented Sep 26, 2025

Did you do any (reproducible) benchmarks to compare performance between using a generic conversion vs checking runtime feature and using the native conversion?

There are some data from here

Sorry I don't have a mac to test but there is a benchamrk setup.

@kjarosh kjarosh removed the macos Issues related to MacOS label Sep 26, 2025
@kjarosh
Copy link
Member

kjarosh commented Sep 26, 2025

Why is the generic f64 to i32 implementation in oxc different than ours?

Out of those 3 conversions:

  • Ruffle's generic
  • oxc generic
  • fjcvtzs

Either some has to differ or Ruffle's and oxc's has to be equivalent, in which case it doesn't make sense why oxc's is so complex compared to ours.

(Just quickly skimming though it, didn't analyze the differences)

@kjarosh
Copy link
Member

kjarosh commented Sep 26, 2025

As a side note: Flash conversions don't always have to match the spec, plus the behavior might be different between 32 and 64 bit runtimes. Do we have extensive testing of this conversion on both runtimes? I'm not sure actually

@CrazyboyQCD
Copy link
Contributor Author

Why is the generic f64 to i32 implementation in oxc different than ours?

It's originally copied from boa engine, and since both implementations pass the test, this shouldn't be a problem.

@CrazyboyQCD
Copy link
Contributor Author

CrazyboyQCD commented Sep 26, 2025

As a side note: Flash conversions don't always have to match the spec, plus the behavior might be different between 32 and 64 bit runtimes.

Since this a aarch64 instruction, 32 bit target shouldn't be affected.

@kjarosh
Copy link
Member

kjarosh commented Sep 26, 2025

since both implementations pass the test, this shouldn't be a problem.

So if both impls are equivalent, we cannot use the benchmarks done on oxc's impl, because it's (seemingly) less efficient, so it's expected we'd see better performance of fjcvtzs with the feature check there.

@kjarosh
Copy link
Member

kjarosh commented Sep 26, 2025

Since this a aarch64 instruction, 32 bit target shouldn't be affected.

I'm talking about Flash's 32/64 bit runtimes, because there are slight differences in behavior between them, and I recall stumbling upon those differences in conversions.

Ruffle tries to behave as the 32-bit runtime, but we're thinking about letting the user choose which runtime to emulate, so at least we have to be aware of those differences.

@CrazyboyQCD
Copy link
Contributor Author

CrazyboyQCD commented Sep 26, 2025

So if both impls are equivalent, we cannot use the benchmarks done on oxc's impl, because it's (seemingly) less efficient, so it's expected we'd see better performance of fjcvtzs with the feature check there.

We could just replace the implementation with our own, and someone with mac to test.

Assembly reference:
https://godbolt.org/z/9TqEsWjcW (oxc and ruffle implementation, notice that ruffle version contains a fmod call)

https://godbolt.org/z/Me1TaP79e (pure fjcvtzs function)

Ruffle tries to behave as the 32-bit runtime, but we're thinking about letting the user choose which runtime to emulate, so at least we have to be aware of those differences.

Runtime behaviour should depend on how this conversion is used.

@linkmauve
Copy link

Also why doesn’t Rust’s core::arch::aarch64 module support fjcvtzs already, behind the same feature flag? It would make sense to open an issue to the stdlib.

@kjarosh kjarosh added the waiting-on-author Waiting on the PR author to make the requested changes label Oct 4, 2025
@CrazyboyQCD
Copy link
Contributor Author

Can't figure out why one scrolling test fails on firefox, need help here.

@torokati44
Copy link
Member

That's flaky, nothing to do with this PR.

@CrazyboyQCD CrazyboyQCD force-pushed the master branch 3 times, most recently from 261a837 to 8a28f71 Compare October 9, 2025 09:16
@CrazyboyQCD CrazyboyQCD force-pushed the master branch 2 times, most recently from 8a28f71 to 8f75b24 Compare October 9, 2025 15:24
linkmauve added a commit to linkmauve/stdarch that referenced this pull request Oct 9, 2025
This instruction is only available when the jsconv target_feature is available,
so on ARMv8.3 or higher.

It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by
any JS engine probably.

[0] ruffle-rs/ruffle#21780
linkmauve added a commit to linkmauve/stdarch that referenced this pull request Oct 10, 2025
This instruction is only available when the jsconv target_feature is available,
so on ARMv8.3 or higher.

It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by
any JS engine probably.

I’ve picked the stdarch_aarch64_jscvt feature because it’s the name of the
FEAT_JSCVT, but hesitated with naming it stdarch_aarch64_jsconv (the name of
the target_feature) or stdarch_aarch64_jcvt (the name of the C intrinsic) or
stdarch_aarch64_fjcvtzs (the name of the instruction), this choice is purely
arbitrary and I guess it could be argued one way or another.  I wouldn’t expect
it to stay unstable for too long, so ultimately this shouldn’t matter much.

[0] ruffle-rs/ruffle#21780
linkmauve added a commit to linkmauve/stdarch that referenced this pull request Oct 10, 2025
This instruction is only available when the jsconv target_feature is available,
so on ARMv8.3 or higher.

It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by
any JS engine probably.

I’ve picked the stdarch_aarch64_jscvt feature because it’s the name of the
FEAT_JSCVT, but hesitated with naming it stdarch_aarch64_jsconv (the name of
the target_feature) or stdarch_aarch64_jcvt (the name of the C intrinsic) or
stdarch_aarch64_fjcvtzs (the name of the instruction), this choice is purely
arbitrary and I guess it could be argued one way or another.  I wouldn’t expect
it to stay unstable for too long, so ultimately this shouldn’t matter much.

This feature is now tracked in this issue[1].

[0] ruffle-rs/ruffle#21780
[1] rust-lang/rust#147555
linkmauve added a commit to linkmauve/stdarch that referenced this pull request Oct 10, 2025
This instruction is only available when the jsconv target_feature is available,
so on ARMv8.3 or higher.

It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by
any JS engine probably.

I’ve picked the stdarch_aarch64_jscvt feature because it’s the name of the
FEAT_JSCVT, but hesitated with naming it stdarch_aarch64_jsconv (the name of
the target_feature) or stdarch_aarch64_jcvt (the name of the C intrinsic) or
stdarch_aarch64_fjcvtzs (the name of the instruction), this choice is purely
arbitrary and I guess it could be argued one way or another.  I wouldn’t expect
it to stay unstable for too long, so ultimately this shouldn’t matter much.

This feature is now tracked in this issue[1].

[0] ruffle-rs/ruffle#21780
[1] rust-lang/rust#147555
Copy link
Member

@kjarosh kjarosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reran those benchmarks with new code: https://github.com/kjarosh/jsconv-benchmark/actions/runs/18427684643

platform generic fjcvtzs fallback
ubuntu 29 ns 10 ns 33 ns
macos 17 ns 4.8 ns 21 ns
windows 29 ns 25 ns 34 ns

It shows ~3x improvement for ubuntu and macos, 16% improvement for windows. For fallback (CPUs that don't support fjcvtzs) it's 14%–24% slower.

@kjarosh kjarosh removed the waiting-on-author Waiting on the PR author to make the requested changes label Oct 11, 2025
@kjarosh
Copy link
Member

kjarosh commented Oct 11, 2025

Do you happen to know what's the popularity of jsconv in ARM processors? I guess if it's reasonable (>50% per platform) we can merge this PR.

I think we're pretty safe with Apple hardware here, but what about ARM PCs? What about Android phones?

@CrazyboyQCD
Copy link
Contributor Author

@kjarosh

Do you happen to know what's the popularity of jsconv in ARM processors? I guess if it's reasonable (>50% per platform) we can merge this PR.

I think we're pretty safe with Apple hardware here, but what about ARM PCs? What about Android phones?

I didn't find precise data on these, but as an early instruction set extension of ARM64, ARMv8.3 is supported as built-in functions by both GCC and Clang. Additionally, V8, JSC, and SpiderMonkey also use this instruction. I believe the adoption rate is likely quite high.

Copy link
Member

@kjarosh kjarosh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@kjarosh kjarosh enabled auto-merge (rebase) October 12, 2025 08:45
@kjarosh kjarosh merged commit 9f57c21 into ruffle-rs:master Oct 12, 2025
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-core Area: Core player, where no other category fits T-perf Type: Performance Improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ARM64 FJCVTZS instruction optimization for f64 to i32

6 participants