core: Add ARM64 `FJCVTZS` instruction optimization for `f64` to `i32` #21780

CrazyboyQCD · 2025-09-26T08:04:13Z

kjarosh · 2025-09-26T08:11:23Z

Did you do any (reproducible) benchmarks to compare performance between using a generic conversion vs checking runtime feature and using the native conversion?

CrazyboyQCD · 2025-09-26T08:15:33Z

Did you do any (reproducible) benchmarks to compare performance between using a generic conversion vs checking runtime feature and using the native conversion?

There are some data from here

Sorry I don't have a mac to test but there is a benchamrk setup.

kjarosh · 2025-09-26T08:27:51Z

Why is the generic f64 to i32 implementation in oxc different than ours?

Out of those 3 conversions:

Ruffle's generic
oxc generic
fjcvtzs

Either some has to differ or Ruffle's and oxc's has to be equivalent, in which case it doesn't make sense why oxc's is so complex compared to ours.

(Just quickly skimming though it, didn't analyze the differences)

kjarosh · 2025-09-26T08:33:59Z

As a side note: Flash conversions don't always have to match the spec, plus the behavior might be different between 32 and 64 bit runtimes. Do we have extensive testing of this conversion on both runtimes? I'm not sure actually

CrazyboyQCD · 2025-09-26T08:38:26Z

Why is the generic f64 to i32 implementation in oxc different than ours?

It's originally copied from boa engine, and since both implementations pass the test, this shouldn't be a problem.

CrazyboyQCD · 2025-09-26T08:41:26Z

As a side note: Flash conversions don't always have to match the spec, plus the behavior might be different between 32 and 64 bit runtimes.

Since this a aarch64 instruction, 32 bit target shouldn't be affected.

kjarosh · 2025-09-26T08:42:20Z

since both implementations pass the test, this shouldn't be a problem.

So if both impls are equivalent, we cannot use the benchmarks done on oxc's impl, because it's (seemingly) less efficient, so it's expected we'd see better performance of fjcvtzs with the feature check there.

kjarosh · 2025-09-26T08:45:22Z

Since this a aarch64 instruction, 32 bit target shouldn't be affected.

I'm talking about Flash's 32/64 bit runtimes, because there are slight differences in behavior between them, and I recall stumbling upon those differences in conversions.

Ruffle tries to behave as the 32-bit runtime, but we're thinking about letting the user choose which runtime to emulate, so at least we have to be aware of those differences.

CrazyboyQCD · 2025-09-26T09:21:31Z

So if both impls are equivalent, we cannot use the benchmarks done on oxc's impl, because it's (seemingly) less efficient, so it's expected we'd see better performance of fjcvtzs with the feature check there.

We could just replace the implementation with our own, and someone with mac to test.

Assembly reference:
https://godbolt.org/z/9TqEsWjcW (oxc and ruffle implementation, notice that ruffle version contains a fmod call)

https://godbolt.org/z/Me1TaP79e (pure fjcvtzs function)

Ruffle tries to behave as the 32-bit runtime, but we're thinking about letting the user choose which runtime to emulate, so at least we have to be aware of those differences.

Runtime behaviour should depend on how this conversion is used.

core/src/ecma_conversions.rs

linkmauve · 2025-10-01T14:29:45Z

Also why doesn’t Rust’s core::arch::aarch64 module support fjcvtzs already, behind the same feature flag? It would make sense to open an issue to the stdlib.

core/src/ecma_conversions.rs

CrazyboyQCD · 2025-10-06T06:12:41Z

Can't figure out why one scrolling test fails on firefox, need help here.

torokati44 · 2025-10-06T06:16:17Z

That's flaky, nothing to do with this PR.

core/src/ecma_conversions.rs

This instruction is only available when the jsconv target_feature is available, so on ARMv8.3 or higher. It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by any JS engine probably. [0] ruffle-rs/ruffle#21780

This instruction is only available when the jsconv target_feature is available, so on ARMv8.3 or higher. It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by any JS engine probably. I’ve picked the stdarch_aarch64_jscvt feature because it’s the name of the FEAT_JSCVT, but hesitated with naming it stdarch_aarch64_jsconv (the name of the target_feature) or stdarch_aarch64_jcvt (the name of the C intrinsic) or stdarch_aarch64_fjcvtzs (the name of the instruction), this choice is purely arbitrary and I guess it could be argued one way or another. I wouldn’t expect it to stay unstable for too long, so ultimately this shouldn’t matter much. [0] ruffle-rs/ruffle#21780

This instruction is only available when the jsconv target_feature is available, so on ARMv8.3 or higher. It is used e.g. by Ruffle[0] to speed up its conversion from f64 to i32, or by any JS engine probably. I’ve picked the stdarch_aarch64_jscvt feature because it’s the name of the FEAT_JSCVT, but hesitated with naming it stdarch_aarch64_jsconv (the name of the target_feature) or stdarch_aarch64_jcvt (the name of the C intrinsic) or stdarch_aarch64_fjcvtzs (the name of the instruction), this choice is purely arbitrary and I guess it could be argued one way or another. I wouldn’t expect it to stay unstable for too long, so ultimately this shouldn’t matter much. This feature is now tracked in this issue[1]. [0] ruffle-rs/ruffle#21780 [1] rust-lang/rust#147555

kjarosh

I reran those benchmarks with new code: https://github.com/kjarosh/jsconv-benchmark/actions/runs/18427684643

platform	generic	fjcvtzs	fallback
ubuntu	29 ns	10 ns	33 ns
macos	17 ns	4.8 ns	21 ns
windows	29 ns	25 ns	34 ns

It shows ~3x improvement for ubuntu and macos, 16% improvement for windows. For fallback (CPUs that don't support fjcvtzs) it's 14%–24% slower.

core/src/ecma_conversions.rs

kjarosh · 2025-10-11T10:23:04Z

Do you happen to know what's the popularity of jsconv in ARM processors? I guess if it's reasonable (>50% per platform) we can merge this PR.

I think we're pretty safe with Apple hardware here, but what about ARM PCs? What about Android phones?

core/src/ecma_conversions.rs

CrazyboyQCD · 2025-10-12T02:33:35Z

@kjarosh

Do you happen to know what's the popularity of jsconv in ARM processors? I guess if it's reasonable (>50% per platform) we can merge this PR.

I think we're pretty safe with Apple hardware here, but what about ARM PCs? What about Android phones?

I didn't find precise data on these, but as an early instruction set extension of ARM64, ARMv8.3 is supported as built-in functions by both GCC and Clang. Additionally, V8, JSC, and SpiderMonkey also use this instruction. I believe the adoption rate is likely quite high.

kjarosh

LGTM, thank you!

CrazyboyQCD force-pushed the master branch from ceecf8d to b3bcaa0 Compare September 26, 2025 08:07

kjarosh added T-perf Type: Performance Improvements macos Issues related to MacOS A-core Area: Core player, where no other category fits labels Sep 26, 2025

CrazyboyQCD force-pushed the master branch from b3bcaa0 to 9c9c959 Compare September 26, 2025 08:11

kjarosh removed the macos Issues related to MacOS label Sep 26, 2025

linkmauve suggested changes Oct 1, 2025

View reviewed changes

core/src/ecma_conversions.rs Outdated Show resolved Hide resolved

core/src/ecma_conversions.rs Outdated Show resolved Hide resolved

moulins reviewed Oct 4, 2025

View reviewed changes

core/src/ecma_conversions.rs Show resolved Hide resolved

core/src/ecma_conversions.rs Outdated Show resolved Hide resolved

kjarosh added the waiting-on-author Waiting on the PR author to make the requested changes label Oct 4, 2025

CrazyboyQCD force-pushed the master branch from 9c9c959 to d72276f Compare October 5, 2025 10:02

linkmauve reviewed Oct 5, 2025

View reviewed changes

core/src/ecma_conversions.rs Outdated Show resolved Hide resolved

CrazyboyQCD force-pushed the master branch from d72276f to bfbe896 Compare October 5, 2025 10:15

kjarosh reviewed Oct 9, 2025

View reviewed changes

core/src/ecma_conversions.rs Outdated Show resolved Hide resolved

CrazyboyQCD force-pushed the master branch 3 times, most recently from 261a837 to 8a28f71 Compare October 9, 2025 09:16

linkmauve reviewed Oct 9, 2025

View reviewed changes

core/src/ecma_conversions.rs Show resolved Hide resolved

linkmauve mentioned this pull request Oct 9, 2025

Add fjcvtzs instruction to core::arch::aarch64 rust-lang/rust#147517

Closed

folkertdev reviewed Oct 9, 2025

View reviewed changes

core/src/ecma_conversions.rs Show resolved Hide resolved

CrazyboyQCD force-pushed the master branch 2 times, most recently from 8a28f71 to 8f75b24 Compare October 9, 2025 15:24

linkmauve mentioned this pull request Oct 9, 2025

Implement fjcvtzs under the name __jcvt like the C intrinsic rust-lang/stdarch#1938

Merged

CrazyboyQCD requested review from kjarosh and moulins October 10, 2025 00:42

CrazyboyQCD force-pushed the master branch from 65a4dc5 to ed984e4 Compare October 11, 2025 00:44

kjarosh reviewed Oct 11, 2025

View reviewed changes

core/src/ecma_conversions.rs Show resolved Hide resolved

core/src/ecma_conversions.rs Show resolved Hide resolved

core/src/ecma_conversions.rs Outdated Show resolved Hide resolved

core/src/ecma_conversions.rs Outdated Show resolved Hide resolved

CrazyboyQCD force-pushed the master branch from ed984e4 to 2a29337 Compare October 11, 2025 10:04

kjarosh removed the waiting-on-author Waiting on the PR author to make the requested changes label Oct 11, 2025

folkertdev reviewed Oct 11, 2025

View reviewed changes

core/src/ecma_conversions.rs Outdated Show resolved Hide resolved

CrazyboyQCD force-pushed the master branch from 2a29337 to efb25c8 Compare October 12, 2025 01:52

core: Add ARM64 FJCVTZS instruction optimization for f64 to i32

127eef1

kjarosh force-pushed the master branch from efb25c8 to 127eef1 Compare October 12, 2025 08:45

kjarosh approved these changes Oct 12, 2025

View reviewed changes

kjarosh enabled auto-merge (rebase) October 12, 2025 08:45

kjarosh merged commit 9f57c21 into ruffle-rs:master Oct 12, 2025
25 checks passed

Uh oh!

core: Add ARM64 FJCVTZS instruction optimization for f64 to i32 #21780

core: Add ARM64 FJCVTZS instruction optimization for f64 to i32 #21780

Uh oh!

Conversation

CrazyboyQCD commented Sep 26, 2025

Uh oh!

kjarosh commented Sep 26, 2025

Uh oh!

CrazyboyQCD commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kjarosh commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kjarosh commented Sep 26, 2025

Uh oh!

CrazyboyQCD commented Sep 26, 2025

Uh oh!

CrazyboyQCD commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kjarosh commented Sep 26, 2025

Uh oh!

kjarosh commented Sep 26, 2025

Uh oh!

CrazyboyQCD commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

linkmauve commented Oct 1, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CrazyboyQCD commented Oct 6, 2025

Uh oh!

torokati44 commented Oct 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kjarosh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kjarosh commented Oct 11, 2025

Uh oh!

Uh oh!

CrazyboyQCD commented Oct 12, 2025

Uh oh!

kjarosh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

core: Add ARM64 `FJCVTZS` instruction optimization for `f64` to `i32` #21780

core: Add ARM64 `FJCVTZS` instruction optimization for `f64` to `i32` #21780

CrazyboyQCD commented Sep 26, 2025 •

edited

Loading

kjarosh commented Sep 26, 2025 •

edited

Loading

CrazyboyQCD commented Sep 26, 2025 •

edited

Loading

CrazyboyQCD commented Sep 26, 2025 •

edited

Loading