Skip to content

Conversation

@hhugo
Copy link
Member

@hhugo hhugo commented Jul 26, 2024

No description provided.

@hhugo
Copy link
Member Author

hhugo commented Jul 26, 2024

I'd like to benchmark this change

@vouillon
Copy link
Member

vouillon commented Aug 1, 2024

One should create the encoder and the decoder only once and reuse it, and use a fixed buffer for caml_jsstring_of_string.

@hhugo
Copy link
Member Author

hhugo commented Aug 2, 2024

One should create the encoder and the decoder only once and reuse it, and use a fixed buffer for caml_jsstring_of_string.

Done

@hhugo
Copy link
Member Author

hhugo commented Aug 30, 2024

@vouillon, any idea on how to benchmark this other that micro benchmarks ?

@hhugo hhugo force-pushed the text-encode-decode branch 2 times, most recently from 9fea72f to 5499d77 Compare October 9, 2024 19:47
@hhugo hhugo force-pushed the text-encode-decode branch from 5499d77 to 0e9dcba Compare October 16, 2024 08:10
@hhugo hhugo force-pushed the text-encode-decode branch from 0e9dcba to 7375672 Compare October 23, 2024 21:12
@hhugo
Copy link
Member Author

hhugo commented Oct 24, 2024

A quick micro benchmark show a significant slowdown.
master

n: 3, 0.167000
n: 7, 0.268000
n: 10, 0.371000
n: 20, 0.811000
n: 50, 1.989000
n: 100, 3.618000

this PR

n: 3, 1.209000
n: 7, 1.269000
n: 10, 1.354000
n: 20, 1.660000
n: 50, 2.632000
n: 100, 4.578000

@hhugo hhugo marked this pull request as draft October 24, 2024 07:19
@hhugo hhugo added the runtime label Nov 25, 2024
@hhugo hhugo force-pushed the text-encode-decode branch from cae4138 to 9724287 Compare December 4, 2024 21:09
@smorimoto smorimoto requested a review from Copilot December 28, 2024 17:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

runtime/js/mlBytes.js:642

  • The function caml_jsstring_of_string should handle non-ASCII strings correctly. The previous implementation used caml_utf16_of_utf8, which may handle edge cases differently than TextDecoder.
if (jsoo_is_ascii(s)) return s;

runtime/js/mlBytes.js:659

  • The function caml_string_of_jsstring should handle non-ASCII strings correctly. The previous implementation used caml_utf8_of_utf16, which may handle edge cases differently than TextEncoder.
if (jsoo_is_ascii(s)) return caml_string_of_jsbytes(s);

@hhugo hhugo force-pushed the text-encode-decode branch from 9724287 to 85a321a Compare March 27, 2025 08:35
@hhugo hhugo force-pushed the text-encode-decode branch 2 times, most recently from a4cf2ee to 55d1854 Compare May 14, 2025 14:24
@hhugo
Copy link
Member Author

hhugo commented May 14, 2025

@vouillon, do you have any opinion on this ?

@hhugo hhugo force-pushed the text-encode-decode branch from 55d1854 to 5fcbeec Compare May 21, 2025 11:09
@hhugo hhugo force-pushed the text-encode-decode branch 2 times, most recently from f7e0d1e to 0d39b69 Compare June 12, 2025 13:34
@hhugo hhugo marked this pull request as ready for review June 12, 2025 13:34
@hhugo hhugo added this to the 6.1 milestone Jun 12, 2025
@hhugo hhugo force-pushed the text-encode-decode branch from 0d39b69 to 406d45d Compare June 12, 2025 13:45
@hhugo hhugo merged commit f7ea981 into master Jun 12, 2025
26 checks passed
@hhugo hhugo deleted the text-encode-decode branch June 12, 2025 18:40
@OlivierNicole
Copy link
Contributor

A quick micro benchmark show a significant slowdown. master

n: 3, 0.167000
n: 7, 0.268000
n: 10, 0.371000
n: 20, 0.811000
n: 50, 1.989000
n: 100, 3.618000

this PR

n: 3, 1.209000
n: 7, 1.269000
n: 10, 1.354000
n: 20, 1.660000
n: 50, 2.632000
n: 100, 4.578000

Was this slowdown mitigated, or was the end decision to go with it?

@hhugo
Copy link
Member Author

hhugo commented Jul 3, 2025

Was this slowdown mitigated, or was the end decision to go with it?

The slowdown should only affect non-ascii strings.
Also, my assumption was that conversions between js and ml strings should be negligible compared to other computations.
Recent bugs found in our previous implementation made me decide to ignore the performance impact and merge this PR

@OlivierNicole
Copy link
Contributor

I see. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants