perf: detect Latin1-encodable strings at intern time instead of per-c…#4896
Conversation
…all in ToJsString (boa-dev#4881)
Test262 conformance changes
Fixed tests (2):Broken tests (183):Tested main commit: |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4896 +/- ##
===========================================
+ Coverage 47.24% 58.51% +11.27%
===========================================
Files 476 559 +83
Lines 46892 61473 +14581
===========================================
+ Hits 22154 35972 +13818
- Misses 24738 25501 +763 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Hi @linisha15 , Thanks for working on this! I was also exploring this issue since I opened it. I noticed that the formatting and lint checks are currently failing. You may want to run Looking forward to the updated changes. |
|
please fix CI first |
|
@rajat552 Thanks for the feedback! I clicked “Update branch” to sync with the latest |
|
Hi @rajat552, thanks for the feedback . CI has been fixed , the formatting and lint issues are resolved in the latest commit. |
|
Hi! It looks like the CI workflows are currently awaiting maintainer approval before they can run. Once they are approved, the checks should start automatically. Please let me know if anything else is needed from my side. Thanks! |
boa-dev#4896) This Pull Request closes boa-dev#4881 Background- When the bytecode compiler converts an interned string (`Sym`) to a `JsString`, it needs to decide whether to store it as Latin1 (1 byte per character) or UTF-16 (2 bytes per character). Previously, this was done by scanning every character of the string on each call — even if the same string was used many times. What changed- - The `Interner` now checks once, at the moment a string is first stored, whether all its characters fit in Latin1 (code point ≤ U+00FF). The result is saved in a new `latin1_flags` field. - A new `is_latin1(sym)` method lets callers read that saved result instantly, without re-scanning the string. - `ToJsString for Sym` in both `boa_ast` and `boa_engine::bytecompiler` now calls `is_latin1()` instead of scanning the string's characters every time. - `From<&str> for JsString` was also fixed to correctly produce a Latin1 string for characters in the U+0080–U+00FF range, not just plain ASCII.
This Pull Request
closes #4881
Background-
When the bytecode compiler converts an interned string (
Sym) to aJsString, it needs to decide whether to store it as Latin1 (1 byte per character) or UTF-16 (2 bytes per character). Previously, this was done by scanning every character of the string on each call — even if the same string was used many times.What changed-
The
Internernow checks once, at the moment a string is first stored, whether all its characters fit in Latin1 (code point ≤ U+00FF). The result is saved in a newlatin1_flagsfield.A new
is_latin1(sym)method lets callers read that saved result instantly, without re-scanning the string.ToJsString for Symin bothboa_astandboa_engine::bytecompilernow callsis_latin1()instead of scanning the string's characters every time.From<&str> for JsStringwas also fixed to correctly produce a Latin1 string for characters in the U+0080–U+00FF range, not just plain ASCII.