Replies: 1 comment
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
In the context of the WebAssembly specification/CG, a discussion is ongoing whether the envisioned UTF-8
string
type in the Interface Types proposal (basically interop between WebAssembly modules and/or JavaScript) would be minimum viable or not.Since C# represents and exposes strings as potentially ill-formed UTF-16 (sometimes simply called WTF-16), an Interface Types
string
type that enforces well-formedness would make it impossible, and unrecoverable, to roundtrip a string containing an isolated surrogate (e.g. half of a musical symbol or emoji) through it, sayThe current direction of travel in the Wasm spec is to not account for this case and consider isolated surrogates a "bug" that must be "fixed", and I have grown very worried about potential detrimental effects on WTF-16 languages that have deliberately chosen to allow isolated surrogates for backwards-compatibility reasons of their string APIs, say when a requirement is to
substring(0, 1)
), which currently magically works, but when using the Interface Typesstring
type could introduce anything from annoyances to hazards.Once the decision is made, and UTF-8/USVs are chosen, WTF-16 languages will have to live with it and document that potential data corruption can occur when using the Interface Types
string
type. Potential alternatives to avoid this outcome are to specify a suitable escape hatch or "relax" thestring
type to match WTF-8 semantics (basically un-disallows encoding isolated surrogates), which unlike UTF-8 can roundtrip any WTF-16 string. However, it currently appears that these alternatives will not find consensus when it comes to a vote.There are a few more unpleasant side-effects like having to unnecessarily double re-encode from WTF-16 to UTF-8 (lossy) back to UTF-16 for each
string
parameter/return in an Interface Types call, which could put affected languages at a performance and/or code-size disadvantage, but I am not sure how important these are in comparison.If you have an opinion on the matter, I would be happy if you could share it within the respective WebAssembly CG discussion thread before it is too late. The thread over there also has a presentation video for those who are interested in more detailed background on the concepts involved, and there will be a discussion slot on June 22nd in the WebAssembly CG video meeting.
Please share this discussion with those you think should be aware. Happy coding! :)
Beta Was this translation helpful? Give feedback.
All reactions