-
Notifications
You must be signed in to change notification settings - Fork 56
Support UTF-16 as an additional encodingΒ #136
Description
From WebAssembly/design#1419 (comment):
@lukewagner: I think it would make sense to talk about supporting UTF-16 as an additional encoding in the canonical ABI of string. But that's a whole separate topic with a few options, so I don't want to mix that up with the abstract string semantics which need to be understood first.
I am very interested in making this happen, as it would already be a considerable improvement for languages using a 16-bit Unicode representation. What I could imagine currently is having either separate instructions, an immediate (but then it may as well be separate instructions I guess) or a parameter. For example:
list.lift_utf8 [...]
list.lift_utf16 [...]
list.is_utf8 [...]
list.is_utf16 [...]
list.lower_utf8 [...]
list.lower_utf16 [...]
Is that what you had in mind? If not, I am of course very interested in the other options :)
It may also be worthwhile to consider list.lift_latin1, which corresponds to narrow UTF-16 (with the high zero bytes left out), as it is a common optimization strategy in UTF-16 languages (to save memory and better utilize the CPU cache when possible). I do not feel strongly about whether or not we need the latter in an MVP already, though.