-
-
Notifications
You must be signed in to change notification settings - Fork 2
Description
The UTF-8 validation work (#69, #76) added a MessagePackValidateUTF8 primitive that walks a string using String.utf32 to detect invalid byte sequences. This works, but it's tangential to a MessagePack library — UTF-8 validation is a general-purpose string operation that belongs in the Pony standard library's String type.
Currently, Pony's String makes no guarantees about UTF-8 validity. There's no String.is_valid_utf8() or equivalent. Our validator depends on String.utf32's error-reporting convention ((0xFFFD, 1) for invalid sequences), which is an indirect and fragile way to check validity.
Once this library stabilizes its validation approach, an RFC should be proposed to add native UTF-8 validation to String in the stdlib. This would:
- Give all Pony libraries a standard, efficient way to validate UTF-8
- Remove the dependency on
String.utf32's undocumented error-reporting convention - Allow this library to replace
MessagePackValidateUTF8with a stdlib call
Relevant code: MessagePackValidateUTF8
Design discussion: #69