Skip to content

Design RFC for stdlib String UTF-8 validation #77

@SeanTAllen

Description

@SeanTAllen

The UTF-8 validation work (#69, #76) added a MessagePackValidateUTF8 primitive that walks a string using String.utf32 to detect invalid byte sequences. This works, but it's tangential to a MessagePack library — UTF-8 validation is a general-purpose string operation that belongs in the Pony standard library's String type.

Currently, Pony's String makes no guarantees about UTF-8 validity. There's no String.is_valid_utf8() or equivalent. Our validator depends on String.utf32's error-reporting convention ((0xFFFD, 1) for invalid sequences), which is an indirect and fragile way to check validity.

Once this library stabilizes its validation approach, an RFC should be proposed to add native UTF-8 validation to String in the stdlib. This would:

  • Give all Pony libraries a standard, efficient way to validate UTF-8
  • Remove the dependency on String.utf32's undocumented error-reporting convention
  • Allow this library to replace MessagePackValidateUTF8 with a stdlib call

Relevant code: MessagePackValidateUTF8
Design discussion: #69

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions