-
Notifications
You must be signed in to change notification settings - Fork 59
Open
Description
(I've the impression this should be an FAQ but could not find any discussion on this:)
Atdgen maps ATD “strings” to JSON strings which are supposed to be valid Unicode (UTF-8 in practice), and also directly to OCaml string values which can be arbitrary byte-arrays.
- This makes it very easy to generate invalid JSON which then fails with other parsers:, e.g., this Gist shows Jsonm failing with
"illegal bytes in character stream"whileJ.string_of_t0 |> J.t0_of_stringsucceeds. - The “data-encoding” world often uses this as default solution for byte-arrays: https://gitlab.com/nomadic-labs/data-encoding/-/blob/master/src/json.ml#L125-L145 → if a string is not UTF-8 it becomes an array of ints.
Should Mod_j functions have the option failing earlier if an input string is not valid? (I guess that would be having default or first-class-citizen validator entries? -j-pp seems to only work in one direction).
Does it make sense to add a byte-array core type to ATD?
Many tools already just don't care, should this just be documented somewhere properly?
Right now the ATD definition doc just says “Sequence of bytes or characters” …
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels