Clarify intended semantics of isize, usize, and regex types

Other than the `isize`, `usize`, and `regex` types, all of the reserved types in the spec have a simple consumer-independent procedure for checking whether a node is of that type: checking if it is in a certain fixed range for integer types or checking if it meets some regex for string types.

Because of this, it seems very sensible for a parser to reject a node like `(u8)300` or `(ipv4)"example.com"`, and for editor tooling to warn humans writing KDL documents when they are writing them that the nodes are malformed.

These three reserved types lack that property:

- The range for `isize` and `usize` depends on whether the producer or consumer's platform was intended, and the details of that platform.
- Whether a string is a valid regex or not depends on the actual regex syntax being used.

---

It feels like there should be multiple regex types for the different regex standards; e.g. `ere-regex` for POSIX extended regular expressions or `pcre-regex` for the PCRE library's flavor.

Unfortunately, there are so many flavors that most probably shouldn't get reserved names, but giving the ones that _do_ get reserved names makes it more clear what the intended semantics are, and makes it less likely that a parser consuming a KDL document accidentally parses a regex as being of the wrong flavor.

(If I had to pick and choose, I'd reserve/standardize `ecma-regex` and `ere-regex` today and leave the others, but this is a weakly held opinion.)

---

For `isize` and `usize`, I think the question comes down to the semantics of type annotations in general; I don't think I understand how they are supposed to interact with the intended data model.

In particular, are the following pairs of values treated as equivalent in the intended data model (and which if any of them should-in-a-SHOULD-sense be errors):

- `(non-reserved-type)#true` and `#true`
- `(u8)300` and `300`
- `(u8)300` and `(u8)44`
- `(usize)100` and `(u32)100`
- `(usize)100` and `(u64)100`
- `(usize)1000000000000` and `(usize)3567587328`
- `(usize)1000000000000` and `(u32)1000000000000`

Personally, I think standardizing the `isize` and `usize` types would be a mistake; it doesn't seem like a good choice to depend on the specifics of either the producer's or consumer's platform, and on many platforms (for example Common Lisp, JavaScript, OCaml, or Python, all for different reasons) the nearest "natural" semantics don't necessarily correspond to something one would expect from the Rust `isize` and `usize` types, or from the C `size_t`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarify intended semantics of isize, usize, and regex types #505

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Clarify intended semantics of isize, usize, and regex types #505

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions