Character unescaping improvements

Some issues with the current code in `Cesium.CodeGen.Ir.Expressions.Constants.CharConstant.UnescapeCharacter` and `Cesium.Parser.TokenExtensions.UnwrapStringLiteral`:
- [ ] There are two of them, with different implementations. There should be only one.
- [ ] `UnescapeCharacter` doesn't support `\u` and `\U` aka `universal-character-name` from the standard.
- [ ] `UnescapeCharacter` also has a bug in handling octal and hex sequences: both are considered to only have two digits, with special treatment of `\0`. While the standard defines octal sequences to be either one, two or three characters long, while the hex escapes are of arbitrary length.
- [ ] `\0` should not be a special case in either of the methods; it is just an octal number.
- [ ] `UnwrapStringLiteral` also seems to treat octal sequences weirdly: I only see support for octal numbers starting from `0` which is not correct (`UnescapeCharacter` handles these better).
- [ ] Normal compiler behavior is to report a warning on an invalid sequence (e.g. `\m`) and treat it as the character itself. We don't do this: we either silently accept or break on such sequences.

See also #295.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Character unescaping improvements #699

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Character unescaping improvements #699

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions