Commit fb4fe38
authored
feat: implement mapping language lexer with full token set (#43)
- Implement tokenizer for the morph mapping DSL with complete token coverage:
18 keywords (rename, select, drop, set, default, cast, as, where, sort,
each, when, not, and, or, flatten, nest, asc, desc)
- 13 operators (-> = == != > >= < <= + - * / %)
- 8 delimiters ({ } ( ) [ ] , .)
- String literals with escape sequences (\" \\ \n \t \r \uXXXX)
and direct UTF-8 support
- Number literals: integers, floats, scientific notation, negative numbers
with context-aware unary minus vs subtraction operator disambiguation
- Boolean (true/false) and null literals
- Identifiers for function names and field references
- Newlines as statement separators (collapsed)
- Line comments (# ...)
- Span tracking (line:column) for error reporting
- Add 100+ tests covering: every keyword, operator, and delimiter individually,
string escapes (quote, backslash, newline, tab, CR, unicode BMP, UTF-8),
number formats (int, float, negative, scientific, large), paths (.a.b.c,
.[0], .[*], .["key"]), comments, whitespace handling, newline collapsing,
full statements (rename, set, select, cast, sort, each blocks, function
calls), subtraction vs negative number disambiguation, error cases
(unterminated strings, invalid characters, invalid escapes, invalid numbers),
span correctness, and edge cases
Fixes #151 parent 10b944d commit fb4fe38
1 file changed
+1658
-1
lines changed
0 commit comments