Add initial grammar documentation for the parser

mkpro118 · mkpro118 · commit 0c76adbb1a95 · 2025-09-08T01:05:42.000-07:00
Adds the initial documentation for the Prisma schema grammar (v1).
This serves as a reference for the parser's implementation and
provides a formal definition of the language being parsed.
diff --git a/src/core/parser/grammar/mod.rs b/src/core/parser/grammar/mod.rs
@@ -0,0 +1,8 @@
+// This module exists primarily to increase the visiblity of the
+// grammar and it's documentation
+// and provide a place for grammar-related utilities if needed.
+
+// The v1.md file contains the formal EBNF grammar specification
+// and is included in the crate for documentation purposes.
+
+#![doc = include_str!("./v1.md")]
diff --git a/src/core/parser/grammar/v1.md b/src/core/parser/grammar/v1.md
@@ -0,0 +1,239 @@
+# Prisma Schema Grammar v1
+
+This document defines the formal grammar for Prisma schema files, conforming to the parser implementation plan. The grammar is expressed in EBNF notation and serves as the authoritative specification for the parser implementation.
+
+## Grammar Version
+
+**Version:** `grammar_v1`\
+**Target Parser:** Prisma Types Generator Parser\
+**Compatibility:** This grammar defines the AST structure and parsing rules for the first version of the parser.
+
+## Notation
+
+- `*` - zero or more repetitions
+- `+` - one or more repetitions  
+- `?` - optional (zero or one)
+- `|` - alternative
+- `()` - grouping
+- `''` - literal tokens
+- `/* ... */` - comments in grammar
+- Whitespace and comments are implicitly allowed between all tokens
+
+## Top-Level Grammar
+
+```ebnf
+schema        := item*
+
+item          := model_decl 
+               | enum_decl 
+               | datasource_decl 
+               | generator_decl 
+               | type_decl        /* experimental, gated */
+```
+
+## Declaration Grammar
+
+### Model Declaration
+
+```ebnf
+model_decl    := MODEL ident LEFT_BRACE model_member* RIGHT_BRACE
+
+model_member  := field_decl 
+               | block_attribute
+
+field_decl    := ident type_ref opt_marker? field_attribute*
+
+opt_marker    := OPTIONAL
+```
+
+### Enum Declaration
+
+```ebnf
+enum_decl     := ENUM ident LEFT_BRACE enum_member* RIGHT_BRACE
+
+enum_member   := enum_value 
+               | block_attribute
+
+enum_value    := ident field_attribute*
+```
+
+### Configuration Declarations
+
+```ebnf
+datasource_decl := DATASOURCE ident LEFT_BRACE assignment* RIGHT_BRACE
+
+generator_decl  := GENERATOR ident LEFT_BRACE assignment* RIGHT_BRACE
+
+assignment      := ident ASSIGN expr
+```
+
+### Experimental Declarations
+
+```ebnf
+/* Gated by ParserOptions.experimental_blocks containing "type" */
+type_decl       := TYPE ident ASSIGN type_ref
+```
+
+## Type System Grammar
+
+```ebnf
+type_ref      := base_type (LIST)*
+
+base_type     := scalar_type
+               | qualified_ident
+
+scalar_type   := STRING | INT | FLOAT | BOOLEAN | DATETIME | JSON | BYTES | DECIMAL
+
+qualified_ident := ident (DOT ident)*
+```
+
+## Attribute Grammar
+
+```ebnf
+field_attribute := AT qualified_ident arglist?
+
+block_attribute := DOUBLE_AT qualified_ident arglist?
+
+arglist         := LEFT_PAREN (arg (COMMA arg)* COMMA?)? RIGHT_PAREN
+
+arg             := expr                    /* positional argument */
+                 | ident COLON expr        /* named argument */
+```
+
+## Expression Grammar
+
+```ebnf
+expr          := literal
+               | identref
+               | func_call
+               | array
+               | object
+
+literal       := LITERAL                  /* string, int, float literals */
+               | boolean_lit
+               | null_lit
+
+boolean_lit   := 'true' | 'false'         /* these would be LITERAL tokens */
+
+null_lit      := 'null'                   /* this would be a LITERAL token */
+
+identref      := qualified_ident
+
+func_call     := qualified_ident LEFT_PAREN (expr (COMMA expr)* COMMA?)? RIGHT_PAREN
+
+array         := LEFT_BRACKET (expr (COMMA expr)* COMMA?)? RIGHT_BRACKET
+
+object        := LEFT_BRACE (object_entry (COMMA object_entry)* COMMA?)? RIGHT_BRACE
+
+object_entry  := (ident | string_literal) COLON expr
+
+string_literal := LITERAL                 /* string literal variant */
+```
+
+## Lexical Elements
+
+```ebnf
+ident         := IDENTIFIER               /* TokenType::Identifier(String) */
+```
+
+## Complete Token Mapping
+
+This grammar uses ALL tokens from `TokenType` enum in `src/core/scanner/tokens.rs`:
+
+### Keywords -> `TokenType` mapping
+- `MODEL` -> `TokenType::Model`
+- `ENUM` -> `TokenType::Enum` 
+- `DATASOURCE` -> `TokenType::DataSource`
+- `GENERATOR` -> `TokenType::Generator`
+- `TYPE` -> `TokenType::Type`
+
+### Type Keywords -> `TokenType` mapping
+- `STRING` -> `TokenType::String`
+- `INT` -> `TokenType::Int`
+- `FLOAT` -> `TokenType::Float`
+- `BOOLEAN` -> `TokenType::Boolean`
+- `DATETIME` -> `TokenType::DateTime`
+- `JSON` -> `TokenType::Json`
+- `BYTES` -> `TokenType::Bytes`
+- `DECIMAL` -> `TokenType::Decimal`
+
+### Literals -> `TokenType` mapping
+- `LITERAL` -> `TokenType::Literal(String)` (covers string, int, float, boolean, null)
+- `IDENTIFIER` -> `TokenType::Identifier(String)`
+
+### Operators -> `TokenType` mapping
+- `ASSIGN` -> `TokenType::Assign` ('=')
+- `OPTIONAL` -> `TokenType::Optional` ('?')
+- `LIST` -> `TokenType::List` ('[]' - list-type marker)
+- `DOT` -> `TokenType::Dot` ('.')
+
+### Punctuation -> `TokenType` mapping
+- `LEFT_BRACE` -> `TokenType::LeftBrace` ('{')
+- `RIGHT_BRACE` -> `TokenType::RightBrace` ('}')
+- `LEFT_BRACKET` -> `TokenType::LeftBracket` ('[')
+- `RIGHT_BRACKET` -> `TokenType::RightBracket` (']')
+- `LEFT_PAREN` -> `TokenType::LeftParen` ('(')
+- `RIGHT_PAREN` -> `TokenType::RightParen` (')')
+- `COMMA` -> `TokenType::Comma` (',')
+- `COLON` -> `TokenType::Colon` (':')
+- `AT` -> `TokenType::At` ('@')
+- `DOUBLE_AT` -> `TokenType::DoubleAt` ('@@')
+
+### Comments -> `TokenType` mapping (handled by parser)
+- `TokenType::Comment(String)` - Regular comments, preserved for spans
+- `TokenType::DocComment(String)` - Documentation comments, attached to AST nodes
+
+### Special Tokens -> `TokenType` mapping
+- `TokenType::Unsupported(String)` - Handled as parse errors
+- `TokenType::EOF` - End of input marker
+
+## Grammar Properties
+
+### Determinism
+- The grammar is designed to be LL(2) compatible with bounded lookahead
+- No left recursion or ambiguous productions
+- Each production has distinct first/follow sets
+
+### Error Recovery
+- Synchronization points: `RIGHT_BRACE`, top-level keywords (`MODEL`, `ENUM`, `DATASOURCE`, `GENERATOR`)
+- Newline-sensitive recovery available within blocks (configurable)
+- Panic-mode recovery with meaningful error messages
+- `TokenType::Unsupported` tokens trigger parse errors with recovery
+
+### Trailing Commas
+- Trailing commas are permitted in:
+  - Argument lists `(...COMMA?)`
+  - Array literals `[...COMMA?]`
+  - Object literals `{...COMMA?}`
+- Behavior configurable via `ParserOptions.trailing_comma_policy`
+
+### Documentation Comments
+- `TokenType::DocComment` preceding declarations attach as `Docs`
+- `TokenType::Comment` preserved for span accounting only
+- Association rules defined in parser implementation
+
+## Semantic Notes
+
+1. **Qualification**: `qualified_ident` allows namespaced references like `db.VarChar`
+2. **Type References**: All type references are symbolic; resolution happens in semantic analysis
+3. **Attributes**: Unknown attribute names are accepted; validation deferred to semantic phase
+4. **Expressions**: Full expression support for generator/datasource values and attribute arguments
+5. **Ordering**: Source order preserved for all declaration sequences
+6. **List Types**: `TokenType::List` represents the special `[]` marker for list types in Prisma
+
+## Feature Gates
+
+The following constructs require explicit enabling via `ParserOptions.experimental_blocks`:
+
+- `type_decl` - requires `"type"` in `experimental_blocks` set
+
+## Conformance
+
+This grammar serves as the canonical specification for:
+- AST node structure validation
+- Parser conformance testing
+- Golden test generation
+- Error recovery behavior verification
+- Complete `TokenType` enum coverage
+
+All parser implementations must produce equivalent AST structures for valid inputs conforming to this grammar.