Skip to content

Commit 0c76adb

Browse files
committed
Add initial grammar documentation for the parser
Adds the initial documentation for the Prisma schema grammar (v1). This serves as a reference for the parser's implementation and provides a formal definition of the language being parsed.
1 parent 82f953a commit 0c76adb

File tree

2 files changed

+247
-0
lines changed

2 files changed

+247
-0
lines changed

src/core/parser/grammar/mod.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
// This module exists primarily to increase the visiblity of the
2+
// grammar and it's documentation
3+
// and provide a place for grammar-related utilities if needed.
4+
5+
// The v1.md file contains the formal EBNF grammar specification
6+
// and is included in the crate for documentation purposes.
7+
8+
#![doc = include_str!("./v1.md")]

src/core/parser/grammar/v1.md

Lines changed: 239 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,239 @@
1+
# Prisma Schema Grammar v1
2+
3+
This document defines the formal grammar for Prisma schema files, conforming to the parser implementation plan. The grammar is expressed in EBNF notation and serves as the authoritative specification for the parser implementation.
4+
5+
## Grammar Version
6+
7+
**Version:** `grammar_v1`\
8+
**Target Parser:** Prisma Types Generator Parser\
9+
**Compatibility:** This grammar defines the AST structure and parsing rules for the first version of the parser.
10+
11+
## Notation
12+
13+
- `*` - zero or more repetitions
14+
- `+` - one or more repetitions
15+
- `?` - optional (zero or one)
16+
- `|` - alternative
17+
- `()` - grouping
18+
- `''` - literal tokens
19+
- `/* ... */` - comments in grammar
20+
- Whitespace and comments are implicitly allowed between all tokens
21+
22+
## Top-Level Grammar
23+
24+
```ebnf
25+
schema := item*
26+
27+
item := model_decl
28+
| enum_decl
29+
| datasource_decl
30+
| generator_decl
31+
| type_decl /* experimental, gated */
32+
```
33+
34+
## Declaration Grammar
35+
36+
### Model Declaration
37+
38+
```ebnf
39+
model_decl := MODEL ident LEFT_BRACE model_member* RIGHT_BRACE
40+
41+
model_member := field_decl
42+
| block_attribute
43+
44+
field_decl := ident type_ref opt_marker? field_attribute*
45+
46+
opt_marker := OPTIONAL
47+
```
48+
49+
### Enum Declaration
50+
51+
```ebnf
52+
enum_decl := ENUM ident LEFT_BRACE enum_member* RIGHT_BRACE
53+
54+
enum_member := enum_value
55+
| block_attribute
56+
57+
enum_value := ident field_attribute*
58+
```
59+
60+
### Configuration Declarations
61+
62+
```ebnf
63+
datasource_decl := DATASOURCE ident LEFT_BRACE assignment* RIGHT_BRACE
64+
65+
generator_decl := GENERATOR ident LEFT_BRACE assignment* RIGHT_BRACE
66+
67+
assignment := ident ASSIGN expr
68+
```
69+
70+
### Experimental Declarations
71+
72+
```ebnf
73+
/* Gated by ParserOptions.experimental_blocks containing "type" */
74+
type_decl := TYPE ident ASSIGN type_ref
75+
```
76+
77+
## Type System Grammar
78+
79+
```ebnf
80+
type_ref := base_type (LIST)*
81+
82+
base_type := scalar_type
83+
| qualified_ident
84+
85+
scalar_type := STRING | INT | FLOAT | BOOLEAN | DATETIME | JSON | BYTES | DECIMAL
86+
87+
qualified_ident := ident (DOT ident)*
88+
```
89+
90+
## Attribute Grammar
91+
92+
```ebnf
93+
field_attribute := AT qualified_ident arglist?
94+
95+
block_attribute := DOUBLE_AT qualified_ident arglist?
96+
97+
arglist := LEFT_PAREN (arg (COMMA arg)* COMMA?)? RIGHT_PAREN
98+
99+
arg := expr /* positional argument */
100+
| ident COLON expr /* named argument */
101+
```
102+
103+
## Expression Grammar
104+
105+
```ebnf
106+
expr := literal
107+
| identref
108+
| func_call
109+
| array
110+
| object
111+
112+
literal := LITERAL /* string, int, float literals */
113+
| boolean_lit
114+
| null_lit
115+
116+
boolean_lit := 'true' | 'false' /* these would be LITERAL tokens */
117+
118+
null_lit := 'null' /* this would be a LITERAL token */
119+
120+
identref := qualified_ident
121+
122+
func_call := qualified_ident LEFT_PAREN (expr (COMMA expr)* COMMA?)? RIGHT_PAREN
123+
124+
array := LEFT_BRACKET (expr (COMMA expr)* COMMA?)? RIGHT_BRACKET
125+
126+
object := LEFT_BRACE (object_entry (COMMA object_entry)* COMMA?)? RIGHT_BRACE
127+
128+
object_entry := (ident | string_literal) COLON expr
129+
130+
string_literal := LITERAL /* string literal variant */
131+
```
132+
133+
## Lexical Elements
134+
135+
```ebnf
136+
ident := IDENTIFIER /* TokenType::Identifier(String) */
137+
```
138+
139+
## Complete Token Mapping
140+
141+
This grammar uses ALL tokens from `TokenType` enum in `src/core/scanner/tokens.rs`:
142+
143+
### Keywords -> `TokenType` mapping
144+
- `MODEL` -> `TokenType::Model`
145+
- `ENUM` -> `TokenType::Enum`
146+
- `DATASOURCE` -> `TokenType::DataSource`
147+
- `GENERATOR` -> `TokenType::Generator`
148+
- `TYPE` -> `TokenType::Type`
149+
150+
### Type Keywords -> `TokenType` mapping
151+
- `STRING` -> `TokenType::String`
152+
- `INT` -> `TokenType::Int`
153+
- `FLOAT` -> `TokenType::Float`
154+
- `BOOLEAN` -> `TokenType::Boolean`
155+
- `DATETIME` -> `TokenType::DateTime`
156+
- `JSON` -> `TokenType::Json`
157+
- `BYTES` -> `TokenType::Bytes`
158+
- `DECIMAL` -> `TokenType::Decimal`
159+
160+
### Literals -> `TokenType` mapping
161+
- `LITERAL` -> `TokenType::Literal(String)` (covers string, int, float, boolean, null)
162+
- `IDENTIFIER` -> `TokenType::Identifier(String)`
163+
164+
### Operators -> `TokenType` mapping
165+
- `ASSIGN` -> `TokenType::Assign` ('=')
166+
- `OPTIONAL` -> `TokenType::Optional` ('?')
167+
- `LIST` -> `TokenType::List` ('[]' - list-type marker)
168+
- `DOT` -> `TokenType::Dot` ('.')
169+
170+
### Punctuation -> `TokenType` mapping
171+
- `LEFT_BRACE` -> `TokenType::LeftBrace` ('{')
172+
- `RIGHT_BRACE` -> `TokenType::RightBrace` ('}')
173+
- `LEFT_BRACKET` -> `TokenType::LeftBracket` ('[')
174+
- `RIGHT_BRACKET` -> `TokenType::RightBracket` (']')
175+
- `LEFT_PAREN` -> `TokenType::LeftParen` ('(')
176+
- `RIGHT_PAREN` -> `TokenType::RightParen` (')')
177+
- `COMMA` -> `TokenType::Comma` (',')
178+
- `COLON` -> `TokenType::Colon` (':')
179+
- `AT` -> `TokenType::At` ('@')
180+
- `DOUBLE_AT` -> `TokenType::DoubleAt` ('@@')
181+
182+
### Comments -> `TokenType` mapping (handled by parser)
183+
- `TokenType::Comment(String)` - Regular comments, preserved for spans
184+
- `TokenType::DocComment(String)` - Documentation comments, attached to AST nodes
185+
186+
### Special Tokens -> `TokenType` mapping
187+
- `TokenType::Unsupported(String)` - Handled as parse errors
188+
- `TokenType::EOF` - End of input marker
189+
190+
## Grammar Properties
191+
192+
### Determinism
193+
- The grammar is designed to be LL(2) compatible with bounded lookahead
194+
- No left recursion or ambiguous productions
195+
- Each production has distinct first/follow sets
196+
197+
### Error Recovery
198+
- Synchronization points: `RIGHT_BRACE`, top-level keywords (`MODEL`, `ENUM`, `DATASOURCE`, `GENERATOR`)
199+
- Newline-sensitive recovery available within blocks (configurable)
200+
- Panic-mode recovery with meaningful error messages
201+
- `TokenType::Unsupported` tokens trigger parse errors with recovery
202+
203+
### Trailing Commas
204+
- Trailing commas are permitted in:
205+
- Argument lists `(...COMMA?)`
206+
- Array literals `[...COMMA?]`
207+
- Object literals `{...COMMA?}`
208+
- Behavior configurable via `ParserOptions.trailing_comma_policy`
209+
210+
### Documentation Comments
211+
- `TokenType::DocComment` preceding declarations attach as `Docs`
212+
- `TokenType::Comment` preserved for span accounting only
213+
- Association rules defined in parser implementation
214+
215+
## Semantic Notes
216+
217+
1. **Qualification**: `qualified_ident` allows namespaced references like `db.VarChar`
218+
2. **Type References**: All type references are symbolic; resolution happens in semantic analysis
219+
3. **Attributes**: Unknown attribute names are accepted; validation deferred to semantic phase
220+
4. **Expressions**: Full expression support for generator/datasource values and attribute arguments
221+
5. **Ordering**: Source order preserved for all declaration sequences
222+
6. **List Types**: `TokenType::List` represents the special `[]` marker for list types in Prisma
223+
224+
## Feature Gates
225+
226+
The following constructs require explicit enabling via `ParserOptions.experimental_blocks`:
227+
228+
- `type_decl` - requires `"type"` in `experimental_blocks` set
229+
230+
## Conformance
231+
232+
This grammar serves as the canonical specification for:
233+
- AST node structure validation
234+
- Parser conformance testing
235+
- Golden test generation
236+
- Error recovery behavior verification
237+
- Complete `TokenType` enum coverage
238+
239+
All parser implementations must produce equivalent AST structures for valid inputs conforming to this grammar.

0 commit comments

Comments
 (0)