|
| 1 | +--- |
| 2 | +id: cypher-dialect-parity-for-common |
| 3 | +level: task |
| 4 | +title: "Cypher dialect parity for common queries" |
| 5 | +short_code: "GQLITE-T-0096" |
| 6 | +created_at: 2026-02-07T02:09:56.178299+00:00 |
| 7 | +updated_at: 2026-02-07T16:08:59.284721+00:00 |
| 8 | +parent: |
| 9 | +blocked_by: [] |
| 10 | +archived: false |
| 11 | + |
| 12 | +tags: |
| 13 | + - "#task" |
| 14 | + - "#feature" |
| 15 | + - "#phase/completed" |
| 16 | + |
| 17 | + |
| 18 | +exit_criteria_met: false |
| 19 | +strategy_id: NULL |
| 20 | +initiative_id: NULL |
| 21 | +--- |
| 22 | + |
| 23 | +# Cypher dialect parity for common queries |
| 24 | + |
| 25 | +**GitHub Issue**: [#13](https://github.com/colliery-io/graphqlite/issues/13) |
| 26 | + |
| 27 | +## Objective |
| 28 | + |
| 29 | +Expand the Cypher parser to accept common Memgraph/Neo4j-style syntax, eliminating the need for client-side string rewrites for bracket access, backtick identifiers, nested dot access, and trailing semicolons. (IN list literals already work — verified with 47+ passing tests.) |
| 30 | + |
| 31 | +## Backlog Item Details |
| 32 | + |
| 33 | +### Type |
| 34 | +- [x] Feature - New functionality or enhancement |
| 35 | + |
| 36 | +### Priority |
| 37 | +- [ ] P1 - High (important for user experience) |
| 38 | + |
| 39 | +### Business Justification |
| 40 | +- **User Value**: Users can write natural Cypher queries without client-side rewriting hacks |
| 41 | +- **Business Value**: Reduces friction for adoption by Neo4j/Memgraph users; removes a class of client bugs |
| 42 | +- **Effort Estimate**: M - Parser/grammar changes (phases 1-2 trivial, phases 3-4 medium — shared transform work). IN lists already done. |
| 43 | + |
| 44 | +## Acceptance Criteria |
| 45 | + |
| 46 | +## Acceptance Criteria |
| 47 | + |
| 48 | +## Acceptance Criteria |
| 49 | + |
| 50 | +## Acceptance Criteria |
| 51 | + |
| 52 | +- [ ] Trailing semicolons are accepted without error |
| 53 | +- [ ] Backtick identifiers work in property access: `n.\`special-key\`` |
| 54 | +- [ ] Nested dot access parses and executes: `p.metadata.name` |
| 55 | +- [ ] Bracket property access parses and executes: `p['status']['phase']` |
| 56 | +- [x] ~~List literals in expressions work: `IN ['Failed','Unknown']`~~ (already implemented — 47+ tests passing in `23_in_operator.sql`) |
| 57 | +- [ ] Existing accepted syntax continues to work (no regressions) |
| 58 | + |
| 59 | +## Implementation Plan |
| 60 | + |
| 61 | +### Approach |
| 62 | + |
| 63 | +Two commits: |
| 64 | + |
| 65 | +1. **Commit A — Trailing semicolons** (grammar only, trivial) |
| 66 | +2. **Commit B — Backtick + nested dot + bracket chaining** (grammar refactor + transform layer, batched) |
| 67 | + |
| 68 | +--- |
| 69 | + |
| 70 | +### Commit A: Trailing Semicolons |
| 71 | + |
| 72 | +**File:** `src/backend/parser/cypher_gram.y` — `stmt` rule (line 159) |
| 73 | + |
| 74 | +Add two productions: |
| 75 | + |
| 76 | +``` |
| 77 | +| union_query ';' |
| 78 | + { $$ = $1; context->result = $1; } |
| 79 | +| EXPLAIN union_query ';' |
| 80 | + { if ($2->type == AST_NODE_QUERY) { ((cypher_query*)$2)->explain = true; } |
| 81 | + $$ = $2; context->result = $2; } |
| 82 | +``` |
| 83 | + |
| 84 | +Scanner already tokenizes `;` as `CYPHER_TOKEN_CHAR` (line 243 of `cypher_scanner.l`). |
| 85 | + |
| 86 | +**Test:** New `tests/functional/32_dialect_parity.sql` — Section 1 with `RETURN 1;`, `MATCH (n) RETURN n;`, `EXPLAIN MATCH (n) RETURN n;`. |
| 87 | + |
| 88 | +Build, test, commit. |
| 89 | + |
| 90 | +--- |
| 91 | + |
| 92 | +### Commit B: Backtick + Nested Dot + Bracket Chaining |
| 93 | + |
| 94 | +#### Grammar changes (`src/backend/parser/cypher_gram.y`) |
| 95 | + |
| 96 | +**B1. Add 3 productions to `expr` rule** (after line 987): |
| 97 | + |
| 98 | +``` |
| 99 | +| expr '.' IDENTIFIER |
| 100 | + { $$ = (ast_node*)make_property($1, $3, @3.first_line); free($3); } |
| 101 | +| expr '.' BQIDENT |
| 102 | + { $$ = (ast_node*)make_property($1, $3, @3.first_line); free($3); } |
| 103 | +| expr '[' expr ']' |
| 104 | + { $$ = (ast_node*)make_subscript($1, $3, @2.first_line); } |
| 105 | +``` |
| 106 | + |
| 107 | +Enables: `n.\`special-key\``, `p.metadata.name`, `p['status']['phase']`. |
| 108 | + |
| 109 | +**B2. Remove redundant rules from `primary_expr`:** |
| 110 | + |
| 111 | +- `IDENTIFIER '.' IDENTIFIER` (line 1002) |
| 112 | +- `END_P '.' IDENTIFIER` (line 1009) — replaced by adding END_P to `identifier` rule |
| 113 | +- `IDENTIFIER '[' expr ']'` (line 1023) |
| 114 | +- `'(' expr ')' '[' expr ']'` (line 1030) |
| 115 | +- `list_literal '[' expr ']'` (line 1035) |
| 116 | + |
| 117 | +All subsumed by the new `expr` productions. |
| 118 | + |
| 119 | +**B3. Add `END_P` to `identifier` rule:** |
| 120 | + |
| 121 | +``` |
| 122 | +| END_P { $$ = make_identifier(strdup("end"), @1.first_line); } |
| 123 | +``` |
| 124 | + |
| 125 | +**B4. Add BQIDENT variants to `remove_item`** (line 577): |
| 126 | + |
| 127 | +``` |
| 128 | +| IDENTIFIER '.' BQIDENT { ... same as IDENTIFIER '.' IDENTIFIER ... } |
| 129 | +| BQIDENT '.' IDENTIFIER { ... } |
| 130 | +| BQIDENT '.' BQIDENT { ... } |
| 131 | +``` |
| 132 | + |
| 133 | +(SET uses `expr '=' expr` so gets backtick support for free.) |
| 134 | + |
| 135 | +**B5. Update `%expect` / `%expect-rr`** (lines 35-36): |
| 136 | + |
| 137 | +Run `bison` and adjust conflict counts. |
| 138 | + |
| 139 | +#### Transform changes |
| 140 | + |
| 141 | +**B6. `transform_property_access()` in `transform_expr_ops.c`** (line 314): |
| 142 | + |
| 143 | +Replace hard rejection at line 319 with recursive handling: |
| 144 | + |
| 145 | +```c |
| 146 | +if (prop->expr->type == AST_NODE_PROPERTY) { |
| 147 | + /* Nested: p.metadata.name → json_extract(p.metadata_sql, '$.name') */ |
| 148 | + append_sql(ctx, "json_extract("); |
| 149 | + if (transform_property_access(ctx, (cypher_property*)prop->expr) < 0) return -1; |
| 150 | + append_sql(ctx, ", '$."); append_sql(ctx, prop->property_name); append_sql(ctx, "')"); |
| 151 | + return 0; |
| 152 | +} |
| 153 | +if (prop->expr->type == AST_NODE_SUBSCRIPT) { |
| 154 | + /* list[0].name → json_extract(list_subscript_sql, '$.name') */ |
| 155 | + append_sql(ctx, "json_extract("); |
| 156 | + if (transform_expression(ctx, prop->expr) < 0) return -1; |
| 157 | + append_sql(ctx, ", '$."); append_sql(ctx, prop->property_name); append_sql(ctx, "')"); |
| 158 | + return 0; |
| 159 | +} |
| 160 | +/* Existing AST_NODE_IDENTIFIER check remains below */ |
| 161 | +``` |
| 162 | + |
| 163 | +**B7. Subscript string-key normalization in `transform_return.c`** (line 661): |
| 164 | + |
| 165 | +At top of `AST_NODE_SUBSCRIPT` case, normalize `p['key']` → property access: |
| 166 | + |
| 167 | +```c |
| 168 | +cypher_subscript *subscript = (cypher_subscript*)expr; |
| 169 | +if (subscript->index->type == AST_NODE_LITERAL) { |
| 170 | + cypher_literal *idx_lit = (cypher_literal*)subscript->index; |
| 171 | + if (idx_lit->literal_type == LITERAL_STRING) { |
| 172 | + if (subscript->expr->type == AST_NODE_IDENTIFIER || |
| 173 | + subscript->expr->type == AST_NODE_PROPERTY) { |
| 174 | + cypher_property temp_prop; |
| 175 | + memset(&temp_prop, 0, sizeof(temp_prop)); |
| 176 | + temp_prop.base.type = AST_NODE_PROPERTY; |
| 177 | + temp_prop.expr = subscript->expr; |
| 178 | + temp_prop.property_name = idx_lit->value.string; |
| 179 | + return transform_property_access(ctx, &temp_prop); |
| 180 | + } |
| 181 | + } |
| 182 | +} |
| 183 | +/* ... existing json_extract logic below unchanged ... */ |
| 184 | +``` |
| 185 | + |
| 186 | +#### Tests |
| 187 | + |
| 188 | +New `tests/functional/32_dialect_parity.sql` sections: |
| 189 | +- **Section 2:** Backtick property access (`n.\`special-key\``) |
| 190 | +- **Section 3:** Nested dot access (JSON-valued property + `n.metadata.name`) |
| 191 | +- **Section 4:** Bracket chaining (`n['status']`, `n['status']['phase']`) |
| 192 | +- **Section 5:** Mixed (`n['metadata'].name`, `n.items[0]`) |
| 193 | + |
| 194 | +Also un-skip `tests/functional/09_edge_cases.sql` line 199-202. |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +### Files Modified |
| 199 | + |
| 200 | +| File | Change | |
| 201 | +|------|--------| |
| 202 | +| `src/backend/parser/cypher_gram.y` | `stmt` + `;`, `expr '.' IDENT/BQIDENT`, `expr '[' expr ']'`, remove redundant `primary_expr` rules, END_P in `identifier`, BQIDENT in `remove_item`, `%expect` update | |
| 203 | +| `src/backend/transform/transform_expr_ops.c` | `transform_property_access()` handles nested `AST_NODE_PROPERTY` and `AST_NODE_SUBSCRIPT` base | |
| 204 | +| `src/backend/transform/transform_return.c` | String-key-to-property normalization in `AST_NODE_SUBSCRIPT` case | |
| 205 | +| `tests/functional/32_dialect_parity.sql` | **New** — tests for all 4 features | |
| 206 | +| `tests/functional/09_edge_cases.sql` | Un-skip nested property test (line 199) | |
| 207 | + |
| 208 | +### Out of Scope |
| 209 | + |
| 210 | +- SET/REMOVE of nested properties (`SET n.metadata.name = 'foo'`) — requires JSON path updates, much more complex |
| 211 | +- Nested/bracket access is read-only (RETURN, WHERE, WITH contexts) |
| 212 | + |
| 213 | +### Verification |
| 214 | + |
| 215 | +1. `angreal build extension` — bison/flex compile clean |
| 216 | +2. `angreal test unit` — CUnit tests pass |
| 217 | +3. `angreal test functional` — all SQL tests pass including new `32_dialect_parity.sql` |
| 218 | +4. `angreal test all` — no regressions |
| 219 | + |
| 220 | +## Status Updates |
| 221 | + |
| 222 | +### 2026-02-07: Code Review & Triage |
| 223 | + |
| 224 | +**Code areas reviewed:** |
| 225 | +- `src/backend/parser/cypher_gram.y` — `stmt` rule (line 159), property access (lines 1002-1015), subscript (lines 1023-1039), BQIDENT usage (25 sites), `%expect 4`/`%expect-rr 3` (line 35-36), `%left '.'` (line 152), `%dprec` on list_literal/list_comprehension (lines 996-997) |
| 226 | +- `src/backend/transform/transform_expr_ops.c` — `transform_property_access()` (line 314), hard rejects non-IDENTIFIER base at line 319 with "Complex property access not yet supported" |
| 227 | +- `src/backend/transform/transform_return.c` — `AST_NODE_SUBSCRIPT` handler (lines 661-691), generates `json_extract()` with negative-index support |
| 228 | + |
| 229 | +**Feature status after review:** |
| 230 | + |
| 231 | +| Feature | Status | Evidence | |
| 232 | +|---------|--------|----------| |
| 233 | +| Trailing semicolons | NOT SUPPORTED | `stmt` rule has no `;` production | |
| 234 | +| Backtick property access | NOT SUPPORTED | BQIDENT accepted in labels, variables, rel types, map keys — but NOT in property access (`IDENTIFIER '.' IDENTIFIER` only at line 1002) | |
| 235 | +| Nested dot access | NOT SUPPORTED | `transform_property_access()` rejects non-IDENTIFIER base at line 319 | |
| 236 | +| Bracket property access | PARTIAL | `IDENTIFIER '[' expr ']'` works (line 1023) but no chaining (`expr '[' expr ']'` missing) | |
| 237 | +| IN list literals | ALREADY WORKS | 47+ tests passing in `tests/functional/23_in_operator.sql` using `["Alice", "Bob"]` syntax | |
| 238 | + |
| 239 | +**Test harness findings:** |
| 240 | +- `tests/functional/09_edge_cases.sql:199-202` — Nested dot access test **explicitly SKIPPED** (`n.level1.level2.level3`) |
| 241 | +- `tests/functional/23_in_operator.sql` — IN with list literals extensively tested and **passing** (strings, ints, NOT IN, edge cases) |
| 242 | +- No existing tests for trailing semicolons, bracket access chaining, or backtick property access |
| 243 | +- No tests are erroneously failing for these features — they're either skipped or not tested |
| 244 | + |
| 245 | +**Revised scope — IN list literals should be removed from acceptance criteria (already done). Remaining work:** |
| 246 | +1. Trailing semicolons — grammar-only, trivial |
| 247 | +2. Backtick property access — add `BQIDENT` variants to property access rules in `primary_expr` |
| 248 | +3. Nested dot access — move property access to `expr` rule + update transform layer |
| 249 | +4. Bracket access chaining — move subscript to `expr` rule + transform normalization for string keys |
0 commit comments