Skip to content

Commit aae4e4e

Browse files
committed
Add AST translation strategy document
1 parent d2dafb5 commit aae4e4e

File tree

2 files changed

+163
-0
lines changed

2 files changed

+163
-0
lines changed

β€ŽAST_RESEARCH.mdβ€Ž

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# PostgreSQL AST Research
2+
3+
This document summarizes differences in the AST type definitions shipped with the `libpg-query-node` project. Each PostgreSQL version directory under `types/` contains TypeScript definitions generated from PostgreSQL's protobuf specification.
4+
5+
## Versioned type files
6+
7+
| Version | Lines in `types.ts` | Lines in `enums.ts` |
8+
|---------|--------------------|---------------------|
9+
| 13 | 2098 | 61 |
10+
| 14 | 2159 | 61 |
11+
| 15 | 2219 | 63 |
12+
| 16 | 2325 | 68 |
13+
| 17 | 2484 | 75 |
14+
15+
Line counts increase steadily and provide a proxy for growing AST complexity.
16+
17+
## Diff sizes between versions
18+
19+
Measured using `diff -u` on the generated `types.ts` and `enums.ts` files.
20+
21+
| Versions compared | Diff lines `types.ts` | Diff lines `enums.ts` |
22+
|-------------------|----------------------|----------------------|
23+
| 13 β†’ 14 | 287 | 55 |
24+
| 14 β†’ 15 | 304 | 47 |
25+
| 15 β†’ 16 | 2634 | 43 |
26+
| 16 β†’ 17 | 401 | 58 |
27+
| 13 β†’ 17 | 2911 | 91 |
28+
29+
Versions 13–15 show relatively small changes. A dramatic increase occurs between PG15 and PG16 with over 2600 changed lines, reflecting large parser changes. PG17 differs from PG16 by about 400 lines.
30+
31+
## Observed differences
32+
33+
A brief inspection of the diffs highlights:
34+
35+
- New enum values across versions (e.g. additional `WCOKind` options in 15, `JsonEncoding` variants in 16).
36+
- New node interfaces such as `ReturnStmt`, `PLAssignStmt`, and JSON-related constructs appearing in later versions.
37+
- Some existing structures rename or move fields (for example `relkind` β†’ `objtype` in `AlterTableStmt`).
38+
### Enum differences by version
39+
40+
| Version | New enum types |
41+
|---------|----------------|
42+
| 14 | SetQuantifier |
43+
| 15 | AlterPublicationAction, PublicationObjSpecType |
44+
| 16 | JsonConstructorType, JsonEncoding, JsonFormatType, JsonValueType, PartitionStrategy |
45+
| 17 | JsonBehaviorType, JsonExprOp, JsonQuotes, JsonTableColumnType, JsonWrapper, MergeMatchKind, TableFuncType |
46+
47+
48+
### Scalar node changes
49+
50+
The basic scalar nodes were refactored in PG15. Prior to that release the `String` and `BitString` nodes carried a generic `str` field, while `Float` relied on `str` as well. From PG15 onward these nodes were split into
51+
52+
- `String` with field `sval`
53+
- `BitString` with field `bsval`
54+
- `Float` with field `fval`
55+
- A new `Boolean` node with field `boolval`
56+
57+
| Version | String field | BitString field | Float field | Boolean field |
58+
|---------|--------------|-----------------|-------------|---------------|
59+
| 13–14 | `str` | `str` | `str` | n/a |
60+
| 15+ | `sval` | `bsval` | `fval` | `boolval` |
61+
62+
These nodes keep the same role but use more explicit property names. Translating from PG13/14 to PG17 therefore requires renaming these fields when constructing the newer AST representation.
63+
64+
These changes indicate incremental evolution in the ASTs, with PG16 introducing the most significant updates.
65+
### Renamed fields
66+
67+
| From | To | Node type | Introduced in |
68+
|------|----|-----------|--------------|
69+
| `relkind` | `objtype` | AlterTableStmt / CreateTableAsStmt | PG14 |
70+
| `tables` | `pubobjects` | CreatePublicationStmt / AlterPublicationStmt | PG15 |
71+
| `tableAction` | `action` | AlterPublicationStmt | PG15 |
72+
| `varnosyn` & `varattnosyn` | `varnullingrels` | Var | PG16 |
73+
| `aggtranstype` | `aggtransno` | Aggref | PG16 |
74+
75+
76+
## Version similarity
77+
78+
Based on diff sizes, PG13 and PG14 are close, as are PG14 and PG15. PG16 introduces major differences, likely due to language features such as the SQL/JSON enhancements. PG17 again adjusts the AST but retains most PG16 structures. Thus PG13–15 form one similar group and PG16–17 another.
79+
80+
## Viability of translation (PG13 β†’ PG17)
81+
82+
Going forward only, translating PG13 ASTs to PG17 is plausible. Many node types remain compatible, and differences are largely additive. A translation layer would need to
83+
84+
1. Map renamed fields (e.g. `relkind` to `objtype`).
85+
2. Populate newly introduced fields with defaults or derived values.
86+
3. Handle removed or deprecated fields when present in PG13.
87+
88+
Because PG16 introduced large changes, direct translation from PG13 to PG17 may require bridging PG16 first. Still, each version’s ASTs are defined in TypeScript, so programmatic transforms are feasible.
89+
### New interface nodes
90+
91+
| Version | Interfaces added |
92+
|---------|-----------------|
93+
| 14 | CTECycleClause, CTESearchClause, PLAssignStmt, ReturnStmt, StatsElem |
94+
| 15 | AlterDatabaseRefreshCollStmt, Boolean, MergeAction, MergeStmt, MergeWhenClause, PublicationObjSpec, PublicationTable |
95+
| 16 | JsonAggConstructor, JsonArrayAgg, JsonArrayConstructor, JsonArrayQueryConstructor, JsonConstructorExpr, JsonFormat, JsonIsPredicate, JsonKeyValue, JsonObjectAgg, JsonObjectConstructor, JsonOutput, JsonReturning, JsonValueExpr, RTEPermissionInfo |
96+
| 17 | JsonArgument, JsonBehavior, JsonExpr, JsonFuncExpr, JsonParseExpr, JsonScalarExpr, JsonSerializeExpr, JsonTable, JsonTableColumn, JsonTablePath, JsonTablePathScan, JsonTablePathSpec, JsonTableSiblingJoin, MergeSupportFunc, SinglePartitionSpec, WindowFuncRunCondition |
97+
98+
99+
## Conclusion
100+
101+
The repository already provides versioned definitions which can be compared programmatically. Diff metrics suggest PG13–15 are most similar, while PG16 marks a major jump and PG17 follows that design. Building an automated translation will require detailed mapping but appears viable, particularly when only upgrading ASTs.

β€ŽAST_TRANSLATION.mdβ€Ž

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# AST Translation Strategies
2+
3+
This document explores approaches for translating PostgreSQL ASTs between the versioned type definitions under `types/`.
4+
5+
## Goals
6+
7+
- Upgrade an AST produced for an older PostgreSQL release so that it conforms to the latest definitions
8+
- Avoid a downgrade path; only translation forward is needed
9+
- Keep the process transparent and manageable as new versions appear
10+
11+
## Design options
12+
13+
### Functional transforms
14+
15+
One model is to create a set of pure functions, each responsible for upgrading a single node type. These functions would:
16+
17+
1. Accept an instance of a node from the older version
18+
2. Produce the equivalent structure in the newer version
19+
3. Rename fields or populate new defaults as required
20+
21+
Benefits:
22+
23+
- Fine grained and testable; each function does one thing well
24+
- Easier to reason about complex nodes such as `String` or `Var`
25+
- Composable: an overall upgrade is just a pipeline of node-level transforms
26+
27+
### Nested deparser / reparser
28+
29+
Another idea is to build a new deparser that understands multiple versions simultaneously. The deparser would parse using the old types and re-emit using the newest ones. This could be structured as a visitor that walks the AST, writing out SQL and immediately reparsing with the updated parser.
30+
31+
Benefits:
32+
33+
- Eliminates manual field mapping by relying on the parser to create valid nodes
34+
- Might handle edge cases automatically where semantics changed
35+
36+
Trade-offs:
37+
38+
- Performance hit due to serializing and parsing again
39+
- Potential loss of fidelity if certain node properties are not round-trippable
40+
41+
## Translation step ordering
42+
43+
### Sequential upgrades
44+
45+
A straightforward approach is to perform sequential upgrades: 13 β†’ 14 β†’ 15 β†’ 16 β†’ 17. Each step focuses on the incremental changes in that release. This keeps functions small and reuses existing transforms when supporting new versions.
46+
47+
### Direct upgrades
48+
49+
Alternatively, we could implement direct translations for each older version to the newest (13 β†’ 17, 14 β†’ 17, 15 β†’ 17). This avoids running multiple steps but requires larger, more complex functions because they must handle every change introduced across several releases at once.
50+
51+
### Which is better?
52+
53+
Sequential upgrades favor simplicity and reuse. The majority of changes between 13 and 15 are minor, while 16 introduces significant restructuring (see `AST_RESEARCH.md`). Incremental steps allow us to focus on these differences in isolation. Direct upgrades may be feasible for the relatively small jumps (15 β†’ 17), but are harder to implement for 13 β†’ 17.
54+
55+
## Recommended plan
56+
57+
1. Implement functional node-level transforms for each release boundary starting with 13 β†’ 14.
58+
2. Compose those functions so that upgrading from any supported version to 17 is just a series of transformations.
59+
3. Provide utilities for renaming fields (e.g. `str` β†’ `sval` in `String` nodes) and filling defaults for new enum values or optional fields.
60+
4. Optionally develop a proof-of-concept reparse approach for comparison, but keep the functional pipeline as the core strategy.
61+
62+
By building translation functions per version we keep the code maintainable and make it easy to add support for future releases.

0 commit comments

Comments
Β (0)