|
| 1 | +# PostgreSQL AST Research |
| 2 | + |
| 3 | +This document summarizes differences in the AST type definitions shipped with the `libpg-query-node` project. Each PostgreSQL version directory under `types/` contains TypeScript definitions generated from PostgreSQL's protobuf specification. |
| 4 | + |
| 5 | +## Versioned type files |
| 6 | + |
| 7 | +| Version | Lines in `types.ts` | Lines in `enums.ts` | |
| 8 | +|---------|--------------------|---------------------| |
| 9 | +| 13 | 2098 | 61 | |
| 10 | +| 14 | 2159 | 61 | |
| 11 | +| 15 | 2219 | 63 | |
| 12 | +| 16 | 2325 | 68 | |
| 13 | +| 17 | 2484 | 75 | |
| 14 | + |
| 15 | +Line counts increase steadily and provide a proxy for growing AST complexity. |
| 16 | + |
| 17 | +## Diff sizes between versions |
| 18 | + |
| 19 | +Measured using `diff -u` on the generated `types.ts` and `enums.ts` files. |
| 20 | + |
| 21 | +| Versions compared | Diff lines `types.ts` | Diff lines `enums.ts` | |
| 22 | +|-------------------|----------------------|----------------------| |
| 23 | +| 13 → 14 | 287 | 55 | |
| 24 | +| 14 → 15 | 304 | 47 | |
| 25 | +| 15 → 16 | 2634 | 43 | |
| 26 | +| 16 → 17 | 401 | 58 | |
| 27 | +| 13 → 17 | 2911 | 91 | |
| 28 | + |
| 29 | +Versions 13–15 show relatively small changes. A dramatic increase occurs between PG15 and PG16 with over 2600 changed lines, reflecting large parser changes. PG17 differs from PG16 by about 400 lines. |
| 30 | + |
| 31 | +## Observed differences |
| 32 | + |
| 33 | +A brief inspection of the diffs highlights: |
| 34 | + |
| 35 | +- New enum values across versions (e.g. additional `WCOKind` options in 15, `JsonEncoding` variants in 16). |
| 36 | +- New node interfaces such as `ReturnStmt`, `PLAssignStmt`, and JSON-related constructs appearing in later versions. |
| 37 | +- Some existing structures rename or move fields (for example `relkind` → `objtype` in `AlterTableStmt`). |
| 38 | +### Enum differences by version |
| 39 | + |
| 40 | +| Version | New enum types | |
| 41 | +|---------|----------------| |
| 42 | +| 14 | SetQuantifier | |
| 43 | +| 15 | AlterPublicationAction, PublicationObjSpecType | |
| 44 | +| 16 | JsonConstructorType, JsonEncoding, JsonFormatType, JsonValueType, PartitionStrategy | |
| 45 | +| 17 | JsonBehaviorType, JsonExprOp, JsonQuotes, JsonTableColumnType, JsonWrapper, MergeMatchKind, TableFuncType | |
| 46 | + |
| 47 | +### Enum value shifts |
| 48 | + |
| 49 | +The numeric assignments within several enums changed between releases. The table |
| 50 | +below lists notable examples. |
| 51 | + |
| 52 | +| Enum | Changed in | Notes | |
| 53 | +|------|------------|-------| |
| 54 | +| `A_Expr_Kind` | PG14 | Removed `AEXPR_OF` and `AEXPR_PAREN`, causing indices to shift | |
| 55 | +| `RoleSpecType` | PG14 | Added `ROLESPEC_CURRENT_ROLE` at position 1 | |
| 56 | +| `TableLikeOption` | PG14 | Added `CREATE_TABLE_LIKE_COMPRESSION` at position 1 | |
| 57 | +| `WCOKind` | PG15 | Added `WCO_RLS_MERGE_UPDATE_CHECK` and `WCO_RLS_MERGE_DELETE_CHECK` | |
| 58 | +| `ObjectType` | PG15 | Inserted `OBJECT_PUBLICATION_NAMESPACE` and `OBJECT_PUBLICATION_REL` before existing entries | |
| 59 | +| `JoinType` | PG16 | Added `JOIN_RIGHT_ANTI`, shifting subsequent values | |
| 60 | +| `AlterTableType` | PG16–17 | Many values renumbered; PG17 introduces `AT_SetExpression` | |
| 61 | +| `Token` | multiple | Token list grows each release, with new codes inserted | |
| 62 | + |
| 63 | +Counting all enums, roughly **11** changed between PG13 and PG14, **8** changed from PG14 to PG15, **8** changed from PG15 to PG16, and **10** changed from PG16 to PG17. |
| 64 | + |
| 65 | + |
| 66 | +### Scalar node changes |
| 67 | + |
| 68 | +The basic scalar nodes were refactored in PG15. Prior to that release the `String` and `BitString` nodes carried a generic `str` field, while `Float` relied on `str` as well. From PG15 onward these nodes were split into |
| 69 | + |
| 70 | +- `String` with field `sval` |
| 71 | +- `BitString` with field `bsval` |
| 72 | +- `Float` with field `fval` |
| 73 | +- A new `Boolean` node with field `boolval` |
| 74 | + |
| 75 | +| Version | String field | BitString field | Float field | Boolean field | |
| 76 | +|---------|--------------|-----------------|-------------|---------------| |
| 77 | +| 13–14 | `str` | `str` | `str` | n/a | |
| 78 | +| 15+ | `sval` | `bsval` | `fval` | `boolval` | |
| 79 | + |
| 80 | +These nodes keep the same role but use more explicit property names. Translating from PG13/14 to PG17 therefore requires renaming these fields when constructing the newer AST representation. |
| 81 | + |
| 82 | +These changes indicate incremental evolution in the ASTs, with PG16 introducing the most significant updates. |
| 83 | +### Renamed fields |
| 84 | + |
| 85 | +| From | To | Node type | Introduced in | |
| 86 | +|------|----|-----------|--------------| |
| 87 | +| `relkind` | `objtype` | AlterTableStmt / CreateTableAsStmt | PG14 | |
| 88 | +| `tables` | `pubobjects` | CreatePublicationStmt / AlterPublicationStmt | PG15 | |
| 89 | +| `tableAction` | `action` | AlterPublicationStmt | PG15 | |
| 90 | +| `varnosyn` & `varattnosyn` | `varnullingrels` | Var | PG16 | |
| 91 | +| `aggtranstype` | `aggtransno` | Aggref | PG16 | |
| 92 | + |
| 93 | +### Enum representation changes |
| 94 | + |
| 95 | +Historically libpg_query exposed enum fields in the JSON output as **numeric** |
| 96 | +codes. Starting with the PG15 bindings this switched to returning the **string** |
| 97 | +name of each enum value. The TypeScript type definitions reflect string literal |
| 98 | +unions across all versions, but the underlying JSON changed in PG15. |
| 99 | + |
| 100 | +| Version | Enum format | |
| 101 | +|---------|-------------| |
| 102 | +| 13–14 | integers | |
| 103 | +| 15–17 | strings | |
| 104 | + |
| 105 | + |
| 106 | +## Version similarity |
| 107 | + |
| 108 | +Based on diff sizes, PG13 and PG14 are close, as are PG14 and PG15. PG16 introduces major differences, likely due to language features such as the SQL/JSON enhancements. PG17 again adjusts the AST but retains most PG16 structures. Thus PG13–15 form one similar group and PG16–17 another. |
| 109 | + |
| 110 | +## Viability of translation (PG13 → PG17) |
| 111 | + |
| 112 | +Going forward only, translating PG13 ASTs to PG17 is plausible. Many node types remain compatible, and differences are largely additive. A translation layer would need to |
| 113 | + |
| 114 | +1. Map renamed fields (e.g. `relkind` to `objtype`). |
| 115 | +2. Populate newly introduced fields with defaults or derived values. |
| 116 | +3. Handle removed or deprecated fields when present in PG13. |
| 117 | + |
| 118 | +Because PG16 introduced large changes, direct translation from PG13 to PG17 may require bridging PG16 first. Still, each version’s ASTs are defined in TypeScript, so programmatic transforms are feasible. |
| 119 | +### New interface nodes |
| 120 | + |
| 121 | +| Version | Interfaces added | |
| 122 | +|---------|-----------------| |
| 123 | +| 14 | CTECycleClause, CTESearchClause, PLAssignStmt, ReturnStmt, StatsElem | |
| 124 | +| 15 | AlterDatabaseRefreshCollStmt, Boolean, MergeAction, MergeStmt, MergeWhenClause, PublicationObjSpec, PublicationTable | |
| 125 | +| 16 | JsonAggConstructor, JsonArrayAgg, JsonArrayConstructor, JsonArrayQueryConstructor, JsonConstructorExpr, JsonFormat, JsonIsPredicate, JsonKeyValue, JsonObjectAgg, JsonObjectConstructor, JsonOutput, JsonReturning, JsonValueExpr, RTEPermissionInfo | |
| 126 | +| 17 | JsonArgument, JsonBehavior, JsonExpr, JsonFuncExpr, JsonParseExpr, JsonScalarExpr, JsonSerializeExpr, JsonTable, JsonTableColumn, JsonTablePath, JsonTablePathScan, JsonTablePathSpec, JsonTableSiblingJoin, MergeSupportFunc, SinglePartitionSpec, WindowFuncRunCondition | |
| 127 | + |
| 128 | +## Generating AST Samples |
| 129 | + |
| 130 | +To fully understand structural differences we will compile **libpg-query** for |
| 131 | +each supported PostgreSQL version and capture JSON output for a library of |
| 132 | +representative queries. This multi-runtime parser setup lets us record actual |
| 133 | +ASTs from PG13 through PG17. These samples are essential for training upgrade |
| 134 | +logic and verifying enum representations: |
| 135 | + |
| 136 | +- PG13 and PG14 output enum values as integers |
| 137 | +- PG15+ output enums as their string names |
| 138 | + |
| 139 | +The generated samples will live under a dedicated directory and can be compared |
| 140 | +programmatically to spot changes beyond what the protobuf types reveal. |
| 141 | + |
| 142 | + |
| 143 | +## Conclusion |
| 144 | + |
| 145 | +The repository already provides versioned definitions which can be compared programmatically. Diff metrics suggest PG13–15 are most similar, while PG16 marks a major jump and PG17 follows that design. Building an automated translation will require detailed mapping but appears viable, particularly when only upgrading ASTs. |
0 commit comments