Skip to content

Commit f701556

Browse files
committed
setup
1 parent bf2e281 commit f701556

File tree

21 files changed

+7295
-12
lines changed

21 files changed

+7295
-12
lines changed

packages/transform/AST_RESEARCH.md

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
# PostgreSQL AST Research
2+
3+
This document summarizes differences in the AST type definitions shipped with the `libpg-query-node` project. Each PostgreSQL version directory under `types/` contains TypeScript definitions generated from PostgreSQL's protobuf specification.
4+
5+
## Versioned type files
6+
7+
| Version | Lines in `types.ts` | Lines in `enums.ts` |
8+
|---------|--------------------|---------------------|
9+
| 13 | 2098 | 61 |
10+
| 14 | 2159 | 61 |
11+
| 15 | 2219 | 63 |
12+
| 16 | 2325 | 68 |
13+
| 17 | 2484 | 75 |
14+
15+
Line counts increase steadily and provide a proxy for growing AST complexity.
16+
17+
## Diff sizes between versions
18+
19+
Measured using `diff -u` on the generated `types.ts` and `enums.ts` files.
20+
21+
| Versions compared | Diff lines `types.ts` | Diff lines `enums.ts` |
22+
|-------------------|----------------------|----------------------|
23+
| 13 → 14 | 287 | 55 |
24+
| 14 → 15 | 304 | 47 |
25+
| 15 → 16 | 2634 | 43 |
26+
| 16 → 17 | 401 | 58 |
27+
| 13 → 17 | 2911 | 91 |
28+
29+
Versions 13–15 show relatively small changes. A dramatic increase occurs between PG15 and PG16 with over 2600 changed lines, reflecting large parser changes. PG17 differs from PG16 by about 400 lines.
30+
31+
## Observed differences
32+
33+
A brief inspection of the diffs highlights:
34+
35+
- New enum values across versions (e.g. additional `WCOKind` options in 15, `JsonEncoding` variants in 16).
36+
- New node interfaces such as `ReturnStmt`, `PLAssignStmt`, and JSON-related constructs appearing in later versions.
37+
- Some existing structures rename or move fields (for example `relkind``objtype` in `AlterTableStmt`).
38+
### Enum differences by version
39+
40+
| Version | New enum types |
41+
|---------|----------------|
42+
| 14 | SetQuantifier |
43+
| 15 | AlterPublicationAction, PublicationObjSpecType |
44+
| 16 | JsonConstructorType, JsonEncoding, JsonFormatType, JsonValueType, PartitionStrategy |
45+
| 17 | JsonBehaviorType, JsonExprOp, JsonQuotes, JsonTableColumnType, JsonWrapper, MergeMatchKind, TableFuncType |
46+
47+
### Enum value shifts
48+
49+
The numeric assignments within several enums changed between releases. The table
50+
below lists notable examples.
51+
52+
| Enum | Changed in | Notes |
53+
|------|------------|-------|
54+
| `A_Expr_Kind` | PG14 | Removed `AEXPR_OF` and `AEXPR_PAREN`, causing indices to shift |
55+
| `RoleSpecType` | PG14 | Added `ROLESPEC_CURRENT_ROLE` at position 1 |
56+
| `TableLikeOption` | PG14 | Added `CREATE_TABLE_LIKE_COMPRESSION` at position 1 |
57+
| `WCOKind` | PG15 | Added `WCO_RLS_MERGE_UPDATE_CHECK` and `WCO_RLS_MERGE_DELETE_CHECK` |
58+
| `ObjectType` | PG15 | Inserted `OBJECT_PUBLICATION_NAMESPACE` and `OBJECT_PUBLICATION_REL` before existing entries |
59+
| `JoinType` | PG16 | Added `JOIN_RIGHT_ANTI`, shifting subsequent values |
60+
| `AlterTableType` | PG16–17 | Many values renumbered; PG17 introduces `AT_SetExpression` |
61+
| `Token` | multiple | Token list grows each release, with new codes inserted |
62+
63+
Counting all enums, roughly **11** changed between PG13 and PG14, **8** changed from PG14 to PG15, **8** changed from PG15 to PG16, and **10** changed from PG16 to PG17.
64+
65+
66+
### Scalar node changes
67+
68+
The basic scalar nodes were refactored in PG15. Prior to that release the `String` and `BitString` nodes carried a generic `str` field, while `Float` relied on `str` as well. From PG15 onward these nodes were split into
69+
70+
- `String` with field `sval`
71+
- `BitString` with field `bsval`
72+
- `Float` with field `fval`
73+
- A new `Boolean` node with field `boolval`
74+
75+
| Version | String field | BitString field | Float field | Boolean field |
76+
|---------|--------------|-----------------|-------------|---------------|
77+
| 13–14 | `str` | `str` | `str` | n/a |
78+
| 15+ | `sval` | `bsval` | `fval` | `boolval` |
79+
80+
These nodes keep the same role but use more explicit property names. Translating from PG13/14 to PG17 therefore requires renaming these fields when constructing the newer AST representation.
81+
82+
These changes indicate incremental evolution in the ASTs, with PG16 introducing the most significant updates.
83+
### Renamed fields
84+
85+
| From | To | Node type | Introduced in |
86+
|------|----|-----------|--------------|
87+
| `relkind` | `objtype` | AlterTableStmt / CreateTableAsStmt | PG14 |
88+
| `tables` | `pubobjects` | CreatePublicationStmt / AlterPublicationStmt | PG15 |
89+
| `tableAction` | `action` | AlterPublicationStmt | PG15 |
90+
| `varnosyn` & `varattnosyn` | `varnullingrels` | Var | PG16 |
91+
| `aggtranstype` | `aggtransno` | Aggref | PG16 |
92+
93+
### Enum representation changes
94+
95+
Historically libpg_query exposed enum fields in the JSON output as **numeric**
96+
codes. Starting with the PG15 bindings this switched to returning the **string**
97+
name of each enum value. The TypeScript type definitions reflect string literal
98+
unions across all versions, but the underlying JSON changed in PG15.
99+
100+
| Version | Enum format |
101+
|---------|-------------|
102+
| 13–14 | integers |
103+
| 15–17 | strings |
104+
105+
106+
## Version similarity
107+
108+
Based on diff sizes, PG13 and PG14 are close, as are PG14 and PG15. PG16 introduces major differences, likely due to language features such as the SQL/JSON enhancements. PG17 again adjusts the AST but retains most PG16 structures. Thus PG13–15 form one similar group and PG16–17 another.
109+
110+
## Viability of translation (PG13 → PG17)
111+
112+
Going forward only, translating PG13 ASTs to PG17 is plausible. Many node types remain compatible, and differences are largely additive. A translation layer would need to
113+
114+
1. Map renamed fields (e.g. `relkind` to `objtype`).
115+
2. Populate newly introduced fields with defaults or derived values.
116+
3. Handle removed or deprecated fields when present in PG13.
117+
118+
Because PG16 introduced large changes, direct translation from PG13 to PG17 may require bridging PG16 first. Still, each version’s ASTs are defined in TypeScript, so programmatic transforms are feasible.
119+
### New interface nodes
120+
121+
| Version | Interfaces added |
122+
|---------|-----------------|
123+
| 14 | CTECycleClause, CTESearchClause, PLAssignStmt, ReturnStmt, StatsElem |
124+
| 15 | AlterDatabaseRefreshCollStmt, Boolean, MergeAction, MergeStmt, MergeWhenClause, PublicationObjSpec, PublicationTable |
125+
| 16 | JsonAggConstructor, JsonArrayAgg, JsonArrayConstructor, JsonArrayQueryConstructor, JsonConstructorExpr, JsonFormat, JsonIsPredicate, JsonKeyValue, JsonObjectAgg, JsonObjectConstructor, JsonOutput, JsonReturning, JsonValueExpr, RTEPermissionInfo |
126+
| 17 | JsonArgument, JsonBehavior, JsonExpr, JsonFuncExpr, JsonParseExpr, JsonScalarExpr, JsonSerializeExpr, JsonTable, JsonTableColumn, JsonTablePath, JsonTablePathScan, JsonTablePathSpec, JsonTableSiblingJoin, MergeSupportFunc, SinglePartitionSpec, WindowFuncRunCondition |
127+
128+
## Generating AST Samples
129+
130+
To fully understand structural differences we will compile **libpg-query** for
131+
each supported PostgreSQL version and capture JSON output for a library of
132+
representative queries. This multi-runtime parser setup lets us record actual
133+
ASTs from PG13 through PG17. These samples are essential for training upgrade
134+
logic and verifying enum representations:
135+
136+
- PG13 and PG14 output enum values as integers
137+
- PG15+ output enums as their string names
138+
139+
The generated samples will live under a dedicated directory and can be compared
140+
programmatically to spot changes beyond what the protobuf types reveal.
141+
142+
143+
## Conclusion
144+
145+
The repository already provides versioned definitions which can be compared programmatically. Diff metrics suggest PG13–15 are most similar, while PG16 marks a major jump and PG17 follows that design. Building an automated translation will require detailed mapping but appears viable, particularly when only upgrading ASTs.

packages/transform/AST_TESTS.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# AST Translation Test Plan
2+
3+
This document outlines a set of incremental tests for building and validating a PostgreSQL AST translation layer. The goal is to ensure ASTs parsed from older versions can be upgraded to the latest version without changing semantic meaning.
4+
5+
## 0. Generate Example ASTs from simple queries
6+
7+
make a new directory called transform/ inside of the root __fixtures__ dir.
8+
9+
in here, we should have a folder for each version 13/,14/,15/,16/,17/
10+
11+
then, create a series of example .json files that are the resulting output of a handful of sql queries.
12+
13+
This will be the basis for how we can do our testing, and understand, truly understand the differences in the ASTs.
14+
15+
16+
## 1. Baseline Parsing
17+
18+
1. **Parse basic queries for each version**
19+
- Verify `parseSync` and `parse` return a single statement for simple queries (`SELECT 1`, `SELECT NULL`).
20+
2. **Round-trip parsing**
21+
- Parse a query, deparse it back to SQL, and parse again. The ASTs should match after removing location data.
22+
23+
## 2. Enum Handling
24+
25+
1. **Integer to string conversion (PG13/14)**
26+
- Feed known enum codes to the translation layer and assert the upgraded AST uses the correct enum names.
27+
2. **Preserve string enums (PG15+)**
28+
- Ensure enums already represented as strings remain unchanged after translation.
29+
30+
## 3. Scalar Node Changes
31+
32+
1. **Field rename checks**
33+
- Confirm that `String.str` becomes `String.sval`, `BitString.str` becomes `BitString.bsval`, and `Float.str` becomes `Float.fval` when translating from PG13/14.
34+
2. **Boolean node introduction**
35+
- Translating `A_Const` nodes containing boolean values should yield the new `Boolean` node starting in PG15.
36+
37+
## 4. Renamed Fields
38+
39+
Create fixtures demonstrating renamed fields such as `relkind``objtype` and `tables``pubobjects`. Tests should confirm the new field names and that values are correctly copied.
40+
41+
## 5. Sequential Upgrade Steps
42+
43+
For each release boundary (13→14, 14→15, 15→16, 16→17):
44+
1. Apply the specific upgrade function to a representative AST.
45+
2. Validate that required fields are present and obsolete ones removed.
46+
3. Verify that running all steps in sequence produces the same result as any direct upgrade path (once implemented).
47+
48+
## 6. Full Query Upgrade
49+
50+
1. Parse a library of real-world queries using the oldest supported version.
51+
2. Upgrade the resulting ASTs to the latest version.
52+
3. Deparse the upgraded ASTs and execute them against a running PostgreSQL instance to ensure semantics are preserved.
53+
54+
## 7. Future Regression Tests
55+
56+
As translation functions evolve, capture edge cases that previously failed and assert they remain fixed.
57+
58+
---
59+
60+
These tests build confidence incrementally: start with simple node transformations, then cover whole query upgrades. The plan emphasizes functional, deterministic checks that can run in CI.
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# AST Translation Strategies
2+
3+
This document explores approaches for translating PostgreSQL ASTs between the versioned type definitions under `types/`.
4+
5+
## Goals
6+
7+
- Upgrade an AST produced for an older PostgreSQL release so that it conforms to the latest definitions
8+
- Avoid a downgrade path; only translation forward is needed
9+
- Keep the process transparent and manageable as new versions appear
10+
11+
## Design options
12+
13+
### Functional transforms
14+
15+
One model is to create a set of pure functions, each responsible for upgrading a single node type. These functions would:
16+
17+
1. Accept an instance of a node from the older version
18+
2. Produce the equivalent structure in the newer version
19+
3. Rename fields or populate new defaults as required
20+
21+
Benefits:
22+
23+
- Fine grained and testable; each function does one thing well
24+
- Easier to reason about complex nodes such as `String` or `Var`
25+
- Composable: an overall upgrade is just a pipeline of node-level transforms
26+
27+
### Nested deparser / reparser
28+
29+
Another idea is to build a new deparser that understands multiple versions simultaneously. The deparser would parse using the old types and re-emit using the newest ones. This could be structured as a visitor that walks the AST, writing out SQL and immediately reparsing with the updated parser.
30+
31+
Benefits:
32+
33+
- Eliminates manual field mapping by relying on the parser to create valid nodes
34+
- Might handle edge cases automatically where semantics changed
35+
36+
Trade-offs:
37+
38+
- Performance hit due to serializing and parsing again
39+
- Potential loss of fidelity if certain node properties are not round-trippable
40+
41+
### Handling enums
42+
43+
Older versions of `libpg_query` (PG13 and PG14) emitted numeric codes for enum
44+
fields. From PG15 onward the JSON output uses the enum **name** as a string.
45+
Translation code must therefore convert numeric enums to their string
46+
equivalents when upgrading from PG13/14. When moving between PG15–17 the
47+
representations already match.
48+
49+
## Translation step ordering
50+
51+
### Sequential upgrades
52+
53+
A straightforward approach is to perform sequential upgrades: 13 → 14 → 15 → 16 → 17. Each step focuses on the incremental changes in that release. This keeps functions small and reuses existing transforms when supporting new versions.
54+
55+
### Direct upgrades
56+
57+
Alternatively, we could implement direct translations for each older version to the newest (13 → 17, 14 → 17, 15 → 17). This avoids running multiple steps but requires larger, more complex functions because they must handle every change introduced across several releases at once.
58+
59+
### Which is better?
60+
61+
Sequential upgrades favor simplicity and reuse. The majority of changes between 13 and 15 are minor, while 16 introduces significant restructuring (see `AST_RESEARCH.md`). Incremental steps allow us to focus on these differences in isolation. Direct upgrades may be feasible for the relatively small jumps (15 → 17), but are harder to implement for 13 → 17.
62+
63+
## Recommended plan
64+
65+
1. Implement functional node-level transforms for each release boundary starting with 13 → 14.
66+
2. Compose those functions so that upgrading from any supported version to 17 is just a series of transformations.
67+
3. Provide utilities for renaming fields (e.g. `str``sval` in `String` nodes) and filling defaults for new enum values or optional fields.
68+
4. Optionally develop a proof-of-concept reparse approach for comparison, but keep the functional pipeline as the core strategy.
69+
70+
By building translation functions per version we keep the code maintainable and make it easy to add support for future releases.
71+
72+
## Should we implement a deparser?
73+
74+
The nested deparser approach would walk the old AST, generate SQL, and parse it with the newer parser. This mirrors the visitor pattern and can adapt automatically to certain changes, but it adds overhead and may drop information that doesn't round-trip cleanly. Maintaining explicit transform functions keeps upgrades deterministic and easy to test. A deparser prototype might help with tricky cases, yet the primary strategy should be these per-version transform functions.

packages/transform/PQSQL_PARSER.md

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
this is from the source of the `@pgsql/parser` repo so you can see the API.
2+
3+
4+
```ts
5+
const { describe, it, before } = require('node:test');
6+
const assert = require('node:assert/strict');
7+
const { Parser } = require('../wasm/index.cjs');
8+
9+
describe('Parser', () => {
10+
describe('Dynamic API', () => {
11+
it('should parse SQL with default version', async () => {
12+
const parser = new Parser();
13+
const result = await parser.parse('SELECT 1+1 as sum');
14+
assert.ok(result);
15+
assert.ok(result.stmts);
16+
assert.equal(result.stmts.length, 1);
17+
});
18+
19+
it('should parse SQL with specific version', async () => {
20+
// Get available versions from the Parser class
21+
const parser = new Parser();
22+
const defaultVersion = parser.version;
23+
24+
// Test with a different version if available
25+
const testVersion = defaultVersion === 17 ? 16 : 15;
26+
try {
27+
const versionParser = new Parser(testVersion);
28+
const result = await versionParser.parse('SELECT 1+1 as sum');
29+
assert.equal(versionParser.version, testVersion);
30+
assert.ok(result);
31+
} catch (e) {
32+
// Version might not be available in this build
33+
console.log(`Version ${testVersion} not available in this build`);
34+
}
35+
});
36+
37+
it('should handle parse errors', async () => {
38+
const parser = new Parser();
39+
try {
40+
await parser.parse('INVALID SQL');
41+
assert.fail('Should have thrown an error');
42+
} catch (error) {
43+
assert.ok(error);
44+
assert.ok(error.message.includes('syntax error'));
45+
}
46+
});
47+
48+
it('should work with Parser class', async () => {
49+
const parser = new Parser();
50+
const result = await parser.parse('SELECT * FROM users');
51+
assert.ok(result);
52+
assert.ok(result.stmts);
53+
});
54+
55+
it('should validate version in constructor', () => {
56+
// Test invalid version
57+
assert.throws(() => {
58+
new Parser(99);
59+
}, /Unsupported PostgreSQL version/);
60+
});
61+
62+
it('should support parseSync after initial parse', async () => {
63+
const parser = new Parser();
64+
65+
// First parse to initialize
66+
await parser.parse('SELECT 1');
67+
68+
// Now parseSync should work
69+
const result = parser.parseSync('SELECT 2+2 as sum');
70+
assert.ok(result);
71+
assert.ok(result.stmts);
72+
assert.equal(result.stmts.length, 1);
73+
});
74+
});
75+
76+
describe('Version-specific imports', () => {
77+
// Dynamically test available version imports
78+
const versions = [13, 14, 15, 16, 17];
79+
80+
for (const version of versions) {
81+
it(`should parse with v${version} if available`, async () => {
82+
try {
83+
const versionModule = require(`../wasm/v${version}.cjs`);
84+
await versionModule.loadModule();
85+
const result = await versionModule.parse('SELECT 1');
86+
assert.ok(result);
87+
assert.equal(result.stmts.length, 1);
88+
} catch (e) {
89+
// Version not available in this build
90+
console.log(`Version ${version} not available in this build`);
91+
}
92+
});
93+
}
94+
});
95+
});
96+
```

packages/transform/package.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@
3131
"test:watch": "jest --watch"
3232
},
3333
"devDependencies": {
34+
"@pgsql/parser": "1.0.2",
3435
"pg-proto-parser": "^1.29.1"
3536
},
3637
"keywords": []

packages/transform/scripts/pg-proto-parser.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import { PgProtoParser, PgProtoParserOptions } from 'pg-proto-parser';
22
import { resolve, join } from 'path';
33

4-
const versions = ['13', '17'];
4+
const versions = ['13', '14', '15', '16', '17'];
55
const baseDir = resolve(join(__dirname, '../../../__fixtures__/proto'));
66

77
for (const version of versions) {

0 commit comments

Comments
 (0)