|
| 1 | +# Deparser Entry Point Analysis |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +This document analyzes the entry points of the deparser system, focusing on how users interact with the top-level API and how we can refactor to make the system more consistent and intuitive. |
| 6 | + |
| 7 | +## Current Entry Points |
| 8 | + |
| 9 | +### 1. Constructor Entry Point |
| 10 | +```typescript |
| 11 | +constructor(tree: Node | Node[] | t.ParseResult, opts: DeparserOptions = {}) |
| 12 | +``` |
| 13 | + |
| 14 | +The constructor currently accepts three types of input: |
| 15 | +- Single `Node` object |
| 16 | +- Array of `Node` objects |
| 17 | +- `ParseResult` object |
| 18 | + |
| 19 | +### 2. Static Deparse Method |
| 20 | +```typescript |
| 21 | +static deparse(query: Node | Node[] | t.ParseResult, opts: DeparserOptions = {}): string |
| 22 | +``` |
| 23 | + |
| 24 | +This is the primary public API that users call. |
| 25 | + |
| 26 | +### 3. Instance Method |
| 27 | +```typescript |
| 28 | +deparseQuery(): string |
| 29 | +``` |
| 30 | + |
| 31 | +This method processes the internal tree array. |
| 32 | + |
| 33 | +## Type Analysis |
| 34 | + |
| 35 | +### ParseResult Type |
| 36 | +```typescript |
| 37 | +export interface ParseResult { |
| 38 | + version?: number; |
| 39 | + stmts?: RawStmt[]; |
| 40 | +} |
| 41 | +``` |
| 42 | + |
| 43 | +`ParseResult` is the top-level structure returned by the PostgreSQL parser. It contains: |
| 44 | +- `version`: The parser version |
| 45 | +- `stmts`: An array of `RawStmt` objects |
| 46 | + |
| 47 | +### RawStmt Type |
| 48 | +```typescript |
| 49 | +export interface RawStmt { |
| 50 | + stmt?: Node; |
| 51 | + stmt_location?: number; |
| 52 | + stmt_len?: number; |
| 53 | +} |
| 54 | +``` |
| 55 | + |
| 56 | +`RawStmt` wraps the actual statement node with metadata: |
| 57 | +- `stmt`: The actual statement node (e.g., `SelectStmt`, `InsertStmt`, etc.) |
| 58 | +- `stmt_location`: Character position in the original query |
| 59 | +- `stmt_len`: Length of the statement in the original query |
| 60 | + |
| 61 | +### Node Type |
| 62 | +The `Node` type is a discriminated union of all possible AST node types, including `ParseResult` itself: |
| 63 | +```typescript |
| 64 | +export type Node = { |
| 65 | + ParseResult: ParseResult; |
| 66 | +} | { |
| 67 | + RawStmt: RawStmt; |
| 68 | +} | { |
| 69 | + SelectStmt: SelectStmt; |
| 70 | +} | ... // many more node types |
| 71 | +``` |
| 72 | +
|
| 73 | +## Current Implementation Issues |
| 74 | +
|
| 75 | +### 1. Inconsistent Handling of ParseResult |
| 76 | +
|
| 77 | +In the constructor, `ParseResult` is handled specially: |
| 78 | +```typescript |
| 79 | +if (tree && typeof tree === 'object' && !Array.isArray(tree) && 'stmts' in tree) { |
| 80 | + // This is a ParseResult |
| 81 | + const parseResult = tree as t.ParseResult; |
| 82 | + // Extract the actual Node from each RawStmt |
| 83 | + this.tree = (parseResult.stmts || []).map(rawStmt => rawStmt.stmt).filter(stmt => stmt !== undefined) as Node[]; |
| 84 | +} |
| 85 | +``` |
| 86 | + |
| 87 | +This duck-typing approach checks for the presence of `stmts` property rather than properly checking the node type. |
| 88 | + |
| 89 | +### 2. Missing ParseResult Method |
| 90 | + |
| 91 | +There's no `ParseResult` method in the deparser, even though `ParseResult` is a valid `Node` type. This means if someone passes `{ ParseResult: parseResult }` (the wrapped form), it will fail. |
| 92 | + |
| 93 | +### 3. Inconsistent stmt Method |
| 94 | + |
| 95 | +The `stmt` method appears to handle wrapped statement nodes: |
| 96 | +```typescript |
| 97 | +stmt(node: any, context: DeparserContext = { parentNodeTypes: [] }): string { |
| 98 | + // Handle stmt wrapper nodes that contain the actual statement |
| 99 | + const keys = Object.keys(node); |
| 100 | + if (keys.length === 1) { |
| 101 | + const statementType = keys[0]; |
| 102 | + const methodName = statementType as keyof this; |
| 103 | + if (typeof this[methodName] === 'function') { |
| 104 | + return (this[methodName] as any)(node[statementType], context); |
| 105 | + } |
| 106 | + } |
| 107 | +} |
| 108 | +``` |
| 109 | + |
| 110 | +But this method is never called in the normal flow - statements are handled directly. |
| 111 | + |
| 112 | +### 4. Unused version Method |
| 113 | + |
| 114 | +The `version` method exists but is never called because version information is stripped during ParseResult processing. |
| 115 | + |
| 116 | +## Proposed Refactoring |
| 117 | + |
| 118 | +### 1. Add Proper ParseResult Method |
| 119 | + |
| 120 | +```typescript |
| 121 | +ParseResult(node: t.ParseResult, context: DeparserContext): string { |
| 122 | + if (!node.stmts || node.stmts.length === 0) { |
| 123 | + return ''; |
| 124 | + } |
| 125 | + |
| 126 | + return node.stmts |
| 127 | + .map(rawStmt => this.RawStmt(rawStmt, context)) |
| 128 | + .join(this.formatter.newline() + this.formatter.newline()); |
| 129 | +} |
| 130 | +``` |
| 131 | + |
| 132 | +### 2. Simplify Constructor |
| 133 | + |
| 134 | +The constructor should treat all inputs uniformly: |
| 135 | + |
| 136 | +```typescript |
| 137 | +constructor(tree: Node | Node[] | t.ParseResult, opts: DeparserOptions = {}) { |
| 138 | + this.formatter = new SqlFormatter(opts.newline, opts.tab); |
| 139 | + this.options = { |
| 140 | + functionDelimiter: '$$', |
| 141 | + functionDelimiterFallback: '$EOFCODE$', |
| 142 | + ...opts |
| 143 | + }; |
| 144 | + |
| 145 | + // Only allow duck-typing for ParseResult (not wrapped) |
| 146 | + if (tree && typeof tree === 'object' && !Array.isArray(tree) && 'stmts' in tree && !('ParseResult' in tree)) { |
| 147 | + // This is a bare ParseResult object |
| 148 | + this.tree = [{ ParseResult: tree as t.ParseResult }]; |
| 149 | + } else if (Array.isArray(tree)) { |
| 150 | + this.tree = tree; |
| 151 | + } else { |
| 152 | + this.tree = [tree as Node]; |
| 153 | + } |
| 154 | +} |
| 155 | +``` |
| 156 | + |
| 157 | +### 3. Update deparseQuery Method |
| 158 | + |
| 159 | +```typescript |
| 160 | +deparseQuery(): string { |
| 161 | + return this.tree |
| 162 | + .map(node => { |
| 163 | + // For top-level nodes, we want to handle them directly |
| 164 | + const nodeType = this.getNodeType(node); |
| 165 | + |
| 166 | + // Special handling for ParseResult at the top level |
| 167 | + if (nodeType === 'ParseResult') { |
| 168 | + return this.ParseResult((node as any).ParseResult, { parentNodeTypes: [] }); |
| 169 | + } |
| 170 | + |
| 171 | + return this.deparse(node); |
| 172 | + }) |
| 173 | + .filter(result => result !== null && result !== '') |
| 174 | + .join(this.formatter.newline() + this.formatter.newline()); |
| 175 | +} |
| 176 | +``` |
| 177 | + |
| 178 | +### 4. Remove Redundant Methods |
| 179 | + |
| 180 | +- The `stmt` method appears to be unused and can be removed |
| 181 | +- The `version` method is also unused in the current flow |
| 182 | + |
| 183 | +### 5. Improve Type Safety |
| 184 | + |
| 185 | +Instead of using `any` types and duck-typing, we should use proper type guards: |
| 186 | + |
| 187 | +```typescript |
| 188 | +function isParseResult(obj: any): obj is t.ParseResult { |
| 189 | + return obj && typeof obj === 'object' && 'stmts' in obj; |
| 190 | +} |
| 191 | + |
| 192 | +function isWrappedNode(obj: any): obj is Node { |
| 193 | + if (!obj || typeof obj !== 'object' || Array.isArray(obj)) return false; |
| 194 | + const keys = Object.keys(obj); |
| 195 | + return keys.length === 1 && typeof obj[keys[0]] === 'object'; |
| 196 | +} |
| 197 | +``` |
| 198 | + |
| 199 | +## Benefits of Refactoring |
| 200 | + |
| 201 | +1. **Consistency**: All node types, including `ParseResult`, would be handled uniformly through their respective methods |
| 202 | +2. **Type Safety**: Proper type guards instead of duck-typing |
| 203 | +3. **Clarity**: Clear separation between wrapped nodes (`{ ParseResult: ... }`) and bare objects |
| 204 | +4. **Extensibility**: Easy to add new top-level node types in the future |
| 205 | +5. **Debugging**: Each node type has its own method, making it easier to trace execution |
| 206 | + |
| 207 | +## Current Usage Patterns |
| 208 | + |
| 209 | +Based on the test suite analysis, the current usage pattern is: |
| 210 | +```typescript |
| 211 | +// Tests parse SQL and extract individual statements |
| 212 | +const tree = await parse(sql); |
| 213 | +tree.stmts.forEach(stmt => { |
| 214 | + // Pass the inner stmt directly, not the RawStmt wrapper |
| 215 | + const outSql = deparse(stmt.stmt); |
| 216 | +}); |
| 217 | +``` |
| 218 | + |
| 219 | +This shows that users are currently: |
| 220 | +1. Parsing SQL to get a `ParseResult` |
| 221 | +2. Iterating over `stmts` array to get `RawStmt` objects |
| 222 | +3. Extracting the `stmt` property from each `RawStmt` |
| 223 | +4. Passing the bare statement to `deparse()` |
| 224 | + |
| 225 | +This pattern bypasses the `RawStmt` wrapper entirely, losing the `stmt_location` and `stmt_len` metadata. |
| 226 | + |
| 227 | +## Migration Path |
| 228 | + |
| 229 | +1. **Phase 1: Add Missing Methods** (backward compatible) |
| 230 | + - Add `ParseResult` method to handle wrapped ParseResult nodes |
| 231 | + - Ensure `RawStmt` method properly handles semicolon addition based on `stmt_len` |
| 232 | + - Keep existing duck-typing for bare ParseResult objects |
| 233 | + |
| 234 | +2. **Phase 2: Encourage Best Practices** (minor version) |
| 235 | + - Add documentation showing preferred usage: `deparse(parseResult)` |
| 236 | + - Add optional warnings when duck-typing is used |
| 237 | + - Show how to preserve statement metadata by passing full structures |
| 238 | + |
| 239 | +3. **Phase 3: Deprecate Duck-Typing** (major version) |
| 240 | + - Remove duck-typing support for bare ParseResult objects |
| 241 | + - Require all inputs to be properly wrapped Node types |
| 242 | + - This ensures consistent handling and preserves all metadata |
| 243 | + |
| 244 | +## Example of Improved Usage |
| 245 | + |
| 246 | +```typescript |
| 247 | +// Current (loses metadata) |
| 248 | +const tree = await parse(sql); |
| 249 | +tree.stmts.forEach(stmt => { |
| 250 | + const outSql = deparse(stmt.stmt); |
| 251 | +}); |
| 252 | + |
| 253 | +// Improved (preserves all metadata) |
| 254 | +const tree = await parse(sql); |
| 255 | +const outSql = deparse(tree); // Pass the entire ParseResult |
| 256 | + |
| 257 | +// Or for individual statements with metadata |
| 258 | +tree.stmts.forEach(rawStmt => { |
| 259 | + const outSql = deparse({ RawStmt: rawStmt }); |
| 260 | +}); |
| 261 | +``` |
| 262 | + |
| 263 | +This refactoring would make the deparser more consistent with the AST structure and easier to understand and maintain. |
0 commit comments