|
| 1 | +# PostgreSQL 13->14 AST Transformer Notes |
| 2 | + |
| 3 | +## Current Status |
| 4 | +- **Pass Rate**: 124/258 tests (48%) |
| 5 | +- **Baseline**: Stable at 124/258 despite comprehensive transformations |
| 6 | +- **Branch**: devin/1750826349-v13-to-v14-transformer |
| 7 | + |
| 8 | +## Primary Challenge: funcformat Field Transformation |
| 9 | + |
| 10 | +### Problem Description |
| 11 | +The main blocker for improving beyond 124/258 is the `funcformat` field in `FuncCall` nodes. The current transformer adds `funcformat: "COERCE_EXPLICIT_CALL"` to all FuncCall nodes, but PG14's actual behavior is more nuanced: |
| 12 | + |
| 13 | +### Observed Patterns from Failing Tests |
| 14 | + |
| 15 | +#### 1. SQL Syntax Functions (should use COERCE_SQL_SYNTAX) |
| 16 | +- **TRIM functions**: `TRIM(BOTH FROM ' text ')` → `funcformat: "COERCE_SQL_SYNTAX"` |
| 17 | +- **String functions**: `SUBSTRING`, `POSITION`, `OVERLAY` |
| 18 | +- **Date/time functions**: `EXTRACT`, `CURRENT_DATE`, `CURRENT_TIMESTAMP` |
| 19 | + |
| 20 | +**Example failure** (strings-41.sql): |
| 21 | +``` |
| 22 | +Expected: "funcformat": "COERCE_SQL_SYNTAX" |
| 23 | +Received: "funcformat": "COERCE_EXPLICIT_CALL" |
| 24 | +``` |
| 25 | + |
| 26 | +#### 2. Aggregate Functions in TypeCast (should have NO funcformat) |
| 27 | +- **Aggregate + TypeCast**: `CAST(AVG(column) AS NUMERIC(10,3))` → no funcformat field |
| 28 | +- **Mathematical functions in casts**: Similar pattern |
| 29 | + |
| 30 | +**Example failure** (aggregates-3.sql): |
| 31 | +``` |
| 32 | +Expected: (no funcformat field) |
| 33 | +Received: "funcformat": "COERCE_EXPLICIT_CALL" |
| 34 | +``` |
| 35 | + |
| 36 | +#### 3. Context-Specific Exclusions (already implemented) |
| 37 | +Current exclusions working correctly: |
| 38 | +- CHECK constraints |
| 39 | +- COMMENT statements |
| 40 | +- TypeCast contexts |
| 41 | +- XmlExpr contexts |
| 42 | +- INSERT statements |
| 43 | +- RangeFunction contexts |
| 44 | + |
| 45 | +### Technical Implementation Challenges |
| 46 | + |
| 47 | +#### Current Approach |
| 48 | +```typescript |
| 49 | +// Current: One-size-fits-all |
| 50 | +if (!this.shouldExcludeFuncformat(node, context)) { |
| 51 | + result.funcformat = "COERCE_EXPLICIT_CALL"; |
| 52 | +} |
| 53 | +``` |
| 54 | + |
| 55 | +#### Needed Approach |
| 56 | +```typescript |
| 57 | +// Needed: Function-specific logic |
| 58 | +if (!this.shouldExcludeFuncformat(node, context)) { |
| 59 | + result.funcformat = this.getFuncformatValue(node, context); |
| 60 | +} |
| 61 | + |
| 62 | +private getFuncformatValue(node: any, context: TransformerContext): string { |
| 63 | + const funcname = this.getFunctionName(node); |
| 64 | + |
| 65 | + // SQL syntax functions |
| 66 | + if (sqlSyntaxFunctions.includes(funcname.toLowerCase())) { |
| 67 | + return 'COERCE_SQL_SYNTAX'; |
| 68 | + } |
| 69 | + |
| 70 | + // Default to explicit call |
| 71 | + return 'COERCE_EXPLICIT_CALL'; |
| 72 | +} |
| 73 | +``` |
| 74 | + |
| 75 | +### Analysis of Remaining 134 Failing Tests |
| 76 | + |
| 77 | +#### Test Categories with funcformat Issues: |
| 78 | +1. **String manipulation**: TRIM, SUBSTRING, etc. (need COERCE_SQL_SYNTAX) |
| 79 | +2. **Aggregates in TypeCast**: AVG, SUM, etc. in CAST expressions (need exclusion) |
| 80 | +3. **Date/time functions**: EXTRACT, date arithmetic (need COERCE_SQL_SYNTAX) |
| 81 | +4. **Array operations**: Array functions and operators |
| 82 | +5. **Numeric operations**: Mathematical functions in various contexts |
| 83 | + |
| 84 | +#### Root Cause Analysis: |
| 85 | +The 124/258 plateau suggests that: |
| 86 | +- Context-specific exclusions are working (no regressions) |
| 87 | +- But function-specific `funcformat` values are the missing piece |
| 88 | +- Need to distinguish between SQL syntax vs explicit call functions |
| 89 | +- Need better detection of aggregate-in-typecast patterns |
| 90 | + |
| 91 | +### Next Steps to Break the Plateau |
| 92 | + |
| 93 | +1. **Implement function-specific funcformat logic** |
| 94 | + - Create mapping of SQL syntax functions |
| 95 | + - Add getFuncformatValue() method |
| 96 | + - Test with TRIM/string function failures |
| 97 | + |
| 98 | +2. **Enhance TypeCast + Aggregate detection** |
| 99 | + - Improve context detection for aggregates in casts |
| 100 | + - May need parent node analysis beyond current path checking |
| 101 | + |
| 102 | +3. **Systematic testing approach** |
| 103 | + - Target specific failing test categories |
| 104 | + - Verify each improvement maintains baseline |
| 105 | + - Focus on high-impact function types first |
| 106 | + |
| 107 | +### Key Insights |
| 108 | +- The transformer architecture is sound (124/258 baseline is stable) |
| 109 | +- Context-specific exclusions work correctly |
| 110 | +- The remaining challenge is function-type-specific behavior |
| 111 | +- PG14 parser behavior varies significantly by function category |
| 112 | +- Need more granular funcformat assignment logic |
| 113 | + |
| 114 | +## Implementation Strategy |
| 115 | +Focus on breaking the 124/258 plateau by implementing function-specific funcformat logic, starting with the most common failing patterns (TRIM, aggregates in TypeCast). |
0 commit comments