This PR investigates and documents the challenges of adding native FOR clause support to Opteryx's SQL parser.
-
Comprehensive Documentation (
docs/FOR_CLAUSE_PARSING.md):- Explains Opteryx's temporal FOR clause syntax
- Documents current Python-based implementation
- Analyzes why native parser support is challenging
- Outlines 4 potential approaches with trade-offs
-
Proof-of-Concept Rust Module (
src/temporal_parser.rs):- Skeleton implementation showing how temporal extraction could work in Rust
- Exposed to Python via
extract_temporal_filtersfunction - Clearly documented as POC, not production-ready
- Includes basic test structure
-
Updated Rust Library (
src/lib.rs):- Added
extract_temporal_filtersfunction to Python API - Maintained backward compatibility
- Added
After deep investigation of both sqlparser-rs (v0.59.0) architecture and Opteryx's current implementation:
sqlparser-rs provides limited extension points:
parse_infix: For custom infix operators (e.g.,@>>for ArrayContainsAll)parse_prefix: For custom prefix operatorsparse_statement: For custom statement types- No hook for extending table-level syntax (where FOR clauses appear)
The existing Python approach in sql_rewriter.py:
- ✅ Handles complex cases (quoted strings, comments, nested queries)
- ✅ Well-tested with comprehensive test suite
- ✅ Proven in production
- ✅ Supports special cases (b"" strings, r"" strings, EXTRACT/SUBSTRING/TRIM functions)
- Port to Rust (started in this PR): Move Python logic to Rust for performance
- Fork sqlparser-rs: Add native FOR support, but creates maintenance burden
- Use WITH Hints: Convert
FOR XtoWITH(__TEMPORAL__='X')- clever but awkward - Keep Current: Python implementation is good enough
❌ Replace the existing Python implementation ❌ Change any query execution behavior ❌ Modify the AST structure ❌ Add new SQL syntax support
The Python implementation remains the authoritative version.
For the current issue: The investigation shows that adding native parser support is more complex than initially expected. The current Python implementation should be kept because:
- It works reliably
- It's well-tested
- The complexity of alternatives outweighs benefits
- Performance is not a bottleneck here
If parser support is still desired, the recommended approach is:
- Start with Option 3 (WITH hints) as a low-risk experiment
- If successful, consider Option 2 (fork sqlparser-rs) for clean integration
src/lib.rs: Addedextract_temporal_filtersfunction (POC)src/temporal_parser.rs: New module with documented POC implementationdocs/FOR_CLAUSE_PARSING.md: Comprehensive documentation
# Rust tests pass
cargo test --release temporal_parser
# Python tests unchanged (existing implementation still used)
python -m pytest tests/unit/planner/test_temporal_extraction.py- Review
docs/FOR_CLAUSE_PARSING.mdand choose an approach - If choosing Rust port (Option 1):
- Complete the
split_sql_partsfunction - Port the state machine logic accurately
- Add comprehensive tests matching Python test suite
- Benchmark vs Python
- Gradual migration
- Complete the
- If choosing fork (Option 2):
- Fork sqlparser-rs
- Add TableFactor::Table fields for temporal info
- Modify parser to recognize FOR clauses
- Test with Opteryx
- If choosing hints (Option 3):
- Modify sql_rewriter to convert FOR to WITH hints
- Add post-parsing extraction of hints
- Test thoroughly
This PR provides a thorough analysis and documentation of the problem space. The current Python implementation is good and should be kept. Native parser support is feasible but requires significant effort with unclear benefits.