-
Notifications
You must be signed in to change notification settings - Fork 26
fix: Capitalization for clickhouse types with sqlparser #3152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Capitalization for clickhouse types with sqlparser #3152
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Code Review Agent Run #ea5e5dActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThis adds context-aware ClickHouse type-casing fixes and exposes Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Changelist by BitoThis pull request implements the following key changes.
|
Interaction Diagram by BitosequenceDiagram
participant CLI as CLI Entry
participant InfraMap as InfrastructureMap<br/>🔄 Updated | ●●○ Medium
participant DiffStrat as ClickHouseDiffStrategy
participant Parser as SQLParser<br/>🟩 Added | ●●● High
participant TypeFixer as TypeCasing Fixer<br/>🟩 Added | ●●● High
participant Regex as Regex Engine
participant ClickHouse as ClickHouse Database
CLI->>InfraMap: diff_infrastructure_map()
InfraMap->>DiffStrat: check_materialized_view_population()
DiffStrat->>Parser: parse_create_materialized_view(sql)
Parser->>TypeFixer: query_to_sql_string(query)
TypeFixer->>Regex: CLICKHOUSE_TYPE_PATTERN.replace_all()
Regex-->>TypeFixer: Matched types in valid contexts
TypeFixer->>TypeFixer: fix_clickhouse_type_casing() loop
Note over TypeFixer: Convert UPPERCASE types to PascalCase<br/>only in valid type contexts
TypeFixer-->>Parser: Formatted SQL string
Parser-->>DiffStrat: MaterializedViewStatement
DiffStrat-->>InfraMap: OlapChange with fixed types
InfraMap-->>CLI: InfraChanges with corrected SQL
alt [Type conversion needed]
CLI->>ClickHouse: Deploy SQL with PascalCase types
ClickHouse-->>CLI: Schema created successfully
else [Types already correct]
CLI->>ClickHouse: Deploy SQL (no changes)
end
Critical path: CLI Entry->InfrastructureMap->ClickHouseDiffStrategy->SQLParser->TypeCasing Fixer->Regex Engine
If the interaction diagram doesn't appear, refresh the page to render it. You can disable interaction diagrams by customizing agent settings. Refer to documentation. |
|
The bug is valid: the apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs(5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run
cargo clippyto ensure Rust code passes Clippy's linting standards before each commit
**/*.rs: Usethiserrorwith#[derive(thiserror::Error)]for error handling in Rust; define errors near the fallibility unit with NO globalErrortype
NEVER useanyhow::Resultin Rust; define custom error types instead
Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Place constants inconstants.rsat the appropriate module level in Rust
Use tuple structs with validation constructors as newtypes in Rust (e.g.,struct UserId(String))
Write inline tests with#[cfg(test)]modules in Rust
Documentation is required for all public APIs in Rust
Files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
apps/framework-cli/**/*.rs
📄 CodeRabbit inference engine (apps/framework-cli/AGENTS.md)
apps/framework-cli/**/*.rs: Always runcargo clippy --all-targets -- -D warningsbefore commits; fix all warnings - no Clippy warnings may remain (treat warnings as errors)
Userustfmt --edition 2021for consistent formatting
Write meaningful names for functions, variables, and types
Document all public APIs and breaking changes
Usethiserrorcrate instead ofanyhow::Resultfor error handling
Define errors near their unit of fallibility (no global Error types)
Use#[derive(thiserror::Error)]with#[error()]messages for error definitions
Define newtypes as tuple structs:struct UserId(u64);
Add validation constructors for newtypes:UserId::new(id: u64) -> Result<Self, Error>
Derive standard traits for newtypes:#[derive(Debug, Clone, PartialEq)]
ImplementFrom/TryFromfor newtype conversions
Usederive_moreornutypeto reduce newtype boilerplate
Useconstfor static values (prefer overstatic)
UseUPPER_SNAKE_CASEnaming for constants
Scope constant visibility with preference:pub(crate)>pub(super)>pub
Group related constants together
Write unit tests for all public functions
Test error conditions and edge cases
Files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🧠 Learnings (6)
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Document all public APIs and breaking changes
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use tuple structs with validation constructors as newtypes in Rust (e.g., `struct UserId(String)`)
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Define newtypes as tuple structs: `struct UserId(u64);`
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Write meaningful names for functions, variables, and types
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Use `UPPER_SNAKE_CASE` naming for constants
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Test CLI (macos-latest)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (7)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (7)
18-78: LGTM! Good comprehensive type mapping.Now look, here's the thing about the comment on line 23 - "longer ones first to avoid partial matches" - that's actually not quite right! See, when you're using word boundaries
\bin your regex pattern, you're already preventing partial matches. The word boundary does the heavy lifting there. It's like wearing both a belt AND suspenders - nothing wrong with it, but the belt alone would've done the job!The mapping itself is thorough though. You've got your integers, floats, dates, strings, containers - the whole shebang. The categorization with comments is particularly nice for the next physicist who comes along wondering what all these types are about.
80-91: LGTM! Elegant pattern construction.This is a beautiful thing - you're building a regex pattern dynamically from your type list, so if you add a new type to the mapping, it automatically shows up in the pattern. That's the kind of thing I love to see - no redundancy, no chance of forgetting to update two places.
The word boundaries
\bare the key here - they're doing the real work of ensuring you only match complete type names and don't accidentally turn "UINT64_column" into "UInt64_column".
112-126: LGTM! Well-documented public API.This is exactly how you want to write a utility function - clear documentation explaining the "why" (sqlparser uppercases, ClickHouse wants PascalCase), a simple example showing the transformation, and a one-liner implementation that's almost boring in its simplicity. Boring is good in production code!
The
ignoremarker on the example is the right call since constructing aQueryAST just to show the function works would be more confusing than helpful.
778-779: LGTM!Clean integration - the comment explains why we're not just using
format!, which is exactly the kind of breadcrumb future maintainers need.
819-820: LGTM!Consistent with the materialized view usage - same pattern, same helpful comment. That's the kind of consistency that makes a codebase a pleasure to work with.
2275-2324: Good test coverage for the fix function.These tests are hitting all the important cases - basic types, CAST expressions, nested containers, and the crucial check that lowercase identifiers like
string_columndon't get mangled. That last test at line 2320-2323 is particularly important - it's testing what you DON'T want to happen.One edge case you might consider adding: what happens if someone has a string literal containing a type name, like
SELECT 'STRING' AS my_literal? The regex would convert it to'String'. In practice this is probably rare, but documenting the limitation in a test would be honest about it.
2326-2384: Solid integration tests.These integration tests are where the rubber meets the road - you're testing that the whole pipeline from SQL string → AST → SQL string produces ClickHouse-compatible output. The assertions checking for both presence of PascalCase AND absence of uppercase variants are exactly right.
The test structure is clear: parse it, check it contains what it should, check it doesn't contain what it shouldn't. Simple and effective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs(5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run
cargo clippyto ensure Rust code passes Clippy's linting standards before each commit
**/*.rs: Usethiserrorwith#[derive(thiserror::Error)]for error handling in Rust; define errors near the fallibility unit with NO globalErrortype
NEVER useanyhow::Resultin Rust; define custom error types instead
Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Place constants inconstants.rsat the appropriate module level in Rust
Use tuple structs with validation constructors as newtypes in Rust (e.g.,struct UserId(String))
Write inline tests with#[cfg(test)]modules in Rust
Documentation is required for all public APIs in Rust
Files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
apps/framework-cli/**/*.rs
📄 CodeRabbit inference engine (apps/framework-cli/AGENTS.md)
apps/framework-cli/**/*.rs: Always runcargo clippy --all-targets -- -D warningsbefore commits; fix all warnings - no Clippy warnings may remain (treat warnings as errors)
Userustfmt --edition 2021for consistent formatting
Write meaningful names for functions, variables, and types
Document all public APIs and breaking changes
Usethiserrorcrate instead ofanyhow::Resultfor error handling
Define errors near their unit of fallibility (no global Error types)
Use#[derive(thiserror::Error)]with#[error()]messages for error definitions
Define newtypes as tuple structs:struct UserId(u64);
Add validation constructors for newtypes:UserId::new(id: u64) -> Result<Self, Error>
Derive standard traits for newtypes:#[derive(Debug, Clone, PartialEq)]
ImplementFrom/TryFromfor newtype conversions
Usederive_moreornutypeto reduce newtype boilerplate
Useconstfor static values (prefer overstatic)
UseUPPER_SNAKE_CASEnaming for constants
Scope constant visibility with preference:pub(crate)>pub(super)>pub
Group related constants together
Write unit tests for all public functions
Test error conditions and edge cases
Files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🧠 Learnings (4)
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Document all public APIs and breaking changes
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use tuple structs with validation constructors as newtypes in Rust (e.g., `struct UserId(String)`)
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Define newtypes as tuple structs: `struct UserId(u64);`
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test E2E Python Tests Template (Python 3.12)
- GitHub Check: Test E2E TypeScript Tests Template (Node 22, npm)
- GitHub Check: Test E2E TypeScript Tests Template (Node 24, npm)
- GitHub Check: Test E2E Python Tests Template (Python 3.13)
- GitHub Check: Test E2E TypeScript Tests Template (Node 20, npm)
🔇 Additional comments (6)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (6)
7-7: Nice addition of the regex import.You're bringing in the regex crate to handle the type-casing transformation. This makes sense for the pattern-matching approach you're taking. The import is clean and straightforward.
148-180: The iterative regex approach is sound for handling nested types.The idea of applying the transformation repeatedly until it stabilizes is exactly right for nested structures like
Nullable(Array(String)). First pass converts the outermost wrapper, next pass goes deeper, and so on. Very neat!The HashMap lookup at lines 164-167 with the
unwrap_orfallback is defensive programming - if somehow a type slips through that's not in your map, you don't corrupt the SQL, you just leave it alone.One thing to think about: for deeply nested types or very long SQL strings, this loop could do a lot of work. Each iteration scans the entire string with the regex. In practice, CREATE VIEW and INSERT SELECT statements are probably not huge, so this should be fine. But if you ever see this showing up in profiling, you could consider a single-pass approach with a more sophisticated state machine.
For now, this is perfectly reasonable. The code is clear and correct.
182-196: Public API is well-designed and properly documented.You've created a clean abstraction here - take a Query AST, format it to SQL, fix the type casing, done. The function signature is straightforward, the documentation explains what it does and why, and you even included a usage example in the doc comment.
Following the coding guidelines to the letter - public APIs must be documented, and you've done that nicely. The function name
query_to_sql_stringis descriptive in good Rust style.As per coding guidelines, public APIs must be documented.
848-850: Clean integration of the type-casing fix.You're applying the transformation at exactly the right place - after sqlparser has parsed the query into an AST, but when you're converting it back to a string. The comment explains why you're doing this, which is helpful for future maintainers.
The change is minimal and doesn't disturb the surrounding logic. Beautiful.
889-890: Consistent application in INSERT SELECT parsing.Same pattern here as in the materialized view parsing - format the query with proper type casing. The symmetry is good for maintainability.
2345-2505: Comprehensive test coverage for the main use cases, but missing edge case tests.The test suite here is really quite thorough! You're checking:
- Basic type conversions in CAST expressions ✓
- Nested types like
Nullable(String)✓- Deeply nested types ✓
- Lowercase identifiers stay lowercase ✓
- Simple uppercase column names stay uppercase ✓
- Integration with materialized view parsing ✓
- Integration with INSERT SELECT parsing ✓
All of this is excellent. You're verifying the functionality that matters most - CAST expressions - works correctly.
However, there's a gap: you test
SELECT DATE FROM events(line 2402) where DATE is a column name, and that correctly stays uppercase. But you don't testSELECT a, DATE, b FROM eventswhere DATE appears after a comma. Based on the regex pattern, this would incorrectly convert DATE to Date because of the,\s*prefix matcher.I'd suggest adding a test that either:
- Documents this limitation explicitly as a known issue
- Demonstrates that it's not a problem in practice (maybe you can show it doesn't occur in typical sqlparser output?)
The tests you have are solid for the intended use cases. Just making sure the edge cases are understood and either fixed or documented.
Based on learnings, write unit tests for all public functions and test edge cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs(5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run
cargo clippyto ensure Rust code passes Clippy's linting standards before each commit
**/*.rs: Usethiserrorwith#[derive(thiserror::Error)]for error handling in Rust; define errors near the fallibility unit with NO globalErrortype
NEVER useanyhow::Resultin Rust; define custom error types instead
Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Place constants inconstants.rsat the appropriate module level in Rust
Use tuple structs with validation constructors as newtypes in Rust (e.g.,struct UserId(String))
Write inline tests with#[cfg(test)]modules in Rust
Documentation is required for all public APIs in Rust
Files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
apps/framework-cli/**/*.rs
📄 CodeRabbit inference engine (apps/framework-cli/AGENTS.md)
apps/framework-cli/**/*.rs: Always runcargo clippy --all-targets -- -D warningsbefore commits; fix all warnings - no Clippy warnings may remain (treat warnings as errors)
Userustfmt --edition 2021for consistent formatting
Write meaningful names for functions, variables, and types
Document all public APIs and breaking changes
Usethiserrorcrate instead ofanyhow::Resultfor error handling
Define errors near their unit of fallibility (no global Error types)
Use#[derive(thiserror::Error)]with#[error()]messages for error definitions
Define newtypes as tuple structs:struct UserId(u64);
Add validation constructors for newtypes:UserId::new(id: u64) -> Result<Self, Error>
Derive standard traits for newtypes:#[derive(Debug, Clone, PartialEq)]
ImplementFrom/TryFromfor newtype conversions
Usederive_moreornutypeto reduce newtype boilerplate
Useconstfor static values (prefer overstatic)
UseUPPER_SNAKE_CASEnaming for constants
Scope constant visibility with preference:pub(crate)>pub(super)>pub
Group related constants together
Write unit tests for all public functions
Test error conditions and edge cases
Files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🧠 Learnings (4)
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Document all public APIs and breaking changes
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use tuple structs with validation constructors as newtypes in Rust (e.g., `struct UserId(String)`)
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Define newtypes as tuple structs: `struct UserId(u64);`
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Test E2E Python Tests Template (Python 3.12)
- GitHub Check: Test E2E TypeScript Tests Template (Node 24, npm)
- GitHub Check: Test E2E Python Tests Template (Python 3.13)
- GitHub Check: Test E2E TypeScript Tests Template (Node 22, npm)
- GitHub Check: Test E2E TypeScript Tests Template (Node 20, npm)
🔇 Additional comments (8)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (8)
18-81: LGTM! Well-organized type mapping.You know, this is exactly how you solve a problem elegantly - just list out all the cases! The HashMap gives you O(1) lookup, which is the right choice here. The documentation is clear about why this exists (sqlparser's uppercase formatting vs ClickHouse's PascalCase requirement), and I appreciate the note about submitting an upstream fix. That's thinking about the whole system, not just your little corner of it!
83-105: Looks good - clear enumeration of type wrappers.See, this is what I call being explicit! You've got both the uppercase and PascalCase forms because during the iterative conversion, you might encounter either one. Some folks might say "derive this from the main list," but nah - being explicit here makes it obvious what's happening. Physics students make the same mistake, trying to be too clever when just writing out the cases works better!
158-190: The iterative approach is elegant!Now this is the kind of thing I love to see! Instead of trying to write one monster regex that handles arbitrary nesting (which, let me tell you, is a fool's errand with regex), you just keep applying the transformation until nothing changes. Each pass peels off one layer of the onion. Simple, correct, and you can actually understand what's happening!
The termination condition is sound — when
new_result == result, you're done. No infinite loops here.
192-206: Well-documented public API!Beautiful! You've got a doc comment explaining what it does, why it exists (sqlparser uppercases, ClickHouse wants PascalCase), and an example showing the transformation. This is exactly how you document a public API. Someone reading this in six months will know immediately what this function is for without having to read the implementation. As per coding guidelines, public APIs need documentation, and you've delivered!
858-859: LGTM! Clean integration.Simple substitution — the old
format!("{}", query)becomesquery_to_sql_string(query). The comment explains why. This is how you integrate a fix without making a mess of the existing code!
899-900: Consistent with the materialized view change!Same fix, same pattern. When you find a bug in one place, you look for it everywhere — and you did! Both
parse_create_materialized_viewandparse_insert_selectnow use the samequery_to_sql_stringfunction. That's consistency, and consistency is how you avoid the "fixed it here but not there" bug.
2355-2466: Comprehensive test suite - well done!Now this is how you test! You've got:
- Basic type conversions (
STRING→String,UINT64→UInt64)- Nested types (
NULLABLE(STRING)→Nullable(String))- Complex expressions with multiple casts
- Critical: Tests that identifiers are NOT converted (
SELECT DATE FROM eventsstays unchanged)- Critical: Comma-separated columns preserved (
SELECT a, DATE, b FROM t)The test at lines 2449-2453 is particularly nice — it shows that the column
DATEis preserved while the typeSTRINGafterASis converted. That's the whole point of context-aware matching!The only thing missing is the nested-parentheses-before-comma case I mentioned earlier, but that's a test for a bug, not a missing test for working functionality.
2468-2526: Round-trip tests verify the full integration!These are the tests that actually matter for users — does parsing a materialized view or insert-select statement preserve the PascalCase types? You parse, you format, you check. The
String,UInt64,DateTime,Int64types all survive the round-trip.This is what I call testing the behavior, not just the implementation. Good stuff!
Code Review Agent Run #106131Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (1)
107-147: Regex pattern breaks on nested type wrappers—needs recursive or stateful approach.Looking at this regex pattern, I can see a pretty fundamental problem. The
[^)]*in line 143 means "match any character except a close-paren"—but here's the thing: when you've got nested types likeMap(Nullable(String), UINT64), that pattern stops dead at the first)it sees. So it matchesMap(Nullable(Stringand completely misses that comma and theUINT64that follows.This is exactly what one of the past review comments flagged, and it's still sitting there in the code.
Now, regular expressions aren't really built for counting matched parentheses—that's inherently recursive, and regexes are fundamentally non-recursive (well, unless you use fancy PCRE recursion features, but those get messy fast). You've got a few options here:
State machine approach: Walk through the string character-by-character, counting paren depth, and only match types when you're at the right depth. This is what your
extract_engine_from_create_tablefunction already does around lines 395-414.Multi-pass with balanced replacement: Apply the regex repeatedly but track paren balance to ensure you're only replacing at depth 0, then depth 1, etc.
AST-aware approach: Since you're already parsing with sqlparser, maybe intercept during AST traversal instead of post-processing the string.
The iterative loop (lines 166-187) helps with nesting depth (innermost to outermost), but it doesn't fix the problem that
[^)]*stops at the wrong closing paren when there are multiple args with nested types.🔎 Example demonstrating the bug
// Current behavior: let input = "SELECT CAST(x AS Map(Nullable(String), UINT64))"; let result = fix_clickhouse_type_casing(input); // Bug: Won't convert UINT64 because [^)]* stops at ) after String // The pattern never sees the comma that precedes UINT64 // Expected: // "SELECT CAST(x AS Map(Nullable(String), UInt64))"To fix this properly, you'd want something like the paren-counting logic already used elsewhere in this file.
Based on past review comments and coding guidelines.
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs(5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs
📄 CodeRabbit inference engine (.github/copilot-instructions.md)
Run
cargo clippyto ensure Rust code passes Clippy's linting standards before each commit
**/*.rs: Usethiserrorwith#[derive(thiserror::Error)]for error handling in Rust; define errors near the fallibility unit with NO globalErrortype
NEVER useanyhow::Resultin Rust; define custom error types instead
Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Place constants inconstants.rsat the appropriate module level in Rust
Use tuple structs with validation constructors as newtypes in Rust (e.g.,struct UserId(String))
Write inline tests with#[cfg(test)]modules in Rust
Documentation is required for all public APIs in Rust
Files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
apps/framework-cli/**/*.rs
📄 CodeRabbit inference engine (apps/framework-cli/AGENTS.md)
apps/framework-cli/**/*.rs: Always runcargo clippy --all-targets -- -D warningsbefore commits; fix all warnings - no Clippy warnings may remain (treat warnings as errors)
Userustfmt --edition 2021for consistent formatting
Write meaningful names for functions, variables, and types
Document all public APIs and breaking changes
Usethiserrorcrate instead ofanyhow::Resultfor error handling
Define errors near their unit of fallibility (no global Error types)
Use#[derive(thiserror::Error)]with#[error()]messages for error definitions
Define newtypes as tuple structs:struct UserId(u64);
Add validation constructors for newtypes:UserId::new(id: u64) -> Result<Self, Error>
Derive standard traits for newtypes:#[derive(Debug, Clone, PartialEq)]
ImplementFrom/TryFromfor newtype conversions
Usederive_moreornutypeto reduce newtype boilerplate
Useconstfor static values (prefer overstatic)
UseUPPER_SNAKE_CASEnaming for constants
Scope constant visibility with preference:pub(crate)>pub(super)>pub
Group related constants together
Write unit tests for all public functions
Test error conditions and edge cases
Files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🧠 Learnings (4)
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Document all public APIs and breaking changes
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Define newtypes as tuple structs: `struct UserId(u64);`
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use tuple structs with validation constructors as newtypes in Rust (e.g., `struct UserId(String)`)
Applied to files:
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🔇 Additional comments (1)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (1)
2409-2446: Nice work protecting column identifiers from conversion.I really like what you've done here with these tests. The past reviews flagged a critical concern: what if someone has a column actually named
DATEorUUID? Your regex could incorrectly convert it toDateorUuid, breaking the query since ClickHouse identifiers are case-sensitive.These tests (especially lines 2437-2441 with the comma-separated SELECT list) directly address that concern and demonstrate the pattern correctly distinguishes between type contexts (
CAST(x AS DATE)) and identifier contexts (SELECT DATE FROM t).That's exactly the kind of test that gives confidence the feature won't break legitimate queries. Good defensive testing!
Based on past review comments requesting these specific tests.
| let pattern = format!( | ||
| r"(AS\s+|(?:{})\(|\),\s*|\(\w+,\s*|(?:{}),\s*)({})([(,)])", | ||
| wrappers_pattern, pascal_types_pattern, types_pattern | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Regex over-converts type-named identifiers in function arguments
The regex pattern \(\w+,\s* intended to match type arguments in constructs like AggregateFunction(sum, INT64) will also incorrectly match regular function calls. For example, argMax(value, DATE) where DATE is a column name would be converted to argMax(value, Date). The pattern cannot distinguish between type wrapper contexts (where conversion is needed) and regular function calls (where identifiers matching type names should remain unchanged). While ClickHouse identifier case-insensitivity may prevent runtime errors, the output SQL is semantically incorrect.
| let pattern = format!( | ||
| r"(AS\s+|(?:{})\(|\),\s*|\(\w+,\s*|(?:{}),\s*)({})([(,)])", | ||
| wrappers_pattern, pascal_types_pattern, types_pattern | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Regex over-converts column names after function calls in SELECT
The \),\s* pattern meant for type expressions like Map(Nullable(String), INT64) will also match columns in SELECT lists after function calls. For example, SELECT func(a), DATE, b FROM t where DATE is a column would match because ), precedes DATE and , follows it. The suffix check only prevents matches when the column is last (followed by space), but not when additional columns follow (separated by comma). This causes incorrect conversion of column names to PascalCase type names when they appear after function calls and before other columns.
Code Review Agent Run #9257a1Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
|
apache/datafusion-sqlparser-rs#2154 Made a real fix with the upstream library |
Note
Adds context-aware ClickHouse type casing fix and applies it to serialized SELECTs in materialized view and insert-select parsing, with comprehensive tests.
CLICKHOUSE_TYPE_FIXES,TYPE_WRAPPERS,CLICKHOUSE_TYPE_PATTERN, andfix_clickhouse_type_casing.query_to_sql_stringto formatQuerywhile preserving PascalCase types.parse_create_materialized_viewandparse_insert_selectto produce correctly casedselect_statement.Array,Nullable,Map,Tuple,AggregateFunction), and ensuring identifiers/SELECT lists are not altered.Written by Cursor Bugbot for commit 0b3405f. This will update automatically on new commits. Configure here.
Summary by Bito