Skip to content

Conversation

@callicles
Copy link
Collaborator

@callicles callicles commented Dec 18, 2025

Note

Adds context-aware ClickHouse type casing fix and applies it to serialized SELECTs in materialized view and insert-select parsing, with comprehensive tests.

  • Parser:
    • Add context-aware type casing fix for ClickHouse via CLICKHOUSE_TYPE_FIXES, TYPE_WRAPPERS, CLICKHOUSE_TYPE_PATTERN, and fix_clickhouse_type_casing.
    • Introduce query_to_sql_string to format Query while preserving PascalCase types.
    • Integrate casing fix into parse_create_materialized_view and parse_insert_select to produce correctly cased select_statement.
  • Tests:
    • Add extensive tests covering CAST, nested/container types (Array, Nullable, Map, Tuple, AggregateFunction), and ensuring identifiers/SELECT lists are not altered.
    • Verify materialized view and insert-select parsing preserve PascalCase types in outputs.

Written by Cursor Bugbot for commit 0b3405f. This will update automatically on new commits. Configure here.

Summary by Bito

  • Fixes a critical bug in the sqlparser library where ClickHouse type names were incorrectly uppercased by implementing and refining a regex-based solution to maintain PascalCase formatting.
  • Refines the regex pattern to better capture context-specific syntax in CAST expressions and nested types.
  • Introduces a mapping for type conversions with improved regex patterns and documentation to clarify the context in which type names should be converted.
  • Adds comprehensive tests to ensure that only valid type contexts trigger the conversion while preserving identifiers in SELECT lists.
  • Updates query formatting functions used in materialized view and insert-select parsing to improve consistency and accuracy of SQL output serialization.
  • Overall, the changes address SQL parsing issues in the sqlparser library by introducing type conversion mappings, refining regex patterns for CAST expressions, updating formatting functions, and adding comprehensive tests to ensure consistent and accurate casing in SQL query serialization for ClickHouse databases.

@linear
Copy link

linear bot commented Dec 18, 2025

@vercel
Copy link

vercel bot commented Dec 18, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
docs-v2 Ready Ready Preview, Comment Dec 18, 2025 9:16pm
framework-docs Ready Ready Preview, Comment Dec 18, 2025 9:16pm

@bito-code-review
Copy link

bito-code-review bot commented Dec 18, 2025

Code Review Agent Run #ea5e5d

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: db89b40..db89b40
    • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at dave@fiveonefour.com.

Documentation & Help

AI Code Review powered by Bito Logo

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 18, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This adds context-aware ClickHouse type-casing fixes and exposes pub fn query_to_sql_string(&Query) -> String. SQL output from sqlparser ASTs is post-processed to convert ClickHouse type names (and nested wrappers) to PascalCase; CREATE MATERIALIZED VIEW and INSERT...SELECT now use this output.

Changes

Cohort / File(s) Change Summary
ClickHouse SQL parser + tests
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs, apps/framework-cli/src/infrastructure/olap/clickhouse/..._tests.rs
Added CLICKHOUSE_TYPE_FIXES mapping and TYPE_WRAPPERS, introduced CLICKHOUSE_TYPE_PATTERN regex and fix_clickhouse_type_casing that iteratively applies context-aware PascalCase fixes for types (handles CAST, wrappers, nested/multi-arg containers). Added pub fn query_to_sql_string(query: &Query) -> String that formats a Query then applies the fixer. Replaced direct format!("{}", query) uses in parse_create_materialized_view and parse_insert_select. Added regex::Regex import and comprehensive tests covering CAST, nested wrappers, multi-arg containers, and identifier-preservation edge cases.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Areas to inspect:
    • The CLICKHOUSE_TYPE_PATTERN regex correctness and anchors for all intended contexts.
    • Iterative replacement termination and handling of deeply nested types.
    • Tests covering edge cases where identifiers resemble type names.

Poem

I probe the code like curious light,
Turning LOUD TYPES into Proper flight.
Regex dances, nested shapes set straight,
Queries sing back—no more CAPS to hate. 🎩✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: fixing ClickHouse type capitalization when using sqlparser, which is the core objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description clearly articulates the purpose: fixing ClickHouse type casing in sqlparser output through context-aware regex-based post-processing, with concrete details about constants, functions, and testing coverage.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch nico/eng-1767-sql-parser-converts-clickhouse-types-to-uppercase-causing

Comment @coderabbitai help to get the list of available commands and usage tips.

@bito-code-review
Copy link

bito-code-review bot commented Dec 18, 2025

Changelist by Bito

This pull request implements the following key changes.

Key Change Files Impacted
Bug Fix - Fix Incorrect ClickHouse Type Casing

sql_parser.rs - Implemented a context-aware conversion mechanism using regex to correctly format ClickHouse type names to PascalCase, updating functions for materialized view and insert-select parsing and enhancing documentation through detailed inline comments.

Testing - Add Comprehensive Tests for Type Casing

sql_parser.rs - Introduced extensive tests that cover various use cases including CAST expressions, nested types, and container types, ensuring that only valid type contexts are converted while leaving column identifiers unchanged.

@bito-code-review
Copy link

bito-code-review bot commented Dec 18, 2025

Interaction Diagram by Bito
sequenceDiagram
participant CLI as CLI Entry
participant InfraMap as InfrastructureMap<br/>🔄 Updated | ●●○ Medium
participant DiffStrat as ClickHouseDiffStrategy
participant Parser as SQLParser<br/>🟩 Added | ●●● High
participant TypeFixer as TypeCasing Fixer<br/>🟩 Added | ●●● High
participant Regex as Regex Engine
participant ClickHouse as ClickHouse Database
CLI->>InfraMap: diff_infrastructure_map()
InfraMap->>DiffStrat: check_materialized_view_population()
DiffStrat->>Parser: parse_create_materialized_view(sql)
Parser->>TypeFixer: query_to_sql_string(query)
TypeFixer->>Regex: CLICKHOUSE_TYPE_PATTERN.replace_all()
Regex-->>TypeFixer: Matched types in valid contexts
TypeFixer->>TypeFixer: fix_clickhouse_type_casing() loop
Note over TypeFixer: Convert UPPERCASE types to PascalCase<br/>only in valid type contexts
TypeFixer-->>Parser: Formatted SQL string
Parser-->>DiffStrat: MaterializedViewStatement
DiffStrat-->>InfraMap: OlapChange with fixed types
InfraMap-->>CLI: InfraChanges with corrected SQL
alt [Type conversion needed]
CLI->>ClickHouse: Deploy SQL with PascalCase types
ClickHouse-->>CLI: Schema created successfully
else [Types already correct]
CLI->>ClickHouse: Deploy SQL (no changes)
    end
Loading

Critical path: CLI Entry->InfrastructureMap->ClickHouseDiffStrategy->SQLParser->TypeCasing Fixer->Regex Engine

Note: This MR adds context-aware ClickHouse type casing fixes to the SQL parser. When parsing CREATE MATERIALIZED VIEW and INSERT INTO...SELECT statements, types are now converted from uppercase (sqlparser default) to PascalCase (ClickHouse requirement). The regex-based fix only applies to valid type contexts (after AS, inside type wrappers like Nullable()), avoiding false conversions of column names. Upstream: CLI infrastructure diffing. Downstream: ClickHouse schema deployment with corrected type casing.

If the interaction diagram doesn't appear, refresh the page to render it.

You can disable interaction diagrams by customizing agent settings. Refer to documentation.

@bito-code-review
Copy link

The bug is valid: the fix_clickhouse_type_casing function uses a broad regex with word boundaries that matches type names anywhere in the SQL string, including column names, aliases, or literals. For example, SELECT DATE FROM events becomes SELECT Date FROM events, breaking case-sensitive identifiers in ClickHouse. The current implementation lacks context to distinguish types from identifiers.

apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs

fn fix_clickhouse_type_casing(sql: &str) -> String {
    CLICKHOUSE_TYPE_PATTERN
        .replace_all(sql, |caps: &regex::Captures| {
            let matched = &caps[1];
            // Find the correct PascalCase version and return as owned String
            CLICKHOUSE_TYPE_FIXES
                .iter()
                .find(|(upper, _)| *upper == matched)
                .map(|(_, pascal)| pascal.to_string())
                .unwrap_or_else(|| matched.to_string())
        })
        .into_owned()
}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f9f8959 and db89b40.

📒 Files selected for processing (1)
  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run cargo clippy to ensure Rust code passes Clippy's linting standards before each commit

**/*.rs: Use thiserror with #[derive(thiserror::Error)] for error handling in Rust; define errors near the fallibility unit with NO global Error type
NEVER use anyhow::Result in Rust; define custom error types instead
Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Place constants in constants.rs at the appropriate module level in Rust
Use tuple structs with validation constructors as newtypes in Rust (e.g., struct UserId(String))
Write inline tests with #[cfg(test)] modules in Rust
Documentation is required for all public APIs in Rust

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
apps/framework-cli/**/*.rs

📄 CodeRabbit inference engine (apps/framework-cli/AGENTS.md)

apps/framework-cli/**/*.rs: Always run cargo clippy --all-targets -- -D warnings before commits; fix all warnings - no Clippy warnings may remain (treat warnings as errors)
Use rustfmt --edition 2021 for consistent formatting
Write meaningful names for functions, variables, and types
Document all public APIs and breaking changes
Use thiserror crate instead of anyhow::Result for error handling
Define errors near their unit of fallibility (no global Error types)
Use #[derive(thiserror::Error)] with #[error()] messages for error definitions
Define newtypes as tuple structs: struct UserId(u64);
Add validation constructors for newtypes: UserId::new(id: u64) -> Result<Self, Error>
Derive standard traits for newtypes: #[derive(Debug, Clone, PartialEq)]
Implement From/TryFrom for newtype conversions
Use derive_more or nutype to reduce newtype boilerplate
Use const for static values (prefer over static)
Use UPPER_SNAKE_CASE naming for constants
Scope constant visibility with preference: pub(crate) > pub(super) > pub
Group related constants together
Write unit tests for all public functions
Test error conditions and edge cases

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🧠 Learnings (6)
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Document all public APIs and breaking changes

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use tuple structs with validation constructors as newtypes in Rust (e.g., `struct UserId(String)`)

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Define newtypes as tuple structs: `struct UserId(u64);`

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Write meaningful names for functions, variables, and types

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Use `UPPER_SNAKE_CASE` naming for constants

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Test CLI (macos-latest)
  • GitHub Check: Cursor Bugbot
🔇 Additional comments (7)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (7)

18-78: LGTM! Good comprehensive type mapping.

Now look, here's the thing about the comment on line 23 - "longer ones first to avoid partial matches" - that's actually not quite right! See, when you're using word boundaries \b in your regex pattern, you're already preventing partial matches. The word boundary does the heavy lifting there. It's like wearing both a belt AND suspenders - nothing wrong with it, but the belt alone would've done the job!

The mapping itself is thorough though. You've got your integers, floats, dates, strings, containers - the whole shebang. The categorization with comments is particularly nice for the next physicist who comes along wondering what all these types are about.


80-91: LGTM! Elegant pattern construction.

This is a beautiful thing - you're building a regex pattern dynamically from your type list, so if you add a new type to the mapping, it automatically shows up in the pattern. That's the kind of thing I love to see - no redundancy, no chance of forgetting to update two places.

The word boundaries \b are the key here - they're doing the real work of ensuring you only match complete type names and don't accidentally turn "UINT64_column" into "UInt64_column".


112-126: LGTM! Well-documented public API.

This is exactly how you want to write a utility function - clear documentation explaining the "why" (sqlparser uppercases, ClickHouse wants PascalCase), a simple example showing the transformation, and a one-liner implementation that's almost boring in its simplicity. Boring is good in production code!

The ignore marker on the example is the right call since constructing a Query AST just to show the function works would be more confusing than helpful.


778-779: LGTM!

Clean integration - the comment explains why we're not just using format!, which is exactly the kind of breadcrumb future maintainers need.


819-820: LGTM!

Consistent with the materialized view usage - same pattern, same helpful comment. That's the kind of consistency that makes a codebase a pleasure to work with.


2275-2324: Good test coverage for the fix function.

These tests are hitting all the important cases - basic types, CAST expressions, nested containers, and the crucial check that lowercase identifiers like string_column don't get mangled. That last test at line 2320-2323 is particularly important - it's testing what you DON'T want to happen.

One edge case you might consider adding: what happens if someone has a string literal containing a type name, like SELECT 'STRING' AS my_literal? The regex would convert it to 'String'. In practice this is probably rare, but documenting the limitation in a test would be honest about it.


2326-2384: Solid integration tests.

These integration tests are where the rubber meets the road - you're testing that the whole pipeline from SQL string → AST → SQL string produces ClickHouse-compatible output. The assertions checking for both presence of PascalCase AND absence of uppercase variants are exactly right.

The test structure is clear: parse it, check it contains what it should, check it doesn't contain what it shouldn't. Simple and effective.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between db89b40 and c5ae3d5.

📒 Files selected for processing (1)
  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run cargo clippy to ensure Rust code passes Clippy's linting standards before each commit

**/*.rs: Use thiserror with #[derive(thiserror::Error)] for error handling in Rust; define errors near the fallibility unit with NO global Error type
NEVER use anyhow::Result in Rust; define custom error types instead
Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Place constants in constants.rs at the appropriate module level in Rust
Use tuple structs with validation constructors as newtypes in Rust (e.g., struct UserId(String))
Write inline tests with #[cfg(test)] modules in Rust
Documentation is required for all public APIs in Rust

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
apps/framework-cli/**/*.rs

📄 CodeRabbit inference engine (apps/framework-cli/AGENTS.md)

apps/framework-cli/**/*.rs: Always run cargo clippy --all-targets -- -D warnings before commits; fix all warnings - no Clippy warnings may remain (treat warnings as errors)
Use rustfmt --edition 2021 for consistent formatting
Write meaningful names for functions, variables, and types
Document all public APIs and breaking changes
Use thiserror crate instead of anyhow::Result for error handling
Define errors near their unit of fallibility (no global Error types)
Use #[derive(thiserror::Error)] with #[error()] messages for error definitions
Define newtypes as tuple structs: struct UserId(u64);
Add validation constructors for newtypes: UserId::new(id: u64) -> Result<Self, Error>
Derive standard traits for newtypes: #[derive(Debug, Clone, PartialEq)]
Implement From/TryFrom for newtype conversions
Use derive_more or nutype to reduce newtype boilerplate
Use const for static values (prefer over static)
Use UPPER_SNAKE_CASE naming for constants
Scope constant visibility with preference: pub(crate) > pub(super) > pub
Group related constants together
Write unit tests for all public functions
Test error conditions and edge cases

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🧠 Learnings (4)
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Document all public APIs and breaking changes

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use tuple structs with validation constructors as newtypes in Rust (e.g., `struct UserId(String)`)

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Define newtypes as tuple structs: `struct UserId(u64);`

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test E2E Python Tests Template (Python 3.12)
  • GitHub Check: Test E2E TypeScript Tests Template (Node 22, npm)
  • GitHub Check: Test E2E TypeScript Tests Template (Node 24, npm)
  • GitHub Check: Test E2E Python Tests Template (Python 3.13)
  • GitHub Check: Test E2E TypeScript Tests Template (Node 20, npm)
🔇 Additional comments (6)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (6)

7-7: Nice addition of the regex import.

You're bringing in the regex crate to handle the type-casing transformation. This makes sense for the pattern-matching approach you're taking. The import is clean and straightforward.


148-180: The iterative regex approach is sound for handling nested types.

The idea of applying the transformation repeatedly until it stabilizes is exactly right for nested structures like Nullable(Array(String)). First pass converts the outermost wrapper, next pass goes deeper, and so on. Very neat!

The HashMap lookup at lines 164-167 with the unwrap_or fallback is defensive programming - if somehow a type slips through that's not in your map, you don't corrupt the SQL, you just leave it alone.

One thing to think about: for deeply nested types or very long SQL strings, this loop could do a lot of work. Each iteration scans the entire string with the regex. In practice, CREATE VIEW and INSERT SELECT statements are probably not huge, so this should be fine. But if you ever see this showing up in profiling, you could consider a single-pass approach with a more sophisticated state machine.

For now, this is perfectly reasonable. The code is clear and correct.


182-196: Public API is well-designed and properly documented.

You've created a clean abstraction here - take a Query AST, format it to SQL, fix the type casing, done. The function signature is straightforward, the documentation explains what it does and why, and you even included a usage example in the doc comment.

Following the coding guidelines to the letter - public APIs must be documented, and you've done that nicely. The function name query_to_sql_string is descriptive in good Rust style.

As per coding guidelines, public APIs must be documented.


848-850: Clean integration of the type-casing fix.

You're applying the transformation at exactly the right place - after sqlparser has parsed the query into an AST, but when you're converting it back to a string. The comment explains why you're doing this, which is helpful for future maintainers.

The change is minimal and doesn't disturb the surrounding logic. Beautiful.


889-890: Consistent application in INSERT SELECT parsing.

Same pattern here as in the materialized view parsing - format the query with proper type casing. The symmetry is good for maintainability.


2345-2505: Comprehensive test coverage for the main use cases, but missing edge case tests.

The test suite here is really quite thorough! You're checking:

  • Basic type conversions in CAST expressions ✓
  • Nested types like Nullable(String)
  • Deeply nested types ✓
  • Lowercase identifiers stay lowercase ✓
  • Simple uppercase column names stay uppercase ✓
  • Integration with materialized view parsing ✓
  • Integration with INSERT SELECT parsing ✓

All of this is excellent. You're verifying the functionality that matters most - CAST expressions - works correctly.

However, there's a gap: you test SELECT DATE FROM events (line 2402) where DATE is a column name, and that correctly stays uppercase. But you don't test SELECT a, DATE, b FROM events where DATE appears after a comma. Based on the regex pattern, this would incorrectly convert DATE to Date because of the ,\s* prefix matcher.

I'd suggest adding a test that either:

  1. Documents this limitation explicitly as a known issue
  2. Demonstrates that it's not a problem in practice (maybe you can show it doesn't occur in typical sqlparser output?)

The tests you have are solid for the intended use cases. Just making sure the edge cases are understood and either fixed or documented.

Based on learnings, write unit tests for all public functions and test edge cases.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c5ae3d5 and bede602.

📒 Files selected for processing (1)
  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run cargo clippy to ensure Rust code passes Clippy's linting standards before each commit

**/*.rs: Use thiserror with #[derive(thiserror::Error)] for error handling in Rust; define errors near the fallibility unit with NO global Error type
NEVER use anyhow::Result in Rust; define custom error types instead
Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Place constants in constants.rs at the appropriate module level in Rust
Use tuple structs with validation constructors as newtypes in Rust (e.g., struct UserId(String))
Write inline tests with #[cfg(test)] modules in Rust
Documentation is required for all public APIs in Rust

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
apps/framework-cli/**/*.rs

📄 CodeRabbit inference engine (apps/framework-cli/AGENTS.md)

apps/framework-cli/**/*.rs: Always run cargo clippy --all-targets -- -D warnings before commits; fix all warnings - no Clippy warnings may remain (treat warnings as errors)
Use rustfmt --edition 2021 for consistent formatting
Write meaningful names for functions, variables, and types
Document all public APIs and breaking changes
Use thiserror crate instead of anyhow::Result for error handling
Define errors near their unit of fallibility (no global Error types)
Use #[derive(thiserror::Error)] with #[error()] messages for error definitions
Define newtypes as tuple structs: struct UserId(u64);
Add validation constructors for newtypes: UserId::new(id: u64) -> Result<Self, Error>
Derive standard traits for newtypes: #[derive(Debug, Clone, PartialEq)]
Implement From/TryFrom for newtype conversions
Use derive_more or nutype to reduce newtype boilerplate
Use const for static values (prefer over static)
Use UPPER_SNAKE_CASE naming for constants
Scope constant visibility with preference: pub(crate) > pub(super) > pub
Group related constants together
Write unit tests for all public functions
Test error conditions and edge cases

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🧠 Learnings (4)
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Document all public APIs and breaking changes

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use tuple structs with validation constructors as newtypes in Rust (e.g., `struct UserId(String)`)

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Define newtypes as tuple structs: `struct UserId(u64);`

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Test E2E Python Tests Template (Python 3.12)
  • GitHub Check: Test E2E TypeScript Tests Template (Node 24, npm)
  • GitHub Check: Test E2E Python Tests Template (Python 3.13)
  • GitHub Check: Test E2E TypeScript Tests Template (Node 22, npm)
  • GitHub Check: Test E2E TypeScript Tests Template (Node 20, npm)
🔇 Additional comments (8)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (8)

18-81: LGTM! Well-organized type mapping.

You know, this is exactly how you solve a problem elegantly - just list out all the cases! The HashMap gives you O(1) lookup, which is the right choice here. The documentation is clear about why this exists (sqlparser's uppercase formatting vs ClickHouse's PascalCase requirement), and I appreciate the note about submitting an upstream fix. That's thinking about the whole system, not just your little corner of it!


83-105: Looks good - clear enumeration of type wrappers.

See, this is what I call being explicit! You've got both the uppercase and PascalCase forms because during the iterative conversion, you might encounter either one. Some folks might say "derive this from the main list," but nah - being explicit here makes it obvious what's happening. Physics students make the same mistake, trying to be too clever when just writing out the cases works better!


158-190: The iterative approach is elegant!

Now this is the kind of thing I love to see! Instead of trying to write one monster regex that handles arbitrary nesting (which, let me tell you, is a fool's errand with regex), you just keep applying the transformation until nothing changes. Each pass peels off one layer of the onion. Simple, correct, and you can actually understand what's happening!

The termination condition is sound — when new_result == result, you're done. No infinite loops here.


192-206: Well-documented public API!

Beautiful! You've got a doc comment explaining what it does, why it exists (sqlparser uppercases, ClickHouse wants PascalCase), and an example showing the transformation. This is exactly how you document a public API. Someone reading this in six months will know immediately what this function is for without having to read the implementation. As per coding guidelines, public APIs need documentation, and you've delivered!


858-859: LGTM! Clean integration.

Simple substitution — the old format!("{}", query) becomes query_to_sql_string(query). The comment explains why. This is how you integrate a fix without making a mess of the existing code!


899-900: Consistent with the materialized view change!

Same fix, same pattern. When you find a bug in one place, you look for it everywhere — and you did! Both parse_create_materialized_view and parse_insert_select now use the same query_to_sql_string function. That's consistency, and consistency is how you avoid the "fixed it here but not there" bug.


2355-2466: Comprehensive test suite - well done!

Now this is how you test! You've got:

  • Basic type conversions (STRINGString, UINT64UInt64)
  • Nested types (NULLABLE(STRING)Nullable(String))
  • Complex expressions with multiple casts
  • Critical: Tests that identifiers are NOT converted (SELECT DATE FROM events stays unchanged)
  • Critical: Comma-separated columns preserved (SELECT a, DATE, b FROM t)

The test at lines 2449-2453 is particularly nice — it shows that the column DATE is preserved while the type STRING after AS is converted. That's the whole point of context-aware matching!

The only thing missing is the nested-parentheses-before-comma case I mentioned earlier, but that's a test for a bug, not a missing test for working functionality.


2468-2526: Round-trip tests verify the full integration!

These are the tests that actually matter for users — does parsing a materialized view or insert-select statement preserve the PascalCase types? You parse, you format, you check. The String, UInt64, DateTime, Int64 types all survive the round-trip.

This is what I call testing the behavior, not just the implementation. Good stuff!

@bito-code-review
Copy link

bito-code-review bot commented Dec 18, 2025

Code Review Agent Run #106131

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: db89b40..bede602
    • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at dave@fiveonefour.com.

Documentation & Help

AI Code Review powered by Bito Logo

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (1)

107-147: Regex pattern breaks on nested type wrappers—needs recursive or stateful approach.

Looking at this regex pattern, I can see a pretty fundamental problem. The [^)]* in line 143 means "match any character except a close-paren"—but here's the thing: when you've got nested types like Map(Nullable(String), UINT64), that pattern stops dead at the first ) it sees. So it matches Map(Nullable(String and completely misses that comma and the UINT64 that follows.

This is exactly what one of the past review comments flagged, and it's still sitting there in the code.

Now, regular expressions aren't really built for counting matched parentheses—that's inherently recursive, and regexes are fundamentally non-recursive (well, unless you use fancy PCRE recursion features, but those get messy fast). You've got a few options here:

  1. State machine approach: Walk through the string character-by-character, counting paren depth, and only match types when you're at the right depth. This is what your extract_engine_from_create_table function already does around lines 395-414.

  2. Multi-pass with balanced replacement: Apply the regex repeatedly but track paren balance to ensure you're only replacing at depth 0, then depth 1, etc.

  3. AST-aware approach: Since you're already parsing with sqlparser, maybe intercept during AST traversal instead of post-processing the string.

The iterative loop (lines 166-187) helps with nesting depth (innermost to outermost), but it doesn't fix the problem that [^)]* stops at the wrong closing paren when there are multiple args with nested types.

🔎 Example demonstrating the bug
// Current behavior:
let input = "SELECT CAST(x AS Map(Nullable(String), UINT64))";
let result = fix_clickhouse_type_casing(input);
// Bug: Won't convert UINT64 because [^)]* stops at ) after String
// The pattern never sees the comma that precedes UINT64

// Expected:
// "SELECT CAST(x AS Map(Nullable(String), UInt64))"

To fix this properly, you'd want something like the paren-counting logic already used elsewhere in this file.

Based on past review comments and coding guidelines.

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c5ae3d5 and bede602.

📒 Files selected for processing (1)
  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (5 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
**/*.rs

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Run cargo clippy to ensure Rust code passes Clippy's linting standards before each commit

**/*.rs: Use thiserror with #[derive(thiserror::Error)] for error handling in Rust; define errors near the fallibility unit with NO global Error type
NEVER use anyhow::Result in Rust; define custom error types instead
Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust
Place constants in constants.rs at the appropriate module level in Rust
Use tuple structs with validation constructors as newtypes in Rust (e.g., struct UserId(String))
Write inline tests with #[cfg(test)] modules in Rust
Documentation is required for all public APIs in Rust

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
apps/framework-cli/**/*.rs

📄 CodeRabbit inference engine (apps/framework-cli/AGENTS.md)

apps/framework-cli/**/*.rs: Always run cargo clippy --all-targets -- -D warnings before commits; fix all warnings - no Clippy warnings may remain (treat warnings as errors)
Use rustfmt --edition 2021 for consistent formatting
Write meaningful names for functions, variables, and types
Document all public APIs and breaking changes
Use thiserror crate instead of anyhow::Result for error handling
Define errors near their unit of fallibility (no global Error types)
Use #[derive(thiserror::Error)] with #[error()] messages for error definitions
Define newtypes as tuple structs: struct UserId(u64);
Add validation constructors for newtypes: UserId::new(id: u64) -> Result<Self, Error>
Derive standard traits for newtypes: #[derive(Debug, Clone, PartialEq)]
Implement From/TryFrom for newtype conversions
Use derive_more or nutype to reduce newtype boilerplate
Use const for static values (prefer over static)
Use UPPER_SNAKE_CASE naming for constants
Scope constant visibility with preference: pub(crate) > pub(super) > pub
Group related constants together
Write unit tests for all public functions
Test error conditions and edge cases

Files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🧠 Learnings (4)
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use snake_case for functions and variables, PascalCase for types and traits, SCREAMING_SNAKE_CASE for constants in Rust

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Document all public APIs and breaking changes

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:10.897Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: apps/framework-cli/AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:10.897Z
Learning: Applies to apps/framework-cli/**/*.rs : Define newtypes as tuple structs: `struct UserId(u64);`

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
📚 Learning: 2025-12-16T23:09:00.538Z
Learnt from: CR
Repo: 514-labs/moosestack PR: 0
File: AGENTS.md:0-0
Timestamp: 2025-12-16T23:09:00.538Z
Learning: Applies to **/*.rs : Use tuple structs with validation constructors as newtypes in Rust (e.g., `struct UserId(String)`)

Applied to files:

  • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
🔇 Additional comments (1)
apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs (1)

2409-2446: Nice work protecting column identifiers from conversion.

I really like what you've done here with these tests. The past reviews flagged a critical concern: what if someone has a column actually named DATE or UUID? Your regex could incorrectly convert it to Date or Uuid, breaking the query since ClickHouse identifiers are case-sensitive.

These tests (especially lines 2437-2441 with the comma-separated SELECT list) directly address that concern and demonstrate the pattern correctly distinguishes between type contexts (CAST(x AS DATE)) and identifier contexts (SELECT DATE FROM t).

That's exactly the kind of test that gives confidence the feature won't break legitimate queries. Good defensive testing!

Based on past review comments requesting these specific tests.

let pattern = format!(
r"(AS\s+|(?:{})\(|\),\s*|\(\w+,\s*|(?:{}),\s*)({})([(,)])",
wrappers_pattern, pascal_types_pattern, types_pattern
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Regex over-converts type-named identifiers in function arguments

The regex pattern \(\w+,\s* intended to match type arguments in constructs like AggregateFunction(sum, INT64) will also incorrectly match regular function calls. For example, argMax(value, DATE) where DATE is a column name would be converted to argMax(value, Date). The pattern cannot distinguish between type wrapper contexts (where conversion is needed) and regular function calls (where identifiers matching type names should remain unchanged). While ClickHouse identifier case-insensitivity may prevent runtime errors, the output SQL is semantically incorrect.

Fix in Cursor Fix in Web

let pattern = format!(
r"(AS\s+|(?:{})\(|\),\s*|\(\w+,\s*|(?:{}),\s*)({})([(,)])",
wrappers_pattern, pascal_types_pattern, types_pattern
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Regex over-converts column names after function calls in SELECT

The \),\s* pattern meant for type expressions like Map(Nullable(String), INT64) will also match columns in SELECT lists after function calls. For example, SELECT func(a), DATE, b FROM t where DATE is a column would match because ), precedes DATE and , follows it. The suffix check only prevents matches when the column is last (followed by space), but not when additional columns follow (separated by comma). This causes incorrect conversion of column names to PascalCase type names when they appear after function calls and before other columns.

Fix in Cursor Fix in Web

@bito-code-review
Copy link

bito-code-review bot commented Dec 18, 2025

Code Review Agent Run #9257a1

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: bede602..0b3405f
    • apps/framework-cli/src/infrastructure/olap/clickhouse/sql_parser.rs
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Default Agent You can customize the agent settings here or contact your Bito workspace admin at dave@fiveonefour.com.

Documentation & Help

AI Code Review powered by Bito Logo

@callicles
Copy link
Collaborator Author

apache/datafusion-sqlparser-rs#2154

Made a real fix with the upstream library

@callicles callicles closed this Jan 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants