|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +This is a ClickHouse SQL parser written in Go that parses ClickHouse SQL into AST (Abstract Syntax Tree) and provides SQL formatting capabilities. The project is inspired by memefish and is designed to work both as a Go library and a CLI tool. |
| 8 | + |
| 9 | +## Build and Development Commands |
| 10 | + |
| 11 | +### Build the CLI tool |
| 12 | +```bash |
| 13 | +make |
| 14 | +# or |
| 15 | +go build -o clickhouse-sql-parser main.go |
| 16 | +``` |
| 17 | + |
| 18 | +### Run tests |
| 19 | +```bash |
| 20 | +make test |
| 21 | +# Runs tests with coverage, race detection, and compatible flag |
| 22 | +``` |
| 23 | + |
| 24 | +### Run compatible tests (for ClickHouse compatibility) |
| 25 | +```bash |
| 26 | +make test -compatible |
| 27 | +# Tests against real ClickHouse SQL files from testdata/query/compatible/ |
| 28 | +``` |
| 29 | + |
| 30 | +### Update test golden files |
| 31 | +```bash |
| 32 | +make update_test |
| 33 | +# Updates expected output files in testdata/*/output/ directories |
| 34 | +``` |
| 35 | + |
| 36 | +### Run linting |
| 37 | +```bash |
| 38 | +make lint |
| 39 | +# Uses golangci-lint with 20 minute timeout |
| 40 | +``` |
| 41 | + |
| 42 | +### Run benchmarks |
| 43 | +```bash |
| 44 | +go test -bench=. -benchmem ./parser |
| 45 | +``` |
| 46 | + |
| 47 | +## Architecture Overview |
| 48 | + |
| 49 | +### Core Components |
| 50 | + |
| 51 | +**Lexer (`parser/lexer.go`)** |
| 52 | +- Tokenizes ClickHouse SQL input into tokens |
| 53 | +- Handles keywords, identifiers, operators, literals, and comments |
| 54 | +- Supports various token types including strings, numbers, and punctuation |
| 55 | + |
| 56 | +**Parser (`parser/parser_*.go`)** |
| 57 | +- Modular parser split across multiple files by functionality: |
| 58 | + - `parser_common.go` - Core parser logic and utilities |
| 59 | + - `parser_query.go` - SELECT statements and query parsing |
| 60 | + - `parser_table.go` - CREATE TABLE and table-related DDL |
| 61 | + - `parser_alter.go` - ALTER statements |
| 62 | + - `parser_drop.go` - DROP statements |
| 63 | + - `parser_view.go` - View-related statements |
| 64 | + - `parser_column.go` - Column definitions and operations |
| 65 | + |
| 66 | +**AST (`parser/ast.go`)** |
| 67 | +- Defines all AST node types implementing the `Expr` interface |
| 68 | +- Each node provides `Pos()`, `End()`, `String()`, and `Accept()` methods |
| 69 | +- Supports visitor pattern for AST traversal |
| 70 | + |
| 71 | +**AST Traversal** |
| 72 | +- **Walk Pattern** (`parser/walk.go`) - Recommended approach for AST traversal |
| 73 | + - `Walk(node, fn)` - Depth-first traversal |
| 74 | + - `Find(root, predicate)` - Find first matching node |
| 75 | + - `FindAll(root, predicate)` - Find all matching nodes |
| 76 | + - `Transform(root, transformer)` - Apply transformations |
| 77 | +- **Visitor Pattern** (`parser/ast_visitor.go`) - More complex but powerful traversal |
| 78 | + |
| 79 | +**Main Entry Point (`main.go`)** |
| 80 | +- CLI tool supporting parsing to AST JSON or formatting SQL |
| 81 | +- Accepts input from command line arguments or files |
| 82 | + |
| 83 | +### Key Interfaces |
| 84 | + |
| 85 | +- `Expr` - Base interface for all AST nodes |
| 86 | +- `DDL` - Interface for Data Definition Language statements |
| 87 | +- `ASTVisitor` - Visitor pattern interface for AST traversal |
| 88 | +- `WalkFunc` - Function type for Walk pattern traversal |
| 89 | + |
| 90 | +## Testing Strategy |
| 91 | + |
| 92 | +The project uses a comprehensive testing approach: |
| 93 | + |
| 94 | +**Golden File Testing** |
| 95 | +- Test cases in `parser/testdata/` organized by category: |
| 96 | + - `basic/` - Simple test cases |
| 97 | + - `ddl/` - Data Definition Language tests |
| 98 | + - `dml/` - Data Manipulation Language tests |
| 99 | + - `query/` - SELECT and query tests |
| 100 | +- Expected outputs stored in `output/` subdirectories as `.golden.json` files |
| 101 | +- Formatted SQL outputs in `format/` subdirectories |
| 102 | + |
| 103 | +**Compatible Testing** |
| 104 | +- Real ClickHouse SQL files in `testdata/query/compatible/1_stateful/` |
| 105 | +- Run with `-compatible` flag to test against actual ClickHouse queries |
| 106 | + |
| 107 | +**Benchmark Testing** |
| 108 | +- Performance tests in `parser/benchmark_test.go` |
| 109 | +- Tests parsing speed and memory allocation for various query types |
| 110 | + |
| 111 | +## Development Guidelines |
| 112 | + |
| 113 | +**Adding New SQL Features** |
| 114 | +1. Add test cases to appropriate `testdata/` subdirectory |
| 115 | +2. Implement lexer tokens if needed in `lexer.go` |
| 116 | +3. Add AST node types to `ast.go` with all required methods |
| 117 | +4. Implement parsing logic in appropriate `parser_*.go` file |
| 118 | +5. Add visitor methods to `ast_visitor.go` if using visitor pattern |
| 119 | +6. Update Walk functions in `walk.go` for new node types |
| 120 | +7. Run `make update_test` to generate golden files |
| 121 | + |
| 122 | +**Parser Module Organization** |
| 123 | +- Keep parser functions organized by SQL statement type |
| 124 | +- Use consistent naming: `parseXXX()` for parsing functions |
| 125 | +- Implement proper error handling with descriptive messages |
| 126 | +- Follow existing patterns for operator precedence and expression parsing |
| 127 | + |
| 128 | +**AST Node Implementation** |
| 129 | +- All nodes must implement `Pos()`, `End()`, `String()`, and `Accept()` methods |
| 130 | +- String() method should regenerate valid ClickHouse SQL |
| 131 | +- Accept() method must call visitor.Enter()/Leave() and visit all child nodes |
| 132 | + |
| 133 | +** Walking the AST** |
| 134 | + |
| 135 | +- For a new expression type, it should be also added to the `Walk` function in `walk.go`. |
0 commit comments