Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
103 changes: 103 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
name: Tests

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
detect-changes:
runs-on: ubuntu-latest
outputs:
matrix: ${{ steps.set-matrix.outputs.matrix }}
any_changed: ${{ steps.set-matrix.outputs.any_changed }}
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Get changed files
id: changed-files
run: |
if [ "${{ github.event_name }}" = "pull_request" ]; then
echo "changed_files<<EOF" >> $GITHUB_OUTPUT
git diff --name-only origin/${{ github.base_ref }}..HEAD >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
else
echo "changed_files<<EOF" >> $GITHUB_OUTPUT
git diff --name-only HEAD~1..HEAD >> $GITHUB_OUTPUT
echo "EOF" >> $GITHUB_OUTPUT
fi

- name: Set matrix for changed parsers
id: set-matrix
run: |
# List of all available parsers
ALL_PARSERS="redshift"
# Add more parsers here as they are added to the repository
# ALL_PARSERS="redshift mysql postgresql"

CHANGED_FILES="${{ steps.changed-files.outputs.changed_files }}"
CHANGED_PARSERS=""

for parser in $ALL_PARSERS; do
if echo "$CHANGED_FILES" | grep -q "^$parser/"; then
if [ -z "$CHANGED_PARSERS" ]; then
CHANGED_PARSERS="\"$parser\""
else
CHANGED_PARSERS="$CHANGED_PARSERS,\"$parser\""
fi
fi
done

if [ -n "$CHANGED_PARSERS" ]; then
echo "matrix={\"parser\":[$CHANGED_PARSERS]}" >> $GITHUB_OUTPUT
echo "any_changed=true" >> $GITHUB_OUTPUT
echo "Changed parsers: $CHANGED_PARSERS"
else
echo "matrix={\"parser\":[]}" >> $GITHUB_OUTPUT
echo "any_changed=false" >> $GITHUB_OUTPUT
echo "No parser changes detected"
fi

go-mod-tidy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- uses: actions/setup-go@v5
with:
go-version-file: go.mod
cache-dependency-path: go.sum

- name: Verify go mod tidy
run: |
go mod tidy
git diff --exit-code -- go.mod go.sum

go-tests:
needs: detect-changes
if: needs.detect-changes.outputs.any_changed == 'true'
runs-on: ubuntu-latest
strategy:
matrix: ${{ fromJSON(needs.detect-changes.outputs.matrix) }}
steps:
- uses: actions/checkout@v4

- uses: actions/setup-go@v5
with:
go-version-file: go.mod
cache-dependency-path: go.sum

- name: Run all tests
working-directory: ${{ matrix.parser }}
run: go test -p=8 -timeout 30m -ldflags "-w -s" -v ./... | tee test.log; exit ${PIPESTATUS[0]}
- name: Pretty print tests running time
working-directory: ${{ matrix.parser }}
# grep: filter out lines like "--- PASS: Test (15.04s)"
# sed: remove unnecessary characters
# awk: re-format lines to "PASS: Test (15.04s)"
# sort: cut into columns by delimiter ' ' (single space) and sort by column 3 (test time in seconds) as numeric type in reverse order (largest comes first)
# awk: accumulate sum by test time in seconds
run: grep --color=never -e '--- PASS:' -e '--- FAIL:' test.log | sed 's/[:()]//g' | awk '{print $2,$3,$4}' | sort -t' ' -nk3 -r | awk '{sum += $3; print $1,$2,$3,sum"s"}'
8 changes: 6 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
*.dll
*.so
*.dylib
**/.DS_Store

# Test binary, built with `go test -c`
*.test
Expand All @@ -28,5 +29,8 @@ go.work.sum
.env

# Editor/IDE
# .idea/
# .vscode/
.idea/

# Plguin
# Intellij ANTLR plugin
**/gen/
17 changes: 17 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
module github.com/bytebase/parser

go 1.24.5

require (
github.com/antlr4-go/antlr/v4 v4.13.1
github.com/stretchr/testify v1.10.0
)

require (
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
golang.org/x/exp v0.0.0-20240506185415-9bf2ced13842 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
)

replace github.com/antlr4-go/antlr/v4 => github.com/bytebase/antlr/v4 v4.0.0-20240827034948-8c385f108920
14 changes: 14 additions & 0 deletions go.sum
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
github.com/bytebase/antlr/v4 v4.0.0-20240827034948-8c385f108920 h1:IfmPt5o5R70NKtOrs+QHOoCgViYZelZysGxVBvV4ybA=
github.com/bytebase/antlr/v4 v4.0.0-20240827034948-8c385f108920/go.mod h1:ykhjIPiv0IWpu3OGXCHdz2eUSe8UNGGD6baqjs8jSuU=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/stretchr/testify v1.10.0 h1:Xv5erBjTwe/5IxqUQTdXv5kgmIvbHo3QQyRwhJsOfJA=
github.com/stretchr/testify v1.10.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY=
golang.org/x/exp v0.0.0-20240506185415-9bf2ced13842 h1:vr/HnozRka3pE4EsMEg1lgkXJkTFJCVUX+S/ZT6wYzM=
golang.org/x/exp v0.0.0-20240506185415-9bf2ced13842/go.mod h1:XtvwrStGgqGPLc4cjQfWqZHG1YFdYs6swckp8vpsjnc=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
132 changes: 132 additions & 0 deletions redshift/CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Redshift Parser Development Guide

## Project Overview

This repository is a Redshift parser built with ANTLR 4, forked from github.com/bytebase/postgresql-parser. Due to incompatibility issues with PostgreSQL, this separate repository was created to support Amazon Redshift-specific syntax and features.

## Architecture

### Core Components

1. **ANTLR Grammar Files**:
- `RedshiftLexer.g4` - Tokenization rules for Redshift SQL
- `RedshiftParser.g4` - Parser grammar with 200+ statement types
- Generated Go files: `redshift_parser.go`, `redshift_lexer.go`, etc.

2. **Base Classes**:
- `redshift_parser_base.go` - Engine-aware parser with PostgreSQL/Redshift support
- `redshift_lexer_base.go` - Base lexer implementation
- `string_stack.go` - Utility for string stack operations

3. **Supporting Files**:
- `keywords.go` - 600+ PostgreSQL keywords with reserved status
- `builtin_function.go` - Built-in function definitions
- `build.sh` - ANTLR code generation script

### Engine Support

The parser supports multiple database engines:
- `EnginePostgreSQL` - Standard PostgreSQL syntax
- `EngineRedshift` - Amazon Redshift-specific syntax extensions

## Development Guidelines

### Code Conventions

1. **Follow existing patterns**: Always examine existing code before making changes
2. **Token/Rule/Name Convention**: Maintain consistency with current ANTLR grammar naming
3. **Engine-specific features**: Use engine detection for Redshift-specific syntax
4. **Error handling**: Implement proper error listeners and recovery mechanisms

### Testing Requirements

**CRITICAL**: Every change must include a related test case.

1. **Test Structure**:
- Add SQL test files to the `examples/` directory
- Use Go-based tests in `parser_test.go` and `engine_specific_test.go`
- Tests automatically parse all SQL files in `examples/`

2. **Test Content Sources**:
- Reference https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_commands.html
- Crawl syntax examples from AWS Redshift documentation
- Use real-world SQL examples when possible

3. **Test Categories**:
- DDL: CREATE, ALTER, DROP statements
- DML: SELECT, INSERT, UPDATE, DELETE
- Redshift-specific: IDENTITY columns, DISTKEY, SORTKEY, etc.
- Advanced: Window functions, CTEs, JSON operations

### Adding New Features

1. **Grammar Changes**:
```bash
# Edit RedshiftLexer.g4 or RedshiftParser.g4
# Run build script to regenerate Go code
make build
```

2. **Engine-Specific Logic**:
- Use `GetEngine()` method to detect Redshift vs PostgreSQL
- Implement conditional parsing for dialect-specific features
- See `engine_specific_test.go` for examples

3. **Testing Process**:
- Create SQL test files in `examples/`
- Run tests: `go test -v`
- Verify both parsing success and error handling

### Common Tasks

#### Adding Redshift-Specific Syntax

1. Identify the syntax difference from PostgreSQL
2. Update the appropriate grammar file (lexer or parser)
3. Add engine-specific logic if needed
4. Create test cases with AWS documentation examples
5. Verify tests pass for both engines

#### Adding New Keywords

1. Add to `keywords.go` with appropriate reserved status
2. Update lexer grammar if needed
3. Test keyword recognition in various contexts

#### Adding Built-in Functions

1. Add to `builtin_function.go` in appropriate category
2. Update parser rules if function has special syntax
3. Test function parsing and recognition

## Build and Test Commands

**IMPORTANT**: Always run `./build.sh` before running tests to generate the latest Go code from ANTLR grammars.

```bash
# Generate parser code from ANTLR grammars (REQUIRED before testing)
make build

# Run all tests
go test -v

# Run specific test
go test -run TestParser -v

# Run benchmarks
go test -bench=. -v
```

## References

- [AWS Redshift SQL Commands](https://docs.aws.amazon.com/redshift/latest/dg/c_SQL_commands.html)
- [ANTLR 4 Documentation](https://github.com/antlr/antlr4/blob/master/doc/index.md)
- [PostgreSQL Grammar Reference](https://github.com/tunnelvisionlabs/antlr4-postgresql)

## Contributing

1. Always add test cases for new features
2. Follow existing code patterns and conventions
3. Test against both PostgreSQL and Redshift engines
4. Use AWS documentation for accurate syntax examples
5. Ensure all tests pass before submitting changes
6 changes: 6 additions & 0 deletions redshift/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
all: build test

build:
antlr -Dlanguage=Go -package redshift -visitor -o . RedshiftLexer.g4 RedshiftParser.g4

test: go test -v -run TestRedshiftParser
Loading