Skip to content

Commit c30d472

Browse files
authored
feat: remove redshift action in postgreSQl parser (#14)
1 parent 0e7bac9 commit c30d472

File tree

9 files changed

+15603
-16002
lines changed

9 files changed

+15603
-16002
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,3 +43,5 @@ go.work.sum
4343

4444
# node_modules
4545
**/node_modules/
46+
47+
**/*.class

postgresql/PostgreSQLParser.g4

Lines changed: 1 addition & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -672,18 +672,7 @@ typedtableelement
672672
;
673673

674674
columnDef
675-
: { $parser.Engine!=EngineRedshift }? colid typename create_generic_options? colquallist
676-
| { $parser.Engine==EngineRedshift }? colid typename create_generic_options? rs_colattributes? colquallist
677-
;
678-
679-
rs_colattributes
680-
: DEFAULT b_expr
681-
| IDENTITY_P OPEN_PAREN seed=iconst COMMA step=iconst CLOSE_PAREN
682-
| GENERATED BY DEFAULT AS IDENTITY_P OPEN_PAREN seed=iconst COMMA step=iconst CLOSE_PAREN
683-
| ENCODE StringConstant
684-
| DISTKEY
685-
| SORTKEY
686-
| COLLATE (CASE_SENSITIVE | CASE_INSENSITIVE)
675+
: colid typename create_generic_options? colquallist
687676
;
688677

689678
columnOptions

postgresql/postgresql_parser.go

Lines changed: 15370 additions & 15962 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

postgresql/postgresql_parser_base.go

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,15 @@ import (
66
"github.com/antlr4-go/antlr/v4"
77
)
88

9-
type Engine int
10-
11-
const (
12-
EnginePostgreSQL Engine = iota
13-
EngineRedshift
14-
)
15-
169
type PostgreSQLParserBase struct {
1710
*antlr.BaseParser
1811

19-
Engine Engine
2012
parseErrors []*PostgreSQLParseError
2113
}
2214

2315
func NewPostgreSQLParserBase(input antlr.TokenStream) *PostgreSQLParserBase {
2416
return &PostgreSQLParserBase{
2517
BaseParser: antlr.NewBaseParser(input),
26-
Engine: EnginePostgreSQL,
2718
}
2819
}
2920

postgresql/postgresqlparser_base_listener.go

Lines changed: 0 additions & 6 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

postgresql/postgresqlparser_base_visitor.go

Lines changed: 0 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

postgresql/postgresqlparser_listener.go

Lines changed: 0 additions & 6 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

postgresql/postgresqlparser_visitor.go

Lines changed: 0 additions & 3 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

tools/fuzzing/DESIGN.md

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
# Grammar-Aware Fuzzing Tool Design
2+
3+
## Overview
4+
5+
A fuzzing tool that generates valid SQL inputs by analyzing ANTLR v4 grammar files, ensuring comprehensive parser testing with syntactically correct queries that can stress-test parsing performance and correctness.
6+
7+
## Goals
8+
9+
- **Valid Input Generation**: Generate syntactically correct SQL queries based on grammar rules
10+
- **Performance Testing**: Create complex queries to test parser performance limits
11+
- **Coverage Maximization**: Exercise all grammar rules and edge cases
12+
- **Automated Testing**: Integrate with CI for continuous parser validation
13+
14+
## Architecture
15+
16+
```
17+
tools/fuzzing/
18+
├── generator/ # Core generation logic
19+
│ ├── grammar_analyzer.go # Parse ANTLR grammar files
20+
│ ├── rule_expander.go # Expand grammar rules to concrete syntax
21+
│ └── query_builder.go # Build SQL queries from rule expansions
22+
├── strategies/ # Different generation strategies
23+
│ ├── depth_first.go # Generate deeply nested structures
24+
│ ├── breadth_first.go # Generate wide, complex queries
25+
│ └── weighted.go # Probability-based rule selection
26+
├── corpus/ # Generated test cases and seeds
27+
│ ├── seeds/ # Hand-crafted seed inputs
28+
│ └── generated/ # Auto-generated test cases
29+
└── cmd/ # CLI tools
30+
└── fuzzer/ # Main fuzzer executable
31+
```
32+
33+
## Core Components
34+
35+
### 1. Grammar Analyzer
36+
37+
Leverages the existing `tools/grammar/` ANTLR v4 parser to:
38+
- Parse target grammar files (e.g., `postgresql.g4`, `cql.g4`)
39+
- Extract production rules and their alternatives
40+
- Build dependency graph between rules
41+
- Identify terminal vs non-terminal symbols
42+
43+
```go
44+
type GrammarAnalyzer struct {
45+
parser *grammar.ANTLRv4Parser
46+
rules map[string]*Rule
47+
}
48+
49+
type Rule struct {
50+
Name string
51+
Alternatives []Alternative
52+
Type RuleType // LEXER, PARSER, FRAGMENT
53+
}
54+
```
55+
56+
### 2. Rule Expander
57+
58+
Recursively expands grammar rules into concrete syntax trees:
59+
- Handles rule recursion with configurable depth limits
60+
- Supports probability-weighted alternative selection
61+
- Manages lexer rules and literal generation
62+
- Tracks generation context for smart decisions
63+
64+
```go
65+
type RuleExpander struct {
66+
grammar *ParsedGrammar
67+
maxDepth int
68+
weights map[string]float64
69+
random *rand.Rand
70+
}
71+
```
72+
73+
### 3. Query Builder
74+
75+
Converts syntax trees to executable SQL strings:
76+
- Handles whitespace and formatting
77+
- Manages identifier generation (table names, columns)
78+
- Ensures semantic consistency where possible
79+
- Outputs parseable query strings
80+
81+
## Generation Strategies
82+
83+
### Depth-First Strategy
84+
- Generates deeply nested subqueries, expressions
85+
- Tests parser stack limits and recursion handling
86+
- Focuses on structural complexity
87+
88+
### Breadth-First Strategy
89+
- Creates wide queries with many clauses, joins, columns
90+
- Tests parser memory usage and performance
91+
- Focuses on query size and breadth
92+
93+
### Weighted Strategy
94+
- Uses probability weights for rule selection
95+
- Biases toward commonly used constructs
96+
- Configurable via weight files per dialect
97+
98+
## Integration Points
99+
100+
### With Existing Grammar Parser
101+
```go
102+
// Reuse tools/grammar/ for parsing target grammars
103+
analyzer := NewGrammarAnalyzer()
104+
targetGrammar, err := analyzer.ParseGrammarFile("postgresql/PostgreSQLLexer.g4")
105+
```
106+
107+
### With Parser Testing
108+
```go
109+
// Generate test cases for specific parser
110+
fuzzer := NewFuzzer(postgresqlGrammar)
111+
queries := fuzzer.GenerateQueries(1000)
112+
113+
for _, query := range queries {
114+
// Test against postgresql parser
115+
result := postgresqlParser.Parse(query)
116+
// Collect metrics, detect crashes
117+
}
118+
```
119+
120+
## Configuration
121+
122+
### Fuzzer Config
123+
```yaml
124+
target_grammar: "postgresql"
125+
strategies:
126+
- name: "depth_first"
127+
weight: 0.3
128+
max_depth: 15
129+
- name: "breadth_first"
130+
weight: 0.4
131+
max_width: 50
132+
- name: "weighted"
133+
weight: 0.3
134+
weights_file: "postgresql_weights.yaml"
135+
136+
generation:
137+
count: 10000
138+
max_query_length: 100000
139+
seed: 42
140+
141+
output:
142+
format: "sql"
143+
directory: "corpus/generated"
144+
```
145+
146+
### Grammar Weights
147+
```yaml
148+
# postgresql_weights.yaml
149+
rules:
150+
selectStmt: 0.4
151+
insertStmt: 0.2
152+
updateStmt: 0.2
153+
deleteStmt: 0.1
154+
createStmt: 0.1
155+
156+
# Bias toward complex expressions
157+
expr:
158+
binaryOp: 0.4
159+
functionCall: 0.3
160+
subquery: 0.2
161+
literal: 0.1
162+
```
163+
164+
## CLI Interface
165+
166+
```bash
167+
# Generate queries for PostgreSQL
168+
./fuzzer generate --grammar postgresql --count 1000 --strategy weighted
169+
170+
# Run continuous fuzzing with performance metrics
171+
./fuzzer fuzz --grammar cql --duration 1h --metrics
172+
173+
# Validate existing corpus against parser
174+
./fuzzer validate --grammar postgresql --corpus corpus/postgresql/
175+
```
176+
177+
## Performance Metrics
178+
179+
### Generation Metrics
180+
- Queries generated per second
181+
- Grammar rule coverage percentage
182+
- Distribution of query complexity (depth, width)
183+
184+
### Parser Testing Metrics
185+
- Parse success rate
186+
- Average parse time per query
187+
- Memory usage during parsing
188+
- Parser crash/error detection
189+
190+
## Implementation Phases
191+
192+
### Phase 1: Foundation (Week 1-2)
193+
- Basic grammar analyzer using existing ANTLR parser
194+
- Simple rule expander with depth-first strategy
195+
- Command-line interface for manual testing
196+
197+
### Phase 2: Core Features (Week 3-4)
198+
- Multiple generation strategies
199+
- Configuration system
200+
- Basic corpus management
201+
- Integration with existing parser tests
202+
203+
### Phase 3: Advanced Features (Week 5-6)
204+
- Weighted generation with probability tuning
205+
- Performance metrics collection
206+
- CI integration for continuous fuzzing
207+
- Corpus minimization and deduplication
208+
209+
### Phase 4: Optimization (Week 7-8)
210+
- Generation performance optimization
211+
- Advanced semantic awareness
212+
- Custom mutation strategies
213+
- Comprehensive documentation
214+
215+
## Future Enhancements
216+
217+
- **Semantic Awareness**: Generate queries with valid schema references
218+
- **Mutation-Based Fuzzing**: Mutate existing queries to explore edge cases
219+
- **Differential Testing**: Compare parser outputs across database dialects
220+
- **Performance Regression Detection**: Track parser performance over time
221+
- **Grammar Evolution**: Adapt fuzzing as grammars evolve
222+
223+
## Dependencies
224+
225+
- Existing `tools/grammar/` ANTLR v4 parser
226+
- Go standard library (`rand`, `fmt`, `strings`)
227+
- YAML configuration parsing
228+
- CLI framework (e.g., `cobra`)
229+
230+
This design provides a solid foundation for grammar-aware fuzzing while leveraging our existing ANTLR infrastructure.

0 commit comments

Comments
 (0)