feat: implement ANTLR grammar-aware fuzzing library for parser testing #17

h3n4l · 2025-08-29T06:33:09Z

Grammar-Aware Fuzzing Library

This PR introduces a comprehensive fuzzing library that generates valid SQL inputs from ANTLR v4 grammar files for parser testing.

🚀 Features

Core Implementation

Grammar IR Storage: Parses ANTLR v4 .g4 files into intermediate representation
Random Generation: Configurable random generation with support for alternatives, quantifiers, and optional elements
Multi-Grammar Support: Can merge separate lexer and parser grammar files
Recursion Control: Depth-based termination to prevent infinite loops
Token Generation: Concrete token generation for lexer rules with character sets, ranges, and negated sets

Configuration Options

MaxDepth: Maximum recursion depth (default: 5)
OptionalProb: Probability of including optional elements (0.0-1.0)
MaxQuantifier/MinQuantifier: Control for * and + quantifiers
Seed: Reproducible random generation
OutputFormat: Compact or verbose output with rule traversal

Testing Infrastructure

PostgreSQL grammar integration tests
Multiple test scenarios (simple, deep, minimal)
Benchmark tests for performance measurement
Verbose output for debugging rule traversal

🔧 Usage Example

cfg := &config.Config{
    GrammarFiles: []string{"postgresql/PostgreSQLLexer.g4", "postgresql/PostgreSQLParser.g4"},
    StartRule:    "selectstmt",
    Count:        10,
    MaxDepth:     5,
    OptionalProb: 0.7,
    Seed:         42,
}

gen := generator.New(cfg)
err := gen.Generate() // Generates 10 SELECT statements

🧪 Test Results

All tests pass successfully:
=== RUN   TestPostgreSQLSelectStmt
=== RUN   TestPostgreSQLExpressions
=== RUN   TestPostgreSQLVerboseOutput
PASS
ok      github.com/bytebase/parser/tools/fuzzing/tests    0.833s

⚠️ Current Limitations

1. Aggressive Depth Limiting

The current implementation uses a simple depth-based termination that generates placeholders like <rule_MAX_DEPTH> even when not in actual recursion:

Example Output:
Query 1: ( <select_clause_MAX_DEPTH> <for_locking_clause_MAX_DEPTH> )

Issues:
- Reaches max depth through sequential rule expansion, not recursion
- Doesn't distinguish between recursive and non-recursive rule references
- May generate less realistic output at depth boundaries

2. Basic Terminal Selection

- No attempt to find non-recursive alternatives before using placeholders
- Could implement smarter terminal forcing for better output quality

3. Limited Character Set Support

- Basic support for [a-z], ~[...], 'a'..'z' patterns
- Could expand lexer pattern support

🔄 Future Enhancements

1. Smart Recursion Detection: Distinguish actual recursion from sequential expansion
2. Depth-Biased Selection: Prefer non-recursive alternatives at higher depths
3. Terminal Forcing: Try non-recursive alternatives before placeholders
4. More Grammar Support: Extend beyond PostgreSQL to other SQL dialects
5. Advanced Lexer Patterns: Expand character class and regex support

🎯 Integration

- Uses existing ANTLR v4 parser at tools/grammar/
- Compatible with all parser implementations in the repository
- Provides programmatic access without requiring CLI tools
- Ready for CI/CD integration and automated testing

This foundational implementation provides a solid base for grammar-aware fuzzing while clearly identifying areas for future improvement.

This PR description:
- ✅ Highlights the key features and capabilities
- ⚠️ Honestly documents the current limitations (aggressive depth limiting)
- 🔄 Outlines clear future enhancement opportunities
- 📊 Shows test results and usage examples
- 🏗️ Explains the architecture and integration points

h3n4l added 8 commits August 28, 2025 10:29

feat: initialize fuzz

eb1a8df

fix: remove range

32bae30

feat: parse grammar IR

a3c23e6

feat: lexer parser v1

56efd14

feat: generator for lexer rules

192f566

chore: merge grammars

5692d21

chore: remove list grammar options

5659f1c

v1

cf77e43

h3n4l changed the title ~~feat: fuzzing v1~~ feat: implement ANTLR grammar-aware fuzzing library for parser testing Aug 29, 2025

chore: go mod tidy

029ad0c

h3n4l merged commit 80e1ea5 into main Aug 29, 2025
5 checks passed

h3n4l deleted the fuzzing branch August 29, 2025 06:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implement ANTLR grammar-aware fuzzing library for parser testing #17

feat: implement ANTLR grammar-aware fuzzing library for parser testing #17

Uh oh!

h3n4l commented Aug 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: implement ANTLR grammar-aware fuzzing library for parser testing #17

feat: implement ANTLR grammar-aware fuzzing library for parser testing #17

Uh oh!

Conversation

h3n4l commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Grammar-Aware Fuzzing Library

🚀 Features

Core Implementation

Configuration Options

Testing Infrastructure

🔧 Usage Example

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

h3n4l commented Aug 29, 2025 •

edited

Loading