This document describes the new parser-based DSL compiler that replaces the regex-based approach.
The new DSL parser provides a robust, maintainable, and extensible way to compile watch scripts into executable rules. It addresses the major limitations of the regex-based compiler while maintaining backward compatibility.
- Tokenization: Input is properly tokenized into meaningful units
- Context-aware parsing: Handles nested structures, quotes, and operators correctly
- Better error detection: Catches syntax errors early with precise location information
- Structured representation: Rules are parsed into a tree structure
- Type safety: Strong typing throughout the parsing process
- Extensibility: Easy to add new language features
- Line and column numbers: Precise error location reporting
- Multiple error collection: Reports all errors found, not just the first one
- Descriptive messages: Clear, actionable error messages
- Single-pass parsing: More efficient than multiple regex passes
- Reduced memory allocation: Less string manipulation and temporary objects
- Cacheable results: Parsed AST can be cached and reused
Input DSL Script
↓
Lexer (Tokenization)
↓
Parser (AST Generation)
↓
AST to Rule Conversion
↓
JSON Output
- Lexer (
dsl-parser.go): Converts input text into tokens - Parser (
dsl-parser.go): Builds AST from tokens using recursive descent - AST Nodes (
dsl-parser.go): Represent different parts of the rule - Compiler (
dsl-parser.go): Converts AST to executable rule JSON
import "github.com/blnkledger/plane/pkg/query-agent/watch"
script := `rule HighValueTransaction {
description "Detect high value transactions"
when amount > 10000 and metadata.kyc_tier == 1
then review
score 0.85
reason "High value transaction from low KYC tier"
}`
ruleName, description, ruleJSON, err := watch.CompileWatchScriptWithParser(script)
if err != nil {
log.Fatalf("Compilation failed: %v", err)
}
fmt.Printf("Compiled rule: %s\n", ruleName)
fmt.Printf("JSON: %s\n", ruleJSON)invalidScript := `rule InvalidRule {
when amount >
then invalid_action
}`
_, _, _, err := watch.CompileWatchScriptWithParser(invalidScript)
if err != nil {
fmt.Printf("Parse errors:\n%v", err)
// Output includes line numbers and specific error descriptions
}rule RuleName {
description "Optional description"
when <conditions>
then <action>
score <number>
reason "Optional reason"
}
amount > 1000
metadata.kyc_tier == 1
status != "pending"
metadata.user.profile.tier == "premium"
transaction.details.amount >= 5000
source == $current.source
amount > $threshold.high
timestamp <= $current.created_at + PT24H
status in ("pending", "processing", "failed")
metadata.mcc in (7995, 5912, 6051)
when amount > 1000 and metadata.type == "transfer" and status != "completed"
then allow
then block
then review
then approve
then deny
then alert
then review
score 0.85
reason "High risk transaction detected"
| Operator | Description | Example |
|---|---|---|
== |
Equal | status == "completed" |
!= |
Not equal | type != "internal" |
> |
Greater than | amount > 1000 |
>= |
Greater than or equal | amount >= 1000 |
< |
Less than | amount < 1000 |
<= |
Less than or equal | amount <= 1000 |
in |
In array | status in ("pending", "failed") |
regex |
Regex match | description regex "(?i)gift.*card" |
not_regex |
Regex not match | description not_regex "test.*" |
- Numbers:
1000,123.45,0.5 - Strings:
"hello world","escaped \"quotes\"" - Booleans:
true,false - Arrays:
("value1", "value2", 123)
- Current transaction:
$current.field_name - Thresholds:
$threshold.high - Custom variables:
$watchlist_accounts
unterminated string at line 2, column 15
invalid character '&' at line 3, column 8
expected rule name at line 1, column 6
missing 'when' clause in rule 'TestRule'
invalid verdict: 'invalid_action' at line 4, column 10
unsupported operator: '~=' at line 2, column 12
invalid field path: '.invalid' at line 3, column 5
- Lexing: O(n) where n is input length
- Parsing: O(n) for most cases, O(n²) worst case for deeply nested expressions
- AST to JSON: O(m) where m is number of AST nodes
- Tokens: ~2x input size during lexing
- AST: ~3-4x input size for parsed representation
- Output: Minimal additional allocation
go test -bench=. ./pkg/query-agent/watchExample results:
BenchmarkLexer-8 100000 12000 ns/op
BenchmarkParser-8 50000 25000 ns/op
BenchmarkCompileWatchScriptWithParser-8 30000 35000 ns/op
The new parser is designed to be a drop-in replacement:
// Old way
ruleName, desc, json, err := CompileWatchScript(script)
// New way
ruleName, desc, json, err := CompileWatchScriptWithParser(script)Most existing scripts should work without modification. Key differences:
- Stricter validation: Some previously accepted invalid syntax will now be rejected
- Better error messages: More specific error reporting
- Consistent behavior: Edge cases are handled more predictably
- Invalid regex patterns: Now properly validated at compile time
- Malformed strings: Unterminated strings are now errors
- Invalid operators: Typos in operators are caught immediately
# All tests
go test ./pkg/query-agent/watch -v
# Specific test categories
go test ./pkg/query-agent/watch -v -run TestLexer
go test ./pkg/query-agent/watch -v -run TestParser
go test ./pkg/query-agent/watch -v -run TestCompileWatchScriptWithParser
# Benchmarks
go test ./pkg/query-agent/watch -bench=.- Lexer tests: Token recognition, operators, strings, numbers
- Parser tests: Rule structure, conditions, actions, error cases
- Integration tests: Full compilation, JSON output validation
- Error tests: All error conditions with line numbers
- Benchmark tests: Performance comparison with old compiler
- Function calls:
sum(amount, "PT24H") > 10000 - Complex aggregates: Time-window based calculations
- Variables and constants: Reusable definitions
- Imports: Include other rule files
- Comments: Inline documentation support
- Macros: Reusable rule fragments
The parser architecture makes it easy to add new features:
- New tokens: Add to
TokenTypeenum and lexer - New expressions: Add AST node types and parser methods
- New operators: Add to operator mapping
- New syntax: Extend parser grammar
- "expected rule name": Missing or invalid rule identifier
- "unterminated string": Missing closing quote
- "invalid verdict": Typo in action name (allow/block/review)
- "unexpected token": Syntax error, check operator spelling
- Check line numbers: Error messages include precise locations
- Validate quotes: Ensure all strings are properly quoted
- Check operators: Use
==not=,!=not<> - Verify structure: Rules must have both
whenandthenclauses
- Run tests:
go test -vto see examples - Check examples: See
example_usage.gofor working scripts - Enable debug logging: Set log level to debug for detailed parsing info
The lexer recognizes these token types:
- Identifiers and keywords
- String and number literals
- Operators and delimiters
- Special characters and EOF
RuleStatement: Complete rule definitionInfixExpression: Binary operations (field op value)FieldPath: Dot-separated field accessVariable: $variable referencesArrayLiteral: Array valuesActionExpression: Then clause with verdict/score/reason
Uses recursive descent parsing with:
- Operator precedence handling
- Error recovery mechanisms
- Look-ahead for disambiguation
- Context-sensitive parsing for different rule sections
This new parser provides a solid foundation for the DSL that can grow with future requirements while maintaining excellent performance and error handling.