feat(experimental): implement crslang TypeConverter for SecLang parser#1500
feat(experimental): implement crslang TypeConverter for SecLang parser#1500
Conversation
…ecLang parser Implement the TypeConverter that bridges crslang's structured types to Coraza's compiled rule types. This enables parsing SecLang via ANTLR4 grammar + crslang listener, then converting to Coraza's internal representation using existing operator/action/transformation registries. Key implementation details: - Convert crslang metadata (id, phase, msg, severity, tags) to Coraza action Init calls since crslang stores these separately from actions - Handle crslang's setvar GetAllParams() format (strip "setvar:" prefix) - Support both value and pointer type assertions for directive types stored by the crslang listener - Process SecMarker from DirectiveList.Marker field (not Directives) - Pin seclang_parser to v0.3.0 for compatibility with crslang v0.1.0 (v0.3.2 renamed Rules_directiveContext breaking crslang)
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1500 +/- ##
==========================================
+ Coverage 86.32% 86.37% +0.05%
==========================================
Files 176 176
Lines 8775 8810 +35
==========================================
+ Hits 7575 7610 +35
Misses 949 949
Partials 251 251
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This pull request implements an experimental ANTLR4-based SecLang parser (parser_v2) that uses the crslang type system to parse SecLang directives into structured types, then converts them to Coraza's internal representation. The implementation bridges the gap between the standardized ANTLR4 grammar (seclang_parser v0.3.0) and Coraza's existing rule engine, providing an alternative to the current line-by-line parser.
Changes:
- Adds experimental ANTLR4-based parser with TypeConverter for crslang → Coraza conversion
- Implements support for core directives (SecRule, SecAction, SecMarker, config directives, default actions, rule modifications)
- Adds 23 tests covering basic parsing scenarios
- Pins dependencies: seclang_parser v0.3.0, crslang v0.1.0, antlr v4.13.1
Reviewed changes
Copilot reviewed 3 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| go.mod | Adds ANTLR4 parser dependencies (antlr, crslang, seclang_parser) and transitive dependencies |
| go.sum | Checksums for new dependencies |
| experimental/seclang/parser_v2.go | Core implementation: ANTLR4 parser integration, crslang type converter, action/variable/operator handling |
| experimental/seclang/parser_v2_test.go | Test suite covering config directives, rules, actions, markers, and error cases |
Comments suppressed due to low confidence (6)
experimental/seclang/parser_v2.go:1068
- The parseActions function doesn't check for unclosed quotes after the parsing loop completes. The original implementation in internal/seclang/rule_parser.go (lines 568-573) logs a warning when quotes are left unclosed. This inconsistency could lead to different parsing behavior between the two parsers. Consider adding a similar check and warning to maintain compatibility.
func parseActions(actions string) ([]ruleAction, error) {
var res []ruleAction
disruptiveActionIndex := -1
beforeKey := -1
afterKey := -1
inQuotes := false
for i := 1; i < len(actions); i++ {
c := actions[i]
if actions[i-1] == '\\' {
continue
}
if c == '\'' {
inQuotes = !inQuotes
continue
}
if inQuotes {
continue
}
switch c {
case ':':
if afterKey != -1 {
continue
}
afterKey = i
case ',':
var val string
if afterKey == -1 {
afterKey = i
} else {
val = actions[afterKey+1 : i]
}
var err error
res, disruptiveActionIndex, err = appendRuleAction(res, actions[beforeKey+1:afterKey], val, disruptiveActionIndex)
if err != nil {
return nil, err
}
beforeKey = i
afterKey = -1
}
}
var val string
if afterKey == -1 {
afterKey = len(actions)
} else {
val = actions[afterKey+1:]
}
var err error
res, _, err = appendRuleAction(res, actions[beforeKey+1:afterKey], val, disruptiveActionIndex)
if err != nil {
return nil, err
}
return res, nil
}
experimental/seclang/parser_v2.go:1009
- The getDefaultActions function creates a new map and re-parses all default action strings on every call, which is inefficient. This function is called once for each rule during conversion. Consider parsing and caching the default actions in ParserState during initialization (similar to how internal/seclang/rule_parser.go stores them in RuleParser.defaultActions), then retrieve the cached parsed actions here.
// getDefaultActions returns the parsed default actions for the given phase
func (c *typeConverter) getDefaultActions(phase types.RulePhase) ([]ruleAction, error) {
defaultActions := make(map[types.RulePhase][]ruleAction)
if c.state.HasRuleDefaultActions {
for _, da := range c.state.RuleDefaultActions {
act, err := parseActions(da)
if err != nil {
return nil, err
}
daPhase := types.RulePhase(0)
for _, a := range act {
if a.Key == "phase" {
daPhase, err = types.ParseRulePhase(a.Value)
if err != nil {
return nil, err
}
}
}
if daPhase != 0 {
defaultActions[daPhase] = act
}
}
}
// If no default actions for phase 2, use hardcoded defaults
if defaultActions[types.PhaseRequestBody] == nil {
act, err := parseActions("phase:2,log,auditlog,pass")
if err != nil {
return nil, err
}
defaultActions[types.PhaseRequestBody] = act
}
return defaultActions[phase], nil
}
experimental/seclang/parser_v2_test.go:371
- The test suite doesn't cover several important features mentioned in the PR description, including chained rules (rule.HasChain, ChainedRule), SecDefaultAction directive (convertDefaultAction), and rule modification directives (SecRuleUpdateTargetById, SecRuleUpdateActionById). Consider adding tests for these scenarios to ensure they work correctly, especially since chained rule handling has complex logic (lines 427-452 in parser_v2.go).
// Copyright 2024 Juan Pablo Tosso and the OWASP Coraza contributors
// SPDX-License-Identifier: Apache-2.0
package seclang
import (
"testing"
"github.com/corazawaf/coraza/v3/internal/corazawaf"
"github.com/corazawaf/coraza/v3/types"
)
func TestParser_FromString_RuleEngine(t *testing.T) {
tests := []struct {
name string
input string
expected types.RuleEngineStatus
}{
{
name: "SecRuleEngine On",
input: "SecRuleEngine On",
expected: types.RuleEngineOn,
},
{
name: "SecRuleEngine Off",
input: "SecRuleEngine Off",
expected: types.RuleEngineOff,
},
{
name: "SecRuleEngine DetectionOnly",
input: "SecRuleEngine DetectionOnly",
expected: types.RuleEngineDetectionOnly,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
waf := corazawaf.NewWAF()
parser := NewParser(waf)
err := parser.FromString(tt.input)
if err != nil {
t.Fatalf("FromString() error = %v", err)
}
if waf.RuleEngine != tt.expected {
t.Errorf("RuleEngine = %v, expected %v", waf.RuleEngine, tt.expected)
}
})
}
}
func TestParser_FromString_RequestBodyAccess(t *testing.T) {
tests := []struct {
name string
input string
expected bool
}{
{
name: "SecRequestBodyAccess On",
input: "SecRequestBodyAccess On",
expected: true,
},
{
name: "SecRequestBodyAccess Off",
input: "SecRequestBodyAccess Off",
expected: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
waf := corazawaf.NewWAF()
parser := NewParser(waf)
err := parser.FromString(tt.input)
if err != nil {
t.Fatalf("FromString() error = %v", err)
}
if waf.RequestBodyAccess != tt.expected {
t.Errorf("RequestBodyAccess = %v, expected %v", waf.RequestBodyAccess, tt.expected)
}
})
}
}
func TestParser_FromString_ResponseBodyAccess(t *testing.T) {
tests := []struct {
name string
input string
expected bool
}{
{
name: "SecResponseBodyAccess On",
input: "SecResponseBodyAccess On",
expected: true,
},
{
name: "SecResponseBodyAccess Off",
input: "SecResponseBodyAccess Off",
expected: false,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
waf := corazawaf.NewWAF()
parser := NewParser(waf)
err := parser.FromString(tt.input)
if err != nil {
t.Fatalf("FromString() error = %v", err)
}
if waf.ResponseBodyAccess != tt.expected {
t.Errorf("ResponseBodyAccess = %v, expected %v", waf.ResponseBodyAccess, tt.expected)
}
})
}
}
func TestParser_FromString_BodyLimits(t *testing.T) {
tests := []struct {
name string
input string
checkReq bool
checkRes bool
expected int64
}{
{
name: "SecRequestBodyLimit",
input: "SecRequestBodyLimit 131072",
checkReq: true,
expected: 131072,
},
{
name: "SecResponseBodyLimit",
input: "SecResponseBodyLimit 524288",
checkRes: true,
expected: 524288,
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
waf := corazawaf.NewWAF()
parser := NewParser(waf)
err := parser.FromString(tt.input)
if err != nil {
t.Fatalf("FromString() error = %v", err)
}
if tt.checkReq && waf.RequestBodyLimit != tt.expected {
t.Errorf("RequestBodyLimit = %v, expected %v", waf.RequestBodyLimit, tt.expected)
}
if tt.checkRes && waf.ResponseBodyLimit != tt.expected {
t.Errorf("ResponseBodyLimit = %v, expected %v", waf.ResponseBodyLimit, tt.expected)
}
})
}
}
func TestParser_FromString_MultipleDirectives(t *testing.T) {
input := `
SecRuleEngine On
SecRequestBodyAccess On
SecRequestBodyLimit 131072
SecResponseBodyAccess Off
`
waf := corazawaf.NewWAF()
parser := NewParser(waf)
err := parser.FromString(input)
if err != nil {
t.Fatalf("FromString() error = %v", err)
}
if waf.RuleEngine != types.RuleEngineOn {
t.Errorf("RuleEngine = %v, expected On", waf.RuleEngine)
}
if !waf.RequestBodyAccess {
t.Error("RequestBodyAccess should be true")
}
if waf.RequestBodyLimit != 131072 {
t.Errorf("RequestBodyLimit = %v, expected 131072", waf.RequestBodyLimit)
}
if waf.ResponseBodyAccess {
t.Error("ResponseBodyAccess should be false")
}
}
func TestParser_FromString_Comments(t *testing.T) {
input := `
# This is a comment
SecRuleEngine On
# Another comment
SecRequestBodyAccess On
`
waf := corazawaf.NewWAF()
parser := NewParser(waf)
err := parser.FromString(input)
if err != nil {
t.Fatalf("FromString() error = %v", err)
}
if waf.RuleEngine != types.RuleEngineOn {
t.Errorf("RuleEngine = %v, expected On", waf.RuleEngine)
}
}
func TestParser_FromString_SecRule(t *testing.T) {
input := `SecRule REQUEST_URI "@rx /admin" "id:1,phase:1,deny,status:403,msg:'Admin access denied'"`
waf := corazawaf.NewWAF()
parser := NewParser(waf)
err := parser.FromString(input)
if err != nil {
t.Fatalf("FromString() error = %v", err)
}
rules := waf.Rules.GetRules()
if len(rules) == 0 {
t.Fatal("Expected at least one rule")
}
rule := rules[0]
if rule.ID_ != 1 {
t.Errorf("Rule ID = %v, expected 1", rule.ID_)
}
if rule.Phase_ != 1 {
t.Errorf("Rule Phase = %v, expected 1", rule.Phase_)
}
}
func TestParser_FromString_SecAction(t *testing.T) {
input := `SecAction "id:900000,phase:1,pass,nolog,setvar:tx.blocking_paranoia_level=1"`
waf := corazawaf.NewWAF()
parser := NewParser(waf)
err := parser.FromString(input)
if err != nil {
t.Fatalf("FromString() error = %v", err)
}
rules := waf.Rules.GetRules()
if len(rules) == 0 {
t.Fatal("Expected at least one rule")
}
rule := rules[0]
if rule.ID_ != 900000 {
t.Errorf("Rule ID = %v, expected 900000", rule.ID_)
}
if rule.Phase_ != 1 {
t.Errorf("Rule Phase = %v, expected 1", rule.Phase_)
}
}
func TestParser_FromString_SecMarker(t *testing.T) {
// SecMarker in ANTLR grammar is stored as a DirectiveList Marker,
// not as a rule. It acts as a section boundary.
input := `SecMarker "END_HOST_CHECK"`
waf := corazawaf.NewWAF()
parser := NewParser(waf)
err := parser.FromString(input)
if err != nil {
t.Fatalf("FromString() error = %v", err)
}
rules := waf.Rules.GetRules()
if len(rules) == 0 {
t.Fatal("Expected at least one rule")
}
rule := rules[0]
if rule.SecMark_ != "END_HOST_CHECK" {
t.Errorf("SecMark = %q, expected END_HOST_CHECK", rule.SecMark_)
}
}
func TestParser_FromString_SecComponentSignature(t *testing.T) {
input := `SecComponentSignature "OWASP_CRS/4.0.0"`
waf := corazawaf.NewWAF()
parser := NewParser(waf)
err := parser.FromString(input)
if err != nil {
t.Fatalf("FromString() error = %v", err)
}
if len(waf.ComponentNames) == 0 {
t.Fatal("Expected at least one component name")
}
// The value may include quotes depending on how the parser handles it
found := false
for _, name := range waf.ComponentNames {
if name == "OWASP_CRS/4.0.0" || name == `"OWASP_CRS/4.0.0"` {
found = true
break
}
}
if !found {
t.Errorf("ComponentNames = %v, expected to contain OWASP_CRS/4.0.0", waf.ComponentNames)
}
}
func TestParser_SyntaxErrors(t *testing.T) {
tests := []struct {
name string
input string
}{
{
name: "Invalid directive",
input: "InvalidDirective blah",
},
{
name: "Malformed input",
input: "SecRuleEngine",
},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
waf := corazawaf.NewWAF()
parser := NewParser(waf)
err := parser.FromString(tt.input)
if err == nil {
t.Error("FromString() expected error, got nil")
}
})
}
}
func TestParser_FromString_MultipleRules(t *testing.T) {
input := `
SecRuleEngine On
SecRule REQUEST_METHOD "@rx ^POST$" "id:1,phase:1,pass,nolog"
SecRule REQUEST_URI "@contains /admin" "id:2,phase:1,deny,status:403"
`
waf := corazawaf.NewWAF()
parser := NewParser(waf)
err := parser.FromString(input)
if err != nil {
t.Fatalf("FromString() error = %v", err)
}
rules := waf.Rules.GetRules()
if len(rules) != 2 {
t.Fatalf("Expected 2 rules, got %d", len(rules))
}
if rules[0].ID_ != 1 {
t.Errorf("First rule ID = %v, expected 1", rules[0].ID_)
}
if rules[1].ID_ != 2 {
t.Errorf("Second rule ID = %v, expected 2", rules[1].ID_)
}
}
experimental/seclang/parser_v2.go:264
- The
errorsfield in the typeConverter struct is declared but never used throughout the codebase. This is dead code that should be removed to keep the code clean and avoid confusion.
// typeConverter converts crslang types to Coraza's internal representation
type typeConverter struct {
waf *corazawaf.WAF
state *ParserState
errors []error
}
experimental/seclang/parser_v2.go:1202
- The addVariablesFromString function uses a simplified string splitting approach (split on "|") compared to the comprehensive character-by-character parsing in internal/seclang/rule_parser.go ParseVariables (lines 42-161). The original parser handles complex cases like regex variables (e.g., ARGS:/regex/), quoted keys (e.g., ARGS:'key'), and XPATH expressions. This simplified approach may fail to parse these advanced variable syntaxes correctly in SecRuleUpdateTargetById directives. Consider implementing the full parsing logic or adding tests to verify these edge cases work correctly.
func (c *typeConverter) addVariablesFromString(rule *corazawaf.Rule, vars string) error {
// Parse the variable string character by character (same as RuleParser.ParseVariables)
// For simplicity, we split on | and process each
for _, part := range strings.Split(vars, "|") {
part = strings.TrimSpace(part)
if part == "" {
continue
}
isNegation := false
isCount := false
if strings.HasPrefix(part, "!") {
isNegation = true
part = part[1:]
}
if strings.HasPrefix(part, "&") {
isCount = true
part = part[1:]
}
varName, key, _ := strings.Cut(part, ":")
rv, err := variables.Parse(varName)
if err != nil {
return fmt.Errorf("unknown variable %q: %w", varName, err)
}
if isNegation {
if err := rule.AddVariableNegation(rv, key); err != nil {
return err
}
} else {
if err := rule.AddVariable(rv, key, isCount); err != nil {
return err
}
}
}
return nil
}
experimental/seclang/parser_v2.go:55
- The ParserState.CurrentLine field is used to set rule line numbers (lines 398, 424, 479) but is never updated during parsing. This means all rules will have Line_ = 0, which makes debugging and error reporting difficult. The ANTLR parser provides line information through the parse tree context. Consider extracting and tracking line numbers from the ANTLR parse tree or from the crslang directive objects if they contain this information.
CurrentLine int
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@copilot fix the linting issues from https://github.com/corazawaf/coraza/actions/runs/22493287637/job/65160987946?pr=1500 |
#1501) * Initial plan * fix(experimental): resolve linting issues in parser_v2.go Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com> * fix(experimental/seclang): resolve golangci-lint failures in parser_v2 Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>
…ag (#1536) * Initial plan * feat(experimental/seclang): gate crslang parser behind build tag Co-authored-by: jptosso <1236942+jptosso@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: jptosso <1236942+jptosso@users.noreply.github.com>
|
@copilot Read the coverage report in #1500 (comment) and extend the coverage. |
…5%, fix convertDefaultAction bug Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com> Agent-Logs-Url: https://github.com/corazawaf/coraza/sessions/1951caf8-77d0-493e-bbdd-fa0cfe375af6
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Summary
TypeConverterinexperimental/seclang/parser_v2.gothat bridges crslang's structured types to Coraza's compiled rule typesseclang_parserto v0.3.0 for compatibility withcrslangv0.1.0Details
The TypeConverter handles:
Key implementation decisions:
GetAllParams()includes asetvar:prefix that must be stripped for CorazaDirectiveList.Marker, not inDirectivesTest plan
go build ./experimental/seclang/...compiles without errorsgo test ./experimental/seclang/...— 23 tests passgo vet ./experimental/seclang/...— no issuesgo test ./internal/seclang/...— 349 existing tests still pass (seclang_parser downgrade is safe)