Skip to content

feat: add JSON Stream (NDJSON) body processor#1481

Open
fzipi wants to merge 13 commits intomainfrom
feat/add-streaming-json-processor
Open

feat: add JSON Stream (NDJSON) body processor#1481
fzipi wants to merge 13 commits intomainfrom
feat/add-streaming-json-processor

Conversation

@fzipi
Copy link
Copy Markdown
Member

@fzipi fzipi commented Jan 20, 2026

what

Implements a new body processor for handling streaming JSON formats with per-record rule evaluation:

  • NDJSON (Newline Delimited JSON)
  • JSON Lines
  • JSON Sequence (RFC 7464)

Streaming body processor

  • Line-by-line processing for memory efficiency
  • Each JSON object indexed by record number (json.0.field, json.1.field)
  • Built-in DoS protection with 1024 recursion limit
  • Support for nested objects and arrays
  • Auto-detection of format (NDJSON vs RFC 7464) by peeking at the first 4KB
  • Registered under JSONSTREAM, NDJSON, and JSONLINES aliases

Per-record rule evaluation

Instead of parsing all records into ArgsPost and evaluating Phase 2 rules once, rules are now evaluated after each complete JSON record:

  • Malicious records are caught immediately without processing the rest of the stream
  • ArgsPost is cleared between records so each record is evaluated in isolation
  • TX variables (e.g. anomaly scores) persist across records, enabling cross-record correlation — a stream where 3 records each score below the threshold individually can
    still trigger a block when the accumulated score exceeds it
  • Eval() is safe to call multiple times per phase — AllowTypePhase, Skip, and the transformation cache all reset correctly between calls
  • Non-streaming body processors are completely unaffected

New StreamingBodyProcessor interface

Extends BodyProcessor with ProcessRequestRecords/ProcessResponseRecords methods that yield parsed records one at a time via callback. The transaction detects this
interface via type assertion and switches to the per-record evaluation path automatically.

Streaming relay support

Added ProcessRequestBodyFromStream(input, output) / ProcessResponseBodyFromStream(input, output) methods on the transaction for integrators building custom streaming
middleware. These read records from input, evaluate rules per record, and write clean records to output. Exposed via an experimental StreamingTransaction interface.

Usage example

SecRule REQUEST_HEADERS:Content-Type "^application/x-ndjson" \
    "id:'200007',phase:1,pass,nolog,ctl:requestBodyProcessor=JSONSTREAM"

Testing

  • 24 unit tests for the body processor (single/multiple lines, nested objects, arrays, error cases, recursion limits, TX variable storage, large tokens, format
    auto-detection)
  • 7 unit tests for callback-based record processing (interruption stops processing, field prefixes, RFC 7464, backward compatibility)
  • 4 integration tests with real WAF rules (interruption at bad record, clean passthrough, TX variable accumulation across records, below-threshold no-block)
  • Benchmark: ~5,000 ops/sec for 100-object streams

Benchmark Results (Apple M2)

ProcessRequest (buffered) vs Callback (streaming)

Scenario Records ProcessRequest Callback Throughput Speedup Alloc Reduction
small 1 2.00 MB/s, 146 allocs 2.66 MB/s, 17 allocs 1.3x 88%
small 10 11.30 MB/s, 283 allocs 16.70 MB/s, 134 allocs 1.5x 53%
small 100 22.38 MB/s, 1,644 allocs 33.01 MB/s, 1,304 allocs 1.5x 21%
small 1,000 22.84 MB/s, 16,671 allocs 36.20 MB/s, 14,493 allocs 1.6x 13%
medium 1 8.50 MB/s, 176 allocs 11.92 MB/s, 39 allocs 1.4x 78%
medium 10 26.20 MB/s, 579 allocs 41.03 MB/s, 354 allocs 1.6x 39%
medium 100 31.71 MB/s, 4,563 allocs 52.34 MB/s, 3,505 allocs 1.7x 23%
medium 1,000 31.54 MB/s, 51,097 allocs 53.93 MB/s, 41,712 allocs 1.7x 18%
nested 1 8.53 MB/s, 178 allocs 12.04 MB/s, 41 allocs 1.4x 77%
nested 10 25.22 MB/s, 599 allocs 38.13 MB/s, 374 allocs 1.5x 38%
nested 100 30.71 MB/s, 4,763 allocs 48.66 MB/s, 3,705 allocs 1.6x 22%
nested 1,000 30.49 MB/s, 53,101 allocs 48.92 MB/s, 43,713 allocs 1.6x 18%

RFC 7464 (JSON Sequence) via Callback

Scenario Records Throughput Allocs/op
small 10 16.61 MB/s 134
small 100 34.37 MB/s 1,304
medium 100 47.02 MB/s 3,505
nested 100 48.06 MB/s 3,705

Key Takeaways

  • The callback-based streaming path is consistently 1.5–1.7x faster in throughput.
  • Allocation counts are 13–88% lower (most dramatic at low record counts where per-collection overhead dominates).
  • RFC 7464 format performance is comparable to NDJSON at the same record counts, confirming negligible format auto-detection overhead.
  • The callback path avoids populating TransactionVariables collections (ArgsPost, TX vars, raw body via TeeReader), which accounts for the reduced allocations.

Record Templates

  • small: {"id":1,"name":"Alice"} (24 bytes)
  • medium: {"user_id":1234567890,"name":"User Name","email":"user@example.com","role":"admin","active":true,"tags":["tag1","tag2","tag3"]} (128 bytes)
  • nested: {"user":{"name":"Alice","address":{"city":"NYC","zip":"10001"}},"scores":[95,87,92],"meta":{"created":"2026-01-01","active":true}} (131 bytes)

Implements a new body processor for handling streaming JSON formats:
- NDJSON (Newline Delimited JSON)
- JSON Lines
- JSON Sequence (RFC 7464)

Features:
- Line-by-line processing for memory efficiency
- Each JSON object indexed by line number (json.0.field, json.1.field)
- Built-in DoS protection with 1024 recursion limit
- TX variables for raw body and line count
- Support for nested objects and arrays
- Comprehensive error handling

Configuration:
- Added rules to coraza.conf-recommended for NDJSON content types
- Optional line count limiting rule
- Registered under JSONSTREAM, NDJSON, and JSONLINES aliases

Testing:
- 13 comprehensive test cases covering:
  - Single/multiple lines
  - Nested objects and arrays
  - Error cases (invalid JSON, empty stream)
  - Recursion limit enforcement
  - TX variable storage
- Benchmark: ~5,000 ops/sec for 100-object streams

Usage example:
  SecRule REQUEST_HEADERS:Content-Type "^application/x-ndjson" \
    "id:'200007',phase:1,pass,nolog,ctl:requestBodyProcessor=JSONSTREAM"

Closes: Related to streaming JSON support discussion
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
@fzipi fzipi requested a review from Copilot January 20, 2026 17:14
@fzipi fzipi requested a review from a team as a code owner January 20, 2026 17:14
@codecov
Copy link
Copy Markdown

codecov bot commented Jan 20, 2026

Codecov Report

❌ Patch coverage is 52.69461% with 158 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.07%. Comparing base (537abf9) to head (d4502d1).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
internal/corazawaf/transaction.go 10.17% 148 Missing and 2 partials ⚠️
experimental/bodyprocessors/jsonstream.go 95.15% 4 Missing and 4 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1481      +/-   ##
==========================================
- Coverage   85.30%   84.07%   -1.24%     
==========================================
  Files         174      175       +1     
  Lines        8461     8811     +350     
==========================================
+ Hits         7218     7408     +190     
- Misses        994     1146     +152     
- Partials      249      257       +8     
Flag Coverage Δ
coraza.rule.case_sensitive_args_keys 84.04% <52.69%> (-1.24%) ⬇️
coraza.rule.mandatory_rule_id_check 84.06% <52.69%> (-1.24%) ⬇️
coraza.rule.multiphase_evaluation 83.80% <52.69%> (-1.23%) ⬇️
coraza.rule.no_regex_multiline 84.06% <52.69%> (-1.24%) ⬇️
default 84.07% <52.69%> (-1.24%) ⬇️
examples+ 21.03% <24.55%> (+4.54%) ⬆️
examples+coraza.rule.case_sensitive_args_keys 83.97% <52.69%> (-1.23%) ⬇️
examples+coraza.rule.mandatory_rule_id_check 84.06% <52.69%> (-1.24%) ⬇️
examples+coraza.rule.multiphase_evaluation 83.60% <52.69%> (-1.22%) ⬇️
examples+coraza.rule.no_regex_multiline 83.91% <52.69%> (-1.23%) ⬇️
examples+memoize_builders 84.01% <52.69%> (-1.24%) ⬇️
examples+no_fs_access 81.75% <52.69%> (-1.14%) ⬇️
ftw 84.07% <52.69%> (-1.24%) ⬇️
memoize_builders 84.18% <52.69%> (-1.25%) ⬇️
no_fs_access 83.61% <52.69%> (-1.22%) ⬇️
tinygo 84.05% <52.69%> (-1.24%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for JSON Stream (NDJSON) body processing to Coraza WAF, enabling line-by-line processing of streaming JSON formats. The implementation includes a new body processor that handles NDJSON, JSON Lines, and claims support for JSON Sequence (RFC 7464).

Changes:

  • New jsonStreamBodyProcessor that processes JSON objects line-by-line with memory-efficient streaming
  • Built-in DoS protection via configurable recursion limits (default 1024)
  • TX variable storage for raw body and line count to enable custom validation rules
  • Configuration rules in coraza.conf-recommended for NDJSON content types

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
internal/bodyprocessors/jsonstream.go Core implementation of NDJSON body processor with line-by-line parsing, recursion limits, and TX variable storage
internal/bodyprocessors/jsonstream_test.go Comprehensive test suite with 13 test cases covering single/multiple lines, nested objects, arrays, error cases, and benchmarks
coraza.conf-recommended Configuration rules for enabling NDJSON processing based on Content-Type headers, with optional line count limiting

Memory Documentation:
- Add explicit documentation about 2x memory usage from TeeReader
- Clarify that this is necessary for TX variables (like regular JSON processor)
- Note memory implications: 2x body size (buffer + parsed variables)

Line Numbering:
- Use 1-based line numbers in error messages instead of 0-based
- More user-friendly: "line 1" instead of "line 0"
- Applied to both invalid JSON and parsing errors

Scanner Buffer Limit:
- Increase max scan token size from default 64KB to 1MB
- Prevents failure on large JSON objects per line
- Set initial buffer to 64KB, max to 1MB for memory efficiency

Configuration Consistency:
- Fix rule 200008 to use JSONSTREAM (was NDJSON)
- Now consistent with rule 200007
- Both rules use the same processor name

Test Code Quality:
- Replace string concatenation with fmt.Sprintf for line numbers
- Fix issue where rune('0'+tt.line) only works for single digits
- Add fmt import to test file

Documentation Accuracy:
- Remove RFC 7464 JSON Sequence from "supported formats"
- Add note that RS separator (0x1E) is not yet implemented
- Avoid misleading users about unsupported features

All tests passing: 13/13
@fzipi fzipi marked this pull request as draft January 20, 2026 17:52
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
@jcchavezs
Copy link
Copy Markdown
Member

Could we add this to experimental?

@fzipi
Copy link
Copy Markdown
Member Author

fzipi commented Jan 22, 2026

I thought I mentioned this. Yes, that was my idea.

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

fzipi added 2 commits January 23, 2026 00:14
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

@fzipi fzipi marked this pull request as ready for review February 15, 2026 12:50
@fzipi fzipi requested a review from Copilot February 15, 2026 12:51
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

- Extract inline interface to named indexedCollection type (jcchavezs)
- Preserve original stream format in relay by including format-specific
  delimiters in rawRecord (NDJSON uses \n, RFC 7464 uses RS prefix + \n)
- Update readItemsWithLimit TODO comments to reference #1110
@fzipi
Copy link
Copy Markdown
Member Author

fzipi commented Mar 21, 2026

@copilot Fix the conflicts first. Then add e2e tests to tests for this new body processor.

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 21, 2026

@fzipi I've opened a new pull request, #1563, to work on those changes. Once the pull request is ready, I'll request review from you.

* fix(deps): update module golang.org/x/net to v0.45.0 [security] (#1487)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix(deps): update go modules in go.mod (#1433)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* docs(actions): update format and add package (#1475)

* docs(actions): update format and add package

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* fix: update documentation for package

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* fix: go fmt

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

---------

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* fix: add A-Z to auditlog (#1479)

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* fix: SecRuleUpdateActionById should replace disruptive actions (#1471)

* fix: SecRuleUpdateActionById should replace disruptive actions

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* fix: multiphase test with bad expectations

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* tests: improve coverage on engine

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* refactor: address SecRuleUpdateActionById review comments (#1484)

* Initial plan

* Address code review comments: improve documentation, fix double parsing, and fix range logic

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* Refactor: Extract hasDisruptiveActions helper to avoid code duplication

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* docs: Improve applyParsedActions documentation

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* docs: Clarify body parsing logic in SetRawRequest

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* refactor: address review comments on SecRuleUpdateActionById

- Rename ClearActionsOfType to ClearDisruptiveActions
- Add comments explaining quote trimming in action parsing
- Remove empty line after function brace in updateActionBySingleID
- Split engine_test.go: move output/helper tests to engine_output_test.go

* Apply suggestions from code review

Co-authored-by: Matteo Pace <pace.matteo96@gmail.com>

* fix: use index-based iteration for SecRuleUpdateActionById range updates

The range loop variable copied each Rule, so modifications to disruptive
actions were lost. Use index-based iteration to modify rules in place.
Also adds a test case exercising the range update path.

---------

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Matteo Pace <pace.matteo96@gmail.com>

* refactor: remove root package dependency on experimental (#1494)

* refactor: remove root package dependency on experimental

Replace experimental.Options with corazawaf.Options in waf.go, breaking
the import cycle that prevented the experimental package from importing
the root coraza package. This unblocks PR #1478 and lets experimental
helpers use coraza.WAFConfig with proper type safety instead of any.

* Update waf.go

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* chore: min go version to 1.25 (#1497)

* No content wants no body

* Update .github/workflows/regression.yml

Co-authored-by: Felipe Zipitría <3012076+fzipi@users.noreply.github.com>

* one more place

---------

Co-authored-by: Felipe Zipitría <3012076+fzipi@users.noreply.github.com>

* feat: add optional rule observer callback to WAF config (#1478)

* feat: add optional rule observer callback to WAF config

Introduce an optional rule observer callback that is invoked for each rule successfully added to the WAF during initialization.

The observer receives rule metadata via the existing RuleMetadata interface.

* Move to the experimental package

* Do not use reflection to keep the compatibility with older Go versions

* Use coraza.WAFConfig, move the test to where it belongs.

---------

Co-authored-by: Felipe Zipitría <3012076+fzipi@users.noreply.github.com>
Co-authored-by: José Carlos Chávez <jcchavezs@gmail.com>

* feat: add WAFWithRules interface with RulesCount() (#1492)

Add WAFWithRules interface with RulesCount()

* fix(deps): update module golang.org/x/net to v0.51.0 [security] (#1502)

* fix(deps): update module golang.org/x/net to v0.51.0 [security]

* chore: update go.work to 1.25.0

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* chore: update golang to 1.25.0

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

---------

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* chore(deps): update module golang.org/x/net to v0.51.0 [security] (#1506)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* fix: lowercase regex patterns for case-insensitive variable collections (#1505)

* fix: lowercase regex patterns for case-insensitive variable collections

When a rule uses regex-based variable selection (e.g. TX:/PATTERN/),
the regex pattern was compiled from the raw uppercase string before
any case normalization. Since TX collection keys are stored lowercase,
the uppercase regex would never match, causing rules like CRS 922110
(which uses TX:/MULTIPART_HEADERS_CONTENT_TYPES_*/) to silently fail.

Now AddVariable and AddVariableNegation lowercase the regex pattern
before compilation for case-insensitive variables, matching the
existing behavior for string keys in newRuleVariableParams.

* chore: update coreruleset to v4.24.0

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

---------

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* chore: update libinjection-go and deps (#1496)

* chore: update libinjection-go and deps

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* chore: update coreruleset v4.24.0

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

---------

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* fix: ctl:ruleRemoveTargetById to support whole-collection exclusion (#1495)

* Initial plan

* Fix ruleRemoveTargetById to support removing entire collection (empty key)

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* feat: add SecRequestBodyJsonDepthLimit directive (#1110)

* feat: add SecRequestBodyJsonDepthLimit directive

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* Apply suggestions from code review

* fix: mage format

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* Update internal/bodyprocessors/json_test.go

* Update internal/bodyprocessors/json_test.go

* fix: bad char

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* fix: gofmt

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* docs: add clarifying comments for JSON recursion limit behavior

- Explain why ResponseBodyRecursionLimit = -1 (unlimited for responses)
- Document dual purpose of body reading (TX vars + ARGS_POST)
- Clarify DoS protection mechanism in readItems()
- Note how negative values bypass recursion check

* fix: address PR review comments for JSON depth limit

- Always enforce a positive recursion limit: change ResponseBodyRecursionLimit
  from -1 (unlimited) to 1024, matching the request body default
- Rename test case "broken1" to "unbalanced_brackets" for clarity
- Extract error check from the key iteration loop in TestReadJSON

* test: add benchmarks for gjson.Valid pre-validation overhead

Measures the cost of gjson.Valid() in the full readJSON pipeline.
gjson.Parse is lazy (~9ns), so the real overhead is Valid vs the
readItems traversal. Results show ~10-16% overhead for validation,
which is acceptable for WAF safety. No single-pass alternative
exists in the gjson API.

* Apply suggestions from code review

* Apply suggestion from @fzipi

---------

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Co-authored-by: José Carlos Chávez <jcchavezs@gmail.com>

* fix: update constants for recursion limit (#1512)

* fix: conflate the constants for recursion limit

* fix: value setting

* chore: remove panic from seclang compiler (#1514)

* Initial plan

* fix: replace panic with error return in parser.go evaluateLine

Co-authored-by: jptosso <1236942+jptosso@users.noreply.github.com>

* fix: revert go.sum changes - do not modify go.sum files in this PR

Co-authored-by: jptosso <1236942+jptosso@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jptosso <1236942+jptosso@users.noreply.github.com>

* ci: reduce regression matrix from 128 to 15 jobs (#1522)

Replace dynamic 64-permutation tag matrix with a curated static list
of 13 build-flag combinations. Run all combos on Go 1.25.x and only
baseline + kitchen-sink on Go 1.26.x.

Add concurrency groups to regression, lint, tinygo, and codeql
workflows so stale PR runs are auto-cancelled on new pushes.

* feat: ignore unexpected EOF in MIME multipart request body processor (#1453)

* Ignore unexpected EOF in MIME multipart request body processor

We need this behavior since we need to process an incomplete MIME multipart
request body when SecRequestBodyLimitAction is set to ProcessPartial.

* fix: add copilot code review comments

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

---------

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Co-authored-by: José Carlos Chávez <jcchavezs@gmail.com>
Co-authored-by: Felipe Zipitría <3012076+fzipi@users.noreply.github.com>
Co-authored-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* fix: set changed flag in removeComments and escapeSeqDecode (#1532)

Fix two bugs where transformation functions modified the input string
but did not report changed=true:

- removeComments: entering a C-style (/* */) or HTML (<!-- -->)
  comment block did not set changed=true, causing the multi-match
  optimization to skip the transformed result.

- escapeSeqDecode: unrecognized escape sequences (e.g. \z) dropped
  the backslash but did not set changed=true.

Add test coverage for both fixes including a new remove_comments_test.go
and an additional unrecognized-escape test case for escape_seq_decode.

* perf: use map for ruleRemoveByID for O(1) lookup (#1524)

* perf: use map for ruleRemoveByID for O(1) lookup

Replace []int slice with map[int]struct{} for the per-transaction
rule exclusion list. The rule evaluation loop checks this list for
every rule in every phase, making O(1) map lookup significantly
faster than O(n) linear scan when rules are excluded via ctl actions.

* test: add TestRemoveRuleByID for map-based rule exclusion

* bench: add BenchmarkRuleEvalWithRemovedRules

* refactor: use real unconditionalMatch operator from registry in tests

* Fix HTTP middleware to process all Transfer-Encoding values (#1518)

* Fix HTTP middleware to process all Transfer-Encoding values

Co-authored-by: jptosso <1236942+jptosso@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jptosso <1236942+jptosso@users.noreply.github.com>
Co-authored-by: Matteo Pace <pace.matteo96@gmail.com>

* fix(deps): update module golang.org/x/sync to v0.20.0 in go.mod (#1543)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* feat: optimize ruleRemoveById range handling store ranges instead of expanding to int slices (#1538)

* Initial plan

* Optimize ruleRemoveById range handling to avoid generating massive int slices

- Replace rangeToInts (which allocated []int of all matching rule IDs) with
  parseRange and parseIDOrRange helpers that return start/end integers
- For ctlRuleRemoveByID with ranges: store the range in
  Transaction.ruleRemoveByIDRanges ([][2]int) and check it in the rule
  evaluation loop, avoiding both the intermediate []int and potentially
  large map expansions
- For ctlRuleRemoveTargetByID: iterate rules once directly, eliminating
  the intermediate []int allocation
- Add RemoveRuleByIDRange method to Transaction
- Reset ruleRemoveByIDRanges on transaction pool reuse
- Replace TestCtlParseRange with TestCtlParseIDOrRange to test the new helpers

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* Improve test coverage for range-based rule removal

- Add TestRemoveRuleByIDRange in transaction_test.go:
  - range is stored in ruleRemoveByIDRanges
  - rules in range are skipped during Eval
  - multiple ranges work correctly
  - ruleRemoveByIDRanges is reset on transaction pool reuse
- Add TestCtlParseRange in ctl_test.go to cover parseRange directly
  (including the no-separator and start>end error paths)
- Add GetRuleRemoveByIDRanges() accessor on Transaction for cross-package
  test assertions
- Enhance "ruleRemoveById range" TestCtl case to verify the range is stored
- Add "ruleRemoveTargetById range" TestCtl case to verify range path works

Coverage changes:
  parseRange:         83.3% → 100%
  parseIDOrRange:     100%  (unchanged)
  RemoveRuleByIDRange: 0%   → 100%

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* fix(testing): Correct use of ProcessURI in Benchmarks (#1546)

* perf: prefix-based transformation cache with inline values (#1544)

Redesign the transformation cache to share intermediate results across
rules with common transformation prefixes (e.g. rules using
t:lowercase,t:urlDecodeUni reuse the t:lowercase result cached by an
earlier rule using just t:lowercase).

Key changes:
- Add transformationPrefixIDs to Rule for backward prefix search
- Cache every intermediate transformation step, not just the final result
- Store cache values inline (not pointers) to avoid heap allocations
- Fix ClearTransformations (t:none) to reset transformationsID

Benchmarked against full CRS v4 ruleset (8 runs, benchstat):
  Allocations: -2% (small) to -19% (30 params)
  Memory:      -2% (small) to -12% (30 params)
  Timing:      -5% (small/large), neutral (medium)
  No regressions on any metric.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* perf: bulk-allocate MatchData in collection Find methods (#1530)

* perf: bulk-allocate MatchData in collection Find methods

Pre-allocate a contiguous []corazarules.MatchData buffer and take
pointers into it instead of individually heap-allocating each
MatchData. This reduces per-result allocations from N to 2 (one buf
slice + one result slice), improving GC pressure for large result
sets.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: avoid double regex evaluation in FindRegex

Collect matching data slices during the counting pass so the second
pass only iterates over already-matched entries, eliminating redundant
MatchString calls.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* bench: add FindAll/FindRegex/FindString benchmarks

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Felipe Zipitría <3012076+fzipi@users.noreply.github.com>

* perf: use FindStringSubmatchIndex to avoid capture allocations (#1547)

* perf: use FindStringSubmatchIndex to avoid capture allocations

Replace FindStringSubmatch (allocates a []string slice per match) with
FindStringSubmatchIndex (returns index pairs). Substrings passed to
CaptureField become slices of the original input — zero allocation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add BenchmarkRxCapture for submatch allocation comparison

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(DetectionOnly): fixed RelevantOnly audit logs, improved matchedRules (#1549)

* add detectedInterruption var for DetectionOnly mode

* IsDetectionOnly, refactor, populate matchedRules

* nit

* Apply suggestions from code review

Co-authored-by: Felipe Zipitría <3012076+fzipi@users.noreply.github.com>

---------

Co-authored-by: Romain SERVIERES <romain@madeformed.com>
Co-authored-by: Felipe Zipitría <3012076+fzipi@users.noreply.github.com>

* fix(deps): update module golang.org/x/net to v0.52.0 in go.mod (#1553)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* ci: increase fuzztime (#1554)

* more fuzztime

* go mod

* chore(ci): harden GHA workflows with least-privilege permissions (#1559)

- Add top-level `permissions: {}` (deny-all) to every workflow
- Add scoped per-job permissions granting only what each job needs
- Fix expression injection in regression.yml by using env instead of
  inline shell interpolation for BUILD_TAGS
- Restrict regression.yml pull_request trigger to main branch only
- Add explicit permissions to fuzz.yml (issues: write for failure reports)
- Add security-events: write to CodeQL workflow

* feat: enable regex memoize by default (#1540)

* feat: enable regex memoize by default

Memoization of regex and aho-corasick builders was previously opt-in via
the `memoize_builders` build tag. Most users didn't know to enable it,
missing a critical performance optimization.

This commit:
- Enables memoization by default (opt-out via `coraza.no_memoize` tag)
- Refactors internal/memoize from package-level Do() to Memoizer struct
- Adds Memoizer interface to plugintypes.OperatorOptions
- Wires WAF's Memoizer through to all operator and rule consumers
- Replaces `memoize_builders` build tag with `coraza.no_memoize` opt-out

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: document cache tradeoffs and add noop memoize test

- Update README and memoize README to document global cache behavior
  and point to WAF.Close() for live-reload scenarios.
- Add test file for coraza.no_memoize build variant to verify no-op
  behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add WAF.Close() with per-owner memoize cache tracking and scale benchmarks (#1541)

* feat: add WAF.Close() with per-owner memoize cache tracking

Add WAFCloser interface and per-owner tracking to the memoize cache so
that long-lived processes can release compiled regex entries when a WAF
instance is destroyed. Each WAF gets a uint64 ID; Release() removes the
owner and tombstones entries with no remaining owners.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add memoize scale benchmarks and CRS integration tests

Add benchmarks demonstrating memoize value at scale (1-100 WAFs × 300
patterns) and CRS integration tests verifying Close() releases memory.
Results show ~27x speedup for 100 WAFs and 27MiB released on Close().

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add WAF.Close() calls to e2e and CRS tests

Demonstrate proper WAFCloser usage in integration tests: e2e test,
CRS FTW test, CRS benchmarks, and crsWAF helper.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* test: extend coraza.no_memoize coverage in noop_test.go (#1555)

* Initial plan

* test: extend noop_test.go coverage for coraza.no_memoize build tag

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* fix: check error return of m.Do in benchmark to resolve errcheck lint failure (#1556)

* Initial plan

* fix: check error return of m.Do in benchmark test to fix errcheck lint

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* fix: skip memoize scale tests in short mode

The scale tests (TestMemoizeScaleMultipleOwners, TestCacheGrowthWithoutClose,
TestCacheBoundedWithClose) compile hundreds of regexes across many owners/cycles.
Under TinyGo's slower regex engine these take hours when run in CI with -short.

Gate all three scale tests behind testing.Short() in both sync_test.go and
nosync_test.go so TinyGo CI (which passes -short) completes in reasonable time.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* fix(memoize): avoid deadlock in TinyGo's sync.Map during Release and Reset

TinyGo's sync.Map.Range() holds its internal lock for the entire
iteration. Calling cache.Delete() inside the Range callback tries to
re-acquire the same non-reentrant lock, causing a deadlock.

Defer all cache.Delete() calls until after Range returns by collecting
keys first. This also fixes t.Skip() in tests which does not halt
execution in TinyGo due to unimplemented runtime.Goexit().

On standard Go this is a net performance win for Release (up to 60%
faster at 100 owners) with negligible temporary memory (~9KB slice).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Felipe Zipitría <3012076+fzipi@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* feat: implement SecUploadKeepFiles directive (#1557)

* feat: implement SecUploadKeepFiles with RelevantOnly support

Add UploadKeepFilesStatus type supporting On, Off, and RelevantOnly
values for the SecUploadKeepFiles directive. When set to On, uploaded
files are preserved after transaction close. When set to RelevantOnly,
files are kept only if rules matched during the transaction.

Closes #1550

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Apply suggestion from @M4tteoP

Co-authored-by: Matteo Pace <pace.matteo96@gmail.com>

* docs: update SecUploadKeepFiles in coraza.conf-recommended

Remove the "not supported" note and document the RelevantOnly option.

* fix: filter nolog rules in RelevantOnly upload keep files check

RelevantOnly now only considers rules with Log enabled, matching the
same filtering used for audit log part K. This prevents CRS
initialization rules (nolog) from making RelevantOnly behave like On.

* fix: require SecUploadDir when SecUploadKeepFiles is enabled

Add validation in WAF.Validate() to ensure SecUploadDir is configured
when SecUploadKeepFiles is set to On or RelevantOnly, matching the
ModSecurity requirement.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* fix: directive docs

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Matteo Pace <pace.matteo96@gmail.com>

* fix: correct two compile errors in SecUploadKeepFiles implementation (#1560)

* Initial plan

* fix: correct lint errors - HasAccessToFS is a bool not a function, fix wrong constant name

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* fix: gofmt

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>

* fix: skip SecUploadKeepFiles tests when no_fs_access build tag is set

The upload keep files tests expected success for On/RelevantOnly modes,
but the implementation correctly rejects these when filesystem access is
disabled. Guard these test cases behind environment.HasAccessToFS.

---------

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Matteo Pace <pace.matteo96@gmail.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>

* feat: add regex support to ctl:ruleRemoveTargetById, ruleRemoveTargetByTag, and ruleRemoveTargetByMsg collection keys (#1561)

* Initial plan

* Add regex support to ctl:ruleRemoveTargetById for URI-scoped exclusions

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* Use memoization for regex compilation in parseCtl

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* Add benchmarks for short and medium regex exceptions in GetField

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* refactor: add HasRegex shared utility and use it in rule.go and ctl.go

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* test: add POST JSON body test for ruleRemoveTargetById regex key exclusion

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* docs: update RemoveRuleTargetByID comment to document keyRx parameter

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* docs: update ctl action doc comment to describe regex key syntax with example

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* test: add ruleRemoveTargetByTag and ruleRemoveTargetByMsg regex key integration tests

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* style: apply gofmt to internal/actions/ctl.go

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* test: add memoizer coverage to TestParseCtl for ctl regex path

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>

* Initial plan

* test: add e2e tests for JSONSTREAM body processor

Co-authored-by: fzipi <3012076+fzipi@users.noreply.github.com>
Agent-Logs-Url: https://github.com/corazawaf/coraza/sessions/bebca76e-344f-4966-8675-8bf4e5fda0cb

---------

Signed-off-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Felipe Zipitría <3012076+fzipi@users.noreply.github.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: Matteo Pace <pace.matteo96@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Alexander S. <126732+heaven@users.noreply.github.com>
Co-authored-by: José Carlos Chávez <jcchavezs@gmail.com>
Co-authored-by: Pierre POMES <pierre.pomes@gmail.com>
Co-authored-by: Felipe Zipitria <felipe.zipitria@owasp.org>
Co-authored-by: jptosso <1236942+jptosso@users.noreply.github.com>
Co-authored-by: Juan Pablo Tosso <jptosso@gmail.com>
Co-authored-by: Hiroaki Nakamura <hnakamur@gmail.com>
Co-authored-by: Marc W. <113890636+MarcWort@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Romain SERVIERES <romain@madeformed.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants