Skip to content

docs: add AGENTS.md for LLM codebase navigation#1535

Open
jptosso wants to merge 5 commits intomainfrom
add-agents-md
Open

docs: add AGENTS.md for LLM codebase navigation#1535
jptosso wants to merge 5 commits intomainfrom
add-agents-md

Conversation

@jptosso
Copy link
Copy Markdown
Member

@jptosso jptosso commented Mar 6, 2026

Summary

  • Add AGENTS.md at the repo root with comprehensive documentation to help LLMs (GitHub Copilot, Claude, Cursor, etc.) navigate and understand the Coraza codebase
  • Covers architecture (request pipeline, rule evaluation), plugin system, collection system, SecLang parser, build system, and how-to guides for common tasks
  • Update .github/copilot-instructions.md to reference AGENTS.md

Test plan

  • Verify all referenced file paths exist in the repo (verified locally, all 34 paths confirmed)
  • Review AGENTS.md content for accuracy against current codebase

Summary by CodeRabbit

  • Documentation
    • Added comprehensive architecture guide documenting codebase structure, request processing pipeline, rule evaluation flow, and plugin system.
    • Added developer reference documentation with step-by-step instructions for common development tasks and testing patterns.

Add comprehensive documentation to help LLMs (GitHub Copilot, Claude,
Cursor, etc.) understand the Coraza architecture, navigate the codebase,
and perform common tasks like adding operators, transformations, and
actions. Update copilot-instructions.md to reference AGENTS.md.
@jptosso jptosso requested a review from a team as a code owner March 6, 2026 13:28
@jptosso jptosso enabled auto-merge (squash) March 6, 2026 14:28
fzipi
fzipi previously approved these changes Mar 6, 2026
jptosso and others added 2 commits March 6, 2026 15:37
Co-authored-by: Felipe Zipitría <3012076+fzipi@users.noreply.github.com>
@fzipi
Copy link
Copy Markdown
Member

fzipi commented Mar 6, 2026

Can be enhanced yet.

@fzipi fzipi requested a review from Copilot March 28, 2026 13:59
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 28, 2026

📝 Walkthrough

Walkthrough

This pull request adds documentation to guide LLM-assisted development. It introduces a new AGENTS.md file documenting the Coraza Go WAF codebase structure, architecture, plugin system, and common development tasks, and updates .github/copilot-instructions.md to reference this guide.

Changes

Cohort / File(s) Summary
Configuration & Guidance
.github/copilot-instructions.md
Added blockquote reference to AGENTS.md documentation file for repository-level LLM navigation context.
New Documentation
AGENTS.md
Created comprehensive LLM navigation guide covering Coraza WAF codebase structure, 5-phase request processing pipeline, rule evaluation flow, plugin system architecture, transaction collections, SecLang parser, build tooling, and step-by-step guides for common development tasks (operators, transformations, actions, directives).

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 With paws on keyboard, I hop with glee,
New pathways drawn for all AI to see!
Maps and guides through Coraza's halls,
Where agents navigate through feature walls! 🎯

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'docs: add AGENTS.md for LLM codebase navigation' clearly and concisely describes the primary change: adding a new documentation file (AGENTS.md) designed to guide LLMs in understanding the codebase.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch add-agents-md

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
AGENTS.md (1)

25-27: Avoid hard-coded file counts in navigation docs.

Approximate counts like ~38 / ~33 can drift quickly and make this guide stale.

💡 Suggested wording
-| `internal/operators/` | All operator implementations (~38 non-test files) |
-| `internal/transformations/` | All transformation implementations (~33 non-test files) |
-| `internal/actions/` | All action implementations (~33 non-test files) |
+| `internal/operators/` | Operator implementations |
+| `internal/transformations/` | Transformation implementations |
+| `internal/actions/` | Action implementations |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 25 - 27, Remove the hard-coded approximate file
counts from the navigation table entries for `internal/operators/`,
`internal/transformations/`, and `internal/actions/` in AGENTS.md; update the
three lines to omit `~38` and `~33` (or replace them with a non-quantitative
phrase like “(see repository)” or “(various implementations)”) so the doc won’t
become stale and will remain accurate without manual updates.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@AGENTS.md`:
- Around line 42-47: Several fenced code blocks in AGENTS.md (for example the
block containing "ProcessConnection(clientIP, clientPort, serverIP, serverPort)
-> ProcessURI(uri, method, httpVersion) -> AddRequestHeader(key, value)  //
repeat per header -> ProcessRequestHeaders() -> *Interruption" and the other
unlabeled blocks around the same area) lack a language hint; add a language
label (e.g., ```text) to each unlabeled fenced block to satisfy MD040 and
markdown linters, updating the opening fences for the blocks that contain
constructs like ProcessConnection(...), ProcessURI(...), AddRequestHeader(...),
and similar sequences so they become ```text ... ``` (or another appropriate
language) while leaving the block contents unchanged.
- Around line 206-216: The Markdown table that begins with the header "| Tag |
Effect |" and lists tags like `coraza.disabled_operators.<name>`,
`coraza.rule.multiphase_evaluation`, `tinygo`, etc., needs blank lines inserted
immediately before and after the table to satisfy MD058; update the AGENTS.md
content to add a single empty line above the table header and a single empty
line after the final `| no_fs_access | Disable filesystem access |` row.

---

Nitpick comments:
In `@AGENTS.md`:
- Around line 25-27: Remove the hard-coded approximate file counts from the
navigation table entries for `internal/operators/`, `internal/transformations/`,
and `internal/actions/` in AGENTS.md; update the three lines to omit `~38` and
`~33` (or replace them with a non-quantitative phrase like “(see repository)” or
“(various implementations)”) so the doc won’t become stale and will remain
accurate without manual updates.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 75d3d4aa-9fbe-4dfe-bdbe-74e9ec039ebb

📥 Commits

Reviewing files that changed from the base of the PR and between 6c45573 and 629f437.

📒 Files selected for processing (2)
  • .github/copilot-instructions.md
  • AGENTS.md

Comment on lines +42 to +47
```
ProcessConnection(clientIP, clientPort, serverIP, serverPort)
-> ProcessURI(uri, method, httpVersion)
-> AddRequestHeader(key, value) // repeat per header
-> ProcessRequestHeaders() -> *Interruption
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add languages to fenced code blocks to satisfy markdown linting.

Several fenced blocks are missing a language hint (e.g., around Line 42 and Line 181). This triggers MD040 and reduces tooling compatibility.

💡 Proposed doc-only fix
-```
+```text
 ProcessConnection(clientIP, clientPort, serverIP, serverPort)
   -> ProcessURI(uri, method, httpVersion)
   -> AddRequestHeader(key, value)  // repeat per header
   -> ProcessRequestHeaders() -> *Interruption

Apply the same `text` (or other appropriate language) label to the other unlabeled fenced blocks.
</details>


Also applies to: 50-53, 57-60, 63-66, 69-71, 181-183

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>

[warning] 42-42: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

</details>

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>

Verify each finding against the current code and only fix it if needed.

In @AGENTS.md around lines 42 - 47, Several fenced code blocks in AGENTS.md (for
example the block containing "ProcessConnection(clientIP, clientPort, serverIP,
serverPort) -> ProcessURI(uri, method, httpVersion) -> AddRequestHeader(key,
value) // repeat per header -> ProcessRequestHeaders() -> *Interruption" and
the other unlabeled blocks around the same area) lack a language hint; add a
language label (e.g., text) to each unlabeled fenced block to satisfy MD040 and markdown linters, updating the opening fences for the blocks that contain constructs like ProcessConnection(...), ProcessURI(...), AddRequestHeader(...), and similar sequences so they become text ... ``` (or another appropriate
language) while leaving the block contents unchanged.


</details>

<!-- fingerprinting:phantom:triton:hawk:edcc74e9-9abc-488e-8a5b-dc306d8c4d0b -->

<!-- This is an auto-generated comment by CodeRabbit -->

Comment on lines +206 to +216
| Tag | Effect |
|---|---|
| `coraza.disabled_operators.<name>` | Exclude a specific operator from compilation |
| `coraza.rule.multiphase_evaluation` | Evaluate rule variables in phases they become ready |
| `coraza.rule.case_sensitive_args_keys` | Case-sensitive ARGS key matching (RFC 3986) |
| `coraza.rule.no_regex_multiline` | Disable default multiline mode in `@rx` operator |
| `coraza.rule.mandatory_rule_id_check` | Require `id` action for all SecRule/SecAction |
| `tinygo` | TinyGo-compatible build (affects sync primitives, FS access) |
| `memoize_builders` | Enable memoization of operator/transformation builders |
| `no_fs_access` | Disable filesystem access |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Surround the build-tags table with blank lines.

The table starting at Line 206 should be separated by blank lines to satisfy MD058.

💡 Proposed doc-only fix
 ### Build tags
+
 | Tag | Effect |
 |---|---|
 | `coraza.disabled_operators.<name>` | Exclude a specific operator from compilation |
 | `coraza.rule.multiphase_evaluation` | Evaluate rule variables in phases they become ready |
 | `coraza.rule.case_sensitive_args_keys` | Case-sensitive ARGS key matching (RFC 3986) |
 | `coraza.rule.no_regex_multiline` | Disable default multiline mode in `@rx` operator |
 | `coraza.rule.mandatory_rule_id_check` | Require `id` action for all SecRule/SecAction |
 | `tinygo` | TinyGo-compatible build (affects sync primitives, FS access) |
 | `memoize_builders` | Enable memoization of operator/transformation builders |
 | `no_fs_access` | Disable filesystem access |
+
 ### Generated code
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| Tag | Effect |
|---|---|
| `coraza.disabled_operators.<name>` | Exclude a specific operator from compilation |
| `coraza.rule.multiphase_evaluation` | Evaluate rule variables in phases they become ready |
| `coraza.rule.case_sensitive_args_keys` | Case-sensitive ARGS key matching (RFC 3986) |
| `coraza.rule.no_regex_multiline` | Disable default multiline mode in `@rx` operator |
| `coraza.rule.mandatory_rule_id_check` | Require `id` action for all SecRule/SecAction |
| `tinygo` | TinyGo-compatible build (affects sync primitives, FS access) |
| `memoize_builders` | Enable memoization of operator/transformation builders |
| `no_fs_access` | Disable filesystem access |
### Build tags
| Tag | Effect |
|---|---|
| `coraza.disabled_operators.<name>` | Exclude a specific operator from compilation |
| `coraza.rule.multiphase_evaluation` | Evaluate rule variables in phases they become ready |
| `coraza.rule.case_sensitive_args_keys` | Case-sensitive ARGS key matching (RFC 3986) |
| `coraza.rule.no_regex_multiline` | Disable default multiline mode in `@rx` operator |
| `coraza.rule.mandatory_rule_id_check` | Require `id` action for all SecRule/SecAction |
| `tinygo` | TinyGo-compatible build (affects sync primitives, FS access) |
| `memoize_builders` | Enable memoization of operator/transformation builders |
| `no_fs_access` | Disable filesystem access |
### Generated code
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)

[warning] 206-206: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@AGENTS.md` around lines 206 - 216, The Markdown table that begins with the
header "| Tag | Effect |" and lists tags like
`coraza.disabled_operators.<name>`, `coraza.rule.multiphase_evaluation`,
`tinygo`, etc., needs blank lines inserted immediately before and after the
table to satisfy MD058; update the AGENTS.md content to add a single empty line
above the table header and a single empty line after the final `| no_fs_access |
Disable filesystem access |` row.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a repo-root AGENTS.md to document Coraza’s architecture and key subsystems for LLM-assisted navigation, and links to it from the existing Copilot instructions.

Changes:

  • Added AGENTS.md with architecture notes, key file pointers, build/test guidance, and “common tasks” how-tos.
  • Updated .github/copilot-instructions.md to reference AGENTS.md.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
AGENTS.md New LLM navigation/architecture guide covering repo structure, request pipeline, rule evaluation, plugins, collections, parser, build/tags, and common tasks.
.github/copilot-instructions.md Adds a link pointing readers to AGENTS.md for detailed architecture/navigation docs.


### Phase 4 - Response Body
```
WriteResponseBody([]byte) / ReadResponseBodyFrom(io.Reader)
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, the response body phase example omits the actual return values: WriteResponseBody / ReadResponseBodyFrom return (*Interruption, int, error) in the types.Transaction interface. Adjust the snippet so it matches the API and makes it clear callers can receive an interruption/byte count directly from these methods.

Suggested change
WriteResponseBody([]byte) / ReadResponseBodyFrom(io.Reader)
WriteResponseBody([]byte) -> (*Interruption, int, error)
ReadResponseBodyFrom(io.Reader) -> (*Interruption, int, error)

Copilot uses AI. Check for mistakes.

### Mage tasks
```bash
go run mage.go test # Run all tests (including memoize_builders, multiphase, CRS)
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go run mage.go test currently runs go test ./..., then reruns tests with -tags=coraza.no_memoize, plus additional suites (examples/http-server with race, and testing/coreruleset with multiple tags). The comment mentioning "memoize_builders" doesn’t match any build tag or what Mage runs today; consider updating this line to reference the actual tag (coraza.no_memoize) and the additional tagged suites (multiphase, no_regex_multiline, case_sensitive_args_keys).

Suggested change
go run mage.go test # Run all tests (including memoize_builders, multiphase, CRS)
go run mage.go test # Run all tests (go test ./..., then with -tags=coraza.no_memoize, plus examples/http-server (race) and testing/coreruleset with multiphase/no_regex_multiline/case_sensitive_args_keys)

Copilot uses AI. Check for mistakes.
| `coraza.rule.no_regex_multiline` | Disable default multiline mode in `@rx` operator |
| `coraza.rule.mandatory_rule_id_check` | Require `id` action for all SecRule/SecAction |
| `tinygo` | TinyGo-compatible build (affects sync primitives, FS access) |
| `memoize_builders` | Enable memoization of operator/transformation builders |
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build tags table appears inaccurate/incomplete compared to the repository’s documented/used tags: there is no memoize_builders build tag in the codebase, and coraza.no_memoize (used in Mage/CI and documented in README) is missing. Suggest removing memoize_builders, adding coraza.no_memoize, and aligning the list with README.md’s “Build tags” section to avoid confusing integrators/LLMs.

Suggested change
| `memoize_builders` | Enable memoization of operator/transformation builders |
| `coraza.no_memoize` | Disable memoization of operator/transformation builders |

Copilot uses AI. Check for mistakes.
Comment on lines +232 to +247
1. Create `internal/operators/my_operator.go`:
```go
type myOperator struct {
data string
}
func (o *myOperator) Evaluate(tx plugintypes.TransactionState, value string) bool {
// implementation
}
```
2. Register in `internal/operators/` init or via `experimental/plugins/operators.go`:
```go
plugins.RegisterOperator("myOperator", func(options plugintypes.OperatorOptions) (plugintypes.Operator, error) {
return &myOperator{data: options.Arguments}, nil
})
```
3. Add tests in `internal/operators/my_operator_test.go`
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The “Adding a new operator” steps mix two different workflows and the example as written would create an import cycle if followed inside internal/operators/ (because experimental/plugins imports internal/operators). If the intent is adding a built-in operator, the registration should be done via internal/operators.Register(...) from within the operators package; if the intent is an external plugin, show plugins.RegisterOperator(...) from an integrator’s package (not from internal/operators).

Suggested change
1. Create `internal/operators/my_operator.go`:
```go
type myOperator struct {
data string
}
func (o *myOperator) Evaluate(tx plugintypes.TransactionState, value string) bool {
// implementation
}
```
2. Register in `internal/operators/` init or via `experimental/plugins/operators.go`:
```go
plugins.RegisterOperator("myOperator", func(options plugintypes.OperatorOptions) (plugintypes.Operator, error) {
return &myOperator{data: options.Arguments}, nil
})
```
3. Add tests in `internal/operators/my_operator_test.go`
There are two ways to add an operator:
- As a **built-in operator** in the core `internal/operators` package.
- As an **external plugin operator** registered via `experimental/plugins` from your own package.
#### Built-in operator (core)
1. Create `internal/operators/my_operator.go`:
```go
package operators
import "github.com/corazawaf/coraza/v3/experimental/plugins/plugintypes"
type myOperator struct {
data string
}
func (o *myOperator) Evaluate(tx plugintypes.TransactionState, value string) bool {
// implementation
return false
}
func init() {
Register("myOperator", func(options plugintypes.OperatorOptions) (plugintypes.Operator, error) {
return &myOperator{data: options.Arguments}, nil
})
}
  1. Add tests in internal/operators/my_operator_test.go.

External operator plugin (integrator)

  1. In your own package (not in internal/operators), create e.g. my_operator_plugin.go:
    package mywaf
    
    import (
        "github.com/corazawaf/coraza/v3/experimental/plugins"
        "github.com/corazawaf/coraza/v3/experimental/plugins/plugintypes"
    )
    
    type myOperator struct {
        data string
    }
    
    func (o *myOperator) Evaluate(tx plugintypes.TransactionState, value string) bool {
        // implementation
        return false
    }
    
    func init() {
        plugins.RegisterOperator("myOperator", func(options plugintypes.OperatorOptions) (plugintypes.Operator, error) {
            return &myOperator{data: options.Arguments}, nil
        })
    }
  2. Add tests in your package to cover the new operator.

Copilot uses AI. Check for mistakes.
Comment on lines +257 to +260
2. Register via `experimental/plugins/transformations.go`:
```go
plugins.RegisterTransformation("myTransform", myTransform)
```
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transformation section has the same workflow mix-up as operators: registering via experimental/plugins/transformations.go is something an integrator would do from outside the internal/transformations package. If you’re documenting how to add a built-in transformation under internal/transformations/, the example should use that package’s internal registration mechanism (to avoid an import cycle and match existing built-ins).

Copilot uses AI. Check for mistakes.
Comment on lines +272 to +275
2. Register via `experimental/plugins/actions.go`:
```go
plugins.RegisterAction("myAction", func() plugintypes.Action { return &myAction{} })
```
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as operators/transformations: the “Adding a new action” section shows plugins.RegisterAction(...) but places it in a workflow that starts by creating internal/actions/my_action.go. Using experimental/plugins from internal/actions would introduce an import cycle; for built-in actions the registration should be done via the internal actions registry, while plugins.RegisterAction should be shown as code that lives in an integrator/plugin package.

Copilot uses AI. Check for mistakes.
Comment on lines +51 to +53
WriteRequestBody([]byte) / ReadRequestBodyFrom(io.Reader)
-> ProcessRequestBody() -> (*Interruption, error)
```
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request body phase example lists WriteRequestBody([]byte) / ReadRequestBodyFrom(io.Reader) as if they don’t return values, but in the public types.Transaction interface both return (*Interruption, int, error) (interruption + bytes written + error). Update this snippet to reflect the real signatures and/or note that these helpers can surface an interruption directly (and may trigger body processing when limits are reached).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants