docs: add AGENTS.md for LLM codebase navigation#1535
Conversation
Add comprehensive documentation to help LLMs (GitHub Copilot, Claude, Cursor, etc.) understand the Coraza architecture, navigate the codebase, and perform common tasks like adding operators, transformations, and actions. Update copilot-instructions.md to reference AGENTS.md.
Co-authored-by: Felipe Zipitría <3012076+fzipi@users.noreply.github.com>
|
Can be enhanced yet. |
📝 WalkthroughWalkthroughThis pull request adds documentation to guide LLM-assisted development. It introduces a new Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
AGENTS.md (1)
25-27: Avoid hard-coded file counts in navigation docs.Approximate counts like
~38/~33can drift quickly and make this guide stale.💡 Suggested wording
-| `internal/operators/` | All operator implementations (~38 non-test files) | -| `internal/transformations/` | All transformation implementations (~33 non-test files) | -| `internal/actions/` | All action implementations (~33 non-test files) | +| `internal/operators/` | Operator implementations | +| `internal/transformations/` | Transformation implementations | +| `internal/actions/` | Action implementations |🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@AGENTS.md` around lines 25 - 27, Remove the hard-coded approximate file counts from the navigation table entries for `internal/operators/`, `internal/transformations/`, and `internal/actions/` in AGENTS.md; update the three lines to omit `~38` and `~33` (or replace them with a non-quantitative phrase like “(see repository)” or “(various implementations)”) so the doc won’t become stale and will remain accurate without manual updates.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@AGENTS.md`:
- Around line 42-47: Several fenced code blocks in AGENTS.md (for example the
block containing "ProcessConnection(clientIP, clientPort, serverIP, serverPort)
-> ProcessURI(uri, method, httpVersion) -> AddRequestHeader(key, value) //
repeat per header -> ProcessRequestHeaders() -> *Interruption" and the other
unlabeled blocks around the same area) lack a language hint; add a language
label (e.g., ```text) to each unlabeled fenced block to satisfy MD040 and
markdown linters, updating the opening fences for the blocks that contain
constructs like ProcessConnection(...), ProcessURI(...), AddRequestHeader(...),
and similar sequences so they become ```text ... ``` (or another appropriate
language) while leaving the block contents unchanged.
- Around line 206-216: The Markdown table that begins with the header "| Tag |
Effect |" and lists tags like `coraza.disabled_operators.<name>`,
`coraza.rule.multiphase_evaluation`, `tinygo`, etc., needs blank lines inserted
immediately before and after the table to satisfy MD058; update the AGENTS.md
content to add a single empty line above the table header and a single empty
line after the final `| no_fs_access | Disable filesystem access |` row.
---
Nitpick comments:
In `@AGENTS.md`:
- Around line 25-27: Remove the hard-coded approximate file counts from the
navigation table entries for `internal/operators/`, `internal/transformations/`,
and `internal/actions/` in AGENTS.md; update the three lines to omit `~38` and
`~33` (or replace them with a non-quantitative phrase like “(see repository)” or
“(various implementations)”) so the doc won’t become stale and will remain
accurate without manual updates.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 75d3d4aa-9fbe-4dfe-bdbe-74e9ec039ebb
📒 Files selected for processing (2)
.github/copilot-instructions.mdAGENTS.md
| ``` | ||
| ProcessConnection(clientIP, clientPort, serverIP, serverPort) | ||
| -> ProcessURI(uri, method, httpVersion) | ||
| -> AddRequestHeader(key, value) // repeat per header | ||
| -> ProcessRequestHeaders() -> *Interruption | ||
| ``` |
There was a problem hiding this comment.
Add languages to fenced code blocks to satisfy markdown linting.
Several fenced blocks are missing a language hint (e.g., around Line 42 and Line 181). This triggers MD040 and reduces tooling compatibility.
💡 Proposed doc-only fix
-```
+```text
ProcessConnection(clientIP, clientPort, serverIP, serverPort)
-> ProcessURI(uri, method, httpVersion)
-> AddRequestHeader(key, value) // repeat per header
-> ProcessRequestHeaders() -> *Interruption
Apply the same `text` (or other appropriate language) label to the other unlabeled fenced blocks.
</details>
Also applies to: 50-53, 57-60, 63-66, 69-71, 181-183
<details>
<summary>🧰 Tools</summary>
<details>
<summary>🪛 markdownlint-cli2 (0.22.0)</summary>
[warning] 42-42: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
</details>
</details>
<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.
In @AGENTS.md around lines 42 - 47, Several fenced code blocks in AGENTS.md (for
example the block containing "ProcessConnection(clientIP, clientPort, serverIP,
serverPort) -> ProcessURI(uri, method, httpVersion) -> AddRequestHeader(key,
value) // repeat per header -> ProcessRequestHeaders() -> *Interruption" and
the other unlabeled blocks around the same area) lack a language hint; add a
language label (e.g., text) to each unlabeled fenced block to satisfy MD040 and markdown linters, updating the opening fences for the blocks that contain constructs like ProcessConnection(...), ProcessURI(...), AddRequestHeader(...), and similar sequences so they become text ... ``` (or another appropriate
language) while leaving the block contents unchanged.
</details>
<!-- fingerprinting:phantom:triton:hawk:edcc74e9-9abc-488e-8a5b-dc306d8c4d0b -->
<!-- This is an auto-generated comment by CodeRabbit -->
| | Tag | Effect | | ||
| |---|---| | ||
| | `coraza.disabled_operators.<name>` | Exclude a specific operator from compilation | | ||
| | `coraza.rule.multiphase_evaluation` | Evaluate rule variables in phases they become ready | | ||
| | `coraza.rule.case_sensitive_args_keys` | Case-sensitive ARGS key matching (RFC 3986) | | ||
| | `coraza.rule.no_regex_multiline` | Disable default multiline mode in `@rx` operator | | ||
| | `coraza.rule.mandatory_rule_id_check` | Require `id` action for all SecRule/SecAction | | ||
| | `tinygo` | TinyGo-compatible build (affects sync primitives, FS access) | | ||
| | `memoize_builders` | Enable memoization of operator/transformation builders | | ||
| | `no_fs_access` | Disable filesystem access | | ||
|
|
There was a problem hiding this comment.
Surround the build-tags table with blank lines.
The table starting at Line 206 should be separated by blank lines to satisfy MD058.
💡 Proposed doc-only fix
### Build tags
+
| Tag | Effect |
|---|---|
| `coraza.disabled_operators.<name>` | Exclude a specific operator from compilation |
| `coraza.rule.multiphase_evaluation` | Evaluate rule variables in phases they become ready |
| `coraza.rule.case_sensitive_args_keys` | Case-sensitive ARGS key matching (RFC 3986) |
| `coraza.rule.no_regex_multiline` | Disable default multiline mode in `@rx` operator |
| `coraza.rule.mandatory_rule_id_check` | Require `id` action for all SecRule/SecAction |
| `tinygo` | TinyGo-compatible build (affects sync primitives, FS access) |
| `memoize_builders` | Enable memoization of operator/transformation builders |
| `no_fs_access` | Disable filesystem access |
+
### Generated code📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | Tag | Effect | | |
| |---|---| | |
| | `coraza.disabled_operators.<name>` | Exclude a specific operator from compilation | | |
| | `coraza.rule.multiphase_evaluation` | Evaluate rule variables in phases they become ready | | |
| | `coraza.rule.case_sensitive_args_keys` | Case-sensitive ARGS key matching (RFC 3986) | | |
| | `coraza.rule.no_regex_multiline` | Disable default multiline mode in `@rx` operator | | |
| | `coraza.rule.mandatory_rule_id_check` | Require `id` action for all SecRule/SecAction | | |
| | `tinygo` | TinyGo-compatible build (affects sync primitives, FS access) | | |
| | `memoize_builders` | Enable memoization of operator/transformation builders | | |
| | `no_fs_access` | Disable filesystem access | | |
| ### Build tags | |
| | Tag | Effect | | |
| |---|---| | |
| | `coraza.disabled_operators.<name>` | Exclude a specific operator from compilation | | |
| | `coraza.rule.multiphase_evaluation` | Evaluate rule variables in phases they become ready | | |
| | `coraza.rule.case_sensitive_args_keys` | Case-sensitive ARGS key matching (RFC 3986) | | |
| | `coraza.rule.no_regex_multiline` | Disable default multiline mode in `@rx` operator | | |
| | `coraza.rule.mandatory_rule_id_check` | Require `id` action for all SecRule/SecAction | | |
| | `tinygo` | TinyGo-compatible build (affects sync primitives, FS access) | | |
| | `memoize_builders` | Enable memoization of operator/transformation builders | | |
| | `no_fs_access` | Disable filesystem access | | |
| ### Generated code |
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 206-206: Tables should be surrounded by blank lines
(MD058, blanks-around-tables)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@AGENTS.md` around lines 206 - 216, The Markdown table that begins with the
header "| Tag | Effect |" and lists tags like
`coraza.disabled_operators.<name>`, `coraza.rule.multiphase_evaluation`,
`tinygo`, etc., needs blank lines inserted immediately before and after the
table to satisfy MD058; update the AGENTS.md content to add a single empty line
above the table header and a single empty line after the final `| no_fs_access |
Disable filesystem access |` row.
There was a problem hiding this comment.
Pull request overview
Adds a repo-root AGENTS.md to document Coraza’s architecture and key subsystems for LLM-assisted navigation, and links to it from the existing Copilot instructions.
Changes:
- Added
AGENTS.mdwith architecture notes, key file pointers, build/test guidance, and “common tasks” how-tos. - Updated
.github/copilot-instructions.mdto referenceAGENTS.md.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
AGENTS.md |
New LLM navigation/architecture guide covering repo structure, request pipeline, rule evaluation, plugins, collections, parser, build/tags, and common tasks. |
.github/copilot-instructions.md |
Adds a link pointing readers to AGENTS.md for detailed architecture/navigation docs. |
|
|
||
| ### Phase 4 - Response Body | ||
| ``` | ||
| WriteResponseBody([]byte) / ReadResponseBodyFrom(io.Reader) |
There was a problem hiding this comment.
Similarly, the response body phase example omits the actual return values: WriteResponseBody / ReadResponseBodyFrom return (*Interruption, int, error) in the types.Transaction interface. Adjust the snippet so it matches the API and makes it clear callers can receive an interruption/byte count directly from these methods.
| WriteResponseBody([]byte) / ReadResponseBodyFrom(io.Reader) | |
| WriteResponseBody([]byte) -> (*Interruption, int, error) | |
| ReadResponseBodyFrom(io.Reader) -> (*Interruption, int, error) |
|
|
||
| ### Mage tasks | ||
| ```bash | ||
| go run mage.go test # Run all tests (including memoize_builders, multiphase, CRS) |
There was a problem hiding this comment.
go run mage.go test currently runs go test ./..., then reruns tests with -tags=coraza.no_memoize, plus additional suites (examples/http-server with race, and testing/coreruleset with multiple tags). The comment mentioning "memoize_builders" doesn’t match any build tag or what Mage runs today; consider updating this line to reference the actual tag (coraza.no_memoize) and the additional tagged suites (multiphase, no_regex_multiline, case_sensitive_args_keys).
| go run mage.go test # Run all tests (including memoize_builders, multiphase, CRS) | |
| go run mage.go test # Run all tests (go test ./..., then with -tags=coraza.no_memoize, plus examples/http-server (race) and testing/coreruleset with multiphase/no_regex_multiline/case_sensitive_args_keys) |
| | `coraza.rule.no_regex_multiline` | Disable default multiline mode in `@rx` operator | | ||
| | `coraza.rule.mandatory_rule_id_check` | Require `id` action for all SecRule/SecAction | | ||
| | `tinygo` | TinyGo-compatible build (affects sync primitives, FS access) | | ||
| | `memoize_builders` | Enable memoization of operator/transformation builders | |
There was a problem hiding this comment.
The build tags table appears inaccurate/incomplete compared to the repository’s documented/used tags: there is no memoize_builders build tag in the codebase, and coraza.no_memoize (used in Mage/CI and documented in README) is missing. Suggest removing memoize_builders, adding coraza.no_memoize, and aligning the list with README.md’s “Build tags” section to avoid confusing integrators/LLMs.
| | `memoize_builders` | Enable memoization of operator/transformation builders | | |
| | `coraza.no_memoize` | Disable memoization of operator/transformation builders | |
| 1. Create `internal/operators/my_operator.go`: | ||
| ```go | ||
| type myOperator struct { | ||
| data string | ||
| } | ||
| func (o *myOperator) Evaluate(tx plugintypes.TransactionState, value string) bool { | ||
| // implementation | ||
| } | ||
| ``` | ||
| 2. Register in `internal/operators/` init or via `experimental/plugins/operators.go`: | ||
| ```go | ||
| plugins.RegisterOperator("myOperator", func(options plugintypes.OperatorOptions) (plugintypes.Operator, error) { | ||
| return &myOperator{data: options.Arguments}, nil | ||
| }) | ||
| ``` | ||
| 3. Add tests in `internal/operators/my_operator_test.go` |
There was a problem hiding this comment.
The “Adding a new operator” steps mix two different workflows and the example as written would create an import cycle if followed inside internal/operators/ (because experimental/plugins imports internal/operators). If the intent is adding a built-in operator, the registration should be done via internal/operators.Register(...) from within the operators package; if the intent is an external plugin, show plugins.RegisterOperator(...) from an integrator’s package (not from internal/operators).
| 1. Create `internal/operators/my_operator.go`: | |
| ```go | |
| type myOperator struct { | |
| data string | |
| } | |
| func (o *myOperator) Evaluate(tx plugintypes.TransactionState, value string) bool { | |
| // implementation | |
| } | |
| ``` | |
| 2. Register in `internal/operators/` init or via `experimental/plugins/operators.go`: | |
| ```go | |
| plugins.RegisterOperator("myOperator", func(options plugintypes.OperatorOptions) (plugintypes.Operator, error) { | |
| return &myOperator{data: options.Arguments}, nil | |
| }) | |
| ``` | |
| 3. Add tests in `internal/operators/my_operator_test.go` | |
| There are two ways to add an operator: | |
| - As a **built-in operator** in the core `internal/operators` package. | |
| - As an **external plugin operator** registered via `experimental/plugins` from your own package. | |
| #### Built-in operator (core) | |
| 1. Create `internal/operators/my_operator.go`: | |
| ```go | |
| package operators | |
| import "github.com/corazawaf/coraza/v3/experimental/plugins/plugintypes" | |
| type myOperator struct { | |
| data string | |
| } | |
| func (o *myOperator) Evaluate(tx plugintypes.TransactionState, value string) bool { | |
| // implementation | |
| return false | |
| } | |
| func init() { | |
| Register("myOperator", func(options plugintypes.OperatorOptions) (plugintypes.Operator, error) { | |
| return &myOperator{data: options.Arguments}, nil | |
| }) | |
| } |
- Add tests in
internal/operators/my_operator_test.go.
External operator plugin (integrator)
- In your own package (not in
internal/operators), create e.g.my_operator_plugin.go:package mywaf import ( "github.com/corazawaf/coraza/v3/experimental/plugins" "github.com/corazawaf/coraza/v3/experimental/plugins/plugintypes" ) type myOperator struct { data string } func (o *myOperator) Evaluate(tx plugintypes.TransactionState, value string) bool { // implementation return false } func init() { plugins.RegisterOperator("myOperator", func(options plugintypes.OperatorOptions) (plugintypes.Operator, error) { return &myOperator{data: options.Arguments}, nil }) }
- Add tests in your package to cover the new operator.
| 2. Register via `experimental/plugins/transformations.go`: | ||
| ```go | ||
| plugins.RegisterTransformation("myTransform", myTransform) | ||
| ``` |
There was a problem hiding this comment.
The transformation section has the same workflow mix-up as operators: registering via experimental/plugins/transformations.go is something an integrator would do from outside the internal/transformations package. If you’re documenting how to add a built-in transformation under internal/transformations/, the example should use that package’s internal registration mechanism (to avoid an import cycle and match existing built-ins).
| 2. Register via `experimental/plugins/actions.go`: | ||
| ```go | ||
| plugins.RegisterAction("myAction", func() plugintypes.Action { return &myAction{} }) | ||
| ``` |
There was a problem hiding this comment.
Same as operators/transformations: the “Adding a new action” section shows plugins.RegisterAction(...) but places it in a workflow that starts by creating internal/actions/my_action.go. Using experimental/plugins from internal/actions would introduce an import cycle; for built-in actions the registration should be done via the internal actions registry, while plugins.RegisterAction should be shown as code that lives in an integrator/plugin package.
| WriteRequestBody([]byte) / ReadRequestBodyFrom(io.Reader) | ||
| -> ProcessRequestBody() -> (*Interruption, error) | ||
| ``` |
There was a problem hiding this comment.
The request body phase example lists WriteRequestBody([]byte) / ReadRequestBodyFrom(io.Reader) as if they don’t return values, but in the public types.Transaction interface both return (*Interruption, int, error) (interruption + bytes written + error). Update this snippet to reflect the real signatures and/or note that these helpers can surface an interruption directly (and may trigger body processing when limits are reached).
Summary
AGENTS.mdat the repo root with comprehensive documentation to help LLMs (GitHub Copilot, Claude, Cursor, etc.) navigate and understand the Coraza codebase.github/copilot-instructions.mdto reference AGENTS.mdTest plan
Summary by CodeRabbit