Skip to content

Commit a338d8d

Browse files
committed
Create AGENTS.md
1 parent e028bf8 commit a338d8d

File tree

1 file changed

+284
-0
lines changed

1 file changed

+284
-0
lines changed

AGENTS.md

Lines changed: 284 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,284 @@
1+
# AGENTS.md
2+
3+
This file provides instructions for AI coding agents working on the **OWASP Noir Passive Rules** repository.
4+
5+
## Repository Overview
6+
7+
This repository contains YAML-based passive scan rules consumed by [OWASP Noir](https://github.com/owasp-noir/noir). Each rule defines patterns (keywords and regular expressions) to detect secrets, API keys, tokens, and other sensitive data in source code during static analysis.
8+
9+
### Directory Layout
10+
11+
```
12+
.
13+
├── secrets/ # YAML rule definition files (main content)
14+
├── spec/ # Crystal test suite
15+
│ ├── spec_helper.cr # Rule parser & matcher engine
16+
│ └── secrets_spec.cr # Tests for every rule
17+
├── .github/
18+
│ ├── workflows/
19+
│ │ ├── ci.yml # Push-to-main: Crystal spec, contributors, revision
20+
│ │ └── yamllint.yml # PR: YAML lint check
21+
│ ├── yamllint.yml # yamllint configuration
22+
│ └── labeler.yml # PR auto-labeling rules
23+
├── AGENTS.md # ← You are here
24+
├── README.md
25+
├── LICENSE
26+
├── revision # Auto-updated timestamp
27+
└── CONTRIBUTORS.svg # Auto-generated contributor list
28+
```
29+
30+
## Rule File Schema
31+
32+
Every rule file lives under `secrets/` and follows this structure:
33+
34+
```yaml
35+
---
36+
id: unique-rule-id
37+
info:
38+
name: Human readable name
39+
author: [author names]
40+
severity: critical|high|medium|low
41+
description: Description of what this rule detects
42+
reference: ['https://reference-url']
43+
matchers-condition: or|and
44+
matchers:
45+
- type: word
46+
patterns: [KEYWORD_ONE, KEYWORD_TWO]
47+
condition: or|and
48+
- type: regex
49+
patterns:
50+
- 'regex_pattern_here'
51+
condition: or|and
52+
category: secret
53+
techs: ['*']
54+
```
55+
56+
### Required Fields
57+
58+
| Field | Description |
59+
|---|---|
60+
| `id` | Unique identifier, lowercase with hyphens (e.g. `github-token`) |
61+
| `info.name` | Human-readable rule name |
62+
| `info.author` | List of authors |
63+
| `info.severity` | One of: `critical`, `high`, `medium`, `low` |
64+
| `info.description` | What the rule detects |
65+
| `info.reference` | List of reference URLs (use `['']` if none) |
66+
| `matchers-condition` | How multiple matcher blocks combine: `or` or `and` |
67+
| `matchers` | At least one matcher block with `type`, `patterns`, `condition` |
68+
| `category` | Always `secret` |
69+
| `techs` | Always `['*']` for language-agnostic rules |
70+
71+
### Matcher Types
72+
73+
- **`word`** — Exact substring match. Use for environment variable names, fixed prefixes, and URL stems.
74+
- **`regex`** — Regular expression match. Use for structured token formats with known character classes and lengths.
75+
76+
## Adding a New Rule — Step by Step
77+
78+
### 1. Create the YAML Rule File
79+
80+
Copy an existing rule as a starting point:
81+
82+
```bash
83+
cp secrets/github-token.yaml secrets/your-new-rule.yaml
84+
```
85+
86+
Edit the file. Replace `id`, `info`, and `matchers` with your new rule's content.
87+
88+
**CRITICAL**: The file MUST end with exactly one newline character.
89+
90+
### 2. Validate YAML Syntax
91+
92+
```bash
93+
yamllint -c .github/yamllint.yml secrets/your-new-rule.yaml
94+
```
95+
96+
Expected: no output, exit code 0. This completes in under 1 second.
97+
98+
### 3. Add Tests to `spec/secrets_spec.cr`
99+
100+
Every rule **must** have corresponding tests. Add a new `describe` block for your rule inside the `"Passive Secret Rules"` describe block. Each rule needs at minimum:
101+
102+
- **Positive tests** — one per word pattern and one per regex pattern, confirming `rule.match?` returns `true`.
103+
- **Negative tests** — at least one benign string that must NOT match, confirming `rule.match?` returns `false`.
104+
- **Boundary tests** — strings that are close to matching but should not (e.g. too short, wrong prefix).
105+
106+
Example structure:
107+
108+
```crystal
109+
describe "your-new-rule" do
110+
rule = Rule.from_file(File.join(SECRETS_DIR, "your-new-rule.yaml"))
111+
112+
it "matches YOUR_ENV_VAR keyword" do
113+
rule.match?("YOUR" + "_ENV_VAR=something").should be_true
114+
end
115+
116+
it "matches token regex pattern" do
117+
rule.match?("pre" + "fix_" + "A" * 40).should be_true
118+
end
119+
120+
it "does not match unrelated text" do
121+
rule.match?("This is normal text with no secrets").should be_false
122+
end
123+
124+
it "does not match too-short token" do
125+
rule.match?("pre" + "fix_short").should be_false
126+
end
127+
end
128+
```
129+
130+
### 4. Run Tests
131+
132+
```bash
133+
crystal spec/secrets_spec.cr
134+
```
135+
136+
All tests must pass with 0 failures.
137+
138+
### 5. Final Validation
139+
140+
```bash
141+
yamllint -c .github/yamllint.yml secrets/*.yaml
142+
crystal spec/secrets_spec.cr
143+
```
144+
145+
Both commands must succeed before committing.
146+
147+
## ⚠️ GitHub Push Protection — CRITICAL
148+
149+
GitHub Push Protection scans pushed code for patterns that look like real secrets. Since this repository tests secret detection patterns, **test strings can trigger push protection and block your push entirely**.
150+
151+
### The Problem
152+
153+
If you write a test string like this:
154+
155+
```crystal
156+
# ❌ BAD — GitHub will block the push
157+
rule.match?("ghp_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghij").should be_true
158+
```
159+
160+
GitHub's static scanner sees `ghp_ABCDEF...` as a real GitHub Personal Access Token and rejects the push with error `GH013: Repository rule violations found`.
161+
162+
### The Solution — Runtime String Assembly
163+
164+
**Never write a complete secret-like literal in source code.** Instead, assemble it at runtime using string concatenation so no single string literal matches a known secret pattern.
165+
166+
```crystal
167+
# ✅ GOOD — Assembled at runtime, invisible to static scanners
168+
rule.match?("gh" + "p_" + "ABCDEFGHIJKLMNOPQRSTUVWXYZ" + "abcdefghij").should be_true
169+
```
170+
171+
### Using the `FakeSecrets` Module
172+
173+
`spec/secrets_spec.cr` contains a `FakeSecrets` module that centralises all fake secret strings. When adding new rules:
174+
175+
1. **Add a new method** to `FakeSecrets` that builds the fake token with concatenation:
176+
177+
```crystal
178+
module FakeSecrets
179+
# ... existing methods ...
180+
181+
def self.your_new_token
182+
"your_pre" + "fix_" + "A" * 40
183+
end
184+
end
185+
```
186+
187+
2. **Use it in tests**:
188+
189+
```crystal
190+
it "matches token regex" do
191+
rule.match?(FakeSecrets.your_new_token).should be_true
192+
end
193+
```
194+
195+
### What to Split
196+
197+
Split the string at the **signature prefix** that secret scanners look for. Common split points:
198+
199+
| Secret Type | Literal to Avoid | How to Split |
200+
|---|---|---|
201+
| GitHub PAT | `ghp_XXXX` | `"gh" + "p_" + "XXXX"` |
202+
| GitLab PAT | `glpat-XXXX` | `"glp" + "at-" + "XXXX"` |
203+
| AWS Key ID | `AKIAXXXX` | `"AKI" + "AXXXX"` |
204+
| OpenAI Key | `sk-XXXX` | `"sk" + "-" + "XXXX"` |
205+
| Stripe Key | `sk_live_XXXX` | `"sk" + "_live_" + "XXXX"` |
206+
| Webhook URL | `https://hooks.slack.com/services/TXXX` | `"https://hooks" + ".slack.com/services/" + "TXXX"` |
207+
| PEM Header | `-----BEGIN RSA PRIVATE KEY-----` | `"-----BEGIN " + "RSA PRIVATE KEY-----"` |
208+
| Environment variable | `AWS_SECRET_ACCESS_KEY` | `"AWS_SECRET" + "_ACCESS_KEY"` |
209+
210+
**Rule of thumb**: if `grep -E 'known_prefix_pattern'` would match your test string, it needs to be split.
211+
212+
### Verification Before Push
213+
214+
Run this command to check for accidental secret-like literals in test files:
215+
216+
```bash
217+
grep -rnE '(ghp_[A-Za-z0-9]{20,}|gho_[A-Za-z0-9]{20,}|ghu_[A-Za-z0-9]{20,}|ghs_[A-Za-z0-9]{20,}|ghr_[A-Za-z0-9]{20,}|github_pat_[A-Za-z0-9_]{20,}|glpat-[A-Za-z0-9_-]{15,}|glptt-[A-Za-z0-9_-]{15,}|sk_live_[A-Za-z0-9]{20,}|rk_live_[A-Za-z0-9]{20,}|AKIA[0-9A-Z]{16}|sk-[a-zA-Z0-9]{40,}|AIzaSy[A-Za-z0-9\-_]{30,}|xai-[A-Za-z0-9]{80,})' spec/
218+
```
219+
220+
Expected: **no output** (exit code 1). If anything matches, split the offending string.
221+
222+
## Test Architecture
223+
224+
### `spec/spec_helper.cr`
225+
226+
Defines two structs:
227+
228+
- **`Matcher`** — Represents one matcher block. Supports `word` (substring match) and `regex` (Regex match) types with `and`/`or` condition logic.
229+
- **`Rule`** — Represents a full YAML rule. Parses via `Rule.from_file(path)`, evaluates via `Rule#match?(text)`.
230+
231+
### `spec/secrets_spec.cr`
232+
233+
Organised into sections:
234+
235+
| Section | Purpose |
236+
|---|---|
237+
| **YAML structure validation** | Validates every `.yaml` file has all required fields |
238+
| **Regex validity** | Ensures all regex patterns compile without error |
239+
| **Per-rule matching tests** | Positive, negative, and boundary tests for each rule |
240+
| **Cross-rule false positive checks** | Confirms benign strings match zero rules |
241+
| **Matchers-condition semantics** | Validates structural conventions (all use `or`, category `secret`) |
242+
| **Severity validation** | Ensures severity is one of the four allowed values |
243+
244+
The YAML structure and regex validity tests are **automatically applied to all rule files** via `Dir.glob`, so new rules get basic validation for free. But per-rule matching tests must be written manually.
245+
246+
## CI Workflows
247+
248+
### On Push to `main` (`.github/workflows/ci.yml`)
249+
250+
| Job | What It Does |
251+
|---|---|
252+
| `test` | Installs Crystal, runs `crystal spec/secrets_spec.cr` |
253+
| `contributors` | Updates `CONTRIBUTORS.svg` |
254+
| `revision` | Updates `revision` timestamp and pushes |
255+
256+
### On Pull Request (`.github/workflows/yamllint.yml`)
257+
258+
Runs `yamllint` against all `secrets/*.yaml` files.
259+
260+
## Checklist for New Patterns
261+
262+
Use this checklist every time you add or modify a rule:
263+
264+
- [ ] YAML file created/modified in `secrets/` with all required fields
265+
- [ ] `yamllint -c .github/yamllint.yml secrets/your-rule.yaml` passes
266+
- [ ] File ends with exactly one newline character
267+
- [ ] `FakeSecrets` method added in `spec/secrets_spec.cr` using string concatenation
268+
- [ ] `describe` block added with positive, negative, and boundary tests
269+
- [ ] All test strings use runtime assembly — no complete secret literals
270+
- [ ] `grep -rnE '<secret patterns>' spec/` returns no matches
271+
- [ ] `crystal spec/secrets_spec.cr` passes with 0 failures
272+
- [ ] `yamllint -c .github/yamllint.yml secrets/*.yaml` passes for all files
273+
274+
## Common Pitfalls
275+
276+
| Pitfall | Fix |
277+
|---|---|
278+
| YAML file missing trailing newline | Ensure exactly one `\n` at end of file |
279+
| YAML indentation with tabs | Use 2 spaces, never tabs |
280+
| Regex special chars unescaped in YAML | Wrap regex in single quotes: `'pattern'` |
281+
| Push rejected by GitHub Push Protection | Split secret-like test strings with `+` concatenation |
282+
| Test passes locally but fails in CI | Ensure `spec/spec_helper.cr` and rule file are both committed |
283+
| New rule not covered by structure tests | Structure/regex tests auto-discover via glob — just ensure the file is in `secrets/` |
284+
| New rule missing matching tests | Always add a dedicated `describe` block with positive and negative cases |

0 commit comments

Comments
 (0)