|
| 1 | +# Integration Tests Coverage Guide |
| 2 | + |
| 3 | +A reference guide to what the gh-aw-firewall integration tests cover and how they relate to real-world usage in GitHub Agentic Workflows. |
| 4 | + |
| 5 | +**Last updated:** February 2026 |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Quick Navigation |
| 10 | + |
| 11 | +| Area | Tests | Doc | |
| 12 | +|------|-------|-----| |
| 13 | +| Domain filtering, DNS, network security | 6 files, ~50 tests | [domain-network.md](test-analysis/domain-network.md) | |
| 14 | +| Chroot sandbox, languages, package managers | 5 files, ~70 tests | [chroot.md](test-analysis/chroot.md) | |
| 15 | +| Protocol support, credentials, tokens | 8 files, ~100 tests | [protocol-security.md](test-analysis/protocol-security.md) | |
| 16 | +| Containers, volumes, git, env vars | 7 files, ~45 tests | [container-ops.md](test-analysis/container-ops.md) | |
| 17 | +| CI workflows, smoke tests, build-test | 27 workflows | [ci-smoke.md](test-analysis/ci-smoke.md) | |
| 18 | +| Test fixtures and infrastructure | 6 helper files | [test-infra.md](test-analysis/test-infra.md) | |
| 19 | + |
| 20 | +--- |
| 21 | + |
| 22 | +## Overview |
| 23 | + |
| 24 | +The test suite is organized in three tiers: |
| 25 | + |
| 26 | +``` |
| 27 | +┌─────────────────────────────────────────────────────┐ |
| 28 | +│ Smoke Tests (4 workflows) │ |
| 29 | +│ Smoke workflows (Claude, Copilot, Codex, Chroot) │ |
| 30 | +│ running inside AWF sandbox │ |
| 31 | +├─────────────────────────────────────────────────────┤ |
| 32 | +│ Build-Test Workflows (8 workflows) │ |
| 33 | +│ Real projects (Go, Rust, Java, Node, etc.) │ |
| 34 | +│ built and tested through the firewall proxy │ |
| 35 | +├─────────────────────────────────────────────────────┤ |
| 36 | +│ Integration Tests (26 files, ~265 tests) │ |
| 37 | +│ End-to-end AWF container execution with │ |
| 38 | +│ domain filtering, chroot, security assertions │ |
| 39 | +├─────────────────────────────────────────────────────┤ |
| 40 | +│ Unit Tests (19 files) │ |
| 41 | +│ Individual module testing (parser, config, logger) │ |
| 42 | +└─────────────────────────────────────────────────────┘ |
| 43 | +``` |
| 44 | + |
| 45 | +### Test Counts by Category |
| 46 | + |
| 47 | +| Category | Files | Approx Tests | CI Workflow | |
| 48 | +|----------|-------|-------------|-------------| |
| 49 | +| Domain/Network | 6 | 50 | None | |
| 50 | +| Chroot | 5 | 70 | `test-chroot.yml` (4 jobs) | |
| 51 | +| Protocol/Security | 8 | 100 | None | |
| 52 | +| Container/Ops | 7 | 45 | None | |
| 53 | +| Unit Tests | 19 | ~200 | `test-coverage.yml` | |
| 54 | +| Smoke Tests | 4 | N/A | Per-workflow (scheduled + PR) | |
| 55 | +| Build-Test | 8 | N/A | Per-workflow (PR + dispatch) | |
| 56 | + |
| 57 | +--- |
| 58 | + |
| 59 | +## What's Covered |
| 60 | + |
| 61 | +### 1. Chroot Filesystem Isolation (Strong) |
| 62 | + |
| 63 | +The chroot tests are the most mature, run in CI, and cover critical scenarios: |
| 64 | + |
| 65 | +- **Language runtimes**: Python, Node.js, Go, Java, .NET, Ruby, Rust all verified accessible through chroot |
| 66 | +- **Package managers**: pip, npm, cargo, maven, dotnet, gem, go modules — all tested for registry connectivity |
| 67 | +- **Security properties**: NET_ADMIN/SYS_CHROOT capability drop, Docker socket hidden, non-root execution |
| 68 | +- **/proc filesystem**: Dynamic mount verified for JVM and .NET CLR compatibility |
| 69 | +- **Shell features**: Pipes, redirects, command substitution, compound commands all work in chroot |
| 70 | + |
| 71 | +**CI coverage**: 4 parallel jobs in `test-chroot.yml` exercise these tests on every PR. |
| 72 | + |
| 73 | +### 2. Credential Isolation (Strong) |
| 74 | + |
| 75 | +Multi-layered defense tested at each level: |
| 76 | + |
| 77 | +- **Credential file hiding**: Docker config, GitHub CLI tokens, npmrc auth tokens all verified hidden via `/dev/null` overlays |
| 78 | +- **Exfiltration resistance**: base64 encoding, xxd pipelines, grep patterns all tested — return empty |
| 79 | +- **Chroot bypass prevention**: Specific regression test for the vulnerability where credentials were accessible at `$HOME` but not `/host$HOME` |
| 80 | +- **API proxy sidecar**: Agent gets placeholder tokens; real keys held by proxy. Healthchecks for OpenAI, Anthropic, Copilot |
| 81 | +- **One-shot token library**: LD_PRELOAD intercepts `getenv()`, caches value, clears from environment. Tested in both container and chroot modes |
| 82 | +- **Token unsetting from /proc/1/environ**: GITHUB_TOKEN, OPENAI_API_KEY, ANTHROPIC_API_KEY all verified cleared |
| 83 | + |
| 84 | +### 3. Multi-Engine Smoke Tests (Strong) |
| 85 | + |
| 86 | +Real AI agents running through the full AWF pipeline: |
| 87 | + |
| 88 | +- **Claude**: GitHub MCP, Playwright browser automation, file I/O, bash tools |
| 89 | +- **Copilot**: Same + web-fetch, agentic-workflows tools |
| 90 | +- **Codex**: GH CLI safe inputs, Tavily web search, discussion interactions |
| 91 | + |
| 92 | +### 4. Multi-Language Build-Test (Strong) |
| 93 | + |
| 94 | +8 language ecosystems tested with real open-source projects: |
| 95 | + |
| 96 | +- Bun, C++, Deno, .NET, Go, Java, Node.js, Rust |
| 97 | +- Each clones a test repo, installs dependencies, builds, and runs tests through AWF |
| 98 | + |
| 99 | +### 5. Exit Code Propagation (Good) |
| 100 | + |
| 101 | +15 tests covering exit codes 0-255, command exit codes, pipeline behavior. Critical for CI/CD integration where non-zero = failure. |
| 102 | + |
| 103 | +--- |
| 104 | + |
| 105 | +## Coverage Heat Map |
| 106 | + |
| 107 | +A visual overview of what's tested vs. not: |
| 108 | + |
| 109 | +``` |
| 110 | +Feature Unit Integration CI Smoke Build-Test |
| 111 | +───────────────────────────────────────────────────────────────────────── |
| 112 | +Domain allow-list ✅ ✅ ❌ ✅ ✅ |
| 113 | +Domain deny-list (--block-domains) ❌ ❌ ❌ ❌ ❌ |
| 114 | +Wildcard patterns ✅ ✅ ❌ ❌ ❌ |
| 115 | +Empty domains (air-gapped) ❌ ✅ ❌ ❌ ❌ |
| 116 | +DNS server restriction ✅ ⚠️ * ❌ ❌ ❌ |
| 117 | +Network security (SSRF, bypass) ❌ ✅ ❌ ❌ ❌ |
| 118 | +Chroot languages ❌ ✅ ✅ ✅ ✅ |
| 119 | +Chroot package managers ❌ ✅ ✅ ❌ ✅ |
| 120 | +Chroot /proc filesystem ❌ ✅ ✅ ❌ ❌ |
| 121 | +Chroot edge cases ❌ ✅ ✅ ❌ ❌ |
| 122 | +Credential hiding ❌ ✅ ❌ ❌ ❌ |
| 123 | +Token unsetting ❌ ✅ ❌ ❌ ❌ |
| 124 | +One-shot tokens (LD_PRELOAD) ❌ ✅ ❌ ❌ ❌ |
| 125 | +API proxy sidecar ❌ ✅ ❌ ❌ ❌ |
| 126 | +Protocol support (HTTP/HTTPS) ❌ ✅ ❌ ❌ ❌ |
| 127 | +IPv6 ❌ ✅ ❌ ❌ ❌ |
| 128 | +Exit code propagation ❌ ✅ ❌ ❌ ❌ |
| 129 | +Error handling ❌ ✅ ❌ ❌ ❌ |
| 130 | +Volume mounts ❌ ✅ ❌ ❌ ❌ |
| 131 | +Container workdir ❌ ✅ ❌ ❌ ❌ |
| 132 | +Git operations ❌ ✅ ❌ ❌ ❌ |
| 133 | +Environment variables ❌ ✅ ❌ ❌ ❌ |
| 134 | +--env-all ❌ ❌ ❌ ❌ ❌ |
| 135 | +SSL Bump ✅ ❌ ❌ ❌ ❌ |
| 136 | +Log commands ✅ ⚠️ * ❌ ❌ ❌ |
| 137 | +Docker unavailability ❌ ✅ ❌ ❌ ❌ |
| 138 | +Docker warning stub ❌ ❌ ** ❌ ❌ ❌ |
| 139 | +Setup action (action.yml) ❌ ❌ ✅ ❌ ❌ |
| 140 | +Container security scan ❌ ❌ ✅ ❌ ❌ |
| 141 | +Dependency audit ❌ ❌ ✅ ❌ ❌ |
| 142 | +
|
| 143 | +* ⚠️ = Tests exist but have significant gaps (see detailed docs) |
| 144 | +** = Tests exist but are skipped |
| 145 | +``` |
| 146 | + |
| 147 | +--- |
| 148 | + |
| 149 | +## Test Infrastructure Summary |
| 150 | + |
| 151 | +### How Tests Run |
| 152 | + |
| 153 | +- **Serial execution** (`maxWorkers: 1`) — Docker network/container conflicts prevent parallelism |
| 154 | +- **120-second timeout** per test — container lifecycle takes 15-25 seconds |
| 155 | +- **Batch runner** groups commands sharing the same config into single containers — reduces ~73 startups to ~27 for chroot tests |
| 156 | +- **Custom Jest matchers**: `toSucceed()`, `toFail()`, `toExitWithCode()`, `toTimeout()`, `toAllowDomain()`, `toBlockDomain()` |
| 157 | +- **4-stage cleanup**: pre-test TypeScript cleanup → AWF normal exit → AWF signal handlers → CI always-cleanup |
| 158 | + |
| 159 | +### Infrastructure Limitations |
| 160 | + |
| 161 | +1. Docker + sudo required — no lightweight local testing |
| 162 | +2. Batch runner loses individual stderr (merged via `2>&1`) |
| 163 | +3. Log-based matchers require `keepContainers: true` |
| 164 | +4. Aggressive `docker prune` in cleanup can affect non-AWF containers |
| 165 | +5. No retry logic for flaky network tests |
| 166 | + |
| 167 | +See [test-infra.md](test-analysis/test-infra.md) for full infrastructure analysis. |
| 168 | + |
| 169 | +--- |
| 170 | + |
| 171 | +## Detailed Analysis Documents |
| 172 | + |
| 173 | +Each document provides per-test-case analysis with plain-language descriptions, real-world mappings, and gap identification: |
| 174 | + |
| 175 | +- **[Domain & Network Tests](test-analysis/domain-network.md)** — Domain filtering, DNS, network security, localhost |
| 176 | +- **[Chroot Tests](test-analysis/chroot.md)** — Sandbox isolation, languages, package managers, /proc, edge cases |
| 177 | +- **[Protocol & Security Tests](test-analysis/protocol-security.md)** — HTTP/HTTPS, IPv6, API proxy, credentials, tokens, exit codes |
| 178 | +- **[Container & Operations Tests](test-analysis/container-ops.md)** — Workdir, volumes, git, env vars, logging, Docker availability |
| 179 | +- **[CI & Smoke Tests](test-analysis/ci-smoke.md)** — All 27 CI/smoke/build-test workflows analyzed |
| 180 | +- **[Test Infrastructure](test-analysis/test-infra.md)** — Runner architecture, batch pattern, cleanup strategy, limitations |
0 commit comments