Skip to content

Commit d60b392

Browse files
Mossakaclaude
andcommitted
chore: merge origin/main into test/ci-integration-suite
Resolve merge conflicts in tests/fixtures/awf-runner.ts by keeping both rate limit options (from this branch) and envAll/cliEnv options (from main). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2 parents 1a28945 + f7c60f5 commit d60b392

23 files changed

+3216
-262
lines changed

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ This is a firewall for GitHub Copilot CLI (package name: `@github/awf`) that pro
1313
- **[LOGGING.md](LOGGING.md)** - Comprehensive logging documentation
1414
- **[docs/logging_quickref.md](docs/logging_quickref.md)** - Quick reference for log queries and monitoring
1515
- **[docs/releasing.md](docs/releasing.md)** - Release process and versioning instructions
16+
- **[docs/INTEGRATION-TESTS.md](docs/INTEGRATION-TESTS.md)** - Integration test coverage guide with gap analysis
1617

1718
## Development Workflow
1819

containers/api-proxy/Dockerfile

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,8 @@ USER apiproxy
2727
# 10000 - OpenAI API proxy (also serves as health check endpoint)
2828
# 10001 - Anthropic API proxy
2929
# 10002 - GitHub Copilot API proxy
30-
EXPOSE 10000 10001 10002
30+
# 10004 - OpenCode API proxy (routes to Anthropic)
31+
EXPOSE 10000 10001 10002 10004
3132

3233
# Redirect stdout/stderr to log file for persistence
3334
# Use shell form to enable redirection and tee for both file and console

containers/api-proxy/server.js

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -426,6 +426,35 @@ if (COPILOT_GITHUB_TOKEN) {
426426
logRequest('info', 'server_start', { message: 'GitHub Copilot proxy listening on port 10002' });
427427
});
428428
}
429+
430+
// OpenCode API proxy (port 10004) — routes to Anthropic (default BYOK provider)
431+
// OpenCode gets a separate port from Claude (10001) for per-engine rate limiting,
432+
// metrics isolation, and future provider routing (OpenCode is BYOK and may route
433+
// to different providers in the future based on model prefix).
434+
if (ANTHROPIC_API_KEY) {
435+
const opencodeServer = http.createServer((req, res) => {
436+
if (req.url === '/health' && req.method === 'GET') {
437+
res.writeHead(200, { 'Content-Type': 'application/json' });
438+
res.end(JSON.stringify({ status: 'healthy', service: 'opencode-proxy' }));
439+
return;
440+
}
441+
442+
const logMethod = sanitizeForLog(req.method);
443+
const logUrl = sanitizeForLog(req.url);
444+
console.log(`[OpenCode Proxy] ${logMethod} ${logUrl}`);
445+
console.log('[OpenCode Proxy] Injecting x-api-key header with ANTHROPIC_API_KEY');
446+
const anthropicHeaders = { 'x-api-key': ANTHROPIC_API_KEY };
447+
if (!req.headers['anthropic-version']) {
448+
anthropicHeaders['anthropic-version'] = '2023-06-01';
449+
}
450+
proxyRequest(req, res, 'api.anthropic.com', anthropicHeaders);
451+
});
452+
453+
opencodeServer.listen(10004, '0.0.0.0', () => {
454+
console.log('[API Proxy] OpenCode proxy listening on port 10004 (-> Anthropic)');
455+
});
456+
}
457+
429458
// Graceful shutdown
430459
process.on('SIGTERM', () => {
431460
logRequest('info', 'shutdown', { message: 'Received SIGTERM, shutting down gracefully' });

docs-site/src/content/docs/reference/cli-reference.md

Lines changed: 0 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,6 @@ awf [options] -- <command>
4242
| `--enable-host-access` | flag | `false` | Enable access to host services via host.docker.internal |
4343
| `--allow-host-ports <ports>` | string | `80,443` | Ports to allow when using --enable-host-access |
4444
| `--agent-image <value>` | string | `default` | Agent container image (default, act, or custom) |
45-
| `--allow-full-filesystem-access` | flag | `false` | ⚠️ Mount entire host filesystem with read-write access |
4645
| `-V, --version` | flag || Display version |
4746
| `-h, --help` | flag || Display help |
4847

@@ -364,31 +363,6 @@ Custom images are validated against approved patterns to prevent supply chain at
364363

365364
**See also:** [Agent Images Reference](/gh-aw-firewall/reference/agent-images/)
366365

367-
### `--allow-full-filesystem-access`
368-
369-
:::danger[⚠️ SECURITY WARNING]
370-
This flag **DISABLES selective mounting security** and mounts the entire host filesystem with **read-write access**. This exposes **ALL** credential files including:
371-
372-
- Docker Hub tokens (`~/.docker/config.json`)
373-
- GitHub CLI tokens (`~/.config/gh/hosts.yml`)
374-
- NPM, Cargo, Composer credentials
375-
- SSH keys, GPG keys, and other sensitive files
376-
377-
**Only use this flag if:**
378-
1. You are running trusted code that you have fully reviewed
379-
2. You understand the security implications
380-
3. You cannot use `--mount` to selectively mount needed directories
381-
:::
382-
383-
```bash
384-
# ⚠️ Use with extreme caution
385-
sudo awf --allow-full-filesystem-access \
386-
--allow-domains github.com \
387-
-- trusted-command
388-
```
389-
390-
**Alternatives:**
391-
- Use `--mount` to selectively mount only needed directories (recommended)
392366

393367
## Exit Codes
394368

docs/INTEGRATION-TESTS.md

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Integration Tests Coverage Guide
2+
3+
A reference guide to what the gh-aw-firewall integration tests cover and how they relate to real-world usage in GitHub Agentic Workflows.
4+
5+
**Last updated:** February 2026
6+
7+
---
8+
9+
## Quick Navigation
10+
11+
| Area | Tests | Doc |
12+
|------|-------|-----|
13+
| Domain filtering, DNS, network security | 6 files, ~50 tests | [domain-network.md](test-analysis/domain-network.md) |
14+
| Chroot sandbox, languages, package managers | 5 files, ~70 tests | [chroot.md](test-analysis/chroot.md) |
15+
| Protocol support, credentials, tokens | 8 files, ~100 tests | [protocol-security.md](test-analysis/protocol-security.md) |
16+
| Containers, volumes, git, env vars | 7 files, ~45 tests | [container-ops.md](test-analysis/container-ops.md) |
17+
| CI workflows, smoke tests, build-test | 27 workflows | [ci-smoke.md](test-analysis/ci-smoke.md) |
18+
| Test fixtures and infrastructure | 6 helper files | [test-infra.md](test-analysis/test-infra.md) |
19+
20+
---
21+
22+
## Overview
23+
24+
The test suite is organized in three tiers:
25+
26+
```
27+
┌─────────────────────────────────────────────────────┐
28+
│ Smoke Tests (4 workflows) │
29+
│ Smoke workflows (Claude, Copilot, Codex, Chroot) │
30+
│ running inside AWF sandbox │
31+
├─────────────────────────────────────────────────────┤
32+
│ Build-Test Workflows (8 workflows) │
33+
│ Real projects (Go, Rust, Java, Node, etc.) │
34+
│ built and tested through the firewall proxy │
35+
├─────────────────────────────────────────────────────┤
36+
│ Integration Tests (26 files, ~265 tests) │
37+
│ End-to-end AWF container execution with │
38+
│ domain filtering, chroot, security assertions │
39+
├─────────────────────────────────────────────────────┤
40+
│ Unit Tests (19 files) │
41+
│ Individual module testing (parser, config, logger) │
42+
└─────────────────────────────────────────────────────┘
43+
```
44+
45+
### Test Counts by Category
46+
47+
| Category | Files | Approx Tests | CI Workflow |
48+
|----------|-------|-------------|-------------|
49+
| Domain/Network | 6 | 50 | None |
50+
| Chroot | 5 | 70 | `test-chroot.yml` (4 jobs) |
51+
| Protocol/Security | 8 | 100 | None |
52+
| Container/Ops | 7 | 45 | None |
53+
| Unit Tests | 19 | ~200 | `test-coverage.yml` |
54+
| Smoke Tests | 4 | N/A | Per-workflow (scheduled + PR) |
55+
| Build-Test | 8 | N/A | Per-workflow (PR + dispatch) |
56+
57+
---
58+
59+
## What's Covered
60+
61+
### 1. Chroot Filesystem Isolation (Strong)
62+
63+
The chroot tests are the most mature, run in CI, and cover critical scenarios:
64+
65+
- **Language runtimes**: Python, Node.js, Go, Java, .NET, Ruby, Rust all verified accessible through chroot
66+
- **Package managers**: pip, npm, cargo, maven, dotnet, gem, go modules — all tested for registry connectivity
67+
- **Security properties**: NET_ADMIN/SYS_CHROOT capability drop, Docker socket hidden, non-root execution
68+
- **/proc filesystem**: Dynamic mount verified for JVM and .NET CLR compatibility
69+
- **Shell features**: Pipes, redirects, command substitution, compound commands all work in chroot
70+
71+
**CI coverage**: 4 parallel jobs in `test-chroot.yml` exercise these tests on every PR.
72+
73+
### 2. Credential Isolation (Strong)
74+
75+
Multi-layered defense tested at each level:
76+
77+
- **Credential file hiding**: Docker config, GitHub CLI tokens, npmrc auth tokens all verified hidden via `/dev/null` overlays
78+
- **Exfiltration resistance**: base64 encoding, xxd pipelines, grep patterns all tested — return empty
79+
- **Chroot bypass prevention**: Specific regression test for the vulnerability where credentials were accessible at `$HOME` but not `/host$HOME`
80+
- **API proxy sidecar**: Agent gets placeholder tokens; real keys held by proxy. Healthchecks for OpenAI, Anthropic, Copilot
81+
- **One-shot token library**: LD_PRELOAD intercepts `getenv()`, caches value, clears from environment. Tested in both container and chroot modes
82+
- **Token unsetting from /proc/1/environ**: GITHUB_TOKEN, OPENAI_API_KEY, ANTHROPIC_API_KEY all verified cleared
83+
84+
### 3. Multi-Engine Smoke Tests (Strong)
85+
86+
Real AI agents running through the full AWF pipeline:
87+
88+
- **Claude**: GitHub MCP, Playwright browser automation, file I/O, bash tools
89+
- **Copilot**: Same + web-fetch, agentic-workflows tools
90+
- **Codex**: GH CLI safe inputs, Tavily web search, discussion interactions
91+
92+
### 4. Multi-Language Build-Test (Strong)
93+
94+
8 language ecosystems tested with real open-source projects:
95+
96+
- Bun, C++, Deno, .NET, Go, Java, Node.js, Rust
97+
- Each clones a test repo, installs dependencies, builds, and runs tests through AWF
98+
99+
### 5. Exit Code Propagation (Good)
100+
101+
15 tests covering exit codes 0-255, command exit codes, pipeline behavior. Critical for CI/CD integration where non-zero = failure.
102+
103+
---
104+
105+
## Coverage Heat Map
106+
107+
A visual overview of what's tested vs. not:
108+
109+
```
110+
Feature Unit Integration CI Smoke Build-Test
111+
─────────────────────────────────────────────────────────────────────────
112+
Domain allow-list ✅ ✅ ❌ ✅ ✅
113+
Domain deny-list (--block-domains) ❌ ❌ ❌ ❌ ❌
114+
Wildcard patterns ✅ ✅ ❌ ❌ ❌
115+
Empty domains (air-gapped) ❌ ✅ ❌ ❌ ❌
116+
DNS server restriction ✅ ⚠️ * ❌ ❌ ❌
117+
Network security (SSRF, bypass) ❌ ✅ ❌ ❌ ❌
118+
Chroot languages ❌ ✅ ✅ ✅ ✅
119+
Chroot package managers ❌ ✅ ✅ ❌ ✅
120+
Chroot /proc filesystem ❌ ✅ ✅ ❌ ❌
121+
Chroot edge cases ❌ ✅ ✅ ❌ ❌
122+
Credential hiding ❌ ✅ ❌ ❌ ❌
123+
Token unsetting ❌ ✅ ❌ ❌ ❌
124+
One-shot tokens (LD_PRELOAD) ❌ ✅ ❌ ❌ ❌
125+
API proxy sidecar ❌ ✅ ❌ ❌ ❌
126+
Protocol support (HTTP/HTTPS) ❌ ✅ ❌ ❌ ❌
127+
IPv6 ❌ ✅ ❌ ❌ ❌
128+
Exit code propagation ❌ ✅ ❌ ❌ ❌
129+
Error handling ❌ ✅ ❌ ❌ ❌
130+
Volume mounts ❌ ✅ ❌ ❌ ❌
131+
Container workdir ❌ ✅ ❌ ❌ ❌
132+
Git operations ❌ ✅ ❌ ❌ ❌
133+
Environment variables ❌ ✅ ❌ ❌ ❌
134+
--env-all ❌ ❌ ❌ ❌ ❌
135+
SSL Bump ✅ ❌ ❌ ❌ ❌
136+
Log commands ✅ ⚠️ * ❌ ❌ ❌
137+
Docker unavailability ❌ ✅ ❌ ❌ ❌
138+
Docker warning stub ❌ ❌ ** ❌ ❌ ❌
139+
Setup action (action.yml) ❌ ❌ ✅ ❌ ❌
140+
Container security scan ❌ ❌ ✅ ❌ ❌
141+
Dependency audit ❌ ❌ ✅ ❌ ❌
142+
143+
* ⚠️ = Tests exist but have significant gaps (see detailed docs)
144+
** = Tests exist but are skipped
145+
```
146+
147+
---
148+
149+
## Test Infrastructure Summary
150+
151+
### How Tests Run
152+
153+
- **Serial execution** (`maxWorkers: 1`) — Docker network/container conflicts prevent parallelism
154+
- **120-second timeout** per test — container lifecycle takes 15-25 seconds
155+
- **Batch runner** groups commands sharing the same config into single containers — reduces ~73 startups to ~27 for chroot tests
156+
- **Custom Jest matchers**: `toSucceed()`, `toFail()`, `toExitWithCode()`, `toTimeout()`, `toAllowDomain()`, `toBlockDomain()`
157+
- **4-stage cleanup**: pre-test TypeScript cleanup → AWF normal exit → AWF signal handlers → CI always-cleanup
158+
159+
### Infrastructure Limitations
160+
161+
1. Docker + sudo required — no lightweight local testing
162+
2. Batch runner loses individual stderr (merged via `2>&1`)
163+
3. Log-based matchers require `keepContainers: true`
164+
4. Aggressive `docker prune` in cleanup can affect non-AWF containers
165+
5. No retry logic for flaky network tests
166+
167+
See [test-infra.md](test-analysis/test-infra.md) for full infrastructure analysis.
168+
169+
---
170+
171+
## Detailed Analysis Documents
172+
173+
Each document provides per-test-case analysis with plain-language descriptions, real-world mappings, and gap identification:
174+
175+
- **[Domain & Network Tests](test-analysis/domain-network.md)** — Domain filtering, DNS, network security, localhost
176+
- **[Chroot Tests](test-analysis/chroot.md)** — Sandbox isolation, languages, package managers, /proc, edge cases
177+
- **[Protocol & Security Tests](test-analysis/protocol-security.md)** — HTTP/HTTPS, IPv6, API proxy, credentials, tokens, exit codes
178+
- **[Container & Operations Tests](test-analysis/container-ops.md)** — Workdir, volumes, git, env vars, logging, Docker availability
179+
- **[CI & Smoke Tests](test-analysis/ci-smoke.md)** — All 27 CI/smoke/build-test workflows analyzed
180+
- **[Test Infrastructure](test-analysis/test-infra.md)** — Runner architecture, batch pattern, cleanup strategy, limitations

docs/selective-mounting.md

Lines changed: 2 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -179,17 +179,6 @@ sudo awf \
179179
my-command
180180
```
181181

182-
### Full Filesystem Access (Not Recommended)
183-
184-
```bash
185-
# ⚠️ Only use if absolutely necessary
186-
sudo awf --allow-full-filesystem-access --allow-domains github.com -- my-command
187-
188-
# You'll see security warnings:
189-
# ⚠️ SECURITY WARNING: Full filesystem access enabled
190-
# The entire host filesystem is mounted with read-write access
191-
# This exposes sensitive credential files to potential prompt injection attacks
192-
```
193182

194183
## Comparison: Before vs After
195184

@@ -289,7 +278,7 @@ docker inspect awf-agent --format '{{json .Mounts}}' | jq
289278
# - /tmp mounted
290279
# - $HOME mounted
291280
# - /dev/null mounted over credential files
292-
# - NO /:/host mount (unless --allow-full-filesystem-access used)
281+
# - NO /:/host mount
293282
```
294283

295284
## Migration Guide
@@ -319,14 +308,11 @@ awf --allow-domains github.com -- cat /etc/custom/config.json
319308

320309
# ✓ New: Use explicit mount
321310
awf --mount /etc/custom:/etc/custom:ro --allow-domains github.com -- cat /etc/custom/config.json
322-
323-
# Or as last resort (not recommended):
324-
awf --allow-full-filesystem-access --allow-domains github.com -- cat /etc/custom/config.json
325311
```
326312

327313
## Security Best Practices
328314

329-
1. **Default to selective mounting** - Never use `--allow-full-filesystem-access` unless absolutely necessary
315+
1. **Default to selective mounting** - The default behavior provides the best security
330316

331317
2. **Use read-only mounts** - When using `--mount`, prefer `:ro` for directories that don't need writes:
332318
```bash

0 commit comments

Comments
 (0)