Skip to content

Commit 08b0469

Browse files
ElleNajtclaude
andcommitted
Add VPC Direct Egress support for egress firewall
Route Cloud Run container traffic through a VPC where Cloud NGFW firewall policies control outbound access by domain name (FQDN rules). We previously tried iptables inside the container but found that curl -6 bypasses iptables on Cloud Run, ip6tables kills the container, and /proc/sys is read-only. The VPC approach applies firewall rules at the GCP infrastructure level, outside the container. Changes: - Add vpc_network/vpc_subnet/vpc_egress to CloudRunClientConfig - Configure run_v2.VpcAccess on job creation - Add vpc_network/vpc_subnet/vpc_egress to ClaudeCodeClientConfig - Document egress firewall setup in README (with example FQDN rules) - Add integration test for VPC egress (allowed/blocked domains) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 7c03299 commit 08b0469

File tree

4 files changed

+284
-3
lines changed

4 files changed

+284
-3
lines changed

safetytooling/infra/cloud_run/README.md

Lines changed: 110 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,12 +177,115 @@ client = ClaudeCodeClient(
177177
- **Without this, Claude could take over your entire GCP project** - don't skip this step!
178178

179179
**What this doesn't limit:**
180-
- Outbound network access (Claude could exfiltrate data to external URLs)
180+
- Outbound network access (see Egress Firewall below)
181181
- Anthropic API usage (Claude could use your API key for other purposes)
182182

183183
For the "yolo Claude" use case, the main risks are data exfiltration and API key abuse.
184184
Containers are ephemeral (destroyed after job), so there's no persistence risk.
185185

186+
## Egress Firewall (Recommended)
187+
188+
By default, containers can make outbound requests to any host. To restrict egress (e.g., only allow `api.anthropic.com` and Google APIs), use VPC Direct Egress with Cloud NGFW firewall rules.
189+
190+
**How it works:** When `vpc_network` is set, all container traffic routes through a VPC where a Cloud NGFW firewall policy controls access by domain name (FQDN rules). This covers both IPv4 and IPv6.
191+
192+
**Usage:**
193+
194+
```python
195+
client = ClaudeCodeClient(
196+
project_id="my-project",
197+
gcs_bucket="my-bucket",
198+
api_key_secret="anthropic-api-key-USERNAME",
199+
service_account="claude-runner@my-project.iam.gserviceaccount.com",
200+
vpc_network="my-egress-vpc", # VPC with NGFW firewall policy
201+
vpc_subnet="my-egress-subnet", # Subnet in the VPC
202+
vpc_egress="all-traffic", # Route all traffic through VPC
203+
)
204+
```
205+
206+
**One-time GCP setup:**
207+
208+
1. **VPC + Subnet** (with Private Google Access for Google APIs):
209+
```bash
210+
gcloud compute networks create egress-firewall-vpc --subnet-mode=custom
211+
gcloud compute networks subnets create egress-firewall-subnet \
212+
--network=egress-firewall-vpc --region=us-central1 \
213+
--range=10.100.0.0/24 --enable-private-ip-google-access
214+
```
215+
216+
2. **Cloud Router + NAT** (required for internet access from VPC):
217+
```bash
218+
gcloud compute routers create egress-firewall-router \
219+
--network=egress-firewall-vpc --region=us-central1
220+
gcloud compute routers nats create egress-firewall-nat \
221+
--router=egress-firewall-router --region=us-central1 \
222+
--auto-allocate-nat-external-ips \
223+
--endpoint-types=ENDPOINT_TYPE_VM,ENDPOINT_TYPE_MANAGED_PROXY_LB \
224+
--nat-all-subnet-ip-ranges
225+
```
226+
Note: `ENDPOINT_TYPE_MANAGED_PROXY_LB` is required — Cloud Run Direct VPC Egress uses managed proxy load balancers internally.
227+
228+
3. **Cloud NGFW firewall policy** with FQDN rules:
229+
```bash
230+
# Create policy and associate with VPC
231+
gcloud compute network-firewall-policies create egress-firewall-policy --global
232+
gcloud compute network-firewall-policies associations create \
233+
--firewall-policy=egress-firewall-policy --network=egress-firewall-vpc --global-firewall-policy
234+
235+
# Allow DNS
236+
gcloud compute network-firewall-policies rules create 100 \
237+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=allow \
238+
--dest-ip-ranges=0.0.0.0/0 --layer4-configs=udp:53,tcp:53 --global-firewall-policy
239+
240+
# Allow metadata server
241+
gcloud compute network-firewall-policies rules create 200 \
242+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=allow \
243+
--dest-ip-ranges=169.254.169.254/32 --layer4-configs=all --global-firewall-policy
244+
245+
# Allow Google APIs (list each subdomain — wildcards not supported)
246+
gcloud compute network-firewall-policies rules create 250 \
247+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=allow \
248+
--dest-fqdns=storage.googleapis.com,oauth2.googleapis.com,www.googleapis.com,\
249+
secretmanager.googleapis.com,accounts.googleapis.com,cloudresourcemanager.googleapis.com,\
250+
run.googleapis.com,logging.googleapis.com,gcr.io,iamcredentials.googleapis.com \
251+
--layer4-configs=tcp:443 --global-firewall-policy
252+
253+
# Allow Private Google Access VIPs
254+
gcloud compute network-firewall-policies rules create 300 \
255+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=allow \
256+
--dest-ip-ranges=199.36.153.0/24 --layer4-configs=tcp:443 --global-firewall-policy
257+
258+
# Allow your API providers
259+
gcloud compute network-firewall-policies rules create 400 \
260+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=allow \
261+
--dest-fqdns=api.anthropic.com,openrouter.ai \
262+
--layer4-configs=tcp:443 --global-firewall-policy
263+
264+
# Allow package managers + GitHub (needed if agents install dependencies)
265+
gcloud compute network-firewall-policies rules create 450 \
266+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=allow \
267+
--dest-fqdns=registry.npmjs.org,pypi.org,files.pythonhosted.org,\
268+
crates.io,static.crates.io,proxy.golang.org,sum.golang.org,index.golang.org,\
269+
rubygems.org,github.com,raw.githubusercontent.com,objects.githubusercontent.com \
270+
--layer4-configs=tcp:443,tcp:80 --global-firewall-policy
271+
272+
# Deny everything else (IPv4 + IPv6)
273+
gcloud compute network-firewall-policies rules create 10000 \
274+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=deny \
275+
--dest-ip-ranges=0.0.0.0/0 --layer4-configs=all --global-firewall-policy
276+
gcloud compute network-firewall-policies rules create 10001 \
277+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=deny \
278+
--dest-ip-ranges=::/0 --layer4-configs=all --global-firewall-policy
279+
```
280+
281+
**Costs:** ~$32/month for Cloud NAT gateway + ~$0.018/GB for NGFW FQDN rule evaluation. The NAT gateway runs 24/7 regardless of job activity.
282+
283+
**Key facts:**
284+
- FQDN rules don't support wildcards — must list each Google API subdomain individually
285+
- ~20s cold start penalty on first outbound connection (NAT port allocation)
286+
- IPv6 is fully blocked at the VPC level (deny `::/0`)
287+
- Cloud NGFW Standard tier pricing applies for FQDN rule traffic
288+
186289
## How It Works
187290

188291
```
@@ -255,6 +358,9 @@ ClaudeCodeClientConfig(
255358
memory: str = "2Gi", # Up to 32Gi
256359
skip_permissions: bool = True, # --dangerously-skip-permissions
257360
image: str = DEFAULT_CLAUDE_CODE_IMAGE, # Pre-built image with Claude Code
361+
vpc_network: str = None, # VPC for egress firewall (see Egress Firewall section)
362+
vpc_subnet: str = None, # Subnet in the VPC (required when vpc_network is set)
363+
vpc_egress: str = "all-traffic", # "all-traffic" or "private-ranges-only"
258364
)
259365
```
260366

@@ -333,6 +439,9 @@ CloudRunClientConfig(
333439
env: dict = {}, # Environment variables
334440
secrets: dict = {}, # Secret Manager secrets as env vars
335441
service_account: str = None, # Restricted service account (see Security Hardening)
442+
vpc_network: str = None, # VPC for egress firewall
443+
vpc_subnet: str = None, # Subnet in the VPC
444+
vpc_egress: str = None, # "all-traffic" or "private-ranges-only"
336445
)
337446
```
338447

safetytooling/infra/cloud_run/claude_code_client.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,12 @@ class ClaudeCodeClientConfig:
8787
SECURITY: Use a restricted service account to limit container access.
8888
See README for setup instructions.
8989
Format: "name@project.iam.gserviceaccount.com"
90+
vpc_network: VPC network name for Direct VPC Egress. When set with vpc_egress="all-traffic",
91+
all outbound traffic routes through the VPC where Cloud NGFW firewall policies
92+
control access. This covers both IPv4 and IPv6. Requires a Cloud NAT gateway
93+
with ENDPOINT_TYPE_MANAGED_PROXY_LB on the VPC for internet access.
94+
vpc_subnet: VPC subnet name (required when vpc_network is set).
95+
vpc_egress: VPC egress setting - "all-traffic" or "private-ranges-only" (default: "all-traffic").
9096
"""
9197

9298
project_id: str
@@ -102,6 +108,9 @@ class ClaudeCodeClientConfig:
102108
image: str = DEFAULT_CLAUDE_CODE_IMAGE
103109
api_key_secret: str | None = None
104110
service_account: str | None = None
111+
vpc_network: str | None = None
112+
vpc_subnet: str | None = None
113+
vpc_egress: str = "all-traffic"
105114

106115

107116
# Instructions prepended to task when output_instructions=True
@@ -405,6 +414,9 @@ def __init__(
405414
env={},
406415
secrets=secrets,
407416
service_account=self.config.service_account,
417+
vpc_network=self.config.vpc_network,
418+
vpc_subnet=self.config.vpc_subnet,
419+
vpc_egress=self.config.vpc_egress if self.config.vpc_network else None,
408420
)
409421
self._cloud_run = CloudRunClient(cloud_run_config)
410422

safetytooling/infra/cloud_run/cloud_run_client.py

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,9 @@ class CloudRunClientConfig:
102102
env: dict[str, str] = field(default_factory=dict)
103103
secrets: dict[str, str] = field(default_factory=dict)
104104
service_account: str | None = None
105+
vpc_network: str | None = None
106+
vpc_subnet: str | None = None
107+
vpc_egress: str | None = None # "all-traffic" or "private-ranges-only"
105108

106109

107110
@dataclass(frozen=True)
@@ -496,15 +499,27 @@ def _get_or_create_job(self, timeout: int) -> str:
496499
if self.config.service_account:
497500
job.template.template.service_account = self.config.service_account
498501

502+
if self.config.vpc_network:
503+
vpc_access = run_v2.VpcAccess(
504+
network_interfaces=[
505+
run_v2.VpcAccess.NetworkInterface(
506+
network=self.config.vpc_network,
507+
subnetwork=self.config.vpc_subnet,
508+
)
509+
],
510+
)
511+
if self.config.vpc_egress == "all-traffic":
512+
vpc_access.egress = run_v2.VpcAccess.VpcEgress.ALL_TRAFFIC
513+
job.template.template.vpc_access = vpc_access
514+
499515
parent = f"projects/{self.config.project_id}/locations/{self.config.region}"
500-
request = CreateJobRequest(parent=parent, job=job, job_id=job_id)
501516

517+
request = CreateJobRequest(parent=parent, job=job, job_id=job_id)
502518
try:
503519
operation = self._jobs_client.create_job(request=request)
504520
created_job = operation.result()
505521
job_name = created_job.name
506522
except Exception as e:
507-
# Job might already exist (from previous process/session)
508523
if "already exists" in str(e).lower():
509524
job_name = f"{parent}/jobs/{job_id}"
510525
else:
@@ -657,6 +672,9 @@ def _compute_config_hash(self) -> str:
657672
self.config.memory,
658673
self.config.service_account or "",
659674
self.config.gcs_bucket,
675+
self.config.vpc_network or "",
676+
self.config.vpc_subnet or "",
677+
self.config.vpc_egress or "",
660678
]
661679
# Add sorted env vars
662680
for k, v in sorted(self.config.env.items()):

tests/test_vpc_egress.py

Lines changed: 142 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
"""Integration tests for VPC Direct Egress firewall on Cloud Run.
2+
3+
Tests that when vpc_network is configured, the Cloud NGFW firewall policy
4+
correctly allows/blocks outbound traffic by domain.
5+
6+
Requires:
7+
- GCP credentials (gcloud auth application-default login)
8+
- VPC with Cloud NGFW firewall policy (see README.md Egress Firewall section)
9+
- Cloud NAT with ENDPOINT_TYPE_MANAGED_PROXY_LB
10+
11+
Additional environment variables:
12+
- VPC_NETWORK: VPC network name (e.g., "egress-firewall-vpc")
13+
- VPC_SUBNET: VPC subnet name (e.g., "egress-firewall-subnet")
14+
15+
Run with: pytest tests/test_vpc_egress.py -v --run-integration
16+
"""
17+
18+
import os
19+
import re
20+
21+
import pytest
22+
23+
from safetytooling.infra.cloud_run import (
24+
ClaudeCodeClient,
25+
ClaudeCodeClientConfig,
26+
ClaudeCodeTask,
27+
)
28+
29+
30+
@pytest.fixture
31+
def integration_enabled(request):
32+
if not request.config.getoption("--run-integration", default=False):
33+
pytest.skip("Integration tests skipped. Use --run-integration to run.")
34+
35+
36+
@pytest.fixture
37+
def vpc_config():
38+
"""Get VPC + GCP config from environment or skip."""
39+
project_id = os.environ.get("GCP_PROJECT_ID")
40+
gcs_bucket = os.environ.get("GCS_BUCKET")
41+
api_key_secret = os.environ.get("API_KEY_SECRET")
42+
service_account = os.environ.get("SERVICE_ACCOUNT")
43+
vpc_network = os.environ.get("VPC_NETWORK")
44+
vpc_subnet = os.environ.get("VPC_SUBNET")
45+
46+
missing = []
47+
for name, val in [
48+
("GCP_PROJECT_ID", project_id),
49+
("GCS_BUCKET", gcs_bucket),
50+
("API_KEY_SECRET", api_key_secret),
51+
("SERVICE_ACCOUNT", service_account),
52+
("VPC_NETWORK", vpc_network),
53+
("VPC_SUBNET", vpc_subnet),
54+
]:
55+
if not val:
56+
missing.append(name)
57+
if missing:
58+
pytest.skip(f"Missing env vars: {', '.join(missing)}")
59+
60+
return ClaudeCodeClientConfig(
61+
project_id=project_id,
62+
gcs_bucket=gcs_bucket,
63+
api_key_secret=api_key_secret,
64+
service_account=service_account,
65+
vpc_network=vpc_network,
66+
vpc_subnet=vpc_subnet,
67+
vpc_egress="all-traffic",
68+
timeout=300,
69+
)
70+
71+
72+
# Shell script that tests connectivity and prints structured results.
73+
# Each test prints "TEST <name>: PASS" or "TEST <name>: FAIL".
74+
TEST_SCRIPT = r"""
75+
echo "=== VPC Egress Firewall Tests ==="
76+
77+
# Test allowed domain (Anthropic API)
78+
HTTP_CODE=$(curl -4 -s -o /dev/null -w '%{http_code}' --connect-timeout 60 https://api.anthropic.com/v1/messages)
79+
if [ "$HTTP_CODE" != "000" ]; then echo "TEST allowed_anthropic: PASS"; else echo "TEST allowed_anthropic: FAIL"; fi
80+
81+
# Test allowed domain (PyPI)
82+
HTTP_CODE=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 30 https://pypi.org/simple/)
83+
if [ "$HTTP_CODE" != "000" ]; then echo "TEST allowed_pypi: PASS"; else echo "TEST allowed_pypi: FAIL"; fi
84+
85+
# Test allowed domain (npm)
86+
HTTP_CODE=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 30 https://registry.npmjs.org/)
87+
if [ "$HTTP_CODE" != "000" ]; then echo "TEST allowed_npm: PASS"; else echo "TEST allowed_npm: FAIL"; fi
88+
89+
# Test blocked domain (example.com)
90+
HTTP_CODE=$(curl -4 -s -o /dev/null -w '%{http_code}' --connect-timeout 15 https://example.com 2>/dev/null)
91+
if [ "$HTTP_CODE" = "000" ]; then echo "TEST blocked_example: PASS"; else echo "TEST blocked_example: FAIL"; fi
92+
93+
# Test IPv6 blocked
94+
HTTP_CODE=$(curl -6 -s -o /dev/null -w '%{http_code}' --connect-timeout 15 https://api.anthropic.com/v1/messages 2>/dev/null)
95+
if [ "$HTTP_CODE" = "000" ]; then echo "TEST blocked_ipv6: PASS"; else echo "TEST blocked_ipv6: FAIL"; fi
96+
97+
echo "=== Done ==="
98+
"""
99+
100+
101+
def _parse_test_results(output: str) -> dict[str, bool]:
102+
"""Parse 'TEST name: PASS/FAIL' lines from script output."""
103+
results = {}
104+
for match in re.finditer(r"TEST (\w+): (PASS|FAIL)", output):
105+
results[match.group(1)] = match.group(2) == "PASS"
106+
return results
107+
108+
109+
class TestVPCEgress:
110+
"""Test that VPC egress firewall blocks/allows the right domains."""
111+
112+
def test_egress_firewall(self, integration_enabled, vpc_config):
113+
"""Run all egress tests in a single Cloud Run job."""
114+
client = ClaudeCodeClient(vpc_config)
115+
116+
tasks = [
117+
ClaudeCodeTask(
118+
id="egress-test",
119+
task="echo 'Tests ran in pre_claude_command'",
120+
pre_claude_command=TEST_SCRIPT,
121+
output_instructions=False,
122+
n=1,
123+
),
124+
]
125+
126+
results = client.run(tasks)
127+
task = tasks[0]
128+
result = results[task][0]
129+
130+
assert result.returncode == 0, f"Job failed: {result.error}"
131+
132+
parsed = _parse_test_results(result.response)
133+
assert len(parsed) >= 4, f"Expected >= 4 test results, got {len(parsed)}: {parsed}"
134+
135+
# Allowed domains should connect
136+
assert parsed.get("allowed_anthropic"), "api.anthropic.com should be reachable"
137+
assert parsed.get("allowed_pypi"), "pypi.org should be reachable"
138+
assert parsed.get("allowed_npm"), "registry.npmjs.org should be reachable"
139+
140+
# Blocked domains should not connect
141+
assert parsed.get("blocked_example"), "example.com should be blocked"
142+
assert parsed.get("blocked_ipv6"), "IPv6 should be blocked"

0 commit comments

Comments
 (0)