Skip to content

Commit aa99e63

Browse files
ElleNajtclaude
andauthored
Add VPC egress firewall support for Cloud Run (#158)
* Upload large commands to GCS when they exceed env var limits Cloud Run passes commands via environment variables, which have a ~32KB limit. When a command exceeds 30KB, upload it to GCS and replace it with a bootstrap script that downloads and executes it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add VPC Direct Egress support for egress firewall Route Cloud Run container traffic through a VPC where Cloud NGFW firewall policies control outbound access by domain name (FQDN rules). We previously tried iptables inside the container but found that curl -6 bypasses iptables on Cloud Run, ip6tables kills the container, and /proc/sys is read-only. The VPC approach applies firewall rules at the GCP infrastructure level, outside the container. Changes: - Add vpc_network/vpc_subnet/vpc_egress to CloudRunClientConfig - Configure run_v2.VpcAccess on job creation - Add vpc_network/vpc_subnet/vpc_egress to ClaudeCodeClientConfig - Document egress firewall setup in README (with example FQDN rules) - Add integration test for VPC egress (allowed/blocked domains) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 024ebf1 commit aa99e63

File tree

4 files changed

+323
-3
lines changed

4 files changed

+323
-3
lines changed

safetytooling/infra/cloud_run/README.md

Lines changed: 126 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,12 +177,131 @@ client = ClaudeCodeClient(
177177
- **Without this, Claude could take over your entire GCP project** - don't skip this step!
178178

179179
**What this doesn't limit:**
180-
- Outbound network access (Claude could exfiltrate data to external URLs)
180+
- Outbound network access (see Egress Firewall below)
181181
- Anthropic API usage (Claude could use your API key for other purposes)
182182

183183
For the "yolo Claude" use case, the main risks are data exfiltration and API key abuse.
184184
Containers are ephemeral (destroyed after job), so there's no persistence risk.
185185

186+
## Egress Firewall (Recommended)
187+
188+
By default, containers can make outbound requests to any host. To restrict egress (e.g., only allow `api.anthropic.com` and Google APIs), use VPC Direct Egress with Cloud NGFW firewall rules.
189+
190+
**How it works:** When `vpc_network` is set, all container traffic routes through a VPC where a Cloud NGFW firewall policy controls access by domain name (FQDN rules). This covers both IPv4 and IPv6.
191+
192+
**Usage:**
193+
194+
```python
195+
client = ClaudeCodeClient(
196+
project_id="my-project",
197+
gcs_bucket="my-bucket",
198+
api_key_secret="anthropic-api-key-USERNAME",
199+
service_account="claude-runner@my-project.iam.gserviceaccount.com",
200+
vpc_network="my-egress-vpc", # VPC with NGFW firewall policy
201+
vpc_subnet="my-egress-subnet", # Subnet in the VPC
202+
vpc_egress="all-traffic", # Route all traffic through VPC
203+
)
204+
```
205+
206+
**One-time GCP setup:**
207+
208+
1. **VPC + Subnet** (with Private Google Access for Google APIs):
209+
```bash
210+
gcloud compute networks create egress-firewall-vpc --subnet-mode=custom
211+
gcloud compute networks subnets create egress-firewall-subnet \
212+
--network=egress-firewall-vpc --region=us-central1 \
213+
--range=10.100.0.0/24 --enable-private-ip-google-access
214+
```
215+
216+
2. **Cloud Router + NAT** (required for internet access from VPC):
217+
```bash
218+
gcloud compute routers create egress-firewall-router \
219+
--network=egress-firewall-vpc --region=us-central1
220+
gcloud compute routers nats create egress-firewall-nat \
221+
--router=egress-firewall-router --region=us-central1 \
222+
--auto-allocate-nat-external-ips \
223+
--endpoint-types=ENDPOINT_TYPE_VM,ENDPOINT_TYPE_MANAGED_PROXY_LB \
224+
--nat-all-subnet-ip-ranges
225+
```
226+
Note: `ENDPOINT_TYPE_MANAGED_PROXY_LB` is required — Cloud Run Direct VPC Egress uses managed proxy load balancers internally.
227+
228+
3. **Cloud NGFW firewall policy** with FQDN rules:
229+
```bash
230+
# Create policy and associate with VPC
231+
gcloud compute network-firewall-policies create egress-firewall-policy --global
232+
gcloud compute network-firewall-policies associations create \
233+
--firewall-policy=egress-firewall-policy --network=egress-firewall-vpc --global-firewall-policy
234+
235+
# Allow DNS
236+
gcloud compute network-firewall-policies rules create 100 \
237+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=allow \
238+
--dest-ip-ranges=0.0.0.0/0 --layer4-configs=udp:53,tcp:53 --global-firewall-policy
239+
240+
# Allow metadata server
241+
gcloud compute network-firewall-policies rules create 200 \
242+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=allow \
243+
--dest-ip-ranges=169.254.169.254/32 --layer4-configs=all --global-firewall-policy
244+
245+
# Allow Google APIs (list each subdomain — wildcards not supported)
246+
gcloud compute network-firewall-policies rules create 250 \
247+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=allow \
248+
--dest-fqdns=storage.googleapis.com,oauth2.googleapis.com,www.googleapis.com,\
249+
secretmanager.googleapis.com,accounts.googleapis.com,cloudresourcemanager.googleapis.com,\
250+
run.googleapis.com,logging.googleapis.com,gcr.io,iamcredentials.googleapis.com \
251+
--layer4-configs=tcp:443 --global-firewall-policy
252+
253+
# Allow Private Google Access VIPs
254+
gcloud compute network-firewall-policies rules create 300 \
255+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=allow \
256+
--dest-ip-ranges=199.36.153.0/24 --layer4-configs=tcp:443 --global-firewall-policy
257+
258+
# Allow your API providers
259+
gcloud compute network-firewall-policies rules create 400 \
260+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=allow \
261+
--dest-fqdns=api.anthropic.com,openrouter.ai \
262+
--layer4-configs=tcp:443 --global-firewall-policy
263+
264+
# Allow package managers + GitHub (needed if agents install dependencies)
265+
gcloud compute network-firewall-policies rules create 450 \
266+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=allow \
267+
--dest-fqdns=registry.npmjs.org,pypi.org,files.pythonhosted.org,\
268+
crates.io,static.crates.io,proxy.golang.org,sum.golang.org,index.golang.org,\
269+
rubygems.org,github.com,raw.githubusercontent.com,objects.githubusercontent.com \
270+
--layer4-configs=tcp:443,tcp:80 --global-firewall-policy
271+
272+
# Deny everything else (IPv4 + IPv6)
273+
gcloud compute network-firewall-policies rules create 10000 \
274+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=deny \
275+
--dest-ip-ranges=0.0.0.0/0 --layer4-configs=all --global-firewall-policy
276+
gcloud compute network-firewall-policies rules create 10001 \
277+
--firewall-policy=egress-firewall-policy --direction=EGRESS --action=deny \
278+
--dest-ip-ranges=::/0 --layer4-configs=all --global-firewall-policy
279+
```
280+
281+
**Costs:** Cloud NAT charges per VM-hour and per GB processed ([pricing](https://cloud.google.com/nat/pricing)). NGFW Standard charges $0.018/GB on internet-bound traffic evaluated by FQDN rules ([pricing](https://cloud.google.com/firewall/pricing)) — negligible for typical API call workloads but could add up if transferring large files.
282+
283+
**Key facts:**
284+
- FQDN rules don't support wildcards — must list each Google API subdomain individually
285+
- IPv6 is fully blocked at the VPC level (deny `::/0`)
286+
287+
**Verifying your setup:**
288+
289+
An integration test is included at `tests/test_vpc_egress.py`. It launches a Cloud Run container with VPC egress enabled and curls several domains from inside:
290+
291+
- Allowed domains (`api.anthropic.com`, `pypi.org`, `registry.npmjs.org`) should return an HTTP response
292+
- Blocked domains (`example.com`) should time out (HTTP code `000`)
293+
- IPv6 requests (`curl -6`) should be blocked
294+
295+
```bash
296+
# Set env vars for your GCP project
297+
export GCP_PROJECT_ID=my-project GCS_BUCKET=my-bucket
298+
export API_KEY_SECRET=anthropic-api-key-USERNAME
299+
export SERVICE_ACCOUNT=claude-runner@my-project.iam.gserviceaccount.com
300+
export VPC_NETWORK=egress-firewall-vpc VPC_SUBNET=egress-firewall-subnet
301+
302+
pytest tests/test_vpc_egress.py -v --run-integration
303+
```
304+
186305
## How It Works
187306

188307
```
@@ -255,6 +374,9 @@ ClaudeCodeClientConfig(
255374
memory: str = "2Gi", # Up to 32Gi
256375
skip_permissions: bool = True, # --dangerously-skip-permissions
257376
image: str = DEFAULT_CLAUDE_CODE_IMAGE, # Pre-built image with Claude Code
377+
vpc_network: str = None, # VPC for egress firewall (see Egress Firewall section)
378+
vpc_subnet: str = None, # Subnet in the VPC (required when vpc_network is set)
379+
vpc_egress: str = "all-traffic", # "all-traffic" or "private-ranges-only"
258380
)
259381
```
260382

@@ -333,6 +455,9 @@ CloudRunClientConfig(
333455
env: dict = {}, # Environment variables
334456
secrets: dict = {}, # Secret Manager secrets as env vars
335457
service_account: str = None, # Restricted service account (see Security Hardening)
458+
vpc_network: str = None, # VPC for egress firewall
459+
vpc_subnet: str = None, # Subnet in the VPC
460+
vpc_egress: str = None, # "all-traffic" or "private-ranges-only"
336461
)
337462
```
338463

safetytooling/infra/cloud_run/claude_code_client.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,12 @@ class ClaudeCodeClientConfig:
8787
SECURITY: Use a restricted service account to limit container access.
8888
See README for setup instructions.
8989
Format: "name@project.iam.gserviceaccount.com"
90+
vpc_network: VPC network name for Direct VPC Egress. When set with vpc_egress="all-traffic",
91+
all outbound traffic routes through the VPC where Cloud NGFW firewall policies
92+
control access. This covers both IPv4 and IPv6. Requires a Cloud NAT gateway
93+
with ENDPOINT_TYPE_MANAGED_PROXY_LB on the VPC for internet access.
94+
vpc_subnet: VPC subnet name (required when vpc_network is set).
95+
vpc_egress: VPC egress setting - "all-traffic" or "private-ranges-only" (default: "all-traffic").
9096
"""
9197

9298
project_id: str
@@ -102,6 +108,9 @@ class ClaudeCodeClientConfig:
102108
image: str = DEFAULT_CLAUDE_CODE_IMAGE
103109
api_key_secret: str | None = None
104110
service_account: str | None = None
111+
vpc_network: str | None = None
112+
vpc_subnet: str | None = None
113+
vpc_egress: str = "all-traffic"
105114

106115

107116
# Instructions prepended to task when output_instructions=True
@@ -405,6 +414,9 @@ def __init__(
405414
env={},
406415
secrets=secrets,
407416
service_account=self.config.service_account,
417+
vpc_network=self.config.vpc_network,
418+
vpc_subnet=self.config.vpc_subnet,
419+
vpc_egress=self.config.vpc_egress if self.config.vpc_network else None,
408420
)
409421
self._cloud_run = CloudRunClient(cloud_run_config)
410422

safetytooling/infra/cloud_run/cloud_run_client.py

Lines changed: 43 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,9 @@ class CloudRunClientConfig:
102102
env: dict[str, str] = field(default_factory=dict)
103103
secrets: dict[str, str] = field(default_factory=dict)
104104
service_account: str | None = None
105+
vpc_network: str | None = None
106+
vpc_subnet: str | None = None
107+
vpc_egress: str | None = None # "all-traffic" or "private-ranges-only"
105108

106109

107110
@dataclass(frozen=True)
@@ -496,15 +499,27 @@ def _get_or_create_job(self, timeout: int) -> str:
496499
if self.config.service_account:
497500
job.template.template.service_account = self.config.service_account
498501

502+
if self.config.vpc_network:
503+
vpc_access = run_v2.VpcAccess(
504+
network_interfaces=[
505+
run_v2.VpcAccess.NetworkInterface(
506+
network=self.config.vpc_network,
507+
subnetwork=self.config.vpc_subnet,
508+
)
509+
],
510+
)
511+
if self.config.vpc_egress == "all-traffic":
512+
vpc_access.egress = run_v2.VpcAccess.VpcEgress.ALL_TRAFFIC
513+
job.template.template.vpc_access = vpc_access
514+
499515
parent = f"projects/{self.config.project_id}/locations/{self.config.region}"
500-
request = CreateJobRequest(parent=parent, job=job, job_id=job_id)
501516

517+
request = CreateJobRequest(parent=parent, job=job, job_id=job_id)
502518
try:
503519
operation = self._jobs_client.create_job(request=request)
504520
created_job = operation.result()
505521
job_name = created_job.name
506522
except Exception as e:
507-
# Job might already exist (from previous process/session)
508523
if "already exists" in str(e).lower():
509524
job_name = f"{parent}/jobs/{job_id}"
510525
else:
@@ -513,6 +528,19 @@ def _get_or_create_job(self, timeout: int) -> str:
513528
self._job_cache[config_hash] = job_name
514529
return job_name
515530

531+
_GCS_COMMANDS_PREFIX: ClassVar[str] = "cloudrun-commands"
532+
_COMMAND_SIZE_LIMIT: ClassVar[int] = 30000 # Leave headroom below 32768 env var limit
533+
534+
def _upload_command_to_gcs(self, command: str) -> str:
535+
"""Upload a large command to GCS and return its path."""
536+
cmd_hash = hashlib.sha256(command.encode()).hexdigest()[:16]
537+
gcs_path = f"{self._GCS_COMMANDS_PREFIX}/{cmd_hash}.sh"
538+
bucket = self._storage_client.bucket(self.config.gcs_bucket)
539+
blob = bucket.blob(gcs_path)
540+
if not blob.exists():
541+
blob.upload_from_string(command, content_type="text/plain")
542+
return gcs_path
543+
516544
def _run_job_execution(
517545
self,
518546
job_name: str,
@@ -524,7 +552,17 @@ def _run_job_execution(
524552
"""Run an execution of an existing job with specific inputs/outputs/command.
525553
526554
Uses RunJobRequest.Overrides to pass per-execution environment variables.
555+
If the command exceeds the env var size limit, it's uploaded to GCS and
556+
a small bootstrap script downloads and evals it.
527557
"""
558+
# If command is too large for an env var, stash it in GCS
559+
if len(command.encode()) > self._COMMAND_SIZE_LIMIT:
560+
gcs_path = self._upload_command_to_gcs(command)
561+
command = (
562+
f'gcloud storage cp "gs://{self.config.gcs_bucket}/{gcs_path}" /tmp/large_command.sh '
563+
f"&& bash /tmp/large_command.sh"
564+
)
565+
528566
# Build env var overrides for this execution
529567
env_overrides = [
530568
run_v2.EnvVar(name="OUTPUT_GCS_PATH", value=output_gcs_path),
@@ -634,6 +672,9 @@ def _compute_config_hash(self) -> str:
634672
self.config.memory,
635673
self.config.service_account or "",
636674
self.config.gcs_bucket,
675+
self.config.vpc_network or "",
676+
self.config.vpc_subnet or "",
677+
self.config.vpc_egress or "",
637678
]
638679
# Add sorted env vars
639680
for k, v in sorted(self.config.env.items()):

0 commit comments

Comments
 (0)