-
Notifications
You must be signed in to change notification settings - Fork 99
Open
Description
Summary
Add a unified, configurable security layer to the swerex-remote server and clients that provides encrypted transport (TLS), request authentication (API key or JWT), and optional mutual TLS (mTLS). This makes RemoteRuntime safe to expose across networks (Modal, Fargate, HPC, on-prem), aligns with enterprise requirements, and prevents unauthorized command execution.
User Story
As a platform owner or maintainer running SWE-ReX in cloud or multi-tenant environments, I want strong transport encryption and authenticated requests to the remote execution API so that only authorized clients can execute commands and all traffic is protected in transit.
Problem / Pain
- RemoteRuntime is a network-accessible control plane for executing commands. Without first-class auth/TLS, operators must rely on ad-hoc network controls, which is risky and error-prone.
- Current codebase elements indicate a FastAPI/uvicorn server with a CLI entry point but no documented built-in auth layer:
- pyproject.toml shows server entry point: [project.scripts] swerex-remote = "swerex.server:main" (lines 91–93) and FastAPI/uvicorn dependencies (lines 33–42).
- Existing issues cover HTTPS toggles for cloud deployments but do not define authentication modes:
- switch on https for modal/fargate #109 “switch on https for modal/fargate” requests HTTPS for specific providers, but does not address token/JWT auth or mTLS across all deployment modes.
- Security-critical deployments (e.g., Feature request: Google Cloud deployment options (cloud run, k8s etc) #193 Google Cloud options) and recent expansion to async RemoteRuntime (Make remote runtime fully async with aiohttp + testing #214) increase exposure, making a standard security layer urgent.
- Operational symptoms like connection instability/hangs (Instable swerex-remote <--> swerex TCP connections hang the client forever #222) underline the need for robust, explicit connection management and clearer security posture (authenticated retries, error handling).
Proposed Solution
Behavioral overview
- Server-side (swerex.server:main, FastAPI):
- Transport security: Enable TLS (configurable cert/key paths).
- Authentication: Support two primary auth modes out of the box:
- API key (static bearer token via Authorization: Bearer ) for simple setups.
- JWT (HS256/RS256) with configurable issuer, audience, and JWKS URL for enterprise SSO gateways and cloud ingress.
- Optional mTLS: Verify client certificates for environments requiring strong client identity.
- IP allowlist (optional): Block requests not in CIDR allowlist.
- Enforcement: Apply auth dependency to all command/session endpoints; return 401/403 on failures.
- Secrets handling: Load from environment variables or config file; never log secrets.
- Client-side (RemoteRuntime HTTP/aiohttp):
- Provide Authorization header automatically when configured.
- Support client cert/key + CA bundle for mTLS.
- Surface clear auth errors and retry only when appropriate (e.g., transient TLS handshake errors; no retry on 401/403).
- Config/CLI sketch (minimal additive keys; defaults preserve current behavior):
security:
enabled: false # default false for backward compatibility
tls:
enabled: true
cert_file: /path/server.crt
key_file: /path/server.key
ca_file: /path/ca.crt # required for mTLS
mtls: false
auth:
mode: "api-key" # "none" | "api-key" | "jwt"
api_key: "env:SWEREX_API_KEY"
jwt:
issuer: "https://issuer"
audience: "swe-rex"
jwks_url: "https://issuer/.well-known/jwks.json"
network:
ip_allowlist: ["10.0.0.0/8", "192.168.0.0/16"] - Error handling and edge cases
- 400 on malformed headers/certs; 401 on missing/invalid credentials; 403 on valid identity without permission (future extensibility).
- Clear startup-time validation if TLS/mTLS files are missing or unreadable.
- Local-only or docker-internal deployments can keep security.enabled=false to avoid breaking changes.
- Compatibility strategy
- Entire feature is opt-in. If not configured, current behavior remains unchanged.
- Provide helper scripts/docs to generate self-signed certs for dev/testing.
Feasibility & Integration Points
- Server entry point: swerex.server:main (pyproject.toml lines 91–93). Add FastAPI dependencies for auth/TLS middlewares and startup validation.
- Transport: uvicorn TLS flags or programmatic SSLContext; mTLS via ssl.SSLContext.verify_mode = CERT_REQUIRED and CA bundle.
- Auth: FastAPI dependency to enforce API key/JWT on all endpoints; JWT via python-jose or authlib (added as dependency).
- Client: The async HTTP client used post-Make remote runtime fully async with aiohttp + testing #214 (aiohttp) can attach Authorization header and SSL context (client certs, CA path).
- Cloud integrations:
- Modal/Fargate: propagate secrets via environment and mount certs via volume/secret managers.
- Works alongside switch on https for modal/fargate #109 (HTTPS toggle) by generalizing into a unified security config across providers.
Related Issues/PRs
- switch on https for modal/fargate #109 — switch on https for modal/fargate: This proposal generalizes HTTPS into a full security layer and adds authentication (API key/JWT) and optional mTLS across all providers.
- Feature request: Google Cloud deployment options (cloud run, k8s etc) #193 — Feature request: Google Cloud deployment options: Unified auth/TLS simplifies secure GCP ingress/load balancer setups.
- Make remote runtime fully async with aiohttp + testing #214 — Make remote runtime fully async with aiohttp + testing: Integration point for attaching Authorization headers and SSL contexts on the client.
Risks & Mitigations
- Risk: Certificate/JWT misconfiguration causing lockouts → Mitigation: Pre-flight startup validation and clear error messages; documented quickstarts and sanity checks.
- Risk: Operational complexity for small/local setups → Mitigation: Default off; provide simple API key mode and dev cert generator.
- Risk: Performance impact from JWT verification → Mitigation: Cache JWKS keys; reuse connections; measure overhead in CI.
- Risk: Fragmentation of provider-specific toggles → Mitigation: Single, provider-agnostic security config applied uniformly across deployments.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels