-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
P1: HighHigh priority - should fix soonHigh priority - should fix soondocumentationImprovements or additions to documentationImprovements or additions to documentation
Description
Problem
The project lacks security documentation:
- No threat model documented
- Attack surface not defined
- No incident response procedures
- Key rotation process not documented
- No security considerations for operators
CLAUDE.md has good build/architecture docs but doesn't cover security operations.
Impact
High - Without security documentation, operators cannot:
- Properly deploy the service securely
- Respond to security incidents
- Understand the security guarantees
- Perform key rotation safely
Solution
Create SECURITY.md with the following sections:
1. Threat Model
## Threat Model
### Assets
- Signing keys (ECDSA, Ed25519) - stored in dstack KMS
- Authentication tokens - stored in environment
- Cached signatures - in-memory TTL cache
- TDX quotes and GPU attestations
### Threat Actors
- External attackers without credentials
- External attackers with stolen credentials
- Malicious insiders with valid tokens
- Compromised backend services
- Side-channel attackers
### Threats
1. **Key Compromise**: If signing keys leak, attacker can forge signatures
2. **Token Theft**: Stolen auth token enables full proxy access
3. **DoS Attacks**: Resource exhaustion via flooding
4. **Signature Forgery**: Invalid signatures accepted by clients
5. **Path Traversal**: Accessing unintended backend endpoints
6. **Timing Attacks**: Token leakage via timing side-channels
7. **Memory Disclosure**: Token/key leakage via memory dumps
8. **Supply Chain**: Compromised dependencies2. Attack Surface
## Attack Surface
### External Attack Surface
- HTTP endpoints (authenticated)
- `/v1/chat/completions` - JSON/streaming chat
- `/v1/completions` - Text completion
- `/v1/embeddings`, `/v1/rerank`, `/v1/score` - ML endpoints
- `/v1/images/generations`, `/v1/images/edits` - Image endpoints
- `/v1/audio/transcriptions` - Audio endpoint
- `/v1/signature/{chat_id}` - Signature retrieval
- `/v1/attestation/report` - TEE attestation
- `/*` (catch-all) - Arbitrary path forwarding
- HTTP endpoints (unauthenticated)
- `/`, `/version` - Health checks
- `/v1/metrics`, `/v1/models` - Info endpoints
### Internal Attack Surface
- dstack KMS API - Key retrieval
- Backend vLLM/sglang service - Request forwarding
- Python subprocess - GPU attestation
- File system - Git revision file
### Network Surface
- Inbound: 0.0.0.0:8000 (configurable)
- Outbound: Backend URL, dstack KMS, Python interpreter3. Incident Response
## Incident Response
### Token Compromise
1. Immediately rotate `TOKEN` environment variable
2. Restart proxy service
3. Audit logs for unauthorized access
4. Revoke and reissue tokens to legitimate clients
### Key Compromise
1. **DO NOT** restart - this will generate new keys and invalidate all signatures
2. Contact dstack team to rotate KMS keys
3. Coordinate with clients to update to new signing addresses
4. Archive old signatures with compromise timestamp
### DoS Attack
1. Check metrics for unusual request patterns
2. Enable rate limiting if not already active
3. Block attacking IPs at firewall/load balancer
4. Scale horizontally if needed
### Signature Verification Failures
1. Check that signing keys are initialized correctly
2. Verify dstack KMS is reachable
3. Check backend is not returning corrupted responses
4. Review recent code changes to signing logic4. Key Rotation
## Key Rotation Procedures
### Planned Rotation
**Warning**: Key rotation invalidates all cached signatures!
1. Schedule maintenance window (cache TTL + buffer)
2. Let cache expire naturally (default: 20 minutes)
3. Work with dstack team to rotate keys in KMS:
```bash
dstack-cli rotate-key MODEL_NAME/ecdsa-signing-key
dstack-cli rotate-key MODEL_NAME/ed25519-signing-key- Restart proxy service to load new keys
- Verify new signing addresses in logs
- Update documentation/contracts with new addresses
- Notify clients of address change
Emergency Rotation (Key Compromise)
- Immediately rotate in KMS
- Restart proxy (accepts brief downtime)
- Notify all clients ASAP
- Post-incident review
Testing Key Rotation
# In dev mode, test rotation process
DEV=1 cargo run # generates random keys
# Check signing addresses in logs, verify they're different each run
### 5. Security Checklist for Deployment
```markdown
## Deployment Security Checklist
- [ ] `TOKEN` is strong (32+ random characters)
- [ ] `TOKEN` is unique per environment
- [ ] `DEV=1` is NOT set in production
- [ ] TLS terminates at load balancer
- [ ] Rate limiting is configured
- [ ] Backend URL points to internal network (not internet)
- [ ] Monitoring/alerting is configured
- [ ] Log aggregation is enabled
- [ ] Regular security updates scheduled
- [ ] Backup procedure documented
- [ ] Incident response plan reviewed
File to Create
SECURITY.md at repository root
Additional Files
Consider also creating:
docs/OPERATIONS.md- Operational runbookdocs/KEY_ROTATION.md- Detailed rotation procedures
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1: HighHigh priority - should fix soonHigh priority - should fix soondocumentationImprovements or additions to documentationImprovements or additions to documentation