This document details the security architecture, threat model, and security controls for the IAM Immune System.
- Security Architecture
- Threat Model
- Security Controls
- Data Protection
- Access Control
- Monitoring & Detection
- Incident Response
- Compliance
The IAM Immune System employs multiple layers of security:
┌─────────────────────────────────────────────────────────┐
│ Application Layer │
│ - Input validation │
│ - Output encoding │
│ - Error handling │
└─────────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Authentication Layer │
│ - Service account authentication │
│ - Secret Manager for credentials │
│ - Short-lived tokens │
└─────────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Authorization Layer │
│ - Least privilege IAM roles │
│ - Resource-level permissions │
│ - Policy-based access control │
└─────────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Network Layer │
│ - VPC Service Controls (optional) │
│ - Private Google Access │
│ - Firewall rules │
└─────────────────────────────────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Data Layer │
│ - Encryption at rest │
│ - Encryption in transit │
│ - Data classification │
└─────────────────────────────────────────────────────────┘
- Never Trust, Always Verify: Every request is authenticated and authorized
- Least Privilege: Minimum permissions required for each component
- Assume Breach: Design assumes attackers may have internal access
- Verify Explicitly: All access decisions use multiple signals
Critical Assets:
- AWS IAM credentials (stored in Secret Manager)
- GCP service account keys
- Detection rules and policies
- ML models and baseline data
- Audit logs and security alerts
Data Classification:
- AWS credentials: TOP SECRET
- IAM events: CONFIDENTIAL
- Detection results: CONFIDENTIAL
- Audit logs: INTERNAL
- Documentation: PUBLIC
Capability: Advanced persistent threat (APT) Motivation: Data theft, service disruption Vectors:
- Compromise AWS credentials
- Exploit Cloud Function vulnerabilities
- DDoS attack on event processing
- Poison ML training data
Capability: Privileged access Motivation: Sabotage, data theft Vectors:
- Disable detection rules
- Modify remediation actions
- Exfiltrate AWS credentials
- Delete audit logs
Capability: Code injection Motivation: Backdoor installation Vectors:
- Compromised Python packages
- Malicious Terraform modules
- Backdoored base images
Attack Flow:
- Attacker gains access to GCP project
- Attempts to read AWS credentials from Secret Manager
- Uses credentials to access AWS environment
Mitigations:
- Secret Manager access requires specific IAM role
- AWS credentials have minimal permissions (read-only + specific remediation)
- Audit logging on all Secret Manager access
- Alert on unusual Secret Manager access patterns
- Use AWS STS temporary credentials where possible
Detection:
- Cloud Audit Logs monitoring
- Anomalous API call patterns
- Geographic anomaly detection
Attack Flow:
- Attacker modifies remediation code
- Instead of blocking threats, system creates backdoors
- Attacker maintains persistent access
Mitigations:
- Code review for all changes
- Immutable function deployments
- Dry-run mode for testing
- Approval required for critical actions
- Git commit signing required
- Function version pinning
Detection:
- Function code integrity monitoring
- Unexpected remediation outcomes
- Alert on function source changes
Attack Flow:
- Attacker identifies detection patterns
- Crafts IAM events that avoid detection
- Performs malicious actions undetected
Mitigations:
- Multiple detection layers (rule-based + ML)
- Regular detection rule updates
- Continuous ML model retraining
- Anomaly detection as catch-all
- Human review of high-risk events
Detection:
- ML drift detection
- Manual security audits
- Penetration testing
Attack Flow:
- Attacker floods system with events
- Cloud Functions overwhelmed
- Real threats go unprocessed
Mitigations:
- Rate limiting at EventBridge level
- Cloud Functions concurrency limits
- Dead letter queue for failed events
- Alert on high event volumes
- Auto-scaling with caps
Detection:
- Event volume monitoring
- Function error rate monitoring
- Queue depth alerts
GCP Service Account Permissions:
roles:
- logging.logWriter # Write logs only
- pubsub.publisher # Publish alerts only
- secretmanager.secretAccessor # Read secrets only
- storage.objectViewer # Read ML models onlyAWS IAM Policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"iam:DetachUserPolicy",
"iam:DetachRolePolicy",
"iam:DeleteUserPolicy",
"iam:DeleteRolePolicy",
"s3:PutPublicAccessBlock",
"s3:DeleteBucketPolicy",
"s3:PutBucketAcl"
],
"Resource": "*",
"Condition": {
"StringEquals": {
"aws:RequestedRegion": "us-east-1"
}
}
}
]
}- All secrets stored in GCP Secret Manager
- Automatic secret rotation (30 days)
- Secret access logged and monitored
- No secrets in code or environment variables (references only)
- Separate secrets for dev/staging/prod
def validate_event(event: Dict[str, Any]) -> bool:
"""Validate event structure and content."""
required_fields = ['eventName', 'eventTime', 'userIdentity']
# Check required fields
if not all(field in event for field in required_fields):
return False
# Validate timestamp
try:
datetime.fromisoformat(event['eventTime'])
except ValueError:
return False
# Sanitize user input
for key, value in event.items():
if isinstance(value, str):
event[key] = html.escape(value)
return True-
At Rest:
- GCS: AES-256 encryption (Google-managed keys)
- Secret Manager: AES-256 encryption
- Pub/Sub: Encrypted by default
-
In Transit:
- TLS 1.3 for all API calls
- HTTPS only for webhooks
- Certificate pinning for AWS API calls
All actions logged to Cloud Logging:
- Function invocations
- Detection results
- Remediation actions
- Secret access
- IAM changes
- Configuration changes
Log Retention:
- Critical logs: 365 days
- Security logs: 180 days
- Application logs: 90 days
- Debug logs: 30 days
Critical Alerts:
- Secret Manager access from unusual location
- High error rate in detections
- Remediation action failures
- Unexpected AWS API calls
- Function code changes
Alert Channels:
- Slack: Real-time alerts
- Email: Daily digest
- PagerDuty: Critical incidents
- SIEM: All security events
ML-based detection for:
- Unusual event patterns
- Abnormal function behavior
- Unexpected API call sequences
- Geographic anomalies
- Time-based anomalies
- Auto-revoke dangerous permissions
- Auto-block public S3 buckets
- Auto-disable compromised credentials
- Alert security team
- Create incident tickets
Severity Levels:
- P1 (Critical): Active attack, immediate response required
- P2 (High): Potential breach, urgent investigation
- P3 (Medium): Policy violation, scheduled review
- P4 (Low): Informational, routine monitoring
Response Procedures: See Incident Response section.
| Classification | Examples | Protection |
|---|---|---|
| TOP SECRET | AWS credentials | Secret Manager, access logged |
| CONFIDENTIAL | IAM events, detections | Encrypted, restricted access |
| INTERNAL | Audit logs | Encrypted, retention policy |
| PUBLIC | Documentation | Version controlled |
- AWS Credentials: Rotated every 30 days
- IAM Events: Retained for 90 days
- ML Training Data: Retained for 365 days
- Audit Logs: Retained for 180 days
- Backups: Retained for 30 days
- Secure deletion using gcloud commands
- Multi-pass overwrite for sensitive data
- Verification of deletion
- Audit log of disposal
Production Access:
- Only service accounts have access
- Human access requires break-glass procedure
- All access logged and monitored
- Time-limited access tokens
Development Access:
- Separate dev environment
- No access to production secrets
- Synthetic test data only
- Code review required for changes
For emergency production access:
- Submit break-glass request with justification
- Approval from security team (2 people)
- Time-limited access granted (1-4 hours)
- All actions logged
- Post-incident review required
- Event processing rate
- Detection accuracy (true/false positives)
- Remediation success rate
- Function error rate
- API latency
- Cost per event
-
Executive Dashboard
- Total threats detected
- Auto-remediation rate
- Current risk score
- Cost trends
-
Operations Dashboard
- Event volume
- Function performance
- Error rates
- Queue depths
-
Security Dashboard
- Detection breakdown by type
- ML anomaly trends
- Remediation actions
- Failed attempts
- Incident Commander: Security team lead
- Technical Lead: On-call engineer
- Communications: PR/Marketing
- Legal: Compliance officer
- Validate alert is legitimate
- Determine severity
- Assess scope of impact
- Begin timeline documentation
- Isolate affected systems
- Revoke compromised credentials
- Block malicious IPs
- Preserve evidence
- Remove malicious code
- Patch vulnerabilities
- Update detection rules
- Rotate all credentials
- Restore from backups if needed
- Re-enable services
- Verify system integrity
- Monitor for reinfection
- Root cause analysis
- Update runbooks
- Improve detections
- Security training
Internal:
- Slack: Immediate notification
- Email: Status updates every 2 hours
- Incident report: Within 24 hours
External:
- Customers: Within 4 hours (if affected)
- Regulators: As required by law
- Public: Via status page
- SOC 2 Type II: Security, availability, confidentiality
- PCI-DSS: If processing payment data
- GDPR: If processing EU personal data
- HIPAA: If processing health information
- CIS Benchmarks: AWS and GCP
| Control | Implementation |
|---|---|
| Access Control | IAM, Secret Manager |
| Audit Logging | Cloud Logging, 180-day retention |
| Encryption | At-rest and in-transit |
| Incident Response | Documented procedures |
| Vulnerability Management | Weekly scans |
| Change Management | Git, code review |
| Business Continuity | Disaster recovery plan |
All compliance-relevant events logged:
- User access
- Configuration changes
- Data access
- Remediation actions
- System changes
Log Format:
{
"timestamp": "2025-11-30T12:00:00Z",
"action": "secret.access",
"user": "service-account@project.iam.gserviceaccount.com",
"resource": "projects/123/secrets/aws-credentials",
"result": "success",
"ip_address": "10.0.0.1"
}- Daily: Automated security scans
- Weekly: Dependency vulnerability scans
- Monthly: Penetration testing
- Quarterly: Security audit
- Annually: Third-party assessment
- Credential Compromise: Simulate AWS credential theft
- Privilege Escalation: Attempt to gain admin access
- Detection Bypass: Try to evade detection rules
- DDoS: Flood system with events
- Code Injection: Attempt to modify function code
- Security Team: security-team@company.com
- CISO: ciso@company.com
- Incident Response: incident-response@company.com
- Bug Bounty: security@company.com
If you discover a security vulnerability:
- DO NOT open a public GitHub issue
- Email security@company.com with:
- Description of vulnerability
- Steps to reproduce
- Potential impact
- Your contact information
- We will respond within 24 hours
- We will fix critical issues within 7 days
- We will credit you (if desired) after fix
This document is reviewed and updated:
- Monthly: Routine review
- After incidents: Lessons learned
- After major changes: Architecture updates
Last Updated: 2025-11-30 Next Review: 2025-12-30