A comprehensive Kubernetes failure injection and testing framework for Bank of Anthos, enabling automated chaos engineering and resilience testing across isolated namespaces.
Get up and running in under 5 minutes:
# Clone the repository
git clone git@github.com:komodorio/failure-scenarios.git
cd failure-scenarios
# Launch the interactive menu
./start.shOr run individual commands:
# Deploy all scenarios to isolated namespaces (~3 min)
./batch/setup-all-scenarios.sh
# Check deployment status
./batch/check-all-scenarios.sh
# Inject all failures in parallel (~1-2 min)
./batch/inject-all-scenarios.sh
# Cleanup all namespaces
./batch/cleanup-all-scenarios.sh- ๐ฏ 11 Failure Scenarios - Database locks, OOM kills, policy violations, CronJob failures, and more
- โก Parallel Execution - Deploy 11 namespaces in parallel (~3 minutes)
- ๐ Isolated Testing - Each scenario runs in its own namespace
- ๐ Dynamic Discovery - Add new scenarios with zero code changes
- ๐ Multiple Interfaces - Interactive menus, individual commands, or batch operations
- ๐ ๏ธ Self-Contained Scenarios - Each scenario can inject and revert its own failures independently
- โป๏ธ Restore & Cleanup - Built-in restoration scripts to return to normal state
Before running this project, ensure you have:
| Tool | Minimum Version | Purpose |
|---|---|---|
| Kubernetes cluster | 1.19+ | Target environment for failure injection |
| kubectl | 1.30+ | Kubernetes CLI tool |
| helm | 3.0+ | Package manager for Kubernetes |
| jq | 1.6+ | JSON processor for parsing |
| bash | 4.0+ | Shell interpreter |
- Minimum: 5 CPU cores, 5GB RAM, 50GB storage
# macOS
brew install kubectl helm jq
# Ubuntu/Debian
sudo apt-get install kubectl helm jq
# Check versions
kubectl version --client
helm version --short
jq --versionFor a full summary of all available failure scenarios, see the Scenarios Overview Table.
./start.shProvides a menu-driven interface with options:
- Setup All Scenarios
- Check All Scenarios
- Inject All Scenarios
- Inject by Namespace (interactive)
- Restore All Scenarios
- Restore by Namespace (interactive)
- Cleanup All Scenarios
Each scenario is self-contained with inject and revert capabilities:
# Inject failure (default action)
./scenarios/bad-deployment-scenario.sh
./scenarios/bad-deployment-scenario.sh inject
# Revert failure
./scenarios/bad-deployment-scenario.sh revert
# With specific namespace
NAMESPACE="bad-deployment-scenario" ./scenarios/bad-deployment-scenario.sh inject
NAMESPACE="bad-deployment-scenario" ./scenarios/bad-deployment-scenario.sh revert
# Check status
kubectl get pods -n bad-deployment-scenario -w# Setup all scenarios in parallel
./batch/setup-all-scenarios.sh
# Inject all failures in parallel
./batch/inject-all-scenarios.sh
# Restore all scenarios
./batch/restore-all-scenarios.sh
# Check status of all scenarios
./batch/check-all-scenarios.sh- Architecture & Design - In-depth architectural decisions and patterns
- Multi-Namespace Setup - Complete guide to multi-namespace workflows
- Scenario Details - Individual scenario documentation
- Security Policy - Security best practices and vulnerability reporting
This system follows a modular, extensible architecture:
โโโ batch/ # Batch operations for all scenarios
โโโ scenarios/ # Individual failure injection scripts
โโโ lib/ # Shared utilities and helpers
โโโ config/ # Configuration files
โ โโโ namespace-mapping.conf # Maps scenarios to custom namespaces
โโโ bank-of-anthos/ # Google Cloud sample application
- Dynamic Scenario Discovery - Automatically discovers
*-scenario.shfiles - Custom Namespace Mapping - Scenarios deploy to creative movie-themed namespaces
- Namespace Override - Same scripts work for user and dedicated namespaces
- Parallel Execution - Background processes with PID tracking
- Shared Libraries - DRY principle with centralized utilities
Each scenario deploys to a custom namespace inspired by fictional cities from movies:
bad-deploymentโbank-of-springfield(The Simpsons Movie)database-lockโbank-of-punxsutawney(Groundhog Day)high-loadโbank-of-seahaven(The Truman Show)- And more! See config/namespace-mapping.conf for the complete list.
All namespaces include a timestamp suffix for isolation: bank-of-springfield-{timestamp}
- Create
scenarios/your-failure-scenario.sh - Follow the
*-scenario.shnaming convention - Implement both
inject_failure()andrevert_failure()functions - Use the template from
scenarios/README.md - Test both actions:
./scenarios/your-failure-scenario.sh inject ./scenarios/your-failure-scenario.sh revert
- Submit a pull request
No other code changes needed - scenarios are auto-discovered and automatically integrated!
$ ./batch/setup-all-scenarios.sh
========================================
Multi-Namespace Scenario Setup
========================================
Deploying Bank of Anthos to 14 scenario namespaces in parallel...
โ bad-deployment-scenario deployed (12.3s)
โ config-misconfigured-scenario deployed (11.8s)
โ database-lock-scenario deployed (13.1s)
โ failed-backup-cronjob-scenario deployed (12.0s)
โ helm-bad-upgrade-scenario deployed (12.7s)
โ high-load-scenario deployed (11.5s)
โ kyverno-policy-scenario deployed (14.2s)
โ limit-range-contacts-scenario deployed (12.9s)
โ missing-storage-class-scenario deployed (12.4s)
โ network-policy-scenario deployed (12.1s)
โ node-selector-scenario deployed (11.9s)
โ oom-killed-scenario deployed (13.4s)
โ resource-quota-scenario deployed (12.2s)
โ wrong-sa-scenario deployed (11.7s)
[SUCCESS] All 14 scenarios deployed successfully!
Total time: 3m 18s| Operation | Time | Notes |
|---|---|---|
| Setup all scenarios | ~3 min | 10+ namespaces in parallel (depends on the number of scenarios) |
| Inject all scenarios | ~1-2 min | Parallel failure injection |
| Check all scenarios | ~10 sec | Quick status overview |
| Cleanup all scenarios | ~2 min | Parallel namespace deletion |
This tool intentionally causes failures in Kubernetes clusters. Please review our Security Policy for:
- Vulnerability reporting
- Best practices
- Known security considerations
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
- Built on Bank of Anthos by Google Cloud
- Inspired by chaos engineering principles from Chaos Monkey
- Issues: GitHub Issues
- Security: security@komodor.io
- Documentation: See docs section above
Made with โค๏ธ by Komodor