Skip to content

Commit 61562d3

Browse files
author
Nitesh
committed
feat: Add KRKN Chaos Template Library - Issue krkn-chaos#1168
- Add 9 comprehensive chaos scenario templates - Implement CLI template management system - Add template listing, details, and execution - Include parameter customization support - Add comprehensive documentation - Fix schema paths and error handling - Add test suite with 6/6 tests passing Templates included: - pod-failure, node-failure, network-latency - cpu-stress, disk-stress, pod-kill - container-restart, vm-outage, resource-failure Closes krkn-chaos#1168 Signed-off-by: Nitesh <nitesh@example.com>
1 parent 2a60a51 commit 61562d3

34 files changed

+2549
-0
lines changed

README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,32 @@
99
Chaos and resiliency testing tool for Kubernetes.
1010
Kraken injects deliberate failures into Kubernetes clusters to check if it is resilient to turbulent conditions.
1111

12+
## 🚀 New: Chaos Template Library
13+
14+
KRKN now includes a comprehensive **Chaos Template Library** with pre-configured scenarios for quick execution:
15+
16+
- **Pod Failure**: Test application restart policies
17+
- **Node Failure**: Validate cluster self-healing
18+
- **Network Latency**: Test performance under poor network
19+
- **CPU/Disk Stress**: Identify resource bottlenecks
20+
- **VM Outage**: OpenShift Virtualization testing
21+
- And more!
22+
23+
### Quick Start with Templates
24+
25+
```bash
26+
# List available templates
27+
python krkn/template_manager.py list
28+
29+
# Run a pod failure test
30+
python krkn/template_manager.py run pod-failure
31+
32+
# Customize with parameters
33+
python krkn/template_manager.py run network-latency --param latency="200ms"
34+
```
35+
36+
📖 **[Full Template Documentation](docs/chaos-templates.md)**
37+
1238

1339
### Workflow
1440
![Kraken workflow](media/kraken-workflow.png)

docs/chaos-templates.md

Lines changed: 302 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,302 @@
1+
# KRKN Chaos Templates
2+
3+
This guide covers the KRKN Chaos Template Library, which provides pre-configured chaos scenarios for quick execution and testing.
4+
5+
## Overview
6+
7+
The KRKN Chaos Template Library offers ready-to-use chaos engineering scenarios that can be easily customized and executed. These templates follow a standardized structure and cover common failure patterns in Kubernetes environments.
8+
9+
## Available Templates
10+
11+
### Core Templates
12+
13+
| Template | Description | Risk Level | Category |
14+
|----------|-------------|------------|----------|
15+
| **pod-failure** | Simulates pod crash to test application resiliency | Medium | Availability |
16+
| **node-failure** | Simulates node failure to test cluster resiliency | High | Availability |
17+
| **network-latency** | Introduces network latency to test performance | Low | Performance |
18+
| **cpu-stress** | Applies CPU stress to test performance under load | Medium | Performance |
19+
| **disk-stress** | Applies disk I/O stress to test storage performance | Medium | Performance |
20+
| **pod-kill** | Forcefully terminates pods to test recovery | Medium | Availability |
21+
| **container-restart** | Restarts containers to test container-level recovery | Low | Availability |
22+
| **vm-outage** | Simulates VM outage for OpenShift Virtualization | High | Availability |
23+
| **resource-failure** | Simulates Kubernetes resource failures | Medium | Availability |
24+
25+
## Quick Start
26+
27+
### Installation
28+
29+
The template system is included with KRKN. No additional installation is required.
30+
31+
### Listing Available Templates
32+
33+
```bash
34+
# Using the template manager directly
35+
python krkn/template_manager.py list
36+
37+
# Using the template wrapper
38+
python krkn-template list
39+
```
40+
41+
### Running a Template
42+
43+
```bash
44+
# Run a template with default parameters
45+
python krkn/template_manager.py run pod-failure
46+
47+
# Or using the template wrapper
48+
python krkn-template run pod-failure
49+
```
50+
51+
### Viewing Template Details
52+
53+
```bash
54+
# Show detailed information about a template
55+
python krkn/template_manager.py show pod-failure
56+
57+
# Include README content
58+
python krkn/template_manager.py show pod-failure --show-readme
59+
```
60+
61+
## Template Customization
62+
63+
### Parameter Overrides
64+
65+
You can customize templates by overriding parameters:
66+
67+
```bash
68+
python krkn/template_manager.py run pod-failure \
69+
--param name_pattern="^nginx-.*$" \
70+
--param namespace_pattern="^production$" \
71+
--param kill=2
72+
```
73+
74+
### Common Parameters
75+
76+
Most templates support these common parameters:
77+
78+
- **name_pattern**: Regex pattern for resource names
79+
- **namespace_pattern**: Regex pattern for namespaces
80+
- **timeout**: Operation timeout in seconds
81+
- **recovery_time**: Recovery monitoring duration
82+
83+
## Template Structure
84+
85+
Each template follows this structure:
86+
87+
```
88+
templates/chaos-scenarios/
89+
└── template-name/
90+
├── scenario.yaml # Main chaos configuration
91+
├── metadata.yaml # Template metadata and parameters
92+
└── README.md # Detailed documentation
93+
```
94+
95+
### scenario.yaml
96+
97+
Contains the actual chaos scenario configuration in KRKN format.
98+
99+
### metadata.yaml
100+
101+
Contains template metadata including:
102+
103+
```yaml
104+
name: template-name
105+
description: Brief description of the template
106+
target: kubernetes-pod|kubernetes-node|kubernetes-network
107+
risk_level: low|medium|high
108+
category: availability|performance
109+
version: "1.0"
110+
author: KRKN Team
111+
tags:
112+
- tag1
113+
- tag2
114+
estimated_duration: "2-5 minutes"
115+
dependencies: []
116+
parameters:
117+
- name: parameter_name
118+
type: string|integer|boolean
119+
description: Parameter description
120+
default: default_value
121+
```
122+
123+
### README.md
124+
125+
Comprehensive documentation including:
126+
127+
- Use cases
128+
- Prerequisites
129+
- Usage examples
130+
- Expected behavior
131+
- Customization options
132+
- Troubleshooting guide
133+
134+
## Usage Examples
135+
136+
### Pod Failure Testing
137+
138+
```bash
139+
# Test pod failure with default settings
140+
python krkn-template run pod-failure
141+
142+
# Target specific application
143+
python krkn-template run pod-failure \
144+
--param name_pattern="^frontend-.*$" \
145+
--param namespace_pattern="^production$"
146+
147+
# Kill multiple pods
148+
python krkn-template run pod-failure \
149+
--param kill=3 \
150+
--param krkn_pod_recovery_time=180
151+
```
152+
153+
### Network Latency Testing
154+
155+
```bash
156+
# Add 100ms latency
157+
python krkn-template run network-latency
158+
159+
# Custom latency settings
160+
python krkn-template run network-latency \
161+
--param latency="200ms" \
162+
--param jitter="20ms" \
163+
--param duration=120
164+
```
165+
166+
### CPU Stress Testing
167+
168+
```bash
169+
# Apply 80% CPU load
170+
python krkn-template run cpu-stress
171+
172+
# High intensity stress
173+
python krkn-template run cpu-stress \
174+
--param cpu-load-percentage=95 \
175+
--param duration=300 \
176+
--param number-of-nodes=2
177+
```
178+
179+
### Node Failure Testing
180+
181+
```bash
182+
# Test single node failure
183+
python krkn-template run node-failure
184+
185+
# Target specific nodes
186+
python krkn-template run node-failure \
187+
--param label_selector="node-role.kubernetes.io/app=" \
188+
--param instance_count=1
189+
```
190+
191+
## Best Practices
192+
193+
### Before Running Templates
194+
195+
1. **Test in Non-Production**: Always test templates in development/staging environments first.
196+
2. **Check Prerequisites**: Ensure all prerequisites are met for the target template.
197+
3. **Monitor Resources**: Verify sufficient cluster resources are available.
198+
4. **Backup Data**: Ensure critical data is backed up before running high-risk templates.
199+
200+
### During Execution
201+
202+
1. **Monitor Health**: Watch cluster and application health metrics.
203+
2. **Check Logs**: Monitor KRKN and application logs for issues.
204+
3. **Abort if Necessary**: Stop execution if unexpected issues occur.
205+
4. **Document Results**: Record outcomes and observations.
206+
207+
### After Execution
208+
209+
1. **Verify Recovery**: Ensure all resources have recovered properly.
210+
2. **Review Logs**: Analyze logs for insights and improvements.
211+
3. **Update Configurations**: Adjust application configurations based on results.
212+
4. **Document Learnings**: Record findings for future reference.
213+
214+
## Risk Management
215+
216+
### Risk Levels
217+
218+
- **Low**: Minimal impact, unlikely to cause service disruption
219+
- **Medium**: May cause temporary service disruption
220+
- **High**: Can cause significant service disruption
221+
222+
### Safety Measures
223+
224+
1. **Start Small**: Begin with low-risk templates and low intensity settings.
225+
2. **Gradual Increase**: Slowly increase intensity and complexity.
226+
3. **Time Restrictions**: Run chaos experiments during maintenance windows.
227+
4. **Rollback Plans**: Have clear rollback procedures ready.
228+
229+
## Integration with CI/CD
230+
231+
### GitHub Actions Example
232+
233+
```yaml
234+
- name: Run Chaos Test
235+
run: |
236+
python krkn-template run pod-failure \
237+
--param name_pattern="^app-.*$" \
238+
--param namespace_pattern="^testing$"
239+
```
240+
241+
### Jenkins Pipeline Example
242+
243+
```groovy
244+
stage('Chaos Test') {
245+
steps {
246+
sh 'python krkn-template run network-latency --param latency="50ms"'
247+
}
248+
}
249+
```
250+
251+
## Troubleshooting
252+
253+
### Common Issues
254+
255+
1. **Template Not Found**: Check template name spelling and templates directory path.
256+
2. **Permission Denied**: Verify RBAC permissions for KRKN service account.
257+
3. **Resource Not Found**: Ensure target resources exist and are accessible.
258+
4. **Timeout Errors**: Increase timeout values for slow clusters.
259+
260+
### Debug Mode
261+
262+
Enable debug logging for detailed troubleshooting:
263+
264+
```bash
265+
python krkn-template run pod-failure --debug
266+
```
267+
268+
### Log Locations
269+
270+
- KRKN logs: Console output and report files
271+
- Application logs: Kubernetes pod logs
272+
- System logs: Node system logs (if accessible)
273+
274+
## Contributing Templates
275+
276+
### Creating New Templates
277+
278+
1. Create directory under `templates/chaos-scenarios/`
279+
2. Add `scenario.yaml`, `metadata.yaml`, and `README.md`
280+
3. Follow the established structure and naming conventions
281+
4. Test thoroughly before submitting
282+
283+
### Template Guidelines
284+
285+
- Use descriptive names and clear documentation
286+
- Include comprehensive parameter descriptions
287+
- Provide multiple usage examples
288+
- Include troubleshooting sections
289+
- Follow KRKN coding standards
290+
291+
## Support
292+
293+
For issues related to the template system:
294+
295+
1. Check the template README files
296+
2. Review KRKN documentation
297+
3. Search existing GitHub issues
298+
4. Create new issues with detailed information
299+
300+
## Integration with Scenarios Hub
301+
302+
The template system is designed to integrate with the [KRKN Scenarios Hub](https://github.com/krkn-chaos/scenarios-hub). Templates can be contributed to the hub for community sharing and collaboration.

krkn-template

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/usr/bin/env python3
2+
"""
3+
KRKN Template CLI Wrapper
4+
5+
This script provides a convenient command-line interface for managing
6+
and running KRKN chaos scenario templates.
7+
"""
8+
9+
import sys
10+
import os
11+
12+
# Add the current directory to Python path to import krkn modules
13+
sys.path.insert(0, os.path.dirname(os.path.abspath(__file__)))
14+
15+
from krkn.template_manager import main
16+
17+
if __name__ == "__main__":
18+
main()

0 commit comments

Comments
 (0)