You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 03-path-application-development/310-chaos-engineering/readme.adoc
+30-16Lines changed: 30 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,10 +26,10 @@ An empirical process, Chaos Engineering experiments exercise a distributed syste
26
26
27
27
The link:http://principlesofchaos.org/["Principles of Chaos"] define the practical process that Chaos Engineering executes as:
28
28
29
-
1. Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior.
30
-
2. Hypothesize that this steady state will continue in both the control group and the experimental group.
31
-
3. Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc.
32
-
4. Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group.
29
+
. Start by defining ‘steady state’ as some measurable output of a system that indicates normal behavior.
30
+
. Hypothesize that this steady state will continue in both the control group and the experimental group.
31
+
. Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc.
32
+
. Try to disprove the hypothesis by looking for a difference in steady state between the control group and the experimental group.
33
33
34
34
In this chapter we will explore implementing this process using the free and open source link:http://chaostoolkit.org/[Chaos Toolkit].
35
35
@@ -58,7 +58,17 @@ Wait for approximately 3 mins for the load balancer to accept request.
58
58
59
59
A link:http://chaostoolkit.org/[Chaos Toolkit] experiment is defined using a link:http://chaostoolkit.org/reference/api/experiment/[JSON file format].
60
60
61
-
In addition the experiment also begins with some header information that describes the experiment being conducted:
61
+
Each experiment consists of:
62
+
63
+
. Header
64
+
. Steady-state
65
+
. Method & Probes
66
+
67
+
Let's look at how each of these are defined next.
68
+
69
+
=== Header
70
+
71
+
The experiment begins with some header information that describes the experiment being conducted:
62
72
63
73
[source, JSON]
64
74
----
@@ -80,7 +90,7 @@ In addition the experiment also begins with some header information that describ
80
90
81
91
The `version` describes the version of the experiment definition being followed. `title` and `description` describe the experimental hypothesis being explored.
82
92
83
-
It is typical to build up a catalogue of experiments when exploring the weaknesses in a system, and so `tags` are used to provide searchable labels to make that catalogue more easily navigable.
93
+
It is typical to build up a catalog of experiments when exploring the weaknesses in a system, and so `tags` are used to provide searchable labels to make that catalogue more easily navigable.
84
94
85
95
Finally `configuration` is used to supply configuration parameters to the experiment, in this case populating the `web_app_url` configuration parameter with the contents of the `WEBAPP_URL` environment variable.
86
96
@@ -90,9 +100,11 @@ Steady-State defines how a system should observably respond, often within a tole
90
100
91
101
For the sample application, steady-state could be defined as:
92
102
93
-
"The root URL of the `webapp` microservice should always respond with a `200 OK` link:https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html[HTTP Status Code] within a timeout of 3 seconds."
103
+
***********
104
+
The root URL of the `webapp` microservice should always respond with a `200 OK` link:https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html[HTTP Status Code] within a timeout of 3 seconds.
105
+
***********
94
106
95
-
Using the link:http://chaostoolkit.org/[Chaos Toolkit's] JSON experiment definition format, this steady-state can be defined as:
107
+
Using the http://chaostoolkit.org/reference/api/experiment/#steady-state-hypothesis[Chaos Toolkit's JSON experiment definition format], steady-state hypothesis can be defined as:
96
108
97
109
[source, JSON]
98
110
----
@@ -132,13 +144,11 @@ Steady-state begins with a `title`, which describes what the steady-state repres
132
144
133
145
In this case the probes detect that all the pods are in the `running` phase, and that the URL, supplied by the `web_app_url` configuration parameter, returns the specified status code, `200`, within the specified timeout, `3` seconds.
134
146
135
-
=== Defining the Experimental Method
147
+
=== Method & Probes
136
148
137
-
Step 3 of the chaos engineering process is:
149
+
The last step of the Chaos Engineering process is to introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc.
138
150
139
-
3. Introduce variables that reflect real world events like servers that crash, hard drives that malfunction, network connections that are severed, etc.
140
-
141
-
These _variables_ are introduced through the link:http://chaostoolkit.org/[Chaos Toolkit's] experimental `method`:
151
+
These _variables_ are introduced using `method`:
142
152
143
153
[source, JSON]
144
154
----
@@ -173,10 +183,14 @@ These _variables_ are introduced through the link:http://chaostoolkit.org/[Chaos
173
183
],
174
184
----
175
185
176
-
This experiment's method first has an `action` that kills all pods that have the label of `app=greeter-pod`. Often link:http://chaostoolkit.org/[Chaos Toolkit] experimental methods only contain actions, as it is the actions that manipulate the real-world variables of the distributed system.
186
+
This experiment's method first has an `action` that kills all pods that have the label of `app=greeter-pod`. Often Chaos Toolkit experimental methods only contain actions, as it is the actions that manipulate the real-world variables of the distributed system.
177
187
178
188
In this experiment's case there is _also_ a `probe` in the method. Probes in an experiment's method give us a chance to collate more information as the real-world variables are being manipulated by the experiment. The `probe` here extends the output of the experiment with the logs from pods labelled with `app==webapp-pod`.
179
189
190
+
Install the Kubernetes extension for Chaos Toolkit:
191
+
192
+
pip install chaostoolkit-kubernetes
193
+
180
194
=== Rollbacks
181
195
182
196
It is sometimes useful to supply an additional set of actions at the end of an experiment so that any actions in the method that were undertaken can be explicitly reversed. These are contained in a `rollback` section, but as Kubernetes will recover from this experiment's actions anyway there are no rollback actions required in this case:
@@ -193,9 +207,9 @@ This completes the experiment definition.
193
207
194
208
With your cluster running you will first need to ensure you populate the `WEBAPP_URL` environment variable with the URL of your cluster's `webapp-service` endpoint.
195
209
196
-
$ export WEBAPP_URL="http://$(kubectl get svc/webapp-service -o jsonpath={.status.loadBalancer.ingress[0].ip})/"
210
+
$ export WEBAPP_URL="http://$(kubectl get svc/webapp-service -o jsonpath={.status.loadBalancer.ingress[0].hostname})/"
197
211
198
-
Now you can run the link:./experiments/experiment.json[experiment] using the link:http://chaostoolkit.org/[Chaos Toolkit's] `chaos run` command:
212
+
Now you can run the link:./experiments/experiment.json[experiment] using the `chaos run` command:
199
213
200
214
$ chaos run experiment.json
201
215
[2018-03-10 14:42:38 INFO] Validating the experiment's syntax
0 commit comments