Skip to content

Commit dcdd1ed

Browse files
oops almost forgot /capabilities
1 parent 33db2e7 commit dcdd1ed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+4959
-2
lines changed
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
---
2+
title: AWS Fault Injection Service
3+
description: Use Fault Injection Service to simulate faults in your infrastructure and test its fault tolerance.
4+
---
5+
6+
The [Fault Injection Service (FIS)](https://aws.amazon.com/fis/) is a fully managed service by AWS designed to help you improve the resilience of your applications by simulating real-world outages and operational issues.
7+
This service allows you to conduct controlled experiments on your AWS infrastructure, injecting faults and observing how your system responds under various conditions.
8+
9+
By using the Fault Injection Service, you can identify weaknesses, test recovery procedures, and ensure that your applications can withstand unexpected disruptions.
10+
This proactive approach to reliability engineering enables you to enhance system robustness, minimize downtime, and maintain a high level of service availability for your users.
11+
12+
{{< alert title="Note">}}
13+
Fault Injection Service emulation is available as part of the LocalStack Enterprise plan.
14+
If you'd like to try it out, please [contact us](https://www.localstack.cloud/demo) to request access.
15+
{{< /alert >}}
16+
17+
{{< callout "tip" >}}
18+
For more information, please refer to the [FIS service docs]({{< ref "user-guide/aws/fis" >}}).
19+
{{< /callout >}}
20+
21+
Some of the most important concepts associated with a FIS experiment are:
22+
23+
**1.
24+
Experiment Templates**: Experiment templates define the actions, targets, and any stop conditions for your experiment.
25+
They serve as blueprints for conducting fault injection experiments, allowing you to specify what resources are targeted, what faults are injected, and under what conditions the experiment should automatically stop.
26+
27+
**2.
28+
Actions**: Actions are the specific fault injection operations that the experiment performs on the target resources.
29+
These can be injecting latency or throttling to API requests, completely blocking access to instances, etc.
30+
Actions define the type of fault, parameters for the fault injection, and the targets affected.
31+
32+
**3.
33+
Targets**: Targets are the AWS resources on which the experiment actions will be applied.
34+
To make things even more fine-grained, a specific operation of the service can be targeted.
35+
36+
**4.
37+
Stop Conditions**: Stop conditions are criteria that, when met, will automatically stop the experiment.
38+
39+
**5.
40+
IAM Roles and Permissions**: To run experiments, AWS FIS requires specific IAM roles and permissions.
41+
These are necessary for AWS FIS to perform actions on your behalf, like injecting faults into your resources.
42+
43+
**6.
44+
Experiment Execution**: When you start an experiment, AWS FIS executes the actions defined in the experiment template against the specified targets, adhering to any defined stop conditions.
45+
The execution process is logged, and detailed information about the experiment's progress and outcome is provided.
Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
---
2+
title: Chaos API
3+
description: Simulate outages and network failures to test the resiliency of your infrastructure
4+
---
5+
6+
## Introduction
7+
8+
LocalStack Chaos API allows you to mimic outages across any AWS region or service.
9+
Intentionally triggering service outages and monitoring the system's response in situations where the infrastructure is compromised offers a powerful way to test.
10+
This strategy helps gauge the effectiveness of the system's deployment procedures and its resilience against infrastructure disruptions, which is a key element of chaos engineering.
11+
12+
You can use LocalStack Chaos API to cause API failures for any combination of the following:
13+
14+
- Service
15+
- Region
16+
- Operation
17+
18+
You can customise the HTTP error code and message that LocalStack responds with.
19+
If required, you can make the failures occur probabilistically.
20+
21+
Furthermore, the Chaos API can also be configured to add a network latency for all calls.
22+
23+
{{< alert title="Note">}}
24+
Chaos API is available as part of the LocalStack Enterprise plan.
25+
If you'd like to try it out, please [contact us](https://www.localstack.cloud/demo) to request access.
26+
{{< /alert >}}
27+
28+
## Prerequisites
29+
30+
The prerequisites for this guide are:
31+
32+
- LocalStack Pro with [LocalStack CLI](https://docs.localstack.cloud/getting-started/installation/#localstack-cli) & [LocalStack Auth Token](https://docs.localstack.cloud/getting-started/auth-token/)
33+
- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/)
34+
- [Python](https://www.python.org/downloads/)
35+
36+
## Configuration
37+
38+
The disruption types supported by Chaos API are broadly categorised into two groups.
39+
**Service Faults** lead to an application-level HTTP error in an AWS service, and **Network Effects** introduce network-level effects to all connections.
40+
41+
### Service Faults
42+
43+
Service faults can be configured using the endpoint at `/_localstack/chaos/faults`.
44+
The configuration schema consists of an array of one or more rules, where each rule specifies the conditions for the fault to occur.
45+
When active, rules are evaluated sequentially on every request to LocalStack until the first match.
46+
47+
The schema for the configuration is as follows.
48+
49+
```json
50+
[
51+
{
52+
"region": "(str) Region name, e.g. 'ap-south-1'. If omitted, all regions are affected.",
53+
"service": "(str) Name of the service, e.g. 'kinesis'. If omitted, all services are affected.",
54+
"operation": "(str) Name of the operation, e.g. 'PutRecord'. If omitted, all operations are affected.",
55+
"probability": "(num) Probability of invoking this rule, e.g. 0.5. If omitted, 1 is used.",
56+
"error": {
57+
"statusCode": "(int) HTTP status code to use in response, e.g. 503. If omitted, 503 is used.",
58+
"code": "(str) Descriptive error code used in response. If omitted, 'ServiceUnavailable' is used."
59+
}
60+
},
61+
...
62+
]
63+
```
64+
65+
The endpoint allows the following operations:
66+
- `GET`: Get current configuration
67+
- `POST`: Add new configuration
68+
- `PATCH`: Add a rule
69+
- `DELETE`: Delete a rule
70+
71+
An empty array `[]` disables the faults entirely, while an empty rule in the array `[{}]` causes all AWS operations to lead to faults.
72+
73+
### Network Effects
74+
75+
Network effects are configured using the endpoint `/_localstack/chaos/effects`.
76+
Currently the Chaos API only supports a latency factor.
77+
78+
```json
79+
{
80+
"latency": "(int) Network latency in milliseconds. By default, 0 is used."
81+
}
82+
```
83+
84+
This endpoint allows the following operations:
85+
- `GET`: Get current configuration
86+
- `POST`: Add new configuration
87+
88+
## Examples
89+
90+
To cause faults, make a POST request as follows:
91+
92+
{{< command >}}
93+
$ curl --location --request POST 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \
94+
--header 'Content-Type: application/json' \
95+
--data '
96+
[
97+
{
98+
"service": "s3",
99+
"region": "us-east-1"
100+
},
101+
{
102+
"service": "s3",
103+
"region": "ap-south-1"
104+
},
105+
{
106+
"service": "lambda"
107+
}
108+
]'
109+
{{< /command >}}
110+
111+
In this example, S3 is affected in `us-east-1` and `ap-south-1,` and Lambda is affected in all regions.
112+
All calls to these services in these regions will return a 503 Service Unavailable error.
113+
114+
To see this in action, try to create an S3 bucket in `us-east-1`:
115+
116+
{{< command >}}
117+
$ awslocal s3 mb s3://test-bucket --region us-east-1
118+
<disable-copy>
119+
make_bucket failed: s3://test-bucket An error occurred (ServiceUnavailableException) when calling the CreateBucket operation (reached max retries: 4): Service 's3' not accessible due to an outage
120+
</disable-copy>
121+
{{< /command >}}
122+
123+
However, the same operation, when run in `eu-central-1` will work as expected.
124+
125+
{{< command >}}
126+
$ awslocal s3 mb s3://test-bucket --region eu-central-1
127+
<disable-copy>
128+
make_bucket: test-bucket
129+
</disable-copy>
130+
{{< /command >}}
131+
132+
Faults can be disabled by setting an empty rule list in the configuration.
133+
The following request will clear the current configuration:
134+
135+
{{< command >}}
136+
$ curl --location --request POST 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \
137+
--header 'Content-Type: application/json' \
138+
--data '[]'
139+
{{< /command >}}
140+
141+
To retrieve the current configuration, make the following GET call:
142+
143+
{{< command >}}
144+
$ curl --location --request GET 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults'
145+
{{</ command >}}
146+
147+
To add a new rule to the current configuration, make a PATCH call as follows:
148+
149+
{{< command >}}
150+
$ curl --location --request PATCH 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \
151+
--header 'Content-Type: application/json' \
152+
--data '
153+
[
154+
{
155+
"service": "kinesis",
156+
"operation": "PutRecord",
157+
"probability": 0.3,
158+
"error": {
159+
"statusCode": 400,
160+
"code": "ProvisionedThroughputExceededException"
161+
}
162+
}
163+
]'
164+
{{</ command >}}
165+
166+
This new rule will cause probabilistic failures for Kinesis PutRecord operation.
167+
Here, the returned error is also customised to be HTTP 400 ProvisionedThroughputExceededException.
168+
169+
To remove a rule from the configuration, make a DELETE call as follows:
170+
171+
{{< command >}}
172+
$ curl --location --request DELETE 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \
173+
--header 'Content-Type: application/json' \
174+
--data '[{"service": "lambda"}]'
175+
{{</ command >}}
176+
177+
The rule to be removed must be exactly the same as in the existing configuration.
178+
179+
## Comparison with Fault Injection Service
180+
181+
AWS [Fault Injection Service (FIS)]({{< ref "fis" >}}) also allows controlled chaos engineering experiments on infrastructure.
182+
While similar in purpose, there are notable differences between FIS and LocalStack Chaos API.
183+
184+
This table highlights those differences, offering a detailed comparison of how each service approaches chaos engineering, their capabilities, and their integration options.
185+
186+
| **Aspect** | **AWS Fault Injection Service (FIS)** | **LocalStack Chaos API** |
187+
|-----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------|
188+
| **Fault Types** | • EC2 Stop/Terminate Instances<br>• RDS Reboot Instances<br>• SSM Send Command<br>• Inject API errors (e.g., `aws:fis:inject-api-internal-error` for EC2 only) | • API failures (HTTP error codes and messages for any service)<br>• Network effects (latency)<br>• Can be probabilistic and customized. |
189+
| **Procedural vs Declarative** | • Capable of running procedural experiments where it invokes API actions affecting AWS resources (e.g., `aws:ec2:stop-instances`). | • Focuses on declarative effects impacting the AWS API, such as returning errors or adding latency, without invoking AWS resource actions. |
190+
| **Experiment Execution** | • Requires creating and running controlled experiments with predefined templates. Systems are restored after disruption duration. | • Faults are applied dynamically based on configuration rules. Can inject faults on-the-fly without predefining experiments. |
191+
| **Customization** | • Limited to predefined actions (e.g., stopping EC2 instances, inducing specific errors like InternalError for EC2). | • Highly customizable, including probabilistic failures, custom error codes, HTTP status codes, and errors for any AWS operation. |
192+
| **Service Coverage** | • Covers specific AWS services such as EC2, RDS, and SSM. | • Covers all AWS services and operations (e.g., S3, Lambda, Kinesis) with no service-specific restrictions. |
193+
| **Network Effects** | • Not supported. | • Supports adding network latency to simulate slow network conditions. |
194+
| **API Interaction** |`create-experiment-template` to create templates<br>• `start-experiment` to begin experiments<br> |`POST` to `/chaos/faults` to configure faults<br>• `POST` to `/chaos/effects` to introduce network effects.<br> |
195+
| **Probabilistic Failure Injection** | • Not available. | • Supports probabilistic failure injection, introducing partial failures, which mimic intermittent outages. |
196+
| **Broader Fault Injection** | • Limited to predefined actions (e.g., stop instances, reboot databases, inject errors for specific services). | • Broader fault injection for any AWS operation (e.g., PutObject for S3, Invoke for Lambda). |
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
---
2+
title: Chaos Engineering Dashboard
3+
description: Chaos Engineering Dashboard allows users to run chaos experiments within their application stack to test the system's resilience.
4+
---
5+
6+
## Introduction
7+
8+
The Chaos Engineering Dashboard in LocalStack offers streamlined testing for cloud applications, enabling you to simulate server errors, service outages, regional disruptions, and network latency with ease, ensuring your app is ready for real-world challenges.
9+
10+
The dashboard uses [LocalStack Chaos API]({{< ref "chaos-api" >}}) under the hood to offer a set of customizable templates that can be seamlessly integrated into any automation workflows.
11+
12+
{{< figure src="chaos-engineering-dashboard.png" width="900" >}}
13+
14+
You can find this feature in the LocalStack Web Application by navigating to [**app.localstack.cloud/chaos-engineering**](https://app.localstack.cloud/chaos-engineering).
15+
16+
{{< callout "note" >}}
17+
Chaos Engineering Dashboard is offered as a **preview** feature and is under active development.
18+
{{< /callout >}}
19+
20+
## Features
21+
22+
The dashboard offers the following features:
23+
24+
* **DynamoDB Error**: Randomly inject `ProvisionedThroughputExceededException` errors into DynamoDB API responses.
25+
* **Kinesis Error**: Randomly inject `ProvisionedThroughputExceededException` errors into Kinesis API responses.
26+
* **500 Internal Error**: Randomly terminate incoming requests, returning an `Internal Server Error` with a response code of 500.
27+
* **Service Unavailable**: Cause a specified percentage of service API calls to receive a 503 `Service Unavailable` response.
28+
* **AWS Region Unavailable**: Simulate regional outages and failovers by disabling entire AWS regions.
29+
* **Latency**: Introduce specified latency to every API call, useful for simulating network latency or degraded network performance.
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
---
2+
title: Chaos Engineering
3+
description: Chaos Engineering with LocalStack enables you to build resilient systems early on in the development phase.
4+
---
5+
6+
## Introduction
7+
8+
Chaos engineering via LocalStack is a method to enhance system resilience by deliberately introducing controlled disruptions.
9+
This technique takes different forms depending on the team:
10+
11+
- Software developers focus on application behavior and error response
12+
- Architects concentrate on the strength of system design
13+
- Operations teams investigate the dependability of infrastructure setup.
14+
15+
Integrating chaos tests early in the development process helps identify and mitigate potential flaws, leading to systems that are more robust under stress and can withstand turbulent conditions.
16+
Chaos engineering in LocalStack encompasses the following features:
17+
18+
- **Application behavior and error management** through Fault Injection Service (FIS) experiments.
19+
- **Robust architecture** tested via failover scenarios using the Chaos API.
20+
- **Consistent infrastructure setup** under challenging conditions like outages, examined through automated provisioning processes.
21+
22+
The best way to understand concepts is through practice, so dive into our [chaos engineering tutorials]({{< ref "tutorials" >}}).

0 commit comments

Comments
 (0)