|
| 1 | +--- |
| 2 | +title: Chaos API |
| 3 | +description: Simulate outages and network failures to test the resiliency of your infrastructure |
| 4 | +--- |
| 5 | + |
| 6 | +## Introduction |
| 7 | + |
| 8 | +LocalStack Chaos API allows you to mimic outages across any AWS region or service. |
| 9 | +Intentionally triggering service outages and monitoring the system's response in situations where the infrastructure is compromised offers a powerful way to test. |
| 10 | +This strategy helps gauge the effectiveness of the system's deployment procedures and its resilience against infrastructure disruptions, which is a key element of chaos engineering. |
| 11 | + |
| 12 | +You can use LocalStack Chaos API to cause API failures for any combination of the following: |
| 13 | + |
| 14 | +- Service |
| 15 | +- Region |
| 16 | +- Operation |
| 17 | + |
| 18 | +You can customise the HTTP error code and message that LocalStack responds with. |
| 19 | +If required, you can make the failures occur probabilistically. |
| 20 | + |
| 21 | +Furthermore, the Chaos API can also be configured to add a network latency for all calls. |
| 22 | + |
| 23 | +{{< alert title="Note">}} |
| 24 | +Chaos API is available as part of the LocalStack Enterprise plan. |
| 25 | +If you'd like to try it out, please [contact us](https://www.localstack.cloud/demo) to request access. |
| 26 | +{{< /alert >}} |
| 27 | + |
| 28 | +## Prerequisites |
| 29 | + |
| 30 | +The prerequisites for this guide are: |
| 31 | + |
| 32 | +- LocalStack Pro with [LocalStack CLI](https://docs.localstack.cloud/getting-started/installation/#localstack-cli) & [LocalStack Auth Token](https://docs.localstack.cloud/getting-started/auth-token/) |
| 33 | +- [Docker](https://docs.docker.com/get-docker/) and [Docker Compose](https://docs.docker.com/compose/install/) |
| 34 | +- [Python](https://www.python.org/downloads/) |
| 35 | + |
| 36 | +## Configuration |
| 37 | + |
| 38 | +The disruption types supported by Chaos API are broadly categorised into two groups. |
| 39 | +**Service Faults** lead to an application-level HTTP error in an AWS service, and **Network Effects** introduce network-level effects to all connections. |
| 40 | + |
| 41 | +### Service Faults |
| 42 | + |
| 43 | +Service faults can be configured using the endpoint at `/_localstack/chaos/faults`. |
| 44 | +The configuration schema consists of an array of one or more rules, where each rule specifies the conditions for the fault to occur. |
| 45 | +When active, rules are evaluated sequentially on every request to LocalStack until the first match. |
| 46 | + |
| 47 | +The schema for the configuration is as follows. |
| 48 | + |
| 49 | +```json |
| 50 | +[ |
| 51 | + { |
| 52 | + "region": "(str) Region name, e.g. 'ap-south-1'. If omitted, all regions are affected.", |
| 53 | + "service": "(str) Name of the service, e.g. 'kinesis'. If omitted, all services are affected.", |
| 54 | + "operation": "(str) Name of the operation, e.g. 'PutRecord'. If omitted, all operations are affected.", |
| 55 | + "probability": "(num) Probability of invoking this rule, e.g. 0.5. If omitted, 1 is used.", |
| 56 | + "error": { |
| 57 | + "statusCode": "(int) HTTP status code to use in response, e.g. 503. If omitted, 503 is used.", |
| 58 | + "code": "(str) Descriptive error code used in response. If omitted, 'ServiceUnavailable' is used." |
| 59 | + } |
| 60 | + }, |
| 61 | + ... |
| 62 | +] |
| 63 | +``` |
| 64 | + |
| 65 | +The endpoint allows the following operations: |
| 66 | +- `GET`: Get current configuration |
| 67 | +- `POST`: Add new configuration |
| 68 | +- `PATCH`: Add a rule |
| 69 | +- `DELETE`: Delete a rule |
| 70 | + |
| 71 | +An empty array `[]` disables the faults entirely, while an empty rule in the array `[{}]` causes all AWS operations to lead to faults. |
| 72 | + |
| 73 | +### Network Effects |
| 74 | + |
| 75 | +Network effects are configured using the endpoint `/_localstack/chaos/effects`. |
| 76 | +Currently the Chaos API only supports a latency factor. |
| 77 | + |
| 78 | +```json |
| 79 | +{ |
| 80 | + "latency": "(int) Network latency in milliseconds. By default, 0 is used." |
| 81 | +} |
| 82 | +``` |
| 83 | + |
| 84 | +This endpoint allows the following operations: |
| 85 | +- `GET`: Get current configuration |
| 86 | +- `POST`: Add new configuration |
| 87 | + |
| 88 | +## Examples |
| 89 | + |
| 90 | +To cause faults, make a POST request as follows: |
| 91 | + |
| 92 | +{{< command >}} |
| 93 | +$ curl --location --request POST 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \ |
| 94 | +--header 'Content-Type: application/json' \ |
| 95 | +--data ' |
| 96 | +[ |
| 97 | + { |
| 98 | + "service": "s3", |
| 99 | + "region": "us-east-1" |
| 100 | + }, |
| 101 | + { |
| 102 | + "service": "s3", |
| 103 | + "region": "ap-south-1" |
| 104 | + }, |
| 105 | + { |
| 106 | + "service": "lambda" |
| 107 | + } |
| 108 | +]' |
| 109 | +{{< /command >}} |
| 110 | + |
| 111 | +In this example, S3 is affected in `us-east-1` and `ap-south-1,` and Lambda is affected in all regions. |
| 112 | +All calls to these services in these regions will return a 503 Service Unavailable error. |
| 113 | + |
| 114 | +To see this in action, try to create an S3 bucket in `us-east-1`: |
| 115 | + |
| 116 | +{{< command >}} |
| 117 | +$ awslocal s3 mb s3://test-bucket --region us-east-1 |
| 118 | +<disable-copy> |
| 119 | +make_bucket failed: s3://test-bucket An error occurred (ServiceUnavailableException) when calling the CreateBucket operation (reached max retries: 4): Service 's3' not accessible due to an outage |
| 120 | +</disable-copy> |
| 121 | +{{< /command >}} |
| 122 | + |
| 123 | +However, the same operation, when run in `eu-central-1` will work as expected. |
| 124 | + |
| 125 | +{{< command >}} |
| 126 | +$ awslocal s3 mb s3://test-bucket --region eu-central-1 |
| 127 | +<disable-copy> |
| 128 | +make_bucket: test-bucket |
| 129 | +</disable-copy> |
| 130 | +{{< /command >}} |
| 131 | + |
| 132 | +Faults can be disabled by setting an empty rule list in the configuration. |
| 133 | +The following request will clear the current configuration: |
| 134 | + |
| 135 | +{{< command >}} |
| 136 | +$ curl --location --request POST 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \ |
| 137 | +--header 'Content-Type: application/json' \ |
| 138 | +--data '[]' |
| 139 | +{{< /command >}} |
| 140 | + |
| 141 | +To retrieve the current configuration, make the following GET call: |
| 142 | + |
| 143 | +{{< command >}} |
| 144 | +$ curl --location --request GET 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' |
| 145 | +{{</ command >}} |
| 146 | + |
| 147 | +To add a new rule to the current configuration, make a PATCH call as follows: |
| 148 | + |
| 149 | +{{< command >}} |
| 150 | +$ curl --location --request PATCH 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \ |
| 151 | +--header 'Content-Type: application/json' \ |
| 152 | +--data ' |
| 153 | +[ |
| 154 | + { |
| 155 | + "service": "kinesis", |
| 156 | + "operation": "PutRecord", |
| 157 | + "probability": 0.3, |
| 158 | + "error": { |
| 159 | + "statusCode": 400, |
| 160 | + "code": "ProvisionedThroughputExceededException" |
| 161 | + } |
| 162 | + } |
| 163 | +]' |
| 164 | +{{</ command >}} |
| 165 | + |
| 166 | +This new rule will cause probabilistic failures for Kinesis PutRecord operation. |
| 167 | +Here, the returned error is also customised to be HTTP 400 ProvisionedThroughputExceededException. |
| 168 | + |
| 169 | +To remove a rule from the configuration, make a DELETE call as follows: |
| 170 | + |
| 171 | +{{< command >}} |
| 172 | +$ curl --location --request DELETE 'http://localhost.localstack.cloud:4566/_localstack/chaos/faults' \ |
| 173 | +--header 'Content-Type: application/json' \ |
| 174 | +--data '[{"service": "lambda"}]' |
| 175 | +{{</ command >}} |
| 176 | + |
| 177 | +The rule to be removed must be exactly the same as in the existing configuration. |
| 178 | + |
| 179 | +## Comparison with Fault Injection Service |
| 180 | + |
| 181 | +AWS [Fault Injection Service (FIS)]({{< ref "fis" >}}) also allows controlled chaos engineering experiments on infrastructure. |
| 182 | +While similar in purpose, there are notable differences between FIS and LocalStack Chaos API. |
| 183 | + |
| 184 | +This table highlights those differences, offering a detailed comparison of how each service approaches chaos engineering, their capabilities, and their integration options. |
| 185 | + |
| 186 | +| **Aspect** | **AWS Fault Injection Service (FIS)** | **LocalStack Chaos API** | |
| 187 | +|-----------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------| |
| 188 | +| **Fault Types** | • EC2 Stop/Terminate Instances<br>• RDS Reboot Instances<br>• SSM Send Command<br>• Inject API errors (e.g., `aws:fis:inject-api-internal-error` for EC2 only) | • API failures (HTTP error codes and messages for any service)<br>• Network effects (latency)<br>• Can be probabilistic and customized. | |
| 189 | +| **Procedural vs Declarative** | • Capable of running procedural experiments where it invokes API actions affecting AWS resources (e.g., `aws:ec2:stop-instances`). | • Focuses on declarative effects impacting the AWS API, such as returning errors or adding latency, without invoking AWS resource actions. | |
| 190 | +| **Experiment Execution** | • Requires creating and running controlled experiments with predefined templates. Systems are restored after disruption duration. | • Faults are applied dynamically based on configuration rules. Can inject faults on-the-fly without predefining experiments. | |
| 191 | +| **Customization** | • Limited to predefined actions (e.g., stopping EC2 instances, inducing specific errors like InternalError for EC2). | • Highly customizable, including probabilistic failures, custom error codes, HTTP status codes, and errors for any AWS operation. | |
| 192 | +| **Service Coverage** | • Covers specific AWS services such as EC2, RDS, and SSM. | • Covers all AWS services and operations (e.g., S3, Lambda, Kinesis) with no service-specific restrictions. | |
| 193 | +| **Network Effects** | • Not supported. | • Supports adding network latency to simulate slow network conditions. | |
| 194 | +| **API Interaction** | • `create-experiment-template` to create templates<br>• `start-experiment` to begin experiments<br> | • `POST` to `/chaos/faults` to configure faults<br>• `POST` to `/chaos/effects` to introduce network effects.<br> | |
| 195 | +| **Probabilistic Failure Injection** | • Not available. | • Supports probabilistic failure injection, introducing partial failures, which mimic intermittent outages. | |
| 196 | +| **Broader Fault Injection** | • Limited to predefined actions (e.g., stop instances, reboot databases, inject errors for specific services). | • Broader fault injection for any AWS operation (e.g., PutObject for S3, Invoke for Lambda). | |
0 commit comments