|
1 | 1 | # pulumi-aws-ec2-capacity-fallback |
2 | 2 |
|
3 | | -A Pulumi component that launches EC2 instances with automatic fallback across instance types and availability zones when AWS returns capacity errors. |
| 3 | +A Pulumi dynamic resource provider that launches EC2 instances with automatic capacity fallback. You provide an ordered list of instance types and subnets, and the component walks through each type/AZ combination until one launches successfully. If AWS returns `InsufficientInstanceCapacity` or similar errors, it moves on to the next option without failing the deployment. |
4 | 4 |
|
5 | | -## The problem |
| 5 | +This is particularly useful for GPU instance types (g6, g5, p5, etc.) where capacity is limited and unevenly distributed across availability zones. |
6 | 6 |
|
7 | | -When launching GPU instances (g6, g5, p5, etc.), AWS frequently returns `InsufficientInstanceCapacity` because GPU capacity is limited and unevenly distributed across AZs. This causes `pulumi up` to fail, requiring manual intervention to try a different instance type or AZ. |
| 7 | +Once an instance is running, the component locks the launched type and subnet. Changing your type preferences in config won't replace an existing instance -- use `pulumi up --replace <urn>` for that. |
8 | 8 |
|
9 | | -## The solution |
| 9 | +The component pre-filters combinations using `describe_instance_type_offerings` before attempting any launches, supports AZ suffix filtering, optional least-used subnet selection, and in-place updates for tags and security groups. |
10 | 10 |
|
11 | | -This component wraps EC2 instance creation with retry logic. You provide an ordered list of instance types, and the component: |
12 | | - |
13 | | -1. Checks `describe_instance_type_offerings` to skip types not offered in the target AZs |
14 | | -2. Attempts to launch each remaining type/AZ combination via `run_instances` |
15 | | -3. On `InsufficientInstanceCapacity`, `Unsupported`, or `InstanceLimitExceeded`, automatically tries the next combination |
16 | | -4. Once launched, the instance type is locked -- subsequent `pulumi up` runs will not replace the instance even if a different type is now preferred |
| 11 | +## Installation |
17 | 12 |
|
18 | | -## Features |
| 13 | +Install directly from GitHub: |
19 | 14 |
|
20 | | -- Automatic fallback across multiple instance types in priority order |
21 | | -- Automatic fallback across multiple availability zones |
22 | | -- Pre-flight offerings check to skip types not available in target AZs |
23 | | -- AZ filtering (e.g. restrict to AZ A and B only) |
24 | | -- Least-used subnet selection for balanced distribution |
25 | | -- In-place tag and security group updates without instance replacement |
26 | | -- Instance type locked after creation (use `pulumi up --replace <urn>` to force change) |
| 15 | +```bash |
| 16 | +uv add git+ssh://git@github.com/GremlinLTD/pulumi-python-aws-ec2-capacity-fallback.git |
| 17 | +``` |
27 | 18 |
|
28 | | -## Installation |
| 19 | +Or with pip: |
29 | 20 |
|
30 | 21 | ```bash |
31 | | -uv add pulumi-aws-ec2-capacity-fallback |
| 22 | +pip install git+ssh://git@github.com/GremlinLTD/pulumi-python-aws-ec2-capacity-fallback.git |
32 | 23 | ``` |
33 | 24 |
|
34 | | -Or with pip: |
| 25 | +To pin a specific version tag: |
35 | 26 |
|
36 | 27 | ```bash |
37 | | -pip install pulumi-aws-ec2-capacity-fallback |
| 28 | +uv add git+ssh://git@github.com/GremlinLTD/pulumi-python-aws-ec2-capacity-fallback.git@v0.1.0 |
38 | 29 | ``` |
39 | 30 |
|
40 | 31 | ## Usage |
@@ -121,14 +112,16 @@ instance = ResilientInstanceRaw( |
121 | 112 |
|
122 | 113 | ## How it handles re-runs |
123 | 114 |
|
124 | | -The component is designed to be safe on subsequent `pulumi up` runs: |
| 115 | +Subsequent `pulumi up` runs are safe: |
125 | 116 |
|
126 | 117 | - **Instance type changes in config**: ignored. The existing instance keeps its launched type. |
127 | 118 | - **Subnet/AZ changes in config**: ignored. The existing instance stays in its launched subnet. |
128 | 119 | - **Tag changes**: applied in-place (no instance replacement). |
129 | 120 | - **Security group changes**: applied in-place. |
130 | 121 | - **AMI changes**: triggers instance replacement (delete + create with fallback). |
131 | 122 | - **Volume size changes**: triggers instance replacement. |
| 123 | +- **Key name changes**: triggers instance replacement. |
| 124 | +- **User data changes**: triggers instance replacement. |
132 | 125 | - **Intentional type change**: use `pulumi up --replace <urn>` to force recreation. |
133 | 126 |
|
134 | 127 | ## Retryable errors |
|
0 commit comments