|
1 | 1 | # Using spot instances
|
2 | 2 |
|
3 |
| -Spot instances usually cost around 30-70% less than an on-demand instance. So using them for your EKS workloads can save a lot of money but requires some special considerations as they will be terminated with only 2 minutes warning. |
| 3 | +Spot instances usually cost around 30-70% less than an on-demand instance. So using them for your EKS workloads can save a lot of money but requires some special considerations as they could be terminated with only 2 minutes warning. |
4 | 4 |
|
5 | 5 | You need to install a daemonset to catch the 2 minute warning before termination. This will ensure the node is gracefully drained before termination. You can install the [k8s-spot-termination-handler](https://github.com/kube-aws/kube-spot-termination-notice-handler) for this. There's a [Helm chart](https://github.com/helm/charts/tree/master/stable/k8s-spot-termination-handler):
|
6 | 6 |
|
7 | 7 | ```
|
8 | 8 | helm install stable/k8s-spot-termination-handler --namespace kube-system
|
9 | 9 | ```
|
10 | 10 |
|
11 |
| -In the following examples at least 1 worker group that uses on-demand instances is included. This worker group has an added node label that can be used in scheduling. This could be used to schedule any workload but is important for the [cluster-autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) as it might be end up unscheduled when spot instances are terminated. You can add this to the values of the [cluster-autoscaler helm chart](https://github.com/helm/charts/tree/master/stable/cluster-autoscaler): |
| 11 | +In the following examples at least 1 worker group that uses on-demand instances is included. This worker group has an added node label that can be used in scheduling. This could be used to schedule any workload not suitable for spot instances but is important for the [cluster-autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) as it might be end up unscheduled when spot instances are terminated. You can add this to the values of the [cluster-autoscaler helm chart](https://github.com/helm/charts/tree/master/stable/cluster-autoscaler): |
12 | 12 |
|
13 | 13 | ```yaml
|
14 | 14 | nodeSelector:
|
15 |
| - spot: "false" |
| 15 | + kubernetes.io/lifecycle: spot |
16 | 16 | ```
|
17 | 17 |
|
18 | 18 | Notes:
|
19 | 19 |
|
20 | 20 | - The `spot_price` is set to the on-demand price so that the spot instances will run as long as they are the cheaper.
|
21 | 21 | - It's best to have a broad range of instance types to ensure there's always some instances to run when prices fluctuate.
|
22 |
| -- Using an [AWS Spot Fleet](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-fleet-requests.html) is the best option but is not supported by this module yet. |
23 | 22 | - There is an AWS blog article about this [here](https://aws.amazon.com/blogs/compute/run-your-kubernetes-workloads-on-amazon-ec2-spot-instances-with-amazon-eks/).
|
24 | 23 | - Consider using [k8s-spot-rescheduler](https://github.com/pusher/k8s-spot-rescheduler) to move pods from on-demand to spot instances.
|
25 | 24 |
|
26 | 25 | ## Using Launch Configuration
|
27 | 26 |
|
28 |
| -Example Terraform worker group configuration that use an ASG with launch configuration: |
| 27 | +Example worker group configuration that uses an ASG with launch configuration for each worker group: |
29 | 28 |
|
30 | 29 | ```hcl
|
31 |
| -worker_group_count = 3 |
32 |
| -
|
33 |
| -worker_groups = [ |
34 |
| - { |
35 |
| - name = "on-demand-1" |
36 |
| - instance_type = "m4.xlarge" |
37 |
| - asg_max_size = 1 |
38 |
| - autoscaling_enabled = true |
39 |
| - kubelet_extra_args = "--node-labels=spot=false" |
40 |
| - suspended_processes = "AZRebalance" |
41 |
| - }, |
42 |
| - { |
43 |
| - name = "spot-1" |
44 |
| - spot_price = "0.39" |
45 |
| - instance_type = "c4.2xlarge" |
46 |
| - asg_max_size = 20 |
47 |
| - autoscaling_enabled = true |
48 |
| - kubelet_extra_args = "--node-labels=spot=true" |
49 |
| - suspended_processes = "AZRebalance" |
50 |
| - }, |
51 |
| - { |
52 |
| - name = "spot-2" |
53 |
| - spot_price = "0.40" |
54 |
| - instance_type = "m4.2xlarge" |
55 |
| - asg_max_size = 20 |
56 |
| - autoscaling_enabled = true |
57 |
| - kubelet_extra_args = "--node-labels=spot=true" |
58 |
| - suspended_processes = "AZRebalance" |
59 |
| - } |
60 |
| -] |
| 30 | + worker_group_count = 3 |
| 31 | +
|
| 32 | + worker_groups = [ |
| 33 | + { |
| 34 | + name = "on-demand-1" |
| 35 | + instance_type = "m4.xlarge" |
| 36 | + asg_max_size = 1 |
| 37 | + autoscaling_enabled = true |
| 38 | + kubelet_extra_args = "--node-labels=kubernetes.io/lifecycle=normal" |
| 39 | + suspended_processes = "AZRebalance" |
| 40 | + }, |
| 41 | + { |
| 42 | + name = "spot-1" |
| 43 | + spot_price = "0.199" |
| 44 | + instance_type = "c4.xlarge" |
| 45 | + asg_max_size = 20 |
| 46 | + autoscaling_enabled = true |
| 47 | + kubelet_extra_args = "--node-labels=kubernetes.io/lifecycle=spot" |
| 48 | + suspended_processes = "AZRebalance" |
| 49 | + }, |
| 50 | + { |
| 51 | + name = "spot-2" |
| 52 | + spot_price = "0.20" |
| 53 | + instance_type = "m4.xlarge" |
| 54 | + asg_max_size = 20 |
| 55 | + autoscaling_enabled = true |
| 56 | + kubelet_extra_args = "--node-labels=kubernetes.io/lifecycle=spot" |
| 57 | + suspended_processes = "AZRebalance" |
| 58 | + } |
| 59 | + ] |
61 | 60 | ```
|
62 | 61 |
|
63 | 62 | ## Using Launch Templates
|
64 | 63 |
|
65 |
| -Launch Template support is a recent addition to both AWS and this module. It might not be as tried and tested. |
66 |
| - |
67 |
| -Example Terraform worker group configuration that use an ASG with a launch template: |
| 64 | +Launch Template support is a recent addition to both AWS and this module. It might not be as tried and tested but it's more suitable for spot instances as it allowed multiple instance types in the same worker group: |
68 | 65 |
|
69 | 66 | ```hcl
|
70 |
| -
|
71 |
| -worker_group_count = 1 |
72 |
| -
|
73 |
| -worker_groups = [ |
74 |
| - { |
75 |
| - name = "on-demand-1" |
76 |
| - instance_type = "m4.xlarge" |
77 |
| - asg_max_size = 10 |
78 |
| - autoscaling_enabled = true |
79 |
| - kubelet_extra_args = "--node-labels=spot=false" |
80 |
| - suspended_processes = "AZRebalance" |
81 |
| - } |
82 |
| -] |
83 |
| -
|
84 |
| -worker_group_launch_template_count = 1 |
85 |
| -
|
86 |
| -worker_groups_launch_template = [ |
87 |
| - { |
88 |
| - name = "spot-1" |
89 |
| - instance_type = "m5.xlarge" |
90 |
| - override_instance_type = "m4.xlarge" |
91 |
| - spot_instance_pools = 2 |
92 |
| - on_demand_percentage_above_base_capacity = 0 |
93 |
| - spot_max_price = "0.384" |
94 |
| - asg_max_size = 10 |
95 |
| - autoscaling_enabled = true |
96 |
| - kubelet_extra_args = "--node-labels=spot=true" |
97 |
| - } |
98 |
| -] |
| 67 | + worker_group_count = 1 |
| 68 | +
|
| 69 | + worker_groups = [ |
| 70 | + { |
| 71 | + name = "on-demand-1" |
| 72 | + instance_type = "m4.xlarge" |
| 73 | + asg_max_size = 10 |
| 74 | + autoscaling_enabled = true |
| 75 | + kubelet_extra_args = "--node-labels=spot=false" |
| 76 | + suspended_processes = "AZRebalance" |
| 77 | + } |
| 78 | + ] |
| 79 | +
|
| 80 | + worker_group_launch_template_mixed_count = 1 |
| 81 | +
|
| 82 | + worker_group_launch_template_mixed = [ |
| 83 | + { |
| 84 | + name = "spot-1" |
| 85 | + override_instance_type_1 = "m5.large" |
| 86 | + override_instance_type_2 = "c5.large" |
| 87 | + override_instance_type_3 = "t3.large" |
| 88 | + override_instance_type_4 = "r5.large" |
| 89 | + spot_instance_pools = 3 |
| 90 | + asg_max_size = 5 |
| 91 | + asg_desired_size = 5 |
| 92 | + autoscaling_enabled = true |
| 93 | + kubelet_extra_args = "--node-labels=kubernetes.io/lifecycle=spot" |
| 94 | + } |
| 95 | + ] |
99 | 96 | ```
|
100 | 97 |
|
101 | 98 | ## Important issues
|
|
0 commit comments