Skip to content

Commit 1b0fb0a

Browse files
Carus11saschjmil
andauthored
feat: experimental spot node pools (#451)
* feat: Add configuration options for spot node pools Signed-off-by: Carus Kyle <[email protected]> * docs: update community vars to include spot node pool feature Signed-off-by: Carus Kyle <[email protected]> * fix: update the spot node pool var names Signed-off-by: Carus Kyle <[email protected]> * fix: fix to pass community var to real vars Signed-off-by: Carus Kyle <[email protected]> * docs: Update docs/community/community_config_vars.md Co-authored-by: Chris Miller <[email protected]> * docs: add details on spot node pools and risks Signed-off-by: Carus Kyle <[email protected]> * DCO Remediation Commit for Carus Kyle <[email protected]> I, Carus Kyle <[email protected]>, hereby add my Signed-off-by to this commit: 62600d3 Signed-off-by: Carus Kyle <[email protected]> * docs: add link to azure documentation on spot node pools Signed-off-by: Carus Kyle <[email protected]> * fix: add validation for the new spot variables Signed-off-by: Carus Kyle <[email protected]> * chore: correct captialization for spot node variable validations Signed-off-by: saschjmil <[email protected]> --------- Signed-off-by: Carus Kyle <[email protected]> Signed-off-by: saschjmil <[email protected]> Co-authored-by: Chris Miller <[email protected]> Co-authored-by: chjmil <[email protected]>
1 parent fb32592 commit 1b0fb0a

File tree

5 files changed

+70
-6
lines changed

5 files changed

+70
-6
lines changed

docs/community/community_config_vars.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,18 @@ Community-contributed configuration variables are listed in the tables below. Th
1212
<a name="spot_nodes"></a>
1313
## Spot Nodes
1414

15-
Here is some information about spot nodes.
15+
Spot Nodes allow you to run Azure Kubernetes Service (AKS) workloads on low-cost, surplus compute capacity offered by Azure. These Spot Virtual Machines (VMs) can significantly reduce infrastructure costs, especially for workloads that are fault-tolerant or batch-oriented or temporary lab environments. However, Spot VMs can be preempted (evicted) by Azure at any time if the capacity is needed elsewhere, which makes them less suitable for critical or stateful workloads.
1616

17-
Here is a warning about why they might cause issues.
17+
For further information, see https://learn.microsoft.com/en-us/azure/aks/spot-node-pool
1818

19-
Here is a table with the variables you would use to configure them
19+
> [!CAUTION]
20+
> Spot nodes can be evicted with little notice. They are best used for non-production, non-critical workloads or for scenarios where cost savings outweigh the risk of eviction. This is a configuration not supported by SAS Technical Support. Monitor eviction rates and ensure your workloads can tolerate sudden node loss.
21+
22+
To enable a Spot node pool in your AKS cluster using this module, configure the community-maintained variables listed below. These options customize the behavior of the Spot node pool and its underlying virtual machine scale set.
2023

2124
| Name | Description | Type | Default | Release Added | Notes |
2225
| :--- | ---: | ---: | ---: | ---: | ---: |
23-
| contrib_enable_spot_nodes | Enable spot nodes | bool | false | 10.3.0 | |
26+
| community_priority | (Optional) The Priority for Virtual Machines within the Virtual Machine Scale Set that powers this Node Pool. Possible values are Regular and Spot. Defaults to Regular. Changing this forces a new resource to be created. | string | `Regular` | 10.3.0 | Changing this to Spot enables the Spot node pool functionality |
27+
| community_eviction_policy | (Optional) The Eviction Policy which should be used for Virtual Machines within the Virtual Machine Scale Set powering this Node Pool. Possible values are Deallocate and Delete. Changing this forces a new resource to be created. | string | `Delete` | 10.3.0 | |
28+
| community_spot_max_price | (Optional) The maximum price you're willing to pay in USD per Virtual Machine. Valid values are -1 (the current on-demand price for a Virtual Machine) or a positive value with up to five decimal places. Changing this forces a new resource to be created. | string | `-1` | 10.3.0 | |
29+

main.tf

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,10 @@ module "node_pools" {
216216
host_encryption_enabled = var.aks_cluster_enable_host_encryption
217217
tags = var.tags
218218
linux_os_config = each.value.linux_os_config
219+
community_priority = each.value.community_priority
220+
community_eviction_policy = each.value.community_eviction_policy
221+
community_spot_max_price = each.value.community_spot_max_price
222+
219223
}
220224

221225
# https://registry.terraform.io/providers/hashicorp/azurerm/latest/docs/resources/postgresql_flexible_server

modules/aks_node_pool/main.tf

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ resource "azurerm_kubernetes_cluster_node_pool" "autoscale_node_pool" {
2525
node_taints = var.node_taints
2626
orchestrator_version = var.orchestrator_version
2727
tags = var.tags
28+
priority = var.community_priority
29+
eviction_policy = var.community_eviction_policy
30+
spot_max_price = var.community_spot_max_price
2831

2932
lifecycle {
3033
ignore_changes = [node_count]
@@ -58,7 +61,9 @@ resource "azurerm_kubernetes_cluster_node_pool" "static_node_pool" {
5861
node_taints = var.node_taints
5962
orchestrator_version = var.orchestrator_version
6063
tags = var.tags
61-
64+
priority = var.community_priority
65+
eviction_policy = var.community_eviction_policy
66+
spot_max_price = var.community_spot_max_price
6267
linux_os_config {
6368
sysctl_config {
6469
vm_max_map_count = try(var.linux_os_config.sysctl_config.vm_max_map_count,null)

modules/aks_node_pool/variables.tf

Lines changed: 46 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,6 +116,51 @@ variable "proximity_placement_group_id" {
116116
default = ""
117117
}
118118

119+
variable "community_priority" {
120+
description = "(Optional) The Priority for Virtual Machines within the Virtual Machine Scale Set that powers this Node Pool. Possible values are Regular and Spot. Defaults to Regular. Changing this forces a new resource to be created."
121+
type = string
122+
default = "Regular"
123+
validation {
124+
condition = contains(["Regular", "Spot"], var.community_priority)
125+
error_message = "ERROR: community_priority must be either 'Regular' or 'Spot'."
126+
}
127+
}
128+
129+
variable "community_eviction_policy" {
130+
description = "(Optional) The Eviction Policy which should be used for Virtual Machines within the Virtual Machine Scale Set powering this Node Pool. Possible values are Deallocate and Delete. Changing this forces a new resource to be created."
131+
type = string
132+
default = null
133+
validation {
134+
condition = var.community_eviction_policy == null || (
135+
contains(["Delete", "Deallocate"], coalesce(var.community_eviction_policy, "Delete"))
136+
)
137+
error_message = "ERROR: When specified, community_eviction_policy must be either 'Delete' or 'Deallocate'."
138+
}
139+
validation {
140+
condition = var.community_eviction_policy == null || coalesce(var.community_priority, "Regular") == "Spot"
141+
error_message = "ERROR: community_eviction_policy can only be specified when community_priority is set to 'Spot'."
142+
}
143+
}
144+
145+
146+
variable "community_spot_max_price" {
147+
description = "(Optional) The maximum price you're willing to pay in USD per Virtual Machine. Valid values are -1 (the current on-demand price for a Virtual Machine) or a positive value with up to five decimal places. Changing this forces a new resource to be created."
148+
type = string
149+
default = null
150+
validation {
151+
condition = var.community_spot_max_price == null || (
152+
can(tonumber(coalesce(var.community_spot_max_price, "-1"))) &&
153+
(tonumber(coalesce(var.community_spot_max_price, "-1")) == -1 ||
154+
tonumber(coalesce(var.community_spot_max_price, "-1")) > 0)
155+
)
156+
error_message = "ERROR: When specified, community_spot_max_price must be either '-1' or a positive number with up to five decimal places."
157+
}
158+
validation {
159+
condition = var.community_spot_max_price == null || coalesce(var.community_priority, "Regular") == "Spot"
160+
error_message = "ERROR: community_spot_max_price can only be specified when community_priority is set to 'Spot'."
161+
}
162+
}
163+
119164
variable "linux_os_config"{
120165
description = "Specifications of linux os config. Changing this forces a new resource to be created."
121166
type = object({
@@ -124,4 +169,4 @@ variable "linux_os_config"{
124169
}))
125170
})
126171
default = {}
127-
}
172+
}

variables.tf

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -568,11 +568,15 @@ variable "node_pools" {
568568
max_pods = string
569569
node_taints = list(string)
570570
node_labels = map(string)
571+
community_priority = optional(string, "Regular")
572+
community_eviction_policy = optional(string)
573+
community_spot_max_price = optional(string)
571574
linux_os_config = optional(object({
572575
sysctl_config = optional(object({
573576
vm_max_map_count = optional(number)
574577
}))
575578
}))
579+
576580
}))
577581

578582
default = {

0 commit comments

Comments
 (0)