-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
TL;DR
We have a default-deny egress rule enforced. Without the dependency declaration, a node pool will be created first and if a initial node count bigger than zero is specified, it will never get ready because the created node cannot talk to the control plane master.
We met this issue after upgrading the google provider to v6.15.0. My guess is it's caused by hashicorp/terraform-provider-google#20738, which limits the concurrency, making the dependency issue prominent.
Expected behavior
google_compute_firewall.intra_egress should be created before any google_container_node_pool.pools is created.
Observed behavior
google_compute_firewall.intra_egress is not created when a google_container_node_pool.pools is being created, causing the creation being stuck.
The node registration failed with the following logs:
------------------------------
Service DNS Reachable
------------------------------
LOGGING true true
GCR true true
GCS true true
Master N/A false
------------------------------
I'm able to solve this issue if I create a firewall rule allowing egress to master CIDR for GKE nodes and then apply
Terraform Configuration
// By default deny all egress traffic -- applied to all instances
resource "google_compute_firewall" "deny_all_egress" {
name = "rule"
network = module.vpc.network_self_link
project = var.gcp_project_id
direction = "EGRESS"
priority = 65534
log_config {
metadata = "INCLUDE_ALL_METADATA"
}
deny {
protocol = "all"
}
destination_ranges = ["0.0.0.0/0"]
}
// Call this module with at least one node pool with initial count > 0
module "gke" {
source = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster"
version = "35.0.1"
...
node_pools = [
{
...
min_count = 0
max_count = 100
initial_node_count = 1
...
}
]
}
}Terraform Version
1.5.7Additional information
No response