Replies: 1 comment
-
Hi @Spivur, I've investigated this issue and the intermittent nature suggests it might be related to external factors rather than a bug in the module code. Here are some debugging steps to help identify the root cause: Debugging Steps1. Check Terraform ParallelismThe issue might be related to too many concurrent API calls. Try reducing parallelism: terraform apply -parallelism=1 2. Enable Debug LoggingSet these environment variables to get more detailed logs: export TF_LOG=DEBUG
export HCLOUD_DEBUG=true
terraform apply 2>&1 | tee terraform-debug.log 3. Check Hetzner Cloud Console
4. Terraform State InvestigationAfter a failed deployment, check which resources were created: terraform state list | grep -E "(server|network)"
terraform state show module.kube_hetzner.module.agents 5. Manual Network Attachment TestIf a node is missing its private IP, try manually attaching it: # Get the server ID of the problematic node
hcloud server list
# Get the network ID
hcloud network list
# Try to attach manually
hcloud server attach-to-network <SERVER_ID> --network <NETWORK_ID> --ip <IP_ADDRESS> 6. Check Resource LimitsVerify you're not hitting any Hetzner account limits:
Potential Workarounds1. Sequential CreationForce sequential creation of agent nodes by adding explicit dependencies: agent_nodepools = [
{
name = "agent"
server_type = "cax11"
location = "fsn1"
labels = []
taints = []
count = 1
placement_group = "agent-1"
},
{
name = "agent"
server_type = "cax11"
location = "fsn1"
labels = []
taints = []
count = 1
placement_group = "agent-2"
},
{
name = "agent"
server_type = "cax11"
location = "fsn1"
labels = []
taints = []
count = 1
placement_group = "agent-3"
}
] 2. Remove Placement GroupsTry deploying without placement groups first: # Comment out or remove placement_group = "default" 3. Different Network ConfigurationTry using a different network CIDR or region to rule out network-specific issues. Information NeededCould you please provide:
This information will help identify whether it's a module issue or an infrastructure/API problem. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Description
We are currently trying to deploy a simple 4-node cluster. Since July 30th, however, one of the nodes consistently fails to receive a private IP address.
We deploy the cluster using the GitLab OpenTofu component through a GitLab CI/CD pipeline. We have also tested the deployment locally on three different devices. On two devices (WSL, Mac) the cluster deploys successfully without issues, but on one device (Mac) we observe the same problem as in the pipelines.
We have already verified that the issue is not related to the latest version update. Additionally, we have deployed four regular nodes attached to a private network, and this works reliably on all devices and the pipeline.
There is no error message, the deployment simply times out after a while.
We have also tried the deployment without the cilium config.
Kube.tf file
Screenshots
No response
Platform
Linux, Mac
Beta Was this translation helpful? Give feedback.
All reactions