-
Notifications
You must be signed in to change notification settings - Fork 626
gitlab-multi-runner config Sidekick "couldn't execute POST" connection refused #810
Description
Hi all,
First of all many thanks for the awesome work with rancher and the catalog items!
I am experiencing the following problem with the gitlab-multi-runner community catalog item:
When a new instance is started (in my case AWS spot instances), sometimes the "config" sidekick that registers the new runner fails with the following error message:
9.7.2018 17:11:34Running in system-mode.
9.7.2018 17:11:34
9.7.2018 17:11:39ERROR: Registering runner... failed runner=FKPDjL73 status=couldn't execute POST against [URL]: dial tcp: lookup gitlab.ambient-innovation.com on 169.254.169.250:53: read udp 10.42.109.212:55972->169.254.169.250:53: read: connection refused
9.7.2018 17:11:39PANIC: Failed to register this runner. Perhaps you are having network problems
This happens only sometimes on some of the newly started instances. When I start the config sidekick again, everything works fine.
My assumption is, that the sidekick executes the POST request a little too early, before rancher has fully built the network for the new instance. Or in other words the scheduler starts the new container (and its sidekick) before the network is fully ready. This might be related to rancher/rancher#2621
Does anyone have any idea on how to fix this or work around this? We shut down spot instances and start new ones very frequently, so this is really a problem and a manual solution (starting the failed sidekicks manually) is not an option for me. Any help is greatly appreciated.
Further information:
Rancher Server v1.6.18
Gitlab multi runner v10.4.0
The servers are AWS t2.medium instances and run on Rancher OS v1.2.0