Running Tinkerbell control plane in the cloud #344
Replies: 3 comments 1 reply
-
Hey @ThatsMrTalbo. Thanks for posting about this. This is a cool use case! There are definitely some places for Tinkerbell to improve here. Let me address these 2 issues. Problem 1 The Tink Agent does support using a DNS name to connect to the Tink Server. This would need to be configured in the ISO's extra kernel args with the TINKERBELL_IPXE_HTTP_SCRIPT_EXTRA_KERNEL_ARGS="tink_worker_image=ghcr.io/tinkerbell/tink-agent:latest grpc_authority=<DNS NAME HERE>:42113" A DNS record needs to be set up in your environment to point to the Tinkerbell deployment load balancer. No configuration is needed on the Tink Server side. The code you posted with With this said, there is a bug in the ISO serving code that prevents this extra kernel arg from being set properly. I will open a PR to fix that and post back here. Problem 2 You are correct that we don't have any straightforward mechanisms to update the ca-certificates that the Tink Agent uses. What can be done is another extra kernel arg to tell the Tink Agent not to verify the Tink Server's certificate chain and host name: TINKERBELL_IPXE_HTTP_SCRIPT_EXTRA_KERNEL_ARGS=tink_worker_image=ghcr.io/tinkerbell/tink-agent:latest grpc_authority=tinkerbell.192-168-2-50.nip.io:42113 tinkerbell_tls=true tinkerbell_insecure_tls=true Let me know if I didn't understand anything or if I can clarify any points. Thanks again for trying out Tinkerbell and bringing this up! |
Beta Was this translation helpful? Give feedback.
-
Here's the PR to fix the extra kernel parameters ordering issue. #357 |
Beta Was this translation helpful? Give feedback.
-
Thanks for the response! I'll build/deploy the main branch today and run some experiments 🙂 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
👋 Hiya!
I have been experimenting with running a Tinkerbell instance in the cloud in order to boot on-prem nodes using the hook ISO without the need to pre-provision on on-prem Tinkerbell instance.
In this setup there is a VPN between on prem and the cloud in order for Tinkerbell to perform BMC operations securely and for the machines to talk back to the Tinkerbell ports.
The general idea is to use the Smee ISO serving capabilities to provide the Hook image that the node will boot, instead of netbooting.
I have hit a few issues and wondered if this a road anyone else has gone down? Is this a model Tinkerbell even wants to support in the future?
Problem 1 - Smee does not support using DNS for the GRPC API
When Smee generates the Hook ISO it always uses an IP address for the Tink server, even if a DNS entry is provided via
TINKERBELL_IPXE_SCRIPT_TINK_SERVER_ADDR_PORT
. Since in cloud environments load balancers can have multiple IPs for multi-zonal resiliency being able to use a DNS entry for the Hook -> Tink communication would be ideal.Looking at the code it behaves this way because it uses "splitHostPort" to split the TinkServers host and port, then discards the host portion.
If it were to only default to the IP if the host portion was unset (of have an option to do so), then the DNS name is correctly embedded in the ISO, something like:
Problem 2 - CA Certificates not in the tink-agent docker image
Running in the cloud allows us to handle TLS termination at the load balancer level. The offical Tinkerbell chart even exposes a flag telling tink-agent to use TLS. However the tink-agent image does not have ca-certificates inside it, so can't talk to the GPRC API if it is behind TLS in any case. It looks like all the actions include ca-certificates, so maybe I'm missing something here.
There is another corner case here that GRPC requires HTTP2, so if you force this on your layer 4 load balancer you also need to enable
UnencryptedHTTP2
in the various HTTP servers Tinkerbell exposes.Beta Was this translation helpful? Give feedback.
All reactions