Replies: 1 comment
-
Using ansible for "Day 2" activities is indeed appealing and should be considered. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Motiviation
During preparing PR: #778, I came across a lot of ssh timeout issues, as well as connections drops from hetzner which caused the terraform provisioners to fail and therefore the whole execution always needed a restart and never finished with "one click" in my case.
This is cumbersome and time-consuming for the endusers as it requires to deep dive for investigating the potential root cause of the installation failures.
Also to mention, I experienced most of the failures at the final steps at "null resources kustomization" in the
init.tf
doing the kubernetes manifest deployment of day1 and some of the day2 resources.Proposals
1. Use of a more mature tool (like ansible) for doing the configurations and let terraform just to the infrastructure creation and management
The terraform docs itself actually don't recommend to use provisioners heavily (https://developer.hashicorp.com/terraform/language/resources/provisioners/file), just as a last resort:
I would propose to use Ansible
It is much more mature with configuration activities, remote node management, ssh connectivities, and even allows retries :).
Yes, it would be another tool and dependency. But we could put Ansible and Terraform together in a nice container image, and let the user just run the container locally to do the installation. With that it would also improve the stability of the installation as we could make sure that for a terraform and ansible version works and was tested for a certain release.
2. Create a "install.sh" script
Currently the users need to execute multiple chained steps in order to creae a cluster:
When going for Proposal 1, most of them could be coverd by Ansible, but then still it would be nice to have a single "install.sh" which starts the container image doing the whole installation with one click.
Without Proposal 1, I would see at least to bundle, the snapshot creation + terraform execution in an script. Both grepping for output messages of terraform and in case restarting the installation automatically.
3. Split of Day1 and Day2 activities
Right now, after provisioning of the nodes and setup of the k3s clusters, finally a bunch of kustomizations get executed.
We can basically group them in:
As I wrote at the beginning, that this is the place where the installer fails most of the time caused by connection issues (at least in my case), I would at minimum split the "null_resources" and separate ones base on the category and just adapt the trigers to reduce the failure vector.
Furthermore I would argue the things like Ingress configurations are already a day2 activite which I personally always take care anyways via GitOps (using ArgoCD). In my opinion the installation of new cluster should just cover all day1 activites, so the bare minimum which is requied to run that k3s cluster on that infrastructure.
So I would for sure agree to:
as Day1 activity
System upgrade-controller and kured is a bit greayish but would also see them as part of the base features of this project.
However, the ingress I would propose to kick out or at least make configurable to not deploy any at all, and therefore also not run the provisioners for them as not necessary.
What do you think about this ideas? Let's brainstorm together. 👍
Beta Was this translation helpful? Give feedback.
All reactions