[KF-7803] Adding Github action for deploying on EKS#55
Conversation
|
How can I test the workflow works without merging? I mean, I thought to run it manually under Actions tab, but for that, it needs to be already merged |
Just add it to the checks on pull request. Once you make sure it passes, you can disable the workflow on pull_request before merging. I was doing this for TIOBE/TICS integration, e.g. mlflow-operator. If you see the history, in the last commit, I just remove the pull request trigger |
deusebio
left a comment
There was a problem hiding this comment.
Some first feedbacks and guidance
There was a problem hiding this comment.
I have added the credentials and re-triggered the CI. It seems that it is progressing through the "Configure AWS credentials" and it is currently running the "Create EKS cluster" (finger crossed). Please keep an eye and check if the tests work fine, otherwise update the PR accordingly.
Overall, I believe the action here is almost good (we are missing the cleanup of the EKS cluster steps, here - also with the deletion of the volumes). However I have pointed a couple of improvements (v2 migration to v4, and adding inputs to allow testing of multiple releases), that I believe we can also address as separate ticket to keep well-perimetered PRs, and avoid scope creep.
However I feel this are important points that we could address straight away, after this PR is merged, still within the scope of the current epic.
deusebio
left a comment
There was a problem hiding this comment.
unfortunately, the action seemed to failed. I checked the logs and there seems to possibly be an permission error (sometimes I saw this because of the confinemtn of the snap, which therefore does not find/see the aws executable). My suggesiton would be to also use the other action here to take inspiration.
More worringly, the current CI seems to miss the cleanup stanges here. Indeed, I can still see the dangling resources attached, which I'm now clearing manually, but make sure that you cover these steps
|
I have been testing a couple of things but the error still remains. This seems to be the issue: Terraform is trying to deploy a Not sure how to solve this, at least not for a long-term solution. @NohaIhab @mvlassis |
Co-authored-by: Manos Vlassis <57320708+mvlassis@users.noreply.github.com> Signed-off-by: Angel Fernandez <103958447+afgambin@users.noreply.github.com>
|
@deusebio this is eventually passing all checks after Noha unblocked it, solving the issue with the mlflow-minio resource. Please have a look whenever you can for final approval, ty! |
deusebio
left a comment
There was a problem hiding this comment.
There is something fuzzy/inconsistent with the K8s version I feel. But overall, by default, we should test on 1.32, which is the LTS
This PR addresses this Jira ticket: https://warthogs.atlassian.net/browse/KF-7803
For review: I have been following the same structure we have used for AKS + plus the Install guide from CKF docs. I am leaving some questions in the PR comments.