Skip to content

[KF-7803] Adding Github action for deploying on EKS#55

Merged
afgambin merged 45 commits intotrack/1.10from
kf-7803-gh-action-eks
Sep 9, 2025
Merged

[KF-7803] Adding Github action for deploying on EKS#55
afgambin merged 45 commits intotrack/1.10from
kf-7803-gh-action-eks

Conversation

@afgambin
Copy link
Contributor

@afgambin afgambin commented Jul 31, 2025

This PR addresses this Jira ticket: https://warthogs.atlassian.net/browse/KF-7803

For review: I have been following the same structure we have used for AKS + plus the Install guide from CKF docs. I am leaving some questions in the PR comments.

@afgambin afgambin changed the title Github action to deploy on EKS [KF-7803] Github action to deploy on EKS Jul 31, 2025
@afgambin
Copy link
Contributor Author

How can I test the workflow works without merging? I mean, I thought to run it manually under Actions tab, but for that, it needs to be already merged

@afgambin afgambin changed the title [KF-7803] Github action to deploy on EKS [KF-7803] Adding Github action for deploying on EKS Jul 31, 2025
@deusebio
Copy link
Contributor

How can I test the workflow works without merging? I mean, I thought to run it manually under Actions tab, but for that, it needs to be already merged

Just add it to the checks on pull request. Once you make sure it passes, you can disable the workflow on pull_request before merging.

I was doing this for TIOBE/TICS integration, e.g. mlflow-operator. If you see the history, in the last commit, I just remove the pull request trigger

Copy link
Contributor

@deusebio deusebio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some first feedbacks and guidance

Copy link
Contributor

@deusebio deusebio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the credentials and re-triggered the CI. It seems that it is progressing through the "Configure AWS credentials" and it is currently running the "Create EKS cluster" (finger crossed). Please keep an eye and check if the tests work fine, otherwise update the PR accordingly.

Overall, I believe the action here is almost good (we are missing the cleanup of the EKS cluster steps, here - also with the deletion of the volumes). However I have pointed a couple of improvements (v2 migration to v4, and adding inputs to allow testing of multiple releases), that I believe we can also address as separate ticket to keep well-perimetered PRs, and avoid scope creep.

However I feel this are important points that we could address straight away, after this PR is merged, still within the scope of the current epic.

@deusebio deusebio self-requested a review August 4, 2025 17:12
Copy link
Contributor

@deusebio deusebio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, the action seemed to failed. I checked the logs and there seems to possibly be an permission error (sometimes I saw this because of the confinemtn of the snap, which therefore does not find/see the aws executable). My suggesiton would be to also use the other action here to take inspiration.

More worringly, the current CI seems to miss the cleanup stanges here. Indeed, I can still see the dangling resources attached, which I'm now clearing manually, but make sure that you cover these steps

Copy link
Contributor

@NohaIhab NohaIhab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @afgambin, some comments from my side

@afgambin
Copy link
Contributor Author

afgambin commented Aug 20, 2025

I have been testing a couple of things but the error still remains. This seems to be the issue:

Terraform is trying to deploy a minio application inside the mlflow module. The minio charm referenced in that module doesn’t point to a valid OCI image.That definition comes from upstream, so we don’t control its OCI image.
As a result, Juju keeps the application in “creating…” state indefinitely.

Not sure how to solve this, at least not for a long-term solution. @NohaIhab @mvlassis

Co-authored-by: Manos Vlassis <57320708+mvlassis@users.noreply.github.com>
Signed-off-by: Angel Fernandez <103958447+afgambin@users.noreply.github.com>
Copy link
Contributor

@NohaIhab NohaIhab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @afgambin, great job! just a small comment on the cluster labels, probably left-over from previous workflow

@afgambin afgambin requested a review from NohaIhab September 2, 2025 08:50
Copy link
Contributor

@NohaIhab NohaIhab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @afgambin, truly valuable contribution!

@afgambin afgambin requested a review from deusebio September 2, 2025 09:43
@afgambin
Copy link
Contributor Author

afgambin commented Sep 2, 2025

@deusebio this is eventually passing all checks after Noha unblocked it, solving the issue with the mlflow-minio resource. Please have a look whenever you can for final approval, ty!

Copy link
Contributor

@deusebio deusebio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something fuzzy/inconsistent with the K8s version I feel. But overall, by default, we should test on 1.32, which is the LTS

@afgambin afgambin requested a review from deusebio September 5, 2025 11:33
Copy link
Contributor

@deusebio deusebio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thanks @afgambin !!

@afgambin afgambin merged commit 9ae42cd into track/1.10 Sep 9, 2025
6 checks passed
@afgambin afgambin deleted the kf-7803-gh-action-eks branch September 9, 2025 08:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants