feat: GKE & hardened environment/hardened support#1337
feat: GKE & hardened environment/hardened support#1337jpaodev wants to merge 9 commits intomondoohq:mainfrom
Conversation
… on GKE autopilot
|
All contributors have signed the CLA ✍️ ✅ |
|
Thank you @jpaodev We are doing a few changes right now, were refactor the operator and then also add scanning support for external clusters, so that you can use WIF and SPIFFE to scan external clusters. Once that is in (hopefully by end of this week). We have another look into your changes to see how we can include them. |
|
I have read the Mondoo CLA Document and I hereby sign the CLA |
Thanks for the answer! :) - Sounds good! If there's anything that I can do / test let me know!:) I just went through the diff again and the Mondoo Audit Config CRD changes would e.g. Maybe useful information: Also really looking forward to the |
|
I've been running this without any issues for the last couple of days, but no scans kept coming in, unless I scaled down the |
|
@jpaodev We've done a lot of refactoring on the main as we prepare the next major release. Do you like to rebase this PR on main? Probably easier to port the problem over than pure rebasing. If you have no time, we are going to do that to get your improvements in. |
Hi there, I guess easiest way would be to sync my |
|
@chris-rock looks good to me now. Took me a second to understand the 2 deployments (scanapi and the webhook one are now completely gone) Notes:
I think otherwise everything looks pretty okay, note double-check is appreciated, as the merge was a tad harder for my brain than expected 😆 Thanks a lot for looking into this!:) |
|
@chris-rock were you also able to take a quick look at this article: Official Google Docs Link: Run privileged workloads from GKE Autopilot partners ? That would definitely be a gamechanger for GKE Autopilot users, if they could run mondoo node scanning with this |
|
I started to extract some PRs by cherry picking commits. This makes review and scope for the changes more clear. |
Hey there, that's amazing! Big thanks! Also the Example configuration: |
|
Agreed, the no proxy config is super interessting and I like the mirror setup. I need to see if we actually want to deprecate the image config since those would be two different things. Let me play with the proxy setup a bit. Overall I find this a super cool feature @jpaodev. Big thank you for bringing in this great contribution. |
|
I am going to close this PR as all parts have been merged separately. Thank you @jpaodev |
Hello folks at Mondoo!
First of all, big thanks for your efforts in the Cybersecurity space and making the world safer! 😊
I can very happily announce that I have successfully gotten the mondoo operator to work on my GKE Autopilot cluster, which is a huge thing for me!
What does this pull request introduce
This PR introduces support for GKE autopilot compatibility and several features that are important for hardened environment (I wanted to say airgapped, but not really 100% airgapped, rather limited network connectivity) usage.
Overall this PR will hopefully be attractive not only for GKE usage but also for other Kubernetes environments.
Overall this PR makes the mondoo operator more flexible / suitable for hardened environment usage.
It introduces the
MondooOperatorConfigCRD.What should this Pull Request fix?
High Priority GKE Warning (Reliability Issue):
mondoo-operator-mondoo-client-mondoo.kube-systemnamespace," which GKE warns can impact the availability of the GKE Control Plane.kube-systembeing explicitly listed in thenamespacedenylistparameter during the Mondoo operator installation command (namespacedenylist=kube-system,...). This suggests the denial list is not being honored or is misconfigured within the Mondoo operator's webhook configuration.Mondoo Operator Configuration Limitations:
MondooOperatorConfigCRD.Private Registry Mirror Configuration for Mondoo Containers not possible:
Mondoo Operator Controller Manager Pod Unhealthiness (despite patching):
mondoo-operator-controller-managerpod (e.g.,mondoo-operator-controller-manager-678d84c58b-2kpfx) is in a critical unhealthy state.Reason: ErrorandExit Code: 2.Ineffectiveness of Custom Patching Script(s) (Partial Success):
imagePullSecret, rewrites the container image to the private image registry, and injecting the private network proxy (e.g. corporate proxy) environment variables (http_proxy,https_proxy,no_proxy), it does not resolve the underlying issue causing themondoo-operator-controller-managerpod to continuously crash and fail its health probes. The user notes, "sadly the replica still seems to mess things up for some reason."Ephemeral storage requests / limits not set causing container scans to fail:
mondoo-client-containers-scan-testpods were failing on GKE autopilot, because a limit of max 1GiB ephemeral storage is imposed onto pods that do not have explicit ephemeral storage limits defined. (Reference). To mitigate this issue, requests for ephemeral storage of 2Gi and limits of 5Gi have been set inpkg/utils/k8s/resources_requirements.go.Important Information
This PR sadly does not cover anything related to Node Scanning, because GKE Autopilot heavily restricts what is able to run on the cluster. Don't fret - this can be fixed: "Google Cloud’s Autopilot partner workload support, this has changed. Users can now explicitly allow trusted workloads through a declarative AllowlistSynchronizer configuration. Kubescape is among the first open-source tools to make use of this capability." Source / Interesting Read.
Note: If there is anything missing in the PR or if you have any suggestions, please let me know!
Tested? Yes/No?
Everything has been tested on a GKE Autopilot cluster running K8S version
v1.33.5-gke.2100000.It is currently actively being used on a GKE Autopilot cluster.
Finishing
If any further information is required or changes are required, happy to hear from you!
Thanks a lot in advance for review / integration.