Skip to content

feat: GKE & hardened environment/hardened support#1337

Closed
jpaodev wants to merge 9 commits intomondoohq:mainfrom
jpaodev:feat-operator-config
Closed

feat: GKE & hardened environment/hardened support#1337
jpaodev wants to merge 9 commits intomondoohq:mainfrom
jpaodev:feat-operator-config

Conversation

@jpaodev
Copy link
Contributor

@jpaodev jpaodev commented Jan 25, 2026

Hello folks at Mondoo!

First of all, big thanks for your efforts in the Cybersecurity space and making the world safer! 😊

I can very happily announce that I have successfully gotten the mondoo operator to work on my GKE Autopilot cluster, which is a huge thing for me!

What does this pull request introduce

This PR introduces support for GKE autopilot compatibility and several features that are important for hardened environment (I wanted to say airgapped, but not really 100% airgapped, rather limited network connectivity) usage.

Overall this PR will hopefully be attractive not only for GKE usage but also for other Kubernetes environments.
Overall this PR makes the mondoo operator more flexible / suitable for hardened environment usage.

It introduces the MondooOperatorConfig CRD.

# Mondoo Operator Configuration
# These are applied to the MondooOperatorConfig custom resource
operator:
  # Create MondooOperatorConfig resource
  # Set to false on first install, then true on upgrade after CRDs are installed
  createConfig: true
  
  # HTTP proxy for outbound connections to Mondoo Platform
  # Example: "http://proxy.example.com:8080"
  httpProxy: "http://secure-egress-proxy.example.com:3128"
  
  # HTTPS proxy for outbound connections to non-internal sites
  httpsProxy: "http://secure-egress-proxy.example.com:3128"
  
  # Comma-separated list of hosts that should bypass the proxy
  # Example: "localhost,127.0.0.1,.svc,.cluster.local,.internal"
  # Whereas here example.com would be the internal network domain
  # On no_proxy, since internal network connections don't require a proxy
  noProxy: "localhost,172.16.0.0/12,10.0.0.0/8,100.64.0.0/10,192.168.0.0/16,metadata.google.internal,.example.com,api.mondoo.example.com,mondoo.example.com,.mondoo.example.com,.svc,.svc.cluster.local,.cluster.local"
  
  # Container proxy for pulling container images (used in container image scanning)
  # Example: "http://proxy.example.com:8080"
  containerProxy: ""
  
  # Image pull secrets for pulling Mondoo images from private registries
  # Uncomment and configure if using a private registry
  imagePullSecrets:
    - name: registry-secret
  #   - name: ghcr-secret
  
  # Custom image registry prefix for corporate registries (deprecated, use registryMirrors)
  # This rewrites all image references to use your corporate registry
  # Example: "registry.example.com/ghcr.io.docker"
  imageRegistry: ""
  
  # Registry mirrors for mapping public registries to private mirrors
  # Uncomment and configure for your corporate registry
  registryMirrors:
    ghcr.io: registry.example.com/ghcr.io.docker
    hub.docker.com: registry.example.com/hub.docker.com
    quay.io: registry.example.com/quay.io
  
  # Skip proxy settings for cnspec-based components (scan-api, container scanning)
  # Set to true when using an internal Mondoo API mirror that doesn't require proxy
  # This is very important, because cnspec does not seem to respect no_proxy env variable
  # and the connection to the internal network Mondoo would fail without this setting
  skipProxyForCnspec: true
  
  # Skip container image resolution (useful for air-gapped environments or faster testing)
  skipContainerResolution: false

What should this Pull Request fix?

  • High Priority GKE Warning (Reliability Issue):

    • A GKE "High Priority" warning titled "Update webhook intercepting system requests" has been triggered.
    • The specific webhook causing this is mondoo-operator-mondoo-client-mondoo.
    • Problem: This webhook is "Intercepting resources in the kube-system namespace," which GKE warns can impact the availability of the GKE Control Plane.
    • Contradiction/Root Cause Hint: This occurs despite kube-system being explicitly listed in the namespacedenylist parameter during the Mondoo operator installation command (namespacedenylist=kube-system,...). This suggests the denial list is not being honored or is misconfigured within the Mondoo operator's webhook configuration.
    • Disclaimer: Fix not (yet) confirmed!
  • Mondoo Operator Configuration Limitations:

    • The user explicitly states that "pullSecrets and proxy images and proxy settings (corporate proxy) are not configurable in Mondoo for the k8s setup." This design limitation necessitates a custom patching script. -> now fixed through MondooOperatorConfig CRD.
  • Private Registry Mirror Configuration for Mondoo Containers not possible:

    • Private registry mirrors e.g. for ghcr.io are not configurable for the operator.
  • Mondoo Operator Controller Manager Pod Unhealthiness (despite patching):

    • The mondoo-operator-controller-manager pod (e.g., mondoo-operator-controller-manager-678d84c58b-2kpfx) is in a critical unhealthy state.
    • Termination Reason: The container repeatedly terminates with Reason: Error and Exit Code: 2.
    • Back-off Loop: Kubernetes is continuously attempting to restart the failed container, leading to a "Back-off restarting failed container manager" state.
  • Ineffectiveness of Custom Patching Script(s) (Partial Success):

    • While bash scripts could've been used to apply the necessary imagePullSecret, rewrites the container image to the private image registry, and injecting the private network proxy (e.g. corporate proxy) environment variables (http_proxy, https_proxy, no_proxy), it does not resolve the underlying issue causing the mondoo-operator-controller-manager pod to continuously crash and fail its health probes. The user notes, "sadly the replica still seems to mess things up for some reason."
  • Ephemeral storage requests / limits not set causing container scans to fail:

    • The mondoo-client-containers-scan-test pods were failing on GKE autopilot, because a limit of max 1GiB ephemeral storage is imposed onto pods that do not have explicit ephemeral storage limits defined. (Reference). To mitigate this issue, requests for ephemeral storage of 2Gi and limits of 5Gi have been set in pkg/utils/k8s/resources_requirements.go.
    • Disclaimer: Fix not (yet) confirmed!

Important Information

  • This PR sadly does not cover anything related to Node Scanning, because GKE Autopilot heavily restricts what is able to run on the cluster. Don't fret - this can be fixed: "Google Cloud’s Autopilot partner workload support, this has changed. Users can now explicitly allow trusted workloads through a declarative AllowlistSynchronizer configuration. Kubescape is among the first open-source tools to make use of this capability." Source / Interesting Read.

    • Note: It looks like Mondoo needs to actively act and I suppose probably approach Google regarding this topic. "GKE allows a subset of approved partners to run privileged workloads in Autopilot clusters." Official Google Docs Link: Run privileged workloads from GKE Autopilot partners - That would be really cool!
  • Note: If there is anything missing in the PR or if you have any suggestions, please let me know!

Tested? Yes/No?

Everything has been tested on a GKE Autopilot cluster running K8S version v1.33.5-gke.2100000.

It is currently actively being used on a GKE Autopilot cluster.

Finishing

If any further information is required or changes are required, happy to hear from you!

Thanks a lot in advance for review / integration.

@github-actions
Copy link

github-actions bot commented Jan 25, 2026

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@chris-rock
Copy link
Member

Thank you @jpaodev We are doing a few changes right now, were refactor the operator and then also add scanning support for external clusters, so that you can use WIF and SPIFFE to scan external clusters. Once that is in (hopefully by end of this week). We have another look into your changes to see how we can include them.

@jpaodev
Copy link
Contributor Author

jpaodev commented Jan 26, 2026

I have read the Mondoo CLA Document and I hereby sign the CLA

@jpaodev
Copy link
Contributor Author

jpaodev commented Jan 26, 2026

Thank you @jpaodev We are doing a few changes right now, were refactor the operator and then also add scanning support for external clusters, so that you can use WIF and SPIFFE to scan external clusters. Once that is in (hopefully by end of this week). We have another look into your changes to see how we can include them.

Thanks for the answer! :) - Sounds good! If there's anything that I can do / test let me know!:)

I just went through the diff again and the Mondoo Audit Config CRD changes would e.g. charts/mondoo-operator/crds/mondooauditconfig-crd.yaml would have to be double checked. I used make generate to generate the CRDs. Primary focus of the PR is the Mondoo Operator Config.

Maybe useful information:
When trying to delete the mondoo-operator namespace then finalizers should be cleaned up as well to get a quick deletion:

helm uninstall mondoo-operator -n mondoo-operator
kubectl delete mondooauditconfigs.k8s.mondoo.com mondoo-client -n mondoo-operator
#....  Delete CRDs of mondoo ....
# This line is important to get a quick deletion, otherwise the mondoo-operator namespace would not delete
kubectl patch mondooauditconfigs.k8s.mondoo.com mondoo-client -n mondoo-operator -p '{"metadata":{"finalizers":null}}' --type=merge

Also really looking forward to the AllowlistSynchronizer for GKE Autopilot! Hopefully everything works out there with Google! 🎉

@jpaodev
Copy link
Contributor Author

jpaodev commented Jan 29, 2026

I've been running this without any issues for the last couple of days, but no scans kept coming in, unless I scaled down the mondoo-operator-controller-manager deployment and ran the cronjob with it scaled down.
No matter if spawned by me or actually by the cronjob schedule, it just kept killing off the pod. The latest commit seems to fix this! :-)

@chris-rock
Copy link
Member

@jpaodev We've done a lot of refactoring on the main as we prepare the next major release. Do you like to rebase this PR on main? Probably easier to port the problem over than pure rebasing. If you have no time, we are going to do that to get your improvements in.

@jpaodev
Copy link
Contributor Author

jpaodev commented Feb 3, 2026

@jpaodev We've done a lot of refactoring on the main as we prepare the next major release. Do you like to rebase this PR on main? Probably easier to port the problem over than pure rebasing. If you have no time, we are going to do that to get your improvements in.

Hi there, I guess easiest way would be to sync my main with your latest changes, split it off, and merge my changes into that branch, and create a PR? That's how I would normally do it at least and so far that's always been somewhat relaxing

@jpaodev
Copy link
Contributor Author

jpaodev commented Feb 3, 2026

@chris-rock looks good to me now. Took me a second to understand the 2 deployments (scanapi and the webhook one are now completely gone)

Notes:

I think otherwise everything looks pretty okay, note double-check is appreciated, as the merge was a tad harder for my brain than expected 😆

Thanks a lot for looking into this!:)

@jpaodev
Copy link
Contributor Author

jpaodev commented Feb 3, 2026

@chris-rock were you also able to take a quick look at this article: Official Google Docs Link: Run privileged workloads from GKE Autopilot partners ? That would definitely be a gamechanger for GKE Autopilot users, if they could run mondoo node scanning with this

@github-actions
Copy link

github-actions bot commented Feb 5, 2026

Test Results

0 files   -   5  0 suites   - 41   0s ⏱️ - 36m 14s
0 tests  - 280  0 ✅  - 280  0 💤 ±0  0 ❌ ±0 
0 runs   - 295  0 ✅  - 295  0 💤 ±0  0 ❌ ±0 

Results for commit b15d7d9. ± Comparison against base commit 3d2cea2.

@chris-rock
Copy link
Member

I started to extract some PRs by cherry picking commits. This makes review and scope for the changes more clear.

@chris-rock
Copy link
Member

@jpaodev First things have been merged in #1375, #1376 and #1377. The last part missing the Operator-wide proxy config. I am currently rebasing the left changes and cover feedback on top of your commits then.

@jpaodev
Copy link
Contributor Author

jpaodev commented Feb 6, 2026

@jpaodev First things have been merged in #1375, #1376 and #1377. The last part missing the Operator-wide proxy config. I am currently rebasing the left changes and cover feedback on top of your commits then.

Hey there, that's amazing! Big thanks!
Note regarding the proxy config: It might be interesting to modify cnspec, if desired to also support https_proxy and no_proxy. This would be especially relevant for users that are forced to use an explicit forward proxy for Internet targets and don't have their own internal mirror of Mondoo endpoints

Also the registryMirrors feature of the MR would be important for companies with internal mirrors such as Artifactory: In the charts/mondoo-operator/templates/mondoooperatorconfig.yaml you can find it mentioned.

Example configuration:

# Registry mirrors for mapping public registries to private mirrors
  # Uncomment and configure for your corporate registry
  registryMirrors:
    ghcr.io: artifactory.example.com/ghcr.io.docker
    hub.docker.com: artifactory.example.com/hub.docker.com
    quay.io: artifactory.example.com/quay.io

@chris-rock
Copy link
Member

Agreed, the no proxy config is super interessting and I like the mirror setup. I need to see if we actually want to deprecate the image config since those would be two different things. Let me play with the proxy setup a bit. Overall I find this a super cool feature @jpaodev. Big thank you for bringing in this great contribution.

@chris-rock
Copy link
Member

We just merged #1391 and we should have all parts of this PR integrated. Huge thank you @jpaodev. Next we are going to do a preview release for the new operator so that we can test the new features more easily in the wild.

@chris-rock
Copy link
Member

I am going to close this PR as all parts have been merged separately. Thank you @jpaodev

@chris-rock chris-rock closed this Feb 19, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Feb 19, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants