Skip to content

AKS: node bootstrapping problems with network isolation #162

@craddm

Description

@craddm

When the isolated cluster is fully locked down by NSGs, eventually node pools start to fail.

They are not able to receive automatic updates, and they are not able to bootstrap nodes if they need to scale or restart them.

The node bootstrapping process cannot be redirected to Harbor - it's hardcoded to pull from a Microsoft source.

Unfortunately, that Microsoft source is not covered by any of the Network Security Group Service Tags, so there's no reliable way to let only that traffic out using NSGs. NSGs can only use ServiceTags or IP addresses, and the IP address for the MS endpoint (packages.aks.azure.com) is not guaranteed to be stable.

On the current development branch, the only way to get around this is to allow the nodes in the isolated cluster unrestricted access to the internet on port 443. Not ideal, but given all other constraints on user actions, still very difficult to explit.

The alternative is to put a firewall in place and allow that to perform FQDN filtering, allowing outgoing traffic only to the appropriate MS endpoint. That requires quite a bit more configuration - User Defined Routes etc, but should be workable.

Another possibility is to use network isolated AKS clusters, which are natively isolated and have no outbound internet access. For those, you can configure an AKS managed Azure Container registry that is accessible to the nodes even without outbound internet access, which allows node bootstrapping to take place.

That causes us a different problem, though, which is there is no way to configure containerd to pull images from Harbor. And the public registries are all cut off. You can't even pull an image (like busybox) to let you change the containerd config.

You could potentially use a custom Azure Container Registry instead (at that point do we really need Harbor...?)

What I might be able to do is directly specify that the image to modify containerd is from Harbor in the image name. On dev this is still tricky, because Harbor is not using a trusted TLS certificate, so the node won't pull from it - we need to configure containerd to skip verification, which we can't do, or extract the certificate and add it a trust bundle. Probably fine with prod certificates...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions