Skip to content

Incorrect Service Selector in Azure Infrastructure Provider Causes Endpoint Overlap in Cluster API Operator, when not using separate namespaces for providers #5952

@sivarama-p-raju

Description

@sivarama-p-raju

/kind bug

[Before submitting an issue, have you checked the Troubleshooting Guide self-managed & managed?]

Yes

What steps did you take and what happened:

I installed the Azure infrastructure provider as part of the cluster-api-operator Helm chart on my management cluster. During installation, the operator fetched manifests from cluster-api-provider-azure.

I am not using a separate namespace for each component and using the same namespace for all providers and the cluster-api-operator.

The below two services created as part of the Azure Service Operator:

azureserviceoperator-controller-manager-metrics-service
azureserviceoperator-webhook-service

...use the following selector:

selector:
  control-plane: controller-manager

This label is also present on other pods in the namespace, such as:

capi-controller-manager
capi-ipam-in-cluster-controller-manager
capi-kubeadm-bootstrap-controller-manager
capi-kubeadm-control-plane-controller-manager
capx-controller-manager
cluster-api-operator

As a result, these unrelated pods are incorrectly added as endpoints to the Azure Service Operator services.

What did you expect to happen:

The services should use a more specific label selector to target only the Azure Service Operator pods.

For example:

selector:
  app.kubernetes.io/name: azure-service-operator

This label is already present on the relevant pods and would correctly scope the services to the intended endpoints.

Anything else you would like to add:

As a workaround, I am patching these services with the below on my values file:

infrastructure:
  azure:
    manifestPatches:
      - |
        apiVersion: v1
        kind: Service
        metadata:
          name: azureserviceoperator-controller-manager-metrics-service
          namespace: capi-providers
        spec:
          selector:
            app.kubernetes.io/name: azure-service-operator
      - |
        apiVersion: v1
        kind: Service
        metadata:
          name: azureserviceoperator-webhook-service
          namespace: capi-providers
        spec:
          selector:
            app.kubernetes.io/name: azure-service-operator

This resolves the issue locally, but I believe the upstream manifests should be corrected to avoid misrouting and unintended service behavior.

Environment:

  • cluster-api-provider-azure version:

cluster-api-operator: 0.24.0
azure infrastructure provider: v1.21.1

  • Kubernetes version: (use kubectl version): v1.32.9
  • OS (e.g. from /etc/os-release): ubuntu 22.04

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions