Skip to content

SecurityGroupReconciliationFailed when rules already exists on controlPlaneLoadBalancer sg #5438

@MadJlzz

Description

@MadJlzz

/kind bug

What steps did you take and what happened:

After deploying the new provider, I am seeing now a SecurityGroupReconciliationFailed error on the cluster object.

For reference, here's what the spec looks like (I've obfuscated IPs):

[...]
  controlPlaneLoadBalancer:
    crossZoneLoadBalancing: true
    ingressRules:
    - cidrBlocks:
      - x.y.z/32
      - a.b.c/28
      description: Allow Kubernetes API Server
      fromPort: 6443
      protocol: tcp
      toPort: 6443
    loadBalancerType: nlb
    scheme: internet-facing
[...]

From the release note:

If deploying clusters to an existing VPC (not managed by the AWS provider), the provider will no longer automatically create a security group rule allowing traffic from all addresses (0.0.0.0/0). You may need to update AWSCluster.spec.controlPlaneLoadBalancer.ingressRules with the source address of your Management Cluster. (#5198, @sl1pm4t)

I've went over the security group to check and the rule 0.0.0.0/0 that were made by the controller indeed dissapeared. The capa controller logs are still showing some errors

I0324 09:20:40.483534       1 awscluster_controller.go:315] "Reconciling AWSCluster" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="k004aws/k004aws" namespace="k004aws" name="k004aws" reconcileID="6b74eb7d-cc72-42e7-bec2-04629bc5b0c2" cluster="k004aws/k004aws"
^@I0324 09:20:40.761137       1 subnets.go:50] "Reconciling subnets" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="k004aws/k004aws" namespace="k004aws" name="k004aws" reconcileID="6b74eb7d-cc72-42e7-bec2-04629bc5b0c2" cluster="k004aws/k004aws"
E0324 09:20:42.326152       1 awscluster_controller.go:338] "failed to reconcile security groups" err=<
        failed to authorize security group "sg-010b8a82e641aacdf" ingress rules: [protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server]: InvalidParameterValue: The same permission must not appear multiple times
                status code: 400, request id: 4afa5d93-ddce-4cb0-8a9b-01400fe9d5dd
 > controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="k012aws/k012aws" namespace="k012aws" name="k012aws" reconcileID="00812162-78a2-42f8-8782-3055db8835fa" cluster="k012aws/k012aws"
E0324 09:20:42.392246       1 controller.go:316] "Reconciler error" err=<
        failed to authorize security group "sg-010b8a82e641aacdf" ingress rules: [protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server]: InvalidParameterValue: The same permission must not appear multiple times
                status code: 400, request id: 4afa5d93-ddce-4cb0-8a9b-01400fe9d5dd
 > controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="k012aws/k012aws" namespace="k012aws" name="k012aws" reconcileID="00812162-78a2-42f8-8782-3055db8835fa"
I0324 09:20:42.392859       1 awscluster_controller.go:315] "Reconciling AWSCluster" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="k012aws/k012aws" namespace="k012aws" name="k012aws" reconcileID="7127e58d-e9fe-42c2-8911-d742d9c1169f" cluster="k012aws/k012aws"
E0324 09:20:42.497349       1 awscluster_controller.go:338] "failed to reconcile security groups" err=<
        failed to authorize security group "sg-0727c05fce5ba0268" ingress rules: [protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Kubernetes API]: InvalidPermission.Duplicate: the specified rule "peer: x.y.x/32, TCP, from port: 6443, to port: 6443, ALLOW" already exists
                status code: 400, request id: 5832bbb4-3346-417d-badd-60422cbb2882
 > controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="k004aws/k004aws" namespace="k004aws" name="k004aws" reconcileID="6b74eb7d-cc72-42e7-bec2-04629bc5b0c2" cluster="k004aws/k004aws"
E0324 09:20:42.580191       1 controller.go:316] "Reconciler error" err=<
        failed to authorize security group "sg-0727c05fce5ba0268" ingress rules: [protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Kubernetes API]: InvalidPermission.Duplicate: the specified rule "peer: x.y.z/32, TCP, from port: 6443, to port: 6443, ALLOW" already exists
                status code: 400, request id: 5832bbb4-3346-417d-badd-60422cbb2882
 > controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="k004aws/k004aws" namespace="k004aws" name="k004aws" reconcileID="6b74eb7d-cc72-42e7-bec2-04629bc5b0c2"
I0324 09:20:42.581251       1 awscluster_controller.go:315] "Reconciling AWSCluster" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="k004aws/k004aws" namespace="k004aws" name="k004aws" reconcileID="58ed3047-4775-4d0e-ac82-fb00b915b1d6" cluster="k004aws/k004aws"
I0324 09:20:42.604500       1 subnets.go:50] "Reconciling subnets" controller="awscluster" controllerGroup="infrastructure.cluster.x-k8s.io" controllerKind="AWSCluster" AWSCluster="k012aws/k012aws" namespace="k012aws" name="k012aws" reconcileID="7127e58d-e9fe-42c2-8911-d742d9c1169f" cluster="k012aws/k012aws"

Events wise

  Normal   SuccessfulSetVPCAttributes                2m29s (x50 over 3m57s)  aws-controller  Set managed VPC attributes for "vpc-07f14af82e3655951"
  Warning  FailedAuthorizeSecurityGroupIngressRules  2m27s (x41 over 3m43s)  aws-controller  (combined from similar events): Failed to authorize security group ingress rules [protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server] for SecurityGroup "sg-010b8a82e641aacdf": InvalidParameterValue: The same permission must not appear multiple times
           status code: 400, request id: 2a74e42d-9000-4cdb-b405-fff647b77321
  Warning  FailedAuthorizeSecurityGroupIngressRules  2m27s (x41 over 3m43s)  aws-controller  (combined from similar events): Failed to authorize security group ingress rules [protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Kubernetes API protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server protocol=tcp/range=[6443-6443]/description=Allow Kubernetes API Server] for SecurityGroup "sg-010b8a82e641aacdf": InvalidParameterValue: The same permission must not appear multiple times
           status code: 400, request id: 2a74e42d-9000-4cdb-b405-fff647b77321

Those rules were initially created by the controller itself so I thought it may not check if those already exists. After deleting them manually, I was surprised that they were not created at all. I wasn't able to catch more logs from that behavior.

What did you expect to happen:

During security group reconciliation, the controller should check if the rules already exist or not before trying to create them to avoid a SecurityGroupReconciliationFailed.

Environment:

  • Cluster-api-provider-aws version: v2.8.1
  • Kubernetes version: v1.31.6-gke.1020000
  • OS (e.g. from /etc/os-release):

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.needs-priorityneeds-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions