Skip to content

Make clusteradm accept idempotent #395

@nirs

Description

@nirs

Describe the bug

Running clusteradm accept multiple times should succeed if the cluster is already accepted, but
it fails in some cases.

To Reproduce

  1. Build ocm hub and 2 managed clusters using minikube vms
  2. When both managed clusters are connected, stop one minikube vm
  3. Start the minikube vm and run clusteradm accept ... again
  4. clusteradm accept fails with
      drenv.commands.Error: Command failed:
         command: ('clusteradm', 'accept', '--clusters', 'dr2', '--wait', '--context', 'hub')
         exitcode: 1
         error:
            Error: context deadline exceeded

Running manually we see that clusteradm is in an endless loop:

Joining cluster 'hub'
Please log onto the hub cluster and run the following command:

    clusteradm accept --clusters dr2

Accepting cluster
no CSR to approve for cluster dr2
hubAcceptsClient already set for managed cluster dr2

 Your managed cluster dr2 has joined the Hub successfully. Visit https://open-cluster-management.io/scenarios or https://github.com/open-cluster-management-io/OCM/tree/main/solutions for next steps.
no CSR to approve for cluster dr2
hubAcceptsClient already set for managed cluster dr2

 Your managed cluster dr2 has joined the Hub successfully. Visit https://open-cluster-management.io/scenarios or https://github.com/open-cluster-management-io/OCM/tree/main/solutions for next steps.
no CSR to approve for cluster dr2
hubAcceptsClient already set for managed cluster dr2

...

Why run clusteradm again? We have automation build the minikube clusters, connecting them with clusteradm and installing many other components. The entire automation is idempotent, so any failures can be fixed by starting again with partly deployed clusters.

Expected behavior
If the managed clusters is already accepted, consider the operation successful.

Environment ie: OCM version, clusteradm version, Kubernetes version and provider:

$ clusteradm version
client		version	:v0.7.1
server release	version	:v1.27.4
default bundle	version	:0.12.0

$ clusteradm get hub-info --context hub
Registration Operator:
  Controller:	(1/1) quay.io/open-cluster-management/registration-operator:v0.12.0
  CustomResourceDefinition:
    (installed) clustermanagers.operator.open-cluster-management.io [*v1]
Components:
  Registration:
    Controller:	(1/1) quay.io/open-cluster-management/registration:v0.12.0
    Webhook:	(1/1) quay.io/open-cluster-management/registration:v0.12.0
  Work:
    Webhook:	(1/1) quay.io/open-cluster-management/work:v0.12.0
  Placement:
    Controller:	(1/1) quay.io/open-cluster-management/placement:v0.12.0
  CustomResourceDefinition:
    (installed) managedclustersetbindings.cluster.open-cluster-management.io [*v1beta2]
    (installed) placements.cluster.open-cluster-management.io [*v1beta1]
    (installed) clustermanagementaddons.addon.open-cluster-management.io [*v1alpha1]
    (installed) managedclusteraddons.addon.open-cluster-management.io [*v1alpha1]
    (installed) managedclusters.cluster.open-cluster-management.io [*v1]
    (installed) managedclustersets.cluster.open-cluster-management.io [*v1beta2]
    (installed) manifestworkreplicasets.work.open-cluster-management.io [*v1alpha1]
    (installed) manifestworks.work.open-cluster-management.io [*v1]
    (installed) placementdecisions.cluster.open-cluster-management.io [*v1beta1]
    (installed) addondeploymentconfigs.addon.open-cluster-management.io [*v1alpha1]
    (installed) addonplacementscores.cluster.open-cluster-management.io [*v1alpha1]
    (installed) addontemplates.addon.open-cluster-management.io [*v1alpha1]

Additional context

We can work around this by skipping the accept call if the managed cluster is already accepted:
RamenDR/ramen#1106

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions