-
Notifications
You must be signed in to change notification settings - Fork 67
Description
Describe the bug
Running clusteradm accept multiple times should succeed if the cluster is already accepted, but
it fails in some cases.
To Reproduce
- Build ocm hub and 2 managed clusters using minikube vms
- When both managed clusters are connected, stop one minikube vm
- Start the minikube vm and run
clusteradm accept ...again - clusteradm accept fails with
drenv.commands.Error: Command failed:
command: ('clusteradm', 'accept', '--clusters', 'dr2', '--wait', '--context', 'hub')
exitcode: 1
error:
Error: context deadline exceeded
Running manually we see that clusteradm is in an endless loop:
Joining cluster 'hub'
Please log onto the hub cluster and run the following command:
clusteradm accept --clusters dr2
Accepting cluster
no CSR to approve for cluster dr2
hubAcceptsClient already set for managed cluster dr2
Your managed cluster dr2 has joined the Hub successfully. Visit https://open-cluster-management.io/scenarios or https://github.com/open-cluster-management-io/OCM/tree/main/solutions for next steps.
no CSR to approve for cluster dr2
hubAcceptsClient already set for managed cluster dr2
Your managed cluster dr2 has joined the Hub successfully. Visit https://open-cluster-management.io/scenarios or https://github.com/open-cluster-management-io/OCM/tree/main/solutions for next steps.
no CSR to approve for cluster dr2
hubAcceptsClient already set for managed cluster dr2
...
Why run clusteradm again? We have automation build the minikube clusters, connecting them with clusteradm and installing many other components. The entire automation is idempotent, so any failures can be fixed by starting again with partly deployed clusters.
Expected behavior
If the managed clusters is already accepted, consider the operation successful.
Environment ie: OCM version, clusteradm version, Kubernetes version and provider:
$ clusteradm version
client version :v0.7.1
server release version :v1.27.4
default bundle version :0.12.0
$ clusteradm get hub-info --context hub
Registration Operator:
Controller: (1/1) quay.io/open-cluster-management/registration-operator:v0.12.0
CustomResourceDefinition:
(installed) clustermanagers.operator.open-cluster-management.io [*v1]
Components:
Registration:
Controller: (1/1) quay.io/open-cluster-management/registration:v0.12.0
Webhook: (1/1) quay.io/open-cluster-management/registration:v0.12.0
Work:
Webhook: (1/1) quay.io/open-cluster-management/work:v0.12.0
Placement:
Controller: (1/1) quay.io/open-cluster-management/placement:v0.12.0
CustomResourceDefinition:
(installed) managedclustersetbindings.cluster.open-cluster-management.io [*v1beta2]
(installed) placements.cluster.open-cluster-management.io [*v1beta1]
(installed) clustermanagementaddons.addon.open-cluster-management.io [*v1alpha1]
(installed) managedclusteraddons.addon.open-cluster-management.io [*v1alpha1]
(installed) managedclusters.cluster.open-cluster-management.io [*v1]
(installed) managedclustersets.cluster.open-cluster-management.io [*v1beta2]
(installed) manifestworkreplicasets.work.open-cluster-management.io [*v1alpha1]
(installed) manifestworks.work.open-cluster-management.io [*v1]
(installed) placementdecisions.cluster.open-cluster-management.io [*v1beta1]
(installed) addondeploymentconfigs.addon.open-cluster-management.io [*v1alpha1]
(installed) addonplacementscores.cluster.open-cluster-management.io [*v1alpha1]
(installed) addontemplates.addon.open-cluster-management.io [*v1alpha1]
Additional context
We can work around this by skipping the accept call if the managed cluster is already accepted:
RamenDR/ramen#1106