Skip to content

Conversation

camilamacedo86
Copy link
Contributor

@camilamacedo86 camilamacedo86 commented Nov 6, 2024

BY running make kind-deploy against a kind cluster or installing the released script from https://operator-framework.github.io/operator-controller/getting-started/olmv1_getting_started/ the following error is faced:

...
deployment.apps/cert-manager-webhook condition met
deployment.apps/cert-manager-cainjector condition met
deployment.apps/cert-manager condition met
error: jsonpath wait format must be --for=jsonpath='{.status.readyReplicas}'=3

This PR fixes an issue with kubectl wait when used to check the caBundle field in mutatingwebhookconfigurations and validatingwebhookconfigurations.

Solution

This PR introduces the kubectl_wait_for_query function, which replaces kubectl wait for this specific use case. The function repeatedly checks the caBundle field by using kubectl get in a loop, ensuring that the caBundle is populated without relying on status-based conditions. This approach provides a more flexible solution compatible with webhook configurations, bypassing the limitations of kubectl wait.

After the fix:

$ make kind-deploy
/Users/camiladeomacedo/go/bin/controller-gen-v0.16.1 rbac:roleName=manager-role crd webhook paths="./..." output:crd:artifacts:config=config/base/crd/bases output:rbac:artifacts:config=config/base/rbac
/Users/camiladeomacedo/go/bin/kustomize-v4.5.7 build config/overlays/cert-manager > operator-controller.yaml
envsubst '$CATALOGD_VERSION,$CERT_MGR_VERSION,$INSTALL_DEFAULT_CATALOGS,$MANIFEST' < scripts/install.tpl.sh | bash -s
namespace/cert-manager created
customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io created
serviceaccount/cert-manager-cainjector created
serviceaccount/cert-manager created
serviceaccount/cert-manager-webhook created
clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrole.rbac.authorization.k8s.io/cert-manager-cluster-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-view created
clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests created
clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews created
role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
role.rbac.authorization.k8s.io/cert-manager:leaderelection created
role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
service/cert-manager created
service/cert-manager-webhook created
deployment.apps/cert-manager-cainjector created
deployment.apps/cert-manager created
deployment.apps/cert-manager-webhook created
mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook created
deployment.apps/cert-manager-webhook condition met
deployment.apps/cert-manager-cainjector condition met
deployment.apps/cert-manager condition met
mutatingwebhookconfigurations/cert-manager-webhook has {.webhooks[0].clientConfig.caBundle}.
validatingwebhookconfigurations/cert-manager-webhook has {.webhooks[0].clientConfig.caBundle}.
namespace/olmv1-system created
customresourcedefinition.apiextensions.k8s.io/clustercatalogs.olm.operatorframework.io created
serviceaccount/catalogd-controller-manager created
role.rbac.authorization.k8s.io/catalogd-leader-election-role created
clusterrole.rbac.authorization.k8s.io/catalogd-manager-role created
clusterrole.rbac.authorization.k8s.io/catalogd-metrics-reader created
clusterrole.rbac.authorization.k8s.io/catalogd-proxy-role created
rolebinding.rbac.authorization.k8s.io/catalogd-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/catalogd-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/catalogd-proxy-rolebinding created
service/catalogd-service created
deployment.apps/catalogd-controller-manager created
certificate.cert-manager.io/olmv1-ca created
certificate.cert-manager.io/catalogd-service-cert created
clusterissuer.cert-manager.io/olmv1-ca created
issuer.cert-manager.io/self-sign-issuer created
mutatingwebhookconfiguration.admissionregistration.k8s.io/catalogd-mutating-webhook-configuration created
Waiting for deployment "catalogd-controller-manager" rollout to finish: 0 of 1 updated replicas are available...
Waiting for deployment "catalogd-controller-manager" rollout to finish: 0 of 1 updated replicas are available...
deployment "catalogd-controller-manager" successfully rolled out
deployment.apps/catalogd-controller-manager condition met
clustercatalog.olm.operatorframework.io/operatorhubio created
clustercatalog.olm.operatorframework.io/operatorhubio condition met
namespace/olmv1-system configured
customresourcedefinition.apiextensions.k8s.io/clusterextensions.olm.operatorframework.io created
serviceaccount/operator-controller-controller-manager created
role.rbac.authorization.k8s.io/operator-controller-leader-election-role created
role.rbac.authorization.k8s.io/operator-controller-manager-role created
clusterrole.rbac.authorization.k8s.io/operator-controller-clusterextension-editor-role created
clusterrole.rbac.authorization.k8s.io/operator-controller-clusterextension-viewer-role created
clusterrole.rbac.authorization.k8s.io/operator-controller-extension-editor-role created
clusterrole.rbac.authorization.k8s.io/operator-controller-extension-viewer-role created
clusterrole.rbac.authorization.k8s.io/operator-controller-manager-role created
clusterrole.rbac.authorization.k8s.io/operator-controller-metrics-reader created
clusterrole.rbac.authorization.k8s.io/operator-controller-proxy-role created
rolebinding.rbac.authorization.k8s.io/operator-controller-leader-election-rolebinding created
rolebinding.rbac.authorization.k8s.io/operator-controller-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/operator-controller-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/operator-controller-proxy-rolebinding created
service/operator-controller-controller-manager-metrics-service created
deployment.apps/operator-controller-controller-manager created
certificate.cert-manager.io/olmv1-ca configured
certificate.cert-manager.io/olmv1-cert created
clusterissuer.cert-manager.io/olmv1-ca unchanged
issuer.cert-manager.io/self-sign-issuer unchanged
deployment.apps/operator-controller-controller-manager condition met

@camilamacedo86 camilamacedo86 requested a review from a team as a code owner November 6, 2024 19:15
Copy link

netlify bot commented Nov 6, 2024

Deploy Preview for olmv1 ready!

Name Link
🔨 Latest commit 9522c30
🔍 Latest deploy log https://app.netlify.com/sites/olmv1/deploys/672cac3cf1637d0008f56ed2
😎 Deploy Preview https://deploy-preview-1429--olmv1.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

codecov bot commented Nov 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.22%. Comparing base (481bd5d) to head (9522c30).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1429      +/-   ##
==========================================
- Coverage   73.42%   73.22%   -0.20%     
==========================================
  Files          42       42              
  Lines        3063     3063              
==========================================
- Hits         2249     2243       -6     
- Misses        641      645       +4     
- Partials      173      175       +2     
Flag Coverage Δ
e2e 55.20% <ø> (-0.20%) ⬇️
unit 52.56% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@camilamacedo86 camilamacedo86 force-pushed the fix-install-script branch 2 times, most recently from efd97f8 to e7036fc Compare November 6, 2024 19:41
@camilamacedo86 camilamacedo86 changed the title 🐛 fix: add kubectl_wait_for_caBundle to handle caBundle readiness checks 🐛 fix: resolve JSONPath error in caBundle readiness check by adding kubectl_wait_for_caBundle function Nov 6, 2024
@camilamacedo86 camilamacedo86 changed the title 🐛 fix: resolve JSONPath error in caBundle readiness check by adding kubectl_wait_for_caBundle function 🐛 fix: resolve JSONPath error in caBundle readiness check by adding kubectl_wait_for_query function Nov 6, 2024
Comment on lines 44 to 70
function kubectl_wait_for_query() {
manifest=$1
query=$2
timeout=$3
start_time=$(date +%s)
while true; do
val=$(kubectl get "${manifest}" -o jsonpath="${query}" 2>/dev/null || echo "")
if [[ -n "${val}" ]]; then
echo "${manifest} has ${query}."
break
fi
if [[ $(( $(date +%s) - start_time )) -ge ${timeout} ]]; then
echo "Timed out waiting for ${manifest} to have ${query}."
exit 1
fi
sleep 5
done
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this thing be replaced with something along the lines of:

kubectl wait validatingwebhookconfigurations/cert-manager-webhook --for=jsonpath='{.webhooks[0].clientConfig.caBundle}' --timeout=60s

Copy link
Contributor Author

@camilamacedo86 camilamacedo86 Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your suggestion is the current approach, but unfortunately, it doesn’t work for our use case. ;-) The kubectl wait command doesn’t handle this situation well.

You can easily reproduce the issue by:

My Assumption here is: The issue seems to arise because kubectl wait expects JSONPath conditions related to .status fields (such as readyReplicas), which are applicable for deployments but not for webhook configurations. Since mutatingwebhookconfigurations and validatingwebhookconfigurations lack these .status fields, kubectl wait may fail when it tries to monitor non-existent replica-related conditions. Additionally, it’s possible this issue is linked to cert-manager’s recommendation to use 3 replicas in production environments since we see error: jsonpath wait format must be --for=jsonpath='{.status.readyReplicas}'=3, which may indirectly affect how readiness is handled.

echo "Timed out waiting for ${manifest} to have ${query}."
exit 1
fi
sleep 5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good change assuming it helps things be less flaky with timing in practice, but

what's the 5 magic number? I think 5s?

  1. I think this should be a variable with a good name: poll_interval_in_seconds?;
  2. But then you are delaying the polling for 5s? Could be a fair amount of extra wait for no benefit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied your suggestion and set the polling interval to 10 seconds. With a 60-second timeout, it wouldn’t make sense to use a longer interval.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even bring back the default to 5. 10 feels like an eternity XD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @perdasilva done !!!
It was changed for 5s.
+1 I agree

Copy link
Contributor

@perdasilva perdasilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks pretty good (and awesome that we can remove those false starts!). Just wondering if we don't want to reduce the poll interval to 3 or 5 seconds or so. Just not to wait needlessly. Though, I haven't tried this myself, and maybe it take > 10 seconds anyway...

@camilamacedo86
Copy link
Contributor Author

Hi @perdasilva

I think 5s make more sense.
Why should we delay so much when the timeout is 60s?

So, I changed it to 5s.

…ectl_wait_for_query function

By running `make kind-deploy` against a kind cluster  or installing the released script from https://operator-framework.github.io/operator-controller/getting-started/olmv1_getting_started/ the following error is faced:

```sh
...
deployment.apps/cert-manager-webhook condition met
deployment.apps/cert-manager-cainjector condition met
deployment.apps/cert-manager condition met
error: jsonpath wait format must be --for=jsonpath='{.status.readyReplicas}'=3
```

This PR fixes an issue with kubectl wait when used to check the caBundle field in `mutatingwebhookconfigurations` and `validatingwebhookconfigurations`.

This PR introduces the `kubectl_wait_for_query` function, which replaces `kubectl wait` for this specific use case. The function repeatedly checks the `caBundle` field by using `kubectl get` in a loop, ensuring that the `caBundle` is populated without relying on status-based conditions. This approach provides a more flexible solution compatible with webhook configurations, bypassing the limitations of `kubectl wait`.
Copy link
Contributor

@perdasilva perdasilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - Thank you ^^ <333

@LalatenduMohanty LalatenduMohanty added this pull request to the merge queue Nov 7, 2024
Merged via the queue into operator-framework:main with commit 26904a2 Nov 7, 2024
16 of 17 checks passed
@camilamacedo86 camilamacedo86 deleted the fix-install-script branch November 7, 2024 17:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants