-
Notifications
You must be signed in to change notification settings - Fork 68
🐛 fix: resolve JSONPath error in caBundle readiness check by adding kubectl_wait_for_query function #1429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 fix: resolve JSONPath error in caBundle readiness check by adding kubectl_wait_for_query function #1429
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
514d318 to
d36d7d1
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1429 +/- ##
==========================================
- Coverage 73.42% 73.22% -0.20%
==========================================
Files 42 42
Lines 3063 3063
==========================================
- Hits 2249 2243 -6
- Misses 641 645 +4
- Partials 173 175 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
efd97f8 to
e7036fc
Compare
e7036fc to
b538210
Compare
b538210 to
ed2af33
Compare
| function kubectl_wait_for_query() { | ||
| manifest=$1 | ||
| query=$2 | ||
| timeout=$3 | ||
| start_time=$(date +%s) | ||
| while true; do | ||
| val=$(kubectl get "${manifest}" -o jsonpath="${query}" 2>/dev/null || echo "") | ||
| if [[ -n "${val}" ]]; then | ||
| echo "${manifest} has ${query}." | ||
| break | ||
| fi | ||
| if [[ $(( $(date +%s) - start_time )) -ge ${timeout} ]]; then | ||
| echo "Timed out waiting for ${manifest} to have ${query}." | ||
| exit 1 | ||
| fi | ||
| sleep 5 | ||
| done | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this thing be replaced with something along the lines of:
kubectl wait validatingwebhookconfigurations/cert-manager-webhook --for=jsonpath='{.webhooks[0].clientConfig.caBundle}' --timeout=60s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your suggestion is the current approach, but unfortunately, it doesn’t work for our use case. ;-) The kubectl wait command doesn’t handle this situation well.
You can easily reproduce the issue by:
- run
kind create cluster - Then, run the install script https://operator-framework.github.io/operator-controller/getting-started/olmv1_getting_started/ OR following the steps; 📖 docs: add local build and deploy instructions to CONTRIBUTING.md #1430. You will see the error described in the description.
My Assumption here is: The issue seems to arise because kubectl wait expects JSONPath conditions related to .status fields (such as readyReplicas), which are applicable for deployments but not for webhook configurations. Since mutatingwebhookconfigurations and validatingwebhookconfigurations lack these .status fields, kubectl wait may fail when it tries to monitor non-existent replica-related conditions. Additionally, it’s possible this issue is linked to cert-manager’s recommendation to use 3 replicas in production environments since we see error: jsonpath wait format must be --for=jsonpath='{.status.readyReplicas}'=3, which may indirectly affect how readiness is handled.
scripts/install.tpl.sh
Outdated
| echo "Timed out waiting for ${manifest} to have ${query}." | ||
| exit 1 | ||
| fi | ||
| sleep 5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good change assuming it helps things be less flaky with timing in practice, but
what's the 5 magic number? I think 5s?
- I think this should be a variable with a good name:
poll_interval_in_seconds?; - But then you are delaying the polling for 5s? Could be a fair amount of extra wait for no benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied your suggestion and set the polling interval to 10 seconds. With a 60-second timeout, it wouldn’t make sense to use a longer interval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would even bring back the default to 5. 10 feels like an eternity XD
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @perdasilva done !!!
It was changed for 5s.
+1 I agree
ed2af33 to
bdbef17
Compare
bdbef17 to
b118c39
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks pretty good (and awesome that we can remove those false starts!). Just wondering if we don't want to reduce the poll interval to 3 or 5 seconds or so. Just not to wait needlessly. Though, I haven't tried this myself, and maybe it take > 10 seconds anyway...
b118c39 to
be0bf0f
Compare
|
Hi @perdasilva I think 5s make more sense. So, I changed it to 5s. |
…ectl_wait_for_query function By running `make kind-deploy` against a kind cluster or installing the released script from https://operator-framework.github.io/operator-controller/getting-started/olmv1_getting_started/ the following error is faced: ```sh ... deployment.apps/cert-manager-webhook condition met deployment.apps/cert-manager-cainjector condition met deployment.apps/cert-manager condition met error: jsonpath wait format must be --for=jsonpath='{.status.readyReplicas}'=3 ``` This PR fixes an issue with kubectl wait when used to check the caBundle field in `mutatingwebhookconfigurations` and `validatingwebhookconfigurations`. This PR introduces the `kubectl_wait_for_query` function, which replaces `kubectl wait` for this specific use case. The function repeatedly checks the `caBundle` field by using `kubectl get` in a loop, ensuring that the `caBundle` is populated without relying on status-based conditions. This approach provides a more flexible solution compatible with webhook configurations, bypassing the limitations of `kubectl wait`.
be0bf0f to
9522c30
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - Thank you ^^ <333
26904a2
BY running
make kind-deployagainst a kind cluster or installing the released script from https://operator-framework.github.io/operator-controller/getting-started/olmv1_getting_started/ the following error is faced:This PR fixes an issue with kubectl wait when used to check the caBundle field in
mutatingwebhookconfigurationsandvalidatingwebhookconfigurations.Solution
This PR introduces the
kubectl_wait_for_queryfunction, which replaceskubectl waitfor this specific use case. The function repeatedly checks thecaBundlefield by usingkubectl getin a loop, ensuring that thecaBundleis populated without relying on status-based conditions. This approach provides a more flexible solution compatible with webhook configurations, bypassing the limitations ofkubectl wait.After the fix: