Skip to content

Wait for deployments#2010

Merged
knative-prow[bot] merged 8 commits intoknative:mainfrom
skonto:fix_wbh
Mar 13, 2025
Merged

Wait for deployments#2010
knative-prow[bot] merged 8 commits intoknative:mainfrom
skonto:fix_wbh

Conversation

@skonto
Copy link
Copy Markdown
Contributor

@skonto skonto commented Mar 6, 2025

Fixes #2009

Proposed Changes

  • Skip applying webhook resources until deployments are available. This makes sure resources can be reconciled properly and any resources depending on webhooks will be deployed after webhooks are up.

Release Note

Proper order is enforced now during manifest installation.

@knative-prow knative-prow bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Mar 6, 2025
@knative-prow knative-prow bot requested review from aliok and houshengbo March 6, 2025 12:14
Copy link
Copy Markdown

@knative-prow knative-prow bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skonto: 4 warnings.

Details

In response to this:

Fixes #2009

Proposed Changes

  • Re-enqueue until deployments are available. This make sure resources can be reconciled properly.

Release Note

Proper order is enforced now during manifest installation.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

return nil
}

func InstallWebhookConfigs(ctx context.Context, manifest *mf.Manifest, instance base.KComponent) error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: exported function InstallWebhookConfigs should have comment or be unexported. More info.

return nil
}

func InstallWebhookDepResources(ctx context.Context, manifest *mf.Manifest, instance base.KComponent) error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: exported function InstallWebhookDepResources should have comment or be unexported. More info.

return nil
}

func MarkStatusSuccess(ctx context.Context, manifest *mf.Manifest, instance base.KComponent) error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: exported function MarkStatusSuccess should have comment or be unexported. More info.

return nil
}

func SetManifestPaths(ctx context.Context, manifest *mf.Manifest, instance base.KComponent) error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golint comments: exported function SetManifestPaths should have comment or be unexported. More info.

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 6, 2025
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 6, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 62.81%. Comparing base (e9077ba) to head (77f3f27).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2010      +/-   ##
==========================================
+ Coverage   62.76%   62.81%   +0.04%     
==========================================
  Files          49       49              
  Lines        2291     2294       +3     
==========================================
+ Hits         1438     1441       +3     
  Misses        761      761              
  Partials       92       92              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@skonto
Copy link
Copy Markdown
Contributor Author

skonto commented Mar 6, 2025

Istio cni is not coming up for e2e tests

✘ CNI encountered an error: failed to wait for resource: resources not ready after 5m0s: context deadline exceeded
  Error: failed to install manifests: failed to wait for resource: resources not ready after 5m0s: context deadline exceeded

@skonto
Copy link
Copy Markdown
Contributor Author

skonto commented Mar 6, 2025

        waiting:
          message: back-off 2m40s restarting failed container=install-cni pod=istio-cni-node-wndwj_kube-system(692908a1-4a41-4cc7-b918-72f8a177f824)
          reason: CrashLoopBackOff

@skonto
Copy link
Copy Markdown
Contributor Author

skonto commented Mar 6, 2025

image

@skonto
Copy link
Copy Markdown
Contributor Author

skonto commented Mar 6, 2025

istio/istio#53849

@skonto
Copy link
Copy Markdown
Contributor Author

skonto commented Mar 6, 2025

verify should be fixed with #2011

@skonto
Copy link
Copy Markdown
Contributor Author

skonto commented Mar 10, 2025

@houshengbo gentle ping.

if len(nonReadyDeployments) > 0 {
status.MarkDeploymentsNotReady(nonReadyDeployments)
return nil
return controller.NewRequeueAfter(1 * time.Second)
Copy link
Copy Markdown
Member

@pierDipi pierDipi Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to requeue blindly as this could create a hard loop when there is not actual progress?

The controllers are setting up informers for deployments, so the resource will get requeued anyway when they change (including when they become ready)

deploymentInformer.Informer().AddEventHandler(cache.FilteringResourceEventHandler{
FilterFunc: controller.FilterControllerGVK(v1beta1.SchemeGroupVersion.WithKind("KnativeEventing")),
Handler: controller.HandleAll(impl.EnqueueControllerOf),
})

deploymentInformer.Informer().AddEventHandler(cache.FilteringResourceEventHandler{
FilterFunc: controller.FilterControllerGVK(v1beta1.SchemeGroupVersion.WithKind("KnativeServing")),
Handler: controller.HandleAll(impl.EnqueueControllerOf),
})

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if that's not happening, then the informer handling is where we need to tweak to fix the issue

Copy link
Copy Markdown
Contributor Author

@skonto skonto Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am aware of that, I wanted to make sure it happens faster in case something is slow.
The idea is that you cannot progress with an install if your deployments are not up and I wanted to enforce that. So it is not exactly blindly done.

Copy link
Copy Markdown
Contributor Author

@skonto skonto Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing is that if I dont re-enqueue there and block the logic we still face the problem (an error is printed) as the dependent resources will be deployed e.g. Certificate. We need to interrupt the deployment (returning an error) or block until deployments are up. I have tested both options, here I have the former.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the stages:

	stages := common.Stages{
		common.AppendTarget,
		ingress.AppendTargetIngress,
		security.AppendTargetSecurity,
		common.AppendAdditionalManifests,
		r.appendExtensionManifests,
		r.transform,
		manifests.Install,
		common.CheckDeployments,  // <- Make sure we are up here otherwise don't progress bellow
		common.InstallWebhookConfigs,
		common.InstallWebhookDependentResources,
		manifests.SetManifestPaths,
		common.MarkStatusSuccess,
		common.DeleteObsoleteResources(ctx, ks, r.installed),
	}

@skonto
Copy link
Copy Markdown
Contributor Author

skonto commented Mar 10, 2025

@pierDipi I removed the re-enqueue thing and now I just skip the next stages until we are up. If no deployment change happens no reconciliation will happen.

@skonto
Copy link
Copy Markdown
Contributor Author

skonto commented Mar 11, 2025

@houshengbo gentle ping

@houshengbo
Copy link
Copy Markdown

@skonto I just fixed the CI issue, could you rebase this PR?

@skonto
Copy link
Copy Markdown
Contributor Author

skonto commented Mar 11, 2025

Sure.

role mf.Predicate = mf.Any(mf.ByKind("ClusterRole"), mf.ByKind("Role"))
rolebinding mf.Predicate = mf.Any(mf.ByKind("ClusterRoleBinding"), mf.ByKind("RoleBinding"))
webhook mf.Predicate = mf.Any(mf.ByKind("MutatingWebhookConfiguration"), mf.ByKind("ValidatingWebhookConfiguration"))
webhookDependentResources mf.Predicate = mf.ByGVK(schema.GroupVersionKind{Group: "networking.internal.knative.dev", Version: "v1alpha1", Kind: "Certificate"})
Copy link
Copy Markdown

@houshengbo houshengbo Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to specify Version: "v1alpha1" here, as it may change in future? If version is empty, it will just return a GroupKind. It should be fine, right?

Copy link
Copy Markdown
Contributor Author

@skonto skonto Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houshengbo I added a new predicate ByGK because ByGVK

func ByGVK(gvk schema.GroupVersionKind) Predicate {
	return func(u *unstructured.Unstructured) bool {
		return u.GroupVersionKind() == gvk
	}
}

would not work, as u will have a version so I cannot leave version as empty, comparison would fail.

@knative-prow knative-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Mar 13, 2025
@houshengbo
Copy link
Copy Markdown

/lgtm
/approve

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label Mar 13, 2025
@knative-prow
Copy link
Copy Markdown

knative-prow bot commented Mar 13, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: houshengbo, skonto

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot merged commit be75448 into knative:main Mar 13, 2025
25 checks passed
@dsimansk
Copy link
Copy Markdown
Contributor

/cherry-pick release-1.17
/cherry-pick release-1.16
/cherry-pick release-1.15

/cc @skonto @pierDipi

@knative-prow
Copy link
Copy Markdown

knative-prow bot commented Mar 18, 2025

@dsimansk: GitHub didn't allow me to request PR reviews from the following users: skonto.

Note that only knative members and repo collaborators can review this PR, and authors cannot review their own PRs.

Details

In response to this:

/cherry-pick release-1.17
/cherry-pick release-1.16
/cherry-pick release-1.15

/cc @skonto @pierDipi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@knative-prow knative-prow bot requested a review from pierDipi March 18, 2025 12:00
@knative-prow-robot
Copy link
Copy Markdown
Contributor

@dsimansk: #2010 failed to apply on top of branch "release-1.15":

Applying: Wait for deployments
Applying: enable telemetry to debug
Using index info to reconstruct a base tree...
M	test/e2e-common.sh
Falling back to patching base and 3-way merge...
Auto-merging test/e2e-common.sh
CONFLICT (content): Merge conflict in test/e2e-common.sh
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0002 enable telemetry to debug

Details

In response to this:

/cherry-pick release-1.17
/cherry-pick release-1.16
/cherry-pick release-1.15

/cc @skonto @pierDipi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@dsimansk
Copy link
Copy Markdown
Contributor

/cherry-pick release-1.16

@dsimansk
Copy link
Copy Markdown
Contributor

/cherry-pick release-1.17

@knative-prow-robot
Copy link
Copy Markdown
Contributor

@dsimansk: new pull request created: #2034

Details

In response to this:

/cherry-pick release-1.16

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@knative-prow-robot
Copy link
Copy Markdown
Contributor

@dsimansk: new pull request created: #2035

Details

In response to this:

/cherry-pick release-1.17

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

dsimansk pushed a commit to dsimansk/operator that referenced this pull request Mar 18, 2025
* Wait for deployments

* enable telemetry to debug

* fix Istio cni

* typos etc

* lint

* skip net stages if deployments are not ready

* fix tests

* use gk only for Certificate
knative-prow bot pushed a commit that referenced this pull request Mar 19, 2025
* Wait for deployments

* enable telemetry to debug

* fix Istio cni

* typos etc

* lint

* skip net stages if deployments are not ready

* fix tests

* use gk only for Certificate

Co-authored-by: Stavros Kontopoulos <st.kontopoulos@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Webhook dependant resources are deployed before the related deployments are available

5 participants