Skip to content

Conversation

@nojnhuh
Copy link
Contributor

@nojnhuh nojnhuh commented Jun 16, 2025

What type of PR is this?
/kind flake

What this PR does / why we need it:

This PR changes the tigera-operator Pod to tolerate only the NoSchedule tolerations that are set when a Node is being bootstrapped instead of all of them (which includes the node.kubernetes.io/unschedulable taint set when a Node is cordoned and drained).

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #5703

Special notes for your reviewer:

TODOs:

  • squashed commits
  • includes documentation
  • adds unit tests
  • cherry-pick candidate

Release note:

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/flake Categorizes issue or PR as related to a flaky test. labels Jun 16, 2025
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 16, 2025
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Jun 16, 2025
@codecov
Copy link

codecov bot commented Jun 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 52.83%. Comparing base (ba1619e) to head (39b34f2).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5704   +/-   ##
=======================================
  Coverage   52.83%   52.83%           
=======================================
  Files         278      278           
  Lines       29610    29610           
=======================================
  Hits        15645    15645           
  Misses      13148    13148           
  Partials      817      817           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Jun 16, 2025

I just pushed a hack to run more tests at once here.

/retitle [WIP] Define NoSchedule tolerations for tigera-operator

@k8s-ci-robot k8s-ci-robot changed the title Define NoSchedule tolerations for tigera-operator [WIP] Define NoSchedule tolerations for tigera-operator Jun 16, 2025
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2025
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Jun 16, 2025

/test pull-cluster-api-provider-azure-e2e-workload-upgrade

@nojnhuh nojnhuh force-pushed the tigera-tolerations branch from 39b34f2 to 6fff937 Compare June 16, 2025 22:51
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Jun 16, 2025

I just pushed a hack to run more tests at once here.

/retitle [WIP] Define NoSchedule tolerations for tigera-operator

With 2x3 runs of each test all getting past the flake this is targeting, that gives me hope. Hack is removed and this is ready for review.

/retitle Define NoSchedule tolerations for tigera-operator

@nojnhuh nojnhuh changed the title [WIP] Define NoSchedule tolerations for tigera-operator Define NoSchedule tolerations for tigera-operator Jun 16, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 16, 2025
releaseName: projectcalico
repoURL: https://docs.tigera.io/calico/charts
valuesTemplate: |-
valuesTemplate: |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| keeps the trailing new line while |- removes the final trailing new line
So in practice if we know the section is going to end the yaml we want it to have | ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newline that gets stripped is only the newline at the end of this string value for the key valuesTemplate, not the whole file. Some of the files I changed here didn't happen to have a newline at the end of the file whereas the new section that got tacked on by this PR does, which is why I think this changed. In general though, I don't think whether there's a newline or not at the end of a valuesTemplate matters.

@alimaazamat
Copy link
Contributor

/lgtm
just had a personal question!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 16, 2025
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 522967d7ff6c0e8edda3d12d0cf5bb46404a7f3d

@nojnhuh nojnhuh mentioned this pull request Jun 17, 2025
4 tasks
Copy link
Contributor

@willie-yao willie-yao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: willie-yao

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 17, 2025
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Jun 17, 2025

/retest-required

1 similar comment
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Jun 17, 2025

/retest-required

@willie-yao
Copy link
Contributor

/retest

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Jun 17, 2025

@nojnhuh: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-azure-e2e 6fff937 link unknown /test pull-cluster-api-provider-azure-e2e

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@nojnhuh
Copy link
Contributor Author

nojnhuh commented Jun 17, 2025

/retest

@k8s-ci-robot k8s-ci-robot merged commit 8317a15 into kubernetes-sigs:main Jun 18, 2025
36 of 37 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Jun 18, 2025
@github-project-automation github-project-automation bot moved this from Todo to Done in CAPZ Planning Jun 18, 2025
@nojnhuh nojnhuh deleted the tigera-tolerations branch June 18, 2025 00:12
@nojnhuh
Copy link
Contributor Author

nojnhuh commented Oct 1, 2025

/cherry-pick release-1.20

#5889 is failing and this should fix it.

@k8s-infra-cherrypick-robot

@nojnhuh: #5704 failed to apply on top of branch "release-1.20":

Applying: Define NoSchedule tolerations for tigera-operator
Using index info to reconstruct a base tree...
M	templates/test/ci/cluster-template-prow-apiserver-ilb.yaml
M	templates/test/ci/cluster-template-prow-ci-version-dra.yaml
M	templates/test/ci/cluster-template-prow-ci-version-dual-stack.yaml
M	templates/test/ci/cluster-template-prow-ci-version-ipv6.yaml
M	templates/test/ci/cluster-template-prow-ci-version-md-and-mp.yaml
M	templates/test/ci/cluster-template-prow-ci-version.yaml
M	templates/test/ci/cluster-template-prow-dual-stack.yaml
M	templates/test/ci/cluster-template-prow-ipv6.yaml
M	templates/test/ci/cluster-template-prow-machine-pool-ci-version.yaml
M	templates/test/ci/cluster-template-prow-machine-pool-flex.yaml
M	templates/test/ci/cluster-template-prow-machine-pool.yaml
M	templates/test/ci/cluster-template-prow-nvidia-gpu.yaml
M	templates/test/ci/cluster-template-prow.yaml
M	templates/test/dev/cluster-template-custom-builds-dra.yaml
M	templates/test/dev/cluster-template-custom-builds-load-dra.yaml
M	templates/test/dev/cluster-template-custom-builds-load.yaml
M	templates/test/dev/cluster-template-custom-builds-machine-pool.yaml
M	templates/test/dev/cluster-template-custom-builds.yaml
M	test/e2e/data/infrastructure-azure/v1beta1/cluster-template-kcp-scale-in.yaml
M	test/e2e/data/infrastructure-azure/v1beta1/cluster-template-upgrades.yaml
Falling back to patching base and 3-way merge...
Auto-merging test/e2e/data/infrastructure-azure/v1beta1/cluster-template-upgrades.yaml
CONFLICT (content): Merge conflict in test/e2e/data/infrastructure-azure/v1beta1/cluster-template-upgrades.yaml
Auto-merging test/e2e/data/infrastructure-azure/v1beta1/cluster-template-kcp-scale-in.yaml
Auto-merging templates/test/dev/cluster-template-custom-builds.yaml
Auto-merging templates/test/dev/cluster-template-custom-builds-machine-pool.yaml
Auto-merging templates/test/dev/cluster-template-custom-builds-load.yaml
Auto-merging templates/test/dev/cluster-template-custom-builds-load-dra.yaml
Auto-merging templates/test/dev/cluster-template-custom-builds-dra.yaml
Auto-merging templates/test/ci/cluster-template-prow.yaml
Auto-merging templates/test/ci/cluster-template-prow-nvidia-gpu.yaml
Auto-merging templates/test/ci/cluster-template-prow-machine-pool.yaml
Auto-merging templates/test/ci/cluster-template-prow-machine-pool-flex.yaml
Auto-merging templates/test/ci/cluster-template-prow-machine-pool-ci-version.yaml
Auto-merging templates/test/ci/cluster-template-prow-ipv6.yaml
Auto-merging templates/test/ci/cluster-template-prow-dual-stack.yaml
Auto-merging templates/test/ci/cluster-template-prow-ci-version.yaml
Auto-merging templates/test/ci/cluster-template-prow-ci-version-md-and-mp.yaml
Auto-merging templates/test/ci/cluster-template-prow-ci-version-ipv6.yaml
Auto-merging templates/test/ci/cluster-template-prow-ci-version-dual-stack.yaml
Auto-merging templates/test/ci/cluster-template-prow-ci-version-dra.yaml
Auto-merging templates/test/ci/cluster-template-prow-apiserver-ilb.yaml
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Patch failed at 0001 Define NoSchedule tolerations for tigera-operator

In response to this:

/cherry-pick release-1.20

#5889 is failing and this should fix it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/flake Categorizes issue or PR as related to a flaky test. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

tigera-operator Pod sometimes blocks Node deletion

5 participants