-
Notifications
You must be signed in to change notification settings - Fork 452
DRA 1k node load testing #5788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DRA 1k node load testing #5788
Conversation
Skipping CI for Draft Pull Request. |
Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test ? |
@nojnhuh: The following commands are available to trigger required jobs:
The following commands are available to trigger optional jobs:
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #5788 +/- ##
=======================================
Coverage 52.94% 52.94%
=======================================
Files 279 279
Lines 29666 29666
=======================================
Hits 15706 15706
Misses 13147 13147
Partials 813 813 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
/test pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds |
/test pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds |
/test pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds |
Interesting parts of the boot logs for the nodes that failed to bootstrap in https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/5788/pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds/1949711465279655936 Instance 256
Instance 353
Both hit "Connection reset by peer in connection to k8sprowstoragecomm.blob.core.windows.net:443" when downloading binaries. Maybe we're hitting the storage account too hard?
|
/test pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds |
/test pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds |
/test pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds |
/test pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds |
1 similar comment
/test pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds |
This looks like the next bootstrap failure to track down/work around:
Then there was one VMSS instance that got stuck in "Creating" that never seemed to even enter cloud-init? But it looks like VMSS recycled that one. |
/test pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds |
I noticed the CAPZ VM extension on one of the VMSS instances failed and that caused the VMSS to enter a Failed provisioning state, but then that instance still produced a Ready node? /test pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds |
Another flake related to kubernetes-sigs/cluster-api-addon-provider-helm#416 in https://prow.k8s.io/view/gs/kubernetes-ci-logs/pr-logs/pull/kubernetes-sigs_cluster-api-provider-azure/5788/pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds/1950346650828410880. FYI @Jont828 /test pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds |
The furthest I've gotten into why CAAPH sometimes can't reach the API server to install azuredisk is that failing runs seem to correlate with the KubeadmControlPlane becoming initialized before the CAPZ VM extension succeeds on the first control plane machine. Not sure how that's related to connectivity issues yet. Overall though it seems like there's some turbulence right at the moment CAPI is first able to contact the control plane while other clients might still have trouble for up to around 30s. I also saw intermittent connection issues in the kubelet logs in that same time frame when it's trying contact the node's internal IP, so that suggests to me that the problem isn't the public load balancer. |
@nojnhuh add |
Trying that out. /test pull-cluster-api-provider-azure-load-test-1k-dra-with-workload-custom-builds |
@nojnhuh: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Testing kubernetes/test-infra#35109: /test pull-cluster-api-provider-azure-load-test-custom-builds |
What type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
TODOs:
Release note: