Skip to content

Trident is choking on client rate limiter and context cancel errors #1098

@sergeykuperman

Description

@sergeykuperman

Describe the bug
our trial cluster is experiencing a wave of "context cancelled" and "client rate limiter Wait returned an error: context canceled" type of errors, volume provisioning and attachment are very slow, with many retries. Please advise on how to troubleshoot this issue.

example logs:

controller.log

I have tried to increase k8sAPIQPS parameter to 300 in tridentorchestrator (we use operator for trident installation), but this does not seem to help.
Cluster currently holds 2600 tridentvolumes.

Environment
We are using trident 25.06, aws fsx ONTAP , "ontap-san-economy" driver
trident controller runs in a gardener cluster.

  • Trident version: 25.06
  • Trident installation flags used: -n trident
  • Container runtime: containerd 2.0
  • Kubernetes version: v1.31.13
  • Kubernetes orchestrator: Gardener
  • Kubernetes enabled feature gates:
  • OS: Garden Linux 1877.6
  • NetApp backend types: AWS Ontap (linux)
  • Other:

To Reproduce
This happens sporadically, and i do not know what exactly is the root cause, but it renders the system almost unusable

Expected behavior
provisioning, attaching and managing happens relatively quickly

Additional context
This happens to one of our productive clusters, (SAP)
We would be grateful for any tips or support on this case.
i have tried to create account here
https://mysupport.netapp.com/site/user/registration
To create support ticket,
but this site seems bugged as well, it does not let me past the email verification step, keeps asking for more codes, and sending them to email, entering the codes does not advance the registration process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions