Skip to content

Flow control throttling kangal and sysdig? #333

@flah00

Description

@flah00

I recently installed sysdig on a test cluster. As it happens, it's the same cluster I run load tests on. While running sysdig I started a load test. Initially kangal controller timedout creating kubernetes resources. I increased the kubernetes client timeout.

And then the kangal controller was unable to create all of the kubernets resources on the first pass. But it succeeded on the second attempt. The error and stack trace are included.

Feb 12 09:30:50.961 kangal-controller E0212 14:30:50.108353 1 loadtest.go:472] there is a conflict with loadtest 'loadtest-coiling-lightningbug' between datastore and cache. it might be because object has been removed or modified in the datastore
Feb 12 09:30:50.961 kangal-controller Created JMeter resources
Feb 12 09:30:40.866 kangal-controller Created pods with test data
Feb 12 09:30:10.769 kangal-controller Remote custom data enabled, creating PVC
Feb 12 09:29:55.762 kangal-controller E0212 14:29:54.895207 1 loadtest.go:309] error syncing 'loadtest-coiling-lightningbug': client rate limiter Wait returned an error: context deadline exceeded, requeuing
Feb 12 09:29:55.762 kangal-controller error syncing loadtest, re-queuing
Feb 12 09:29:55.762 kangal-controller Error on creating new JMeter service
Feb 12 09:29:55.762 kangal-controller Created pods with test data
Feb 12 09:29:15.659 kangal-controller Remote custom data enabled, creating PVC
Feb 12 09:29:00.590 kangal-controller Created new namespace

Stack trace

github.com/hellofresh/kangal/pkg/controller.(*Controller).processNextWorkItem.func1
	/home/runner/work/kangal/kangal/pkg/controller/loadtest.go:299
github.com/hellofresh/kangal/pkg/controller.(*Controller).processNextWorkItem
	/home/runner/work/kangal/kangal/pkg/controller/loadtest.go:307
github.com/hellofresh/kangal/pkg/controller.(*Controller).runWorker
	/home/runner/work/kangal/kangal/pkg/controller/loadtest.go:240
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:157
k8s.io/apimachinery/pkg/util/wait.BackoffUntil
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:158
k8s.io/apimachinery/pkg/util/wait.JitterUntil
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:135
k8s.io/apimachinery/pkg/util/wait.Until
	/home/runner/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:92

Work around

I uninstalled sysdig and k8s api response time was much peppier. I'm already in touch with their support regarding the problem. Kangal controller also succeeds on its first pass. Clearly they have some work to do. But maybe kangal does as well?

Solution?

I'm not really sure what the expectation of flow control is... Should this be the exclusive province of cluster admins? Should charts offer some guidance for their apps? Should kangal include a priority level configuration and flow schema for its service account?

What do folks think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    pinnedissues that should be kept open

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions