Skip to content

Commit d35b094

Browse files
authored
Merge pull request #7741 from MikeSpreitzer/update-controllers
Modernize controllers.md
2 parents 25fe55c + 2f737de commit d35b094

File tree

1 file changed

+39
-33
lines changed

1 file changed

+39
-33
lines changed

contributors/devel/sig-api-machinery/controllers.md

Lines changed: 39 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,9 @@ Watches, etc, are all merely optimizations of this logic.
1818

1919
When you're writing controllers, there are few guidelines that will help make sure you get the results and performance you're looking for.
2020

21-
1. Operate on one item at a time. If you use a `workqueue.Interface`, you'll be able to queue changes for a particular resource and later pop them in multiple “worker” gofuncs with a guarantee that no two gofuncs will work on the same item at the same time.
21+
1. Operate on one item at a time. If you use a `workqueue.Interface`, you'll be able to queue references to particular objects and later pop them in multiple “worker” goroutines with a guarantee that no two goroutines will work on the same item at the same time.
2222

23-
Many controllers must trigger off multiple resources (I need to "check X if Y changes"), but nearly all controllers can collapse those into a queue of “check this X” based on relationships. For instance, a ReplicaSet controller needs to react to a pod being deleted, but it does that by finding the related ReplicaSets and queuing those.
23+
Many controllers must trigger off multiple resources (I need to "check X if Y changes"), but nearly all controllers can collapse those into a queue of “check this X” based on relationships. For instance, a ReplicaSet controller needs to react to a pod being deleted, but it does that by finding the related ReplicaSets and queuing references to those.
2424

2525
1. Random ordering between resources. When controllers queue off multiple types of resources, there is no guarantee of ordering amongst those resources.
2626

@@ -44,17 +44,19 @@ When you're writing controllers, there are few guidelines that will help make su
4444

4545
1. Wait for your secondary caches. Many controllers have primary and secondary resources. Primary resources are the resources that you'll be updating `Status` for. Secondary resources are resources that you'll be managing (creating/deleting) or using for lookups.
4646

47-
Use the `framework.WaitForCacheSync` function to wait for your secondary caches before starting your primary sync functions. This will make sure that things like a Pod count for a ReplicaSet isn't working off of known out of date information that results in thrashing.
47+
Use the `cache.WaitForCacheSync` function to wait for your secondary caches before starting your primary sync functions. This will make sure that things like a Pod count for a ReplicaSet isn't working off of known out of date information that results in thrashing.
4848

4949
1. There are other actors in the system. Just because you haven't changed an object doesn't mean that somebody else hasn't.
5050

5151
Don't forget that the current state may change at any moment--it's not sufficient to just watch the desired state. If you use the absence of objects in the desired state to indicate that things in the current state should be deleted, make sure you don't have a bug in your observation code (e.g., act before your cache has filled).
5252

53+
1. Failures happen, and their detection in Kubernetes is imperfect. Run multiple copies of your controller pod (e.g., via a `Deployment`), with one copy active and the other(s) ready to take over at a moment's notice. Use `k8s.io/client-go/tools/leader-election` for that. Even leader election is imperfect; even when using leader election it is still possible --- although very unlikely --- that multiple copies of your controller may be active.
54+
5355
1. Percolate errors to the top level for consistent re-queuing. We have a `workqueue.RateLimitingInterface` to allow simple requeuing with reasonable backoffs.
5456

55-
Your main controller func should return an error when requeuing is necessary. When it isn't, it should use `utilruntime.HandleError` and return nil instead. This makes it very easy for reviewers to inspect error handling cases and to be confident that your controller doesn't accidentally lose things it should retry for.
57+
Your main controller func should return an error when requeuing is necessary. When it isn't, it should use `utilruntime.HandleErrorWithContext` and return nil instead. This makes it very easy for reviewers to inspect error handling cases and to be confident that your controller doesn't accidentally lose things it should retry for.
5658

57-
1. Watches and Informers will “sync”. Periodically, they will deliver every matching object in the cluster to your `Update` method. This is good for cases where you may need to take additional action on the object, but sometimes you know there won't be more work to do.
59+
1. Informers can periodically “resync”. This means to call the Update event handler on every object in the informer's local cache. This is good for cases where you may need to take additional action on the object, but sometimes you know there won't be more work to do.
5860

5961
In cases where you are *certain* that you don't need to requeue items when there are no new changes, you can compare the resource version of the old and new objects. If they are the same, you skip requeuing the work. Be careful when you do this. If you ever skip requeuing your item on failures, you could fail, not requeue, and then never retry that item again.
6062

@@ -80,7 +82,7 @@ type Controller struct {
8082

8183
// queue is where incoming work is placed to de-dup and to allow "easy"
8284
// rate limited requeues on errors
83-
queue workqueue.RateLimitingInterface
85+
queue workqueue.TypedRateLimitingInterface[cache.ObjectName]
8486
}
8587

8688
func NewController(pods informers.PodInformer) *Controller {
@@ -93,98 +95,102 @@ func NewController(pods informers.PodInformer) *Controller {
9395
// register event handlers to fill the queue with pod creations, updates and deletions
9496
pods.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
9597
AddFunc: func(obj interface{}) {
96-
key, err := cache.MetaNamespaceKeyFunc(obj)
98+
ref, err := cache.ObjectToName(obj)
9799
if err == nil {
98-
c.queue.Add(key)
100+
c.queue.Add(ref)
99101
}
100102
},
101103
UpdateFunc: func(old interface{}, new interface{}) {
102-
key, err := cache.MetaNamespaceKeyFunc(new)
104+
ref, err := cache.ObjectToName(new)
103105
if err == nil {
104-
c.queue.Add(key)
106+
c.queue.Add(ref)
105107
}
106108
},
107109
DeleteFunc: func(obj interface{}) {
108110
// IndexerInformer uses a delta nodeQueue, therefore for deletes we have to use this
109111
// key function.
110-
key, err := cache.DeletionHandlingMetaNamespaceKeyFunc(obj)
112+
ref, err := cache.DeletionHandlingObjectToName(obj)
111113
if err == nil {
112-
c.queue.Add(key)
114+
c.queue.Add(ref)
113115
}
114116
},
115117
},)
116118

117119
return c
118120
}
119121

120-
func (c *Controller) Run(threadiness int, stopCh chan struct{}) {
122+
func (c *Controller) Run(ctx context.Context, threadiness int) error {
121123
// don't let panics crash the process
122124
defer utilruntime.HandleCrash()
123125
// make sure the work queue is shutdown which will trigger workers to end
124126
defer c.queue.ShutDown()
127+
logger := klog.FromContext(ctx)
125128

126-
klog.Infof("Starting <NAME> controller")
129+
logger.Info("Starting <NAME> controller")
127130

128131
// wait for your secondary caches to fill before starting your work
129-
if !cache.WaitForCacheSync(stopCh, c.podsSynced) {
130-
return
132+
if !cache.WaitForCacheSync(ctx.Done(), c.podsSynced) {
133+
return fmt.Errorf("failed to wait for caches to sync")
131134
}
132135

133136
// start up your worker threads based on threadiness. Some controllers
134137
// have multiple kinds of workers
135138
for i := 0; i < threadiness; i++ {
136139
// runWorker will loop until "something bad" happens. The .Until will
137140
// then rekick the worker after one second
138-
go wait.Until(c.runWorker, time.Second, stopCh)
141+
go wait.UntilWithContext(ctx, c.runWorker, time.Second)
139142
}
143+
logger.Info("Started workers")
140144

141145
// wait until we're told to stop
142-
<-stopCh
143-
klog.Infof("Shutting down <NAME> controller")
146+
<-ctx.Done()
147+
logger.Info("Shutting down <NAME> controller")
148+
149+
return nil
144150
}
145151

146-
func (c *Controller) runWorker() {
152+
func (c *Controller) runWorker(ctx context.Context) {
147153
// hot loop until we're told to stop. processNextWorkItem will
148154
// automatically wait until there's work available, so we don't worry
149155
// about secondary waits
150-
for c.processNextWorkItem() {
156+
for c.processNextWorkItem(ctx) {
151157
}
152158
}
153159

154-
// processNextWorkItem deals with one key off the queue. It returns false
160+
// processNextWorkItem deals with one item off the queue. It returns false
155161
// when it's time to quit.
156-
func (c *Controller) processNextWorkItem() bool {
157-
// pull the next work item from queue. It should be a key we use to lookup
162+
func (c *Controller) processNextWorkItem(ctx context.Context) bool {
163+
// Pull the next work item from queue. It will be an object reference that we use to lookup
158164
// something in a cache
159-
key, quit := c.queue.Get()
160-
if quit {
165+
ref, shutdown := c.queue.Get()
166+
if shutdown {
161167
return false
162168
}
163169
// you always have to indicate to the queue that you've completed a piece of
164170
// work
165-
defer c.queue.Done(key)
171+
defer c.queue.Done(ref)
166172

167-
// do your work on the key. This method will contains your "do stuff" logic
168-
err := c.syncHandler(key.(string))
173+
// Process the object reference. This method will contains your "do stuff" logic
174+
err := c.syncHandler(ref)
169175
if err == nil {
170176
// if you had no error, tell the queue to stop tracking history for your
171-
// key. This will reset things like failure counts for per-item rate
177+
// item. This will reset things like failure counts for per-item rate
172178
// limiting
173-
c.queue.Forget(key)
179+
c.queue.Forget(ref)
174180
return true
175181
}
176182

177183
// there was a failure so be sure to report it. This method allows for
178184
// pluggable error handling which can be used for things like
179185
// cluster-monitoring
180-
utilruntime.HandleError(fmt.Errorf("%v failed with : %v", key, err))
186+
utilruntime.HandleErrorWithContext(ctx, err, "Error syncing; requeuing for later retry", "objectReference", ref))
181187

182188
// since we failed, we should requeue the item to work on later. This
183189
// method will add a backoff to avoid hotlooping on particular items
184190
// (they're probably still not going to work right away) and overall
185191
// controller protection (everything I've done is broken, this controller
186192
// needs to calm down or it can starve other useful work) cases.
187-
c.queue.AddRateLimited(key)
193+
c.queue.AddRateLimited(ref)
188194

189195
return true
190196
}

0 commit comments

Comments
 (0)