[BUG] hub-agent making noop /status updates on every reconciliation

### **Describe the bug**

I'm observing that every time hub-agent restarts, it makes a ton of updates to `clusterrresourceplacement/status` for no reason. Given the default worker count is configured to be `1`, this starves the new items on the work queue on large clusters with lots of work to do for 10-20 minutes.

Similarly the resync period on the hub-agent is configured to be 5 minutes by default (way too aggressive IMO) and it exacerbates the frequency of the problem:

https://github.com/Azure/fleet/blob/6b81bdbbd9cc4187ba5c43cfbef0101fac22c655/cmd/hubagent/options/options.go#L66-L67

### **Environment**
Please provide the following: 
- Hub cluster details: hub-agent v0.8.5
- Member cluster details:  member-agent v0.8.5

### **To Reproduce**
Steps to reproduce the behavior:

- Restart controller (trigger a rolling update)
- Tail logs from the leader
- Observe it prints log statements to re-reconcile everything

      1030 17:55:03.104691       1 framework/framework.go:1310] "No change in scheduling decisions and condition, and the observed CRP generation remains the same" clusterSchedulingPolicySnapshot="lep-...
      1030 17:55:03.522789       1 framework/framework.go:1310] "No change in scheduling decisions and condition, and the observed CRP generation remains the same" clusterSchedulingPolicySnapshot="depo...
      1030 17:55:03.930300       1 framework/framework.go:1310] "No change in scheduling decisions and condition, and the observed CRP generation remains the same" clusterSchedulingPolicySnapshot="tran...
      1030 17:55:04.400769       1 framework/framework.go:1310] "No change in scheduling decisions and condition, and the observed CRP generation remains the same" clusterSchedulingPolicySnapshot="job-...
      1030 17:55:04.812419       1 framework/framework.go:1310] "No change in scheduling decisions and condition, and the observed CRP generation remains the same" clusterSchedulingPolicySnapshot="deci...
      1030 17:55:05.282259       1 framework/framework.go:1310] "No change in scheduling decisions and condition, and the observed CRP generation remains the same" clusterSchedulingPolicySnapshot="auto...
      1030 17:55:05.690881       1 framework/framework.go:1310] "No change in scheduling decisions and condition, and the observed CRP generation remains the same" clusterSchedulingPolicySnapshot="samp...
      1030 17:55:06.145613       1 framework/framework.go:1310] "No change in scheduling decisions and condition, and the observed CRP generation remains the same" clusterSchedulingPolicySnapshot="anti...

- Look at API Server audit logs, observe that it's making updates to `clusterrresourceplacement/status` for every `clusterSchedulingPolicySnapshot` object despite there should be no changes to the object (timestamps are matching):

    ![Image](https://github.com/user-attachments/assets/ea3da68d-7c5f-413d-8800-5ab2dbd17dc5)

- I'm not seeing any updated timestamps etc in `clusterresourceplacement/status` to warrant this `/status` update:

      status:
        conditions:
        - lastTransitionTime: "2024-09-19T00:06:37Z" # this is not today
          message: found all the clusters needed as specified by the scheduling policy
          observedGeneration: 1
          reason: SchedulingPolicyFulfilled
          status: "True"
          type: Scheduled
        observedCRPGeneration: 1
        targetClusters:
        - clusterName: redacted
          reason: picked by scheduling policy
          selected: true


In this part of the code we can clearly see the status update call is made unconditionally:
https://github.com/Azure/fleet/blob/cb9a7a0b305ae8a2875fa9e3ea9fb9f8d134b27b/pkg/controllers/clusterresourceplacement/controller.go#L193-L201

### **Expected behavior**

The controller should do `apiequality.Semantic.DeepEqual(old.status,new.status)`, and skip updating the status on the API when there's no reason to make this API call.

This would ensure the full resyncs and controller startup can happen very fast, and reduce the load on the API Server.

### **Screenshots**

Attached above

### **Additional context**

N/A


	isClusterScheduled, err := r.setPlacementStatus(ctx, crp, selectedResourceIDs, latestSchedulingPolicySnapshot, latestResourceSnapshot)
	if err != nil {
	return ctrl.Result{}, err
	}

	if err := r.Client.Status().Update(ctx, crp); err != nil {
	klog.ErrorS(err, "Failed to update the status", "clusterResourcePlacement", crpKObj)
	return ctrl.Result{}, err
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] hub-agent making noop /status updates on every reconciliation #940

Describe the bug

Environment

To Reproduce

Expected behavior

Screenshots

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	// ResyncPeriod is the base frequency the informers are resynced. Defaults is 5 minutes.
	ResyncPeriod metav1.Duration

[BUG] hub-agent making noop /status updates on every reconciliation #940

Description

Describe the bug

Environment

To Reproduce

Expected behavior

Screenshots

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions