Skip to content

Fix API mismatch for resource-handler#87

Merged
rytswd merged 20 commits intomainfrom
fix-resource-handler-to-match-with-api-change
Jan 1, 2026
Merged

Fix API mismatch for resource-handler#87
rytswd merged 20 commits intomainfrom
fix-resource-handler-to-match-with-api-change

Conversation

@rytswd
Copy link
Member

@rytswd rytswd commented Dec 24, 2025

This ensures all the API changes are incorporated into resource-handler. Also, some commandline arguments have been corrected.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

1 similar comment
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@rytswd rytswd marked this pull request as ready for review December 29, 2025 11:09
@rytswd
Copy link
Member Author

rytswd commented Dec 29, 2025

I haven't done go.work + main.go testing in my local kind. I'll want to do that before merging this -- will work on that a bit later.

cells := shard.Spec.MultiOrch.Cells
if len(cells) == 0 {
return ctrl.Result{}, fmt.Errorf(
"MultiOrch has no cells specified - cannot deploy without cell information",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a deviation from the design document:

"The Operator will deploy one instance of this Deployment into EVERY Cell listed in 'cells'. If 'cells' is empty, it defaults to all cells where pools are defined."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see. Can we consider explicitly assigning the list of cells from top level resource (MultigresCluster) to the Shard resources instead? Because the templates resolution results in a very explicit spec for child resources, this "defaulting" logic for child resources feels a bit mismatched.

For now, I changed the logic to follow the documented spec with 105fac6

@fernando-villalba
Copy link
Collaborator

In pkg/resource-handler/controller/shard/shard_controller.go, the updateStatus function was not updated to handle the new multi-cell topology. It still attempts to find a single StatefulSet per pool using the old naming convention.

The Problem:
The code iterates over pools and constructs the name using buildPoolName(shard.Name, poolName):

// pkg/resource-handler/controller/shard/shard_controller.go on MAIN branch

// ...
for poolName := range shard.Spec.Pools {
    stsName := buildPoolName(shard.Name, poolName) // <--- Returns "shard-pool-primary"
    sts := &appsv1.StatefulSet{}
    err := r.Get(ctx, ..., Name: stsName, sts)
    // ...
}

However, your reconciliation logic (reconcilePool) now creates one StatefulSet per cell using buildPoolNameWithCell.

  • Created Resource: my-shard-pool-primary-zone1
  • Status Check Lookups: my-shard-pool-primary

Why Tests Passed:
The Get returns NotFound, and your code catches that error and continues. This means the function completes successfully (with 0 pods found), satisfying test coverage but resulting in broken status reporting (PoolsReady will never be true).

Fix:
You need to nest a loop over the cells within the pool loop to aggregate status from all cell-specific StatefulSets for that pool.

for poolName, pool := range shard.Spec.Pools {
    for _, cell := range pool.Cells {
        stsName := buildPoolNameWithCell(shard.Name, poolName, string(cell))
        // ... Get and aggregate ...
    }
}

@fernando-villalba
Copy link
Collaborator

You don't need to address this comment now but I thought I would mention it so we can be prepare soon.

To follow up with the resolver module I created in preparation for the webhook (to avoid wherekubectl get doesn't show the actual values being used, among other things). The pkg/resolver should now serve as a Single Source of Truth for defaults, usable by both the Mutating Webhook (when enabled) and the Reconciler (when disabled).

Here are the specific areas we will need to refactor to align with that design:

1. Centralizing defaults (Single Source of Truth)
Currently, this PR defines local constants like DefaultMultiGatewayImage in multigateway.go. My branch also defines DefaultMultiGatewayImage in pkg/resolver/defaults.go.

  • The Risk: If these drift, the Webhook might default the Parent CR to version v1.0, but if a user omits a field on the Child CR, this controller might default it to v1.1.
  • The Fix: Once merged, we should delete the constants in resource-handler and import them directly from pkg/resolver to guarantee consistency.

2. The "Read-Only Child" & Explicit Specs
The design states that Child CRs (Cell, Shard) are "read-only" and strictly owned by the MultigresCluster.

  • Current Logic: Your code creates defensive defaults (e.g., if spec.Image == "" { image = Default }).
  • Target Logic: The MultigresCluster controller will use resolver to calculate all these values before creating the Child CR. Therefore, the Child CR's .Spec should arrive fully populated.
  • Future Change: We should move away from "Just-in-Time" defaulting inside these child controllers. Instead, they should ideally trust that the Spec is populated (or error out if it's missing, alerting us that the Parent Controller failed to resolve defaults). This ensures that kubectl get cell always matches exactly what is running.

3. Webhook Enablement/Disablement
The resolver package is designed to be called explicitly by the Parent Controller if the Webhook is disabled.

// Inside MultigresCluster Reconcile loop
resolvedCluster := resolver.Resolve(cluster)
// Now create Child CRs using resolvedCluster.Spec...

This works regardless of whether a webhook ran previously. If the webhook ran, cluster comes in fully populated. If not, Resolve populates it in memory.

This means the Cell CR will always receive explicit values from the Parent, regardless of whether a webhook ran. This makes the defensive defaulting in this PR redundant in the long term.

@github-actions

This comment has been minimized.

@github-actions
Copy link

🔬 Go Test Coverage Report

Summary

Coverage Type Result
Threshold 0%
Previous Test Coverage Unknown%
New Test Coverage 100.0%

Status

✅ PASS

Detail

Show New Coverage
github.com/numtide/multigres-operator/pkg/resource-handler/controller/cell/cell_controller.go:32:		Reconcile			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/cell/cell_controller.go:82:		handleDeletion			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/cell/cell_controller.go:106:		reconcileMultiGatewayDeployment	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/cell/cell_controller.go:140:		reconcileMultiGatewayService	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/cell/cell_controller.go:175:		updateStatus			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/cell/cell_controller.go:202:		buildConditions			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/cell/cell_controller.go:237:		SetupWithManager		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/cell/multigateway.go:38:			BuildMultiGatewayDeployment	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/cell/multigateway.go:119:			BuildMultiGatewayService	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/metadata/labels.go:95:			BuildStandardLabels		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/metadata/labels.go:108:			AddCellLabel			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/metadata/labels.go:114:			AddClusterLabel			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/metadata/labels.go:120:			AddShardLabel			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/metadata/labels.go:126:			AddDatabaseLabel		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/metadata/labels.go:132:			AddTableGroupLabel		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/metadata/labels.go:141:			MergeLabels			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/containers.go:37:			buildPostgresContainer		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/containers.go:68:			buildMultiPoolerSidecar		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/containers.go:111:			buildPgctldInitContainer	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/containers.go:133:			buildMultiOrchContainer		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/containers.go:163:			buildPgctldVolume		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/containers.go:174:			getPoolServiceID		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/labels.go:11:			buildPoolLabelsWithCell		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/multiorch.go:24:			BuildMultiOrchDeployment	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/multiorch.go:72:			BuildMultiOrchService		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/multiorch.go:102:			buildMultiOrchNameWithCell	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/multiorch.go:107:			buildMultiOrchLabelsWithCell	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/pool_service.go:21:			BuildPoolHeadlessService	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/pool_service.go:56:			buildPoolNameWithCell		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/pool_statefulset.go:32:		BuildPoolStatefulSet		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/pool_statefulset.go:99:		buildPoolVolumeClaimTemplates	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/ports.go:27:			buildMultiPoolerContainerPorts	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/ports.go:49:			buildPoolHeadlessServicePorts	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/ports.go:74:			buildMultiOrchContainerPorts	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/ports.go:91:			buildMultiOrchServicePorts	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:32:		Reconcile			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:103:		handleDeletion			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:127:		reconcileMultiOrchDeployment	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:165:		reconcileMultiOrchService	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:205:		reconcilePool			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:243:		reconcilePoolStatefulSet	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:283:		reconcilePoolHeadlessService	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:324:		updateStatus			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:359:		updatePoolsStatus		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:396:		updateMultiOrchStatus		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:439:		cellSetToSlice			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:448:		buildConditions			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:478:		getMultiOrchCells		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/shard/shard_controller.go:512:		SetupWithManager		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/storage/pvc.go:18:			BuildPVCTemplate		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/container_env.go:13:		buildContainerEnv		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/container_env.go:41:		buildPodIdentityEnv		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/container_env.go:66:		buildEtcdConfigEnv		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/container_env.go:116:		buildEtcdClusterPeerList	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/ports.go:20:			buildContainerPorts		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/ports.go:48:			buildHeadlessServicePorts	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/ports.go:78:			buildClientServicePorts		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/service.go:17:			BuildHeadlessService		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/service.go:48:			BuildClientService		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/statefulset.go:39:		BuildStatefulSet		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/statefulset.go:118:		buildVolumeClaimTemplates	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/toposerver_controller.go:32:	Reconcile			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/toposerver_controller.go:91:	reconcileStatefulSet		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/toposerver_controller.go:128:	reconcileHeadlessService	100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/toposerver_controller.go:166:	reconcileClientService		100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/toposerver_controller.go:204:	updateStatus			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/toposerver_controller.go:234:	buildConditions			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/toposerver_controller.go:262:	handleDeletion			100.0%
github.com/numtide/multigres-operator/pkg/resource-handler/controller/toposerver/toposerver_controller.go:286:	SetupWithManager		100.0%
total:														(statements)			100.0%

@rytswd
Copy link
Member Author

rytswd commented Dec 31, 2025

Re:

  1. Centralizing defaults (Single Source of Truth)

I understand where you are coming from, but with our current multi-Go module setup, it's possible one controller could be using one version, whereas other controllers using something else. This was a deliberate design, allowing full control of how operator can handle any cases, including multiple version support (e.g. two minor releases supported at all time, and one patch needs to adjust only part of the logic). I do think such cases would be rather rare, but if we want to make it fully controlled in a single place, we must merge multiple modules into one (namely resolver, resource-handler, and data-handler). Please note that this would bring in the upstream Multigres dependency into the entire code.

  1. The "Read-Only Child" & Explicit Specs & 3. Webhook Enablement/Disablement

I like this approach more, and just not use any explicit version information in the resource-handler code. We should return an error if the image version is not found. Should we try to merge this with the current logic, and create an issue to fix up later? Or do you want me to update the logic in this PR already?

@fernando-villalba
Copy link
Collaborator

Re:

  1. Centralizing defaults (Single Source of Truth)

I understand where you are coming from, but with our current multi-Go module setup, it's possible one controller could be using one version, whereas other controllers using something else. This was a deliberate design, allowing full control of how operator can handle any cases, including multiple version support (e.g. two minor releases supported at all time, and one patch needs to adjust only part of the logic). I do think such cases would be rather rare, but if we want to make it fully controlled in a single place, we must merge multiple modules into one (namely resolver, resource-handler, and data-handler). Please note that this would bring in the upstream Multigres dependency into the entire code.

  1. The "Read-Only Child" & Explicit Specs & 3. Webhook Enablement/Disablement

I like this approach more, and just not use any explicit version information in the resource-handler code. We should return an error if the image version is not found. Should we try to merge this with the current logic, and create an issue to fix up later? Or do you want me to update the logic in this PR already?

Yeah let's proceed with the PR as it is and correct later. At this point we should aim for functionality for next week, not perfection, we can circle back afterwards.

@rytswd rytswd merged commit 94576a4 into main Jan 1, 2026
3 checks passed
@rytswd rytswd deleted the fix-resource-handler-to-match-with-api-change branch January 1, 2026 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants