Skip to content

Conversation

@fernando-villalba
Copy link
Collaborator

I will be closing PR 78 in favour of this one as it is using all the latest version of the API and code in main, making for a cleaner merge.

Previously, the controller logic in `reconcileGlobalComponents` enforced a single `CoreTemplate` for the entire control plane. If a `GlobalTopoServer` template was defined, it would shadow and ignore the `MultiAdmin` template reference, forcing both components to share the same configuration source.

This commit refactors the reconciliation logic to resolve the `CoreTemplate` independently for both components, allowing them to reference different templates as implied by the API design.

Changes:
- Refactor `reconcileGlobalComponents` to resolve `topoTpl` and `adminTpl` separately.
- Add test case `Create: Independent Templates (Topo vs Admin)` to verify disjoint template usage.
- Add test case `Error: Resolve Admin Template Failed` to ensure full coverage of the new error path.
Adds documentation comments to exported types and methods in the controller
packages to comply with Go style guidelines requiring all exported identifiers
to be documented.

Updates:
- MultigresClusterReconciler: Added docs for struct and methods.
- TableGroupReconciler: Added docs for struct and methods.
- TemplateResolver: Added docs for struct, resolve methods, and merge logic helpers.
- Reordered Kubebuilder markers to appear after the documentation comments, ensuring correct summary generation in godoc.
Update `getGlobalTopoRef` to use the TemplateResolver, ensuring child resources receive the correct topology address when a `CoreTemplate` is used. Previously, this only supported inline specs.

Note: This commit includes updated test cases that cover both this fix and a separate name-length validation bug. The tests are expected to fail until the name-length validation logic is implemented in the next commit.
Enforce a strict 50-character limit on the combined `Cluster-Database-TableGroup` name within `reconcileDatabases`.

Previously, the controller only logged an informational message if the name exceeded 63 characters, allowing the creation of `TableGroup` resources that would inevitably cause downstream failures. The `TableGroup` controller appends suffixes (e.g., `-shard-0`) to create child resources; if the base name is too long, these child names exceed the Kubernetes 63-character limit, resulting in a crash loop.

This commit:
- Returns a hard error if the generated name exceeds 50 characters, blocking the invalid configuration immediately.
- Updates `multigrescluster_controller_test.go` to assert that an error is returned for long names, resolving the previously failing test case.
…tion specs

- Add targeted test cases to cover error propagation paths from `getGlobalTopoRef` during Cell and Database reconciliation, ensuring 100% coverage for topology resolution logic.
- Update the "Create: Long Names" test expectation to `expectError: true` to align with the newly enforced 50-character limit on TableGroup names.
Updates the `Create: Full Cluster Creation with Templates` and `Create: Independent Templates (Topo vs Admin)` test cases in `multigrescluster_controller_test.go` to assert that child `Cell` resources are created with the correct `GlobalTopoServer.Address`.

This regression test ensures that the `getGlobalTopoRef` logic correctly resolves the Global TopoServer address from templates, preventing a previous issue where it would default to an empty string if no inline spec was provided.
Implements logic in `reconcileDatabases` to automatically populate
`MultiOrch.Cells` if it is empty. It defaults to the set of all unique
cells defined in the Shard's pools, ensuring the orchestration layer
is deployed alongside data nodes as per the design specification.

Adds regression test "Create: MultiOrch Placement Defaulting".
Updates `ResolveCoreTemplate`, `ResolveCellTemplate`, and `ResolveShardTemplate`
to return an error if a specific template name (explicitly requested or
cluster default) is not found.

Maintains the "safe fallback" behavior only for the implicit "default"
template lookup, ensuring that user typos or missing critical config
fail-fast while allowing Level 4 defaults to apply for standard setups.
…k preparation

This change extracts hardcoded default values (replica counts, template names) from the MultigresCluster controller logic into a dedicated `constants.go` file.

Key changes:
- Created `constants.go` to hold Level 4 defaults (e.g., DefaultEtcdReplicas=3).
- Updated `template_logic.go` and `multigrescluster_controller.go` to use these constants.
- Updated controller tests to assert against the constant rather than a magic number, ensuring tests remain robust if defaults change.
- Added code comments explicitly noting that these files should eventually move to a shared `pkg/multigres/defaults` package to support the future Mutating Webhook without circular dependencies.

This ensures that both the reconciliation loop and the future admission webhook will rely on a Single Source of Truth for defaulting logic.
Copy link
Member

@rytswd rytswd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked the MC spec for now -- mostly looking good for the main code logic!
The test cases can have some clean-up to make it easier to read, and make sure test helper functions do not pollute the stack trace.

As long as it's got 100% test coverage, I want to see test cases one by one to see we have edge "business" case testing in place. Once that's confirmed, I'm not going to bother too much about the coding style. But at the moment, it's difficult to look at the list of test cases (for various reasons, such as using slice instead of map, single test covers both success and fail, unclear setup steps, etc.)

Comment on lines +36 to +41
// +kubebuilder:rbac:groups=multigres.com,resources=multigresclusters,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=multigres.com,resources=multigresclusters/status,verbs=get;update;patch
// +kubebuilder:rbac:groups=multigres.com,resources=multigresclusters/finalizers,verbs=update
// +kubebuilder:rbac:groups=multigres.com,resources=coretemplates;celltemplates;shardtemplates,verbs=get;list;watch
// +kubebuilder:rbac:groups=multigres.com,resources=cells;tablegroups;toposervers,verbs=get;list;watch;create;update;patch;delete
// +kubebuilder:rbac:groups=apps,resources=deployments,verbs=get;list;watch;create;update;patch;delete
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, these markers won't be read by default. This should be corrected in Makefile. (For Resource Handler use cases, I moved the RBAC spec out from the controller temporarily.)

Refactors `reconcileGlobalComponents` in the MultigresCluster controller by splitting it into two distinct methods: `reconcileGlobalTopoServer` and `reconcileMultiAdmin`.

This improves code clarity and readability by isolating the reconciliation logic for each global component, addressing feedback to avoid context switching within a single function.
Moves the deletion lifecycle logic (finalizer checking, child resource cleanup validation, and finalizer removal) out of the main `Reconcile` loop into a dedicated `handleDelete` method.

This improves the readability of the main reconciliation flow by clearly separating the deletion path from the creation/update path, reducing cognitive load in the primary `Reconcile` function.
Refactors the table-driven tests in `TestMultigresClusterReconciler_Reconcile` to use a map keyed by the test name instead of a slice of structs.

This enforces unique test names and improves the clarity and maintainability of the test suite by explicitly associating the test logic with its description.
Adds `t.Parallel()` to `TestMultigresClusterReconciler_Reconcile` and its subtests, as well as other unit tests in the MultigresCluster controller test suite.

This allows independent test cases to run concurrently, reducing the overall execution time of the test suite.
Replaces usage of `context.Background()` with `t.Context()` in the `MultigresCluster` controller tests.

This change improves test hygiene by ensuring that the context passed to the reconciler and client calls is automatically canceled when the test finishes. unlike `context.Background()`, which is never canceled, `t.Context()` ensures that any goroutines or operations waiting on the context are terminated immediately upon test completion, preventing resource leaks and ensuring better isolation between parallel tests.
Updates `TestMultigresClusterReconciler_Reconcile` to check all errors returned during test setup and validation phases.

Previously, errors from client operations (e.g., `c.Create`, `c.Get`) or setup functions were sometimes ignored. This change ensures that any failure in test prerequisites or validation immediately fails the test using `t.Fatal` or `t.Fatalf`. This prevents silent failures where a test might technically "pass" despite the setup failing, and provides immediate, clear feedback when the test environment is not in the expected state.
Updates the `setupFunc` and `validate` function signatures in the `MultigresCluster` controller tests to accept `testing.TB` instead of `*testing.T`.

This change decouples the test helper logic from the specific test runner implementation. `testing.TB` is an interface satisfied by both `*testing.T` (unit tests) and `*testing.B` (benchmarks). By using this interface, these setup and validation helpers can now be reused seamlessly in benchmark functions (e.g., measuring the performance of the reconciliation loop) without requiring code duplication or modification.
… suites

Refactors the monolithic `TestMultigresClusterReconciler_Reconcile` function into two separate test functions: `TestMultigresClusterReconciler_Reconcile_Success` and `TestMultigresClusterReconciler_Reconcile_Failure`.

This separation clarifies the intent of each test case and simplifies the test table structure by removing the need for an `expectError` field. It also introduces a `setupFixtures` helper to share common initialization logic while ensuring test isolation.
Renames the `cluster` field in the `MultigresCluster` controller test cases to `multigrescluster`.

The variable name `cluster` is ambiguous in this context, as it could refer to the underlying Kubernetes cluster or the `MultigresCluster` custom resource. Using the explicit name `multigrescluster` improves clarity and avoids potential confusion regarding which entity is being referenced in the test fixturs.
Refactors the `MultigresCluster` controller tests to remove anonymous immediate functions (IIFEs) used for initializing struct fields.

Instead of defining complex logic inline within the `multigrescluster` field, this change introduces a `preReconcileUpdate` function hook in the test table. This allows the test cases to define a base object cleanly and apply specific mutations (like clearing finalizers or modifying specs) in a dedicated, readable function step before the reconciliation loop runs.
Removes the `setupFunc` field from the `MultigresCluster` controller test cases.

The `setupFunc` was previously used to imperatively set up object states (like Status) before the reconciliation loop. This complexity is unnecessary because the controller-runtime `fake` client allows initializing objects with populated Status fields directly in the `existingObjects` slice.

Additionally, this commit cleans up the `_Success` test suite struct by removing the unused `failureConfig` and `expectError` fields, which were leftovers from the previous monolithic test design.
Simplifies the `finalClient` initialization logic in `TestMultigresClusterReconciler_Reconcile_Failure` by removing an unnecessary `else` block.

The client is now initialized with the default `baseClient` upfront, and conditionally wrapped with `NewFakeClientWithFailures` only if a failure configuration is present. This reduces indentation and makes the default behavior more explicit.
Refactors the `TableGroup` controller tests to align with the patterns established in the `MultigresCluster` controller test suite.

Changes include:
- Splitting `TestTableGroupReconciler_Reconcile` into `_Success` and `_Failure` suites for better organization.
- Converting test cases to a map-based structure to enforce unique names and order independence.
- Adding `t.Parallel()` to all tests to enable concurrent execution.
- Replacing `context.Background()` with `t.Context()` for proper test lifecycle management.
- Updating helpers to use the `testing.TB` interface for broader reusability.
- Introducing a `preReconcileUpdate` hook and `skipCreate` flag to simplify test fixtures and remove imperative logic from the test table.
- Enforcing strict error handling in all setup and validation steps to prevent silent failures.
Copy link
Member

@rytswd rytswd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leaving the comments for success cases first

…ets. Currently the budgets may be on the conservative side. We may need to adjust accordingly as we go.
- api/v1alpha1/common_types.go: Re-enable `XValidation` for `PodAnnotations` and `PodLabels` but optimize the rule cost by reducing `MaxProperties` to 20. Removed invalid `MaxLength` markers on map fields to fix `controller-gen` build errors. This ensures validation remains active while staying within the Kubernetes API server's estimated cost budget.
- pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller_test.go: Refactor the test runner to default `tc.multigrescluster` to `baseCluster.DeepCopy()`. This removes repetitive boilerplate from individual test cases. Moved inline cluster modifications to `preReconcileUpdate` for better consistency across tests.
- pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller_test.go: Removed the `setupFunc` hook from the test table. The `fake` client allows defining objects with populated `Status` fields directly in `existingObjects`, making the manual creation and status update steps in "Status: Aggregation Logic" redundant. This simplifies the test runner logic and declarative test definitions.
…n cluster tests

- pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller_test.go:
    - Remove `setupFunc` from the test table structure as it was unnecessary with the `fake` client.
    - Remove redundant IIFE patterns from `existingObjects` in failure tests, relying on the test runner to create the initial cluster instead.
    - Update the test runner to intelligently handle `DeletionTimestamp`. Since `client.Create` strips metadata like timestamps, the runner now checks for a deletion timestamp on the input object and conditionally triggers a `client.Delete` to correctly simulate resources in a terminating state.
    - Ensure `baseCluster` defaults are applied consistently when `tc.multigrescluster` is nil.
…objects

- pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller_test.go:
    - Update test runners to default `existingObjects` to the standard set of templates (`coreTpl`, `cellTpl`, `shardTpl`) if `nil`.
    - Remove explicit `existingObjects` definitions from test cases where the default set is sufficient (e.g., `Create: MultiOrch Placement Defaulting`, `Create: MultiAdmin TemplateRef Only`).
    - This declutters the test table and focuses each test case on the specific overrides or logic being tested.
- pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller_test.go: Replace `cmp.Diff` with standard `!=` operators for verifying scalar fields (strings, integers). This aligns with Go style guidelines to prefer simple comparison operators for simple values, reserving `cmp.Diff` for complex structures where a unified diff is necessary for debugging.
…uce boilerplate

- pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller_test.go:
    - Update test runners to default `existingObjects` to the standard set of templates (`coreTpl`, `cellTpl`, `shardTpl`) if the field is `nil`.
    - Remove explicit `existingObjects` definitions from test cases where the default set is sufficient (e.g., `Create: MultiOrch Placement Defaulting`), significantly reducing test table clutter.
- pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller_test.go: Removed the `Create: No Global Topo Config` test case. The logic for handling empty configurations and default wiring is already covered by unit tests in `template_logic.go` and other existing controller tests (e.g., `Create: Defaults and Optional Components`), reducing test suite noise.
- pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller_test.go: Removed the `Create: No Global Topo Config` test case. The logic for handling empty configurations and default wiring is already covered by unit tests in `template_logic.go` and other existing controller tests (e.g., `Create: Defaults and Optional Components`), reducing test suite noise.
… in tests

- pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller_test.go:
    - Introduce `skipClusterCreation` boolean field to the test table struct to explicitly control whether the initial `MultigresCluster` object should be created.
    - Update the test runner to inject the cluster object into `existingObjects` before building the fake client, rather than manually calling `client.Create` inside the loop. This simplifies the runner and ensures correct state initialization (including `DeletionTimestamp` handling) via the client builder.
    - Remove brittle logic that relied on matching the test case name string ("Object Not Found") to skip creation.
    - Update "Object Not Found (Clean Exit)" test case to use the new flag.
- pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller_test.go: Replace `var finalClient client.Client; finalClient = baseClient` with `finalClient := baseClient` in the success test runner. This adheres to Go style guidelines by avoiding unnecessary variable declarations when a short assignment is possible.
- pkg/cluster-handler/controller/tablegroup/tablegroup_controller_test.go:
    - Update `setupFixtures` to accept `testing.TB` and mark it as a test helper for better failure reporting.
    - Switch to injecting `TableGroup` objects via the fake client builder's `existingObjects` rather than manually creating them inside the test loop. This simplifies the test setup and ensures consistent client state initialization.
    - Replace `cmp.Diff` with standard equality checks (`!=`) for simple scalar field assertions (strings, ints, bools) to align with Go idioms.
    - Adopt short variable declarations (`finalClient :=`) and remove unnecessary blank lines before error checks to adhere to codebase style guidelines.
…roller setup

Refactors the integration tests for `multigrescluster` and adds new tests for `tablegroup` to match the style and patterns used in the `resource-handler`.

Changes:
- **Controller Setup:** Updated `MultigresClusterReconciler` and `TableGroupReconciler` to accept variadic `controller.Options` in `SetupWithManager`. This allows injecting `SkipNameValidation: true` for parallel testing, consistent with the Shard controller.
- **MultigresCluster Tests:** Rewrote `integration_test.go` to remove `gomega`, leverage the internal `testutil` package, and use table-driven tests. Fixed test fixtures to include required `RootPath` and explicit cell assignments.
- **TableGroup Tests:** Added a new `integration_test.go` for TableGroups using the `testutil` pattern.
- **Unit Tests:** Updated `_controller_test.go` files to maintain 100% coverage by exercising the new variadic options logic.
Corrects syntax errors in `.github/workflows/_reusable-test-coverage.yaml` that were causing "Invalid workflow file" errors.

Changes:
- Fixed misalignment of the `run` block in the "Fetch base branch" step.
- Corrected the indentation of the "Check coverage status" step to align with the rest of the job steps.
Fixes CI failures caused by out-of-sync modules and syntax errors in the reusable coverage workflow.

Changes:
- **Build:** Ran `make modules-tidy` to update `go.sum` with missing transitive dependencies (protobuf, prometheus, etc.) that were causing build failures.
- **Workflow Syntax:** Fixed YAML indentation in `.github/workflows/_reusable-test-coverage.yaml` for the "Fetch base branch" and "Check coverage status" steps.
- **Coverage Script:** Updated the JavaScript reporting logic to robustly handle empty or undefined coverage environment variables, preventing `SyntaxError` crashes when the build fails.
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link

🔬 Go Test Coverage Report

Summary

Coverage Type Result
Threshold 0%
Previous Test Coverage Unknown%
New Test Coverage 100.0%

Status

✅ PASS

Detail

Show New Coverage
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller.go:43:	Reconcile			100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller.go:99:	handleDelete			100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller.go:115:	checkChildrenDeleted		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller.go:146:	reconcileGlobalComponents	100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller.go:160:	reconcileGlobalTopoServer	100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller.go:204:	reconcileMultiAdmin		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller.go:264:	reconcileCells			100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller.go:341:	reconcileDatabases		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller.go:451:	getGlobalTopoRef		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller.go:485:	updateStatus			100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/multigrescluster_controller.go:565:	SetupWithManager		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/template_logic.go:39:			ResolveCoreTemplate		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/template_logic.go:69:			ResolveCellTemplate		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/template_logic.go:99:			ResolveShardTemplate		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/template_logic.go:129:			MergeCellConfig			100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/template_logic.go:160:			MergeShardConfig		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/template_logic.go:199:			mergeStatelessSpec		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/template_logic.go:227:			mergeMultiOrchSpec		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/template_logic.go:237:			mergePoolSpec			100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/template_logic.go:267:			ResolveGlobalTopo		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster/template_logic.go:287:			ResolveMultiAdmin		100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/tablegroup/tablegroup_controller.go:32:			Reconcile			100.0%
github.com/numtide/multigres-operator/pkg/cluster-handler/controller/tablegroup/tablegroup_controller.go:152:			SetupWithManager		100.0%
total:																(statements)			100.0%

_ = corev1.AddToScheme(scheme)

coreTpl, cellTpl, shardTpl, baseCluster, clusterName, namespace, finalizerName := setupFixtures()
errBoom := errors.New("boom")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this? If we are injecting some special error for testing, we should describe it more clearly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a sentinel error used to inject failures into the mock client and verify the controller handles them correctly. I used 'boom' as a placeholder, but I agree it's unclear. Since we have made loads of changes to this PR already and we need to make progress I won't be changing this now. But I'll make a note for next iteration.

Copy link
Member

@rytswd rytswd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't gone through all the code, but given that the tests cases cover various scenarios and with 100% test coverage, this is good to go. LGTM, thanks for the great work! 🥰

@fernando-villalba fernando-villalba merged commit 6e84bab into main Dec 23, 2025
3 checks passed
@fernando-villalba fernando-villalba deleted the multigrescluster-controller branch December 23, 2025 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants