Skip to content

Conversation

@fernando-villalba
Copy link
Collaborator

No description provided.

Completely redesigns the API based on the latest design doc, moving from
individual component CRDs to a single "MultigresCluster" source of truth
supported by reusable templates and read-only child resources.

Key Changes:
- **MultigresCluster**: Re-implemented as the Root CR. strictly ordered structs
  and added "templateDefaults" and "overrides" logic.
- **Templates**: Added `CoreTemplate`, `CellTemplate`, and `ShardTemplate`
  types to support reusable configuration.
- **Shared Types**: Created `shared_types.go` to centralize common structs
  (`StatelessSpec`, `ComponentConfig`, `StorageSpec`) and prevent circular deps.
- **TopoServer**: Refactored `toposerver_types.go` to include `EtcdSpec` and
  `GlobalTopoServerRef`, acting as the authoritative source for topology config.
- **Shard**: Moved `MultiOrchSpec`, `PoolSpec`, and `ShardImages` into
  `shard_types.go` to localize domain-specific logic.
- **Children**: Updated `Cell`, `TableGroup`, and `Shard` to reflect their
  status as read-only, fully resolved child resources.

Refactoring & Cleanup:
- Deleted obsolete component types: `deploymenttemplate`, `etcd`, `multigateway`,
  and `multiorch`.
- Applied strict "definition-before-use" ordering and added ASCII headers
  to all API files for better readability.
- Standardized naming conventions (e.g., using `Spec` over `Config` for
  managed resources).
This commit hardens the v1alpha1 API by adding extensive CEL validations
to ensure data integrity and compliance with Kubernetes validation cost budgets.

Changes:
- **Budget Safety**: Added `+kubebuilder:validation:MaxLength`, `MaxItems`, and
  `MaxProperties` to all string, slice, and map fields. This prevents
  "worst-case" cost assumption errors in the API server.
- **String Safety**: Added `MinLength=1` to critical identifiers and image fields
  to prevent silent failures with empty strings.
- **Logic Fixes**:
  - Replaced invalid regex pattern on `Endpoints` slice with a CEL rule
    (`self.all(x, x.matches(...))`) to correctly validate lists of URLs.
  - Added CEL validation to `Databases` and `TableGroups` lists to ensure
    only one entry can be marked as `default: true`.
- **Constraint Enforcement**:
  - Added `Minimum` validation to replica counts.
  - Enforced mutual exclusivity for `GlobalTopoServer` (etcd/external/template)
    and `CellConfig` (zone/region).
- **Cleanup**: Removed experimental immutability rules to maintain flexibility
  during early development.
This commit hardens the v1alpha1 API to comply with Kubernetes CEL validation
cost budgets and prevents "worst-case" assumption errors in the API server.

Key Changes:
- **Budget Safety:** Introduced `CellName` (63 chars) and `EndpointUrl` (2048 chars)
  types to bound the validation cost of `[]string` lists, which otherwise
  trigger expensive 3MB-per-item worst-case assumptions.
- **Limits:** Added `+kubebuilder:validation:MaxItems`, `MaxLength`, and
  `MaxProperties` to all lists, strings, and maps.
- **Validation Logic:**
  - Replaced regex on `Endpoints` with a CEL rule (`self.all(x, x.matches(...))`).
  - Added CEL rule to ensure only one Database/TableGroup can be `default: true`.
  - Added strict `Minimum` and `MinLength` checks to prevent silent empty failures.
- **Refactor:** Moved domain-specific types (`CellName`, `EndpointUrl`) to their
  respective files (`cell_types.go`, `toposerver_types.go`) while keeping them
  exported for shared use.
Comment on lines 50 to 57
if cluster.ObjectMeta.DeletionTimestamp.IsZero() {
if !controllerutil.ContainsFinalizer(cluster, finalizerName) {
controllerutil.AddFinalizer(cluster, finalizerName)
if err := r.Update(ctx, cluster); err != nil {
return ctrl.Result{}, err
}
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the deletion timestamp is there, we have to complete the reconciliation. Otherwise it would try to update the "deleting" resource. Once the resource becomes "deleting", you cannot "undelete", and thus this must be corrected.

Also, that means you will not need the else keyword, because once it hits the if statement, it MUST return, and there is no need to else. This goes with the idiomatic Go code of error path being indented, so that the main path stays non-indented.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed on 190410c

I apologize it is a big change, I had to also change tests and I was struggling to get it to 100%, so far I am on 99.3%:

ok github.com/numtide/multigres-operator/pkg/cluster-handler/controller/multigrescluster (cached) coverage: 99.3% of statements
ok github.com/numtide/multigres-operator/pkg/cluster-handler/controller/tablegroup (cached) coverage: 100.0% of statements

…tion

This commit refactors the Reconcile loop to handle resource deletion
at the very beginning of the function.

- Moves the deletion timestamp check to the top.
- Returns early if the object is being deleted to prevent race conditions
  and ensure no further reconciliation logic runs.
- Removes the `else` block and unindents the main reconciliation path
  for better readability and to follow idiomatic Go style.
- Changes have been made to the tests to improve coverage and adapt to new change
@fernando-villalba
Copy link
Collaborator Author

Closing this PR in favour of PR 80

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants