Skip to content

Commit 0981dc9

Browse files
authored
Merge pull request #8 from numtide/cell-crd-planning
Add planning doc for Cell CRD
2 parents 3fec929 + 2648080 commit 0981dc9

File tree

1 file changed

+228
-0
lines changed

1 file changed

+228
-0
lines changed
Lines changed: 228 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,228 @@
1+
---
2+
title: Create Cell management Go module with upstream Multigres dependency
3+
state: draft
4+
tags:
5+
- toposerver
6+
- crd
7+
---
8+
9+
# Summary
10+
11+
Implement Cell management functionality through a `Cell` CRD and reconciler in the `pkg/data-handler` module. A Cell represents a logical grouping of Multigres components and must be registered in the Topo Server (etcd) for the cluster to function. The Cell reconciler automates Cell registration, updates, and cleanup by interacting with the Topo Server using Multigres APIs.
12+
13+
This allows users to manage Cell lifecycle declaratively through Kubernetes, rather than manually configuring entries in etcd.
14+
15+
# Motivation
16+
17+
Multigres clusters require Cell entries in the Topo Server to coordinate components. Without proper Cell registration:
18+
- Components cannot discover each other
19+
- Routing and coordination fail
20+
- The cluster cannot function
21+
22+
Manual Cell management is error-prone and doesn't fit the Kubernetes operator pattern. By implementing a Cell CRD and reconciler, we enable:
23+
24+
1. **Declarative Cell Management**: Users define Cells as Kubernetes resources
25+
2. **Automated Registration**: Operator handles Topo Server interactions
26+
3. **Lifecycle Management**: Cell updates and cleanup happen automatically
27+
4. **Kubernetes-Native**: Cell state visible through `kubectl`, consistent with other operator resources
28+
5. **Integration**: MultigresCluster can create Cells automatically as part of cluster setup
29+
30+
## Goals
31+
- Implement Cell CRD and reconciler for managing Cell entries in Topo Server
32+
- Enable automated Cell registration when Multigres components are deployed
33+
- Handle Cell lifecycle (creation, updates, deletion) through Kubernetes API
34+
- Integrate Cell management into the operator's multi-module architecture
35+
36+
## Non-Goals
37+
- Managing individual Multigres component resources (handled by resource-handler module)
38+
- Orchestrating component startup order (components handle their own dependencies)
39+
- Direct etcd manipulation outside of Cell definitions
40+
- Implementing general-purpose Topo Server management
41+
42+
# Proposal
43+
44+
Implement Cell management in the `pkg/data-handler` module with a `Cell` CRD and reconciler that manages Cell entries in the Topo Server (etcd).
45+
46+
## Cell CRD Structure
47+
48+
```go
49+
type CellSpec struct {
50+
// Name of the cell
51+
Name string `json:"name"`
52+
53+
// Etcd endpoints for Topo Server
54+
EtcdEndpoints []string `json:"etcdEndpoints"`
55+
56+
// Optional TLS configuration for etcd
57+
TLS *TLSConfig `json:"tls,omitempty"`
58+
59+
// References to Multigres components in this cell
60+
Components CellComponentsSpec `json:"components"`
61+
}
62+
63+
type CellComponentsSpec struct {
64+
// MultiGateway references
65+
Gateways []corev1.ObjectReference `json:"gateways,omitempty"`
66+
67+
// MultiPooler references
68+
Poolers []corev1.ObjectReference `json:"poolers,omitempty"`
69+
}
70+
71+
type CellStatus struct {
72+
// Registered indicates if Cell is registered in Topo Server
73+
Registered bool `json:"registered"`
74+
75+
// TopoServerReachable indicates if etcd is accessible
76+
TopoServerReachable bool `json:"topoServerReachable"`
77+
78+
// Conditions for Cell state
79+
Conditions []metav1.Condition `json:"conditions,omitempty"`
80+
}
81+
```
82+
83+
## Reconciliation Logic
84+
85+
The Cell reconciler performs these steps:
86+
87+
1. **Validate etcd connectivity**: Ensure Topo Server endpoints are reachable
88+
2. **Register Cell**: Create or update Cell entry in Topo Server using Multigres APIs
89+
3. **Update component references**: Register component locations in Cell definition
90+
4. **Handle finalizers**: Clean up Cell entry from Topo Server on deletion
91+
5. **Update status**: Reflect registration state and any errors
92+
93+
## Integration with Other Modules
94+
95+
- **MultigresCluster** (in cluster-handler) can create Cell resources as part of cluster setup
96+
- **Component CRDs** (in resource-handler) are referenced by Cell but managed independently
97+
- Cell reconciler uses Multigres libraries to interact with Topo Server
98+
99+
# Design Details
100+
101+
## Module Location
102+
103+
Cell reconciler lives in `pkg/data-handler/controller/cell/`. This module has its own `go.mod` and can include Multigres dependencies.
104+
105+
## Reconciler Implementation
106+
107+
```go
108+
func (r *CellReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
109+
// 1. Fetch Cell CR
110+
// 2. Add finalizer if not present
111+
// 3. Handle deletion (remove from Topo Server)
112+
// 4. Connect to etcd using provided endpoints
113+
// 5. Register/update Cell in Topo Server
114+
// 6. Validate component references exist
115+
// 7. Update Cell status
116+
// 8. Requeue if needed
117+
}
118+
```
119+
120+
## Finalizer Handling
121+
122+
- Add finalizer `cell.multigres.com/finalizer` on Cell creation
123+
- On deletion, remove Cell entry from Topo Server before removing finalizer
124+
- Ensures Cell is properly deregistered before CR is deleted
125+
126+
## Error Handling
127+
128+
- **Etcd unreachable**: Set `spec.topoServerReachable: false`, requeue with backoff
129+
- **Invalid component references**: Log warning, continue (components may not exist yet)
130+
- **Registration failure**: Update Condition with error details, requeue
131+
132+
## Test Plan
133+
134+
1. **Unit Tests**:
135+
- Test Cell spec validation
136+
- Test etcd client configuration
137+
- Mock Topo Server interactions
138+
139+
2. **Integration Tests**:
140+
- Use envtest with real etcd for Topo Server
141+
- Test Cell registration and updates
142+
- Test finalizer cleanup removes Cell from etcd
143+
- Test reconciliation with missing component references
144+
145+
3. **E2E Tests**:
146+
- Deploy full Multigres cluster with Cell
147+
- Verify Cell appears in Topo Server
148+
- Delete Cell, verify cleanup
149+
150+
## Version Skew Strategy
151+
152+
**Cell CRD vs Multigres Version**:
153+
- Cell reconciler must be compatible with Multigres Topo Server schema
154+
- Pin Multigres dependency to compatible version in `pkg/data-handler/go.mod`
155+
- Document required Multigres version range in CRD
156+
157+
**Topo Server Schema Changes**:
158+
- If Multigres changes Cell schema, update reconciler and CRD together
159+
- Use conversion webhooks if Cell CRD version changes
160+
161+
# Implementation History
162+
163+
- 2025-10-09: Initial draft created
164+
165+
# Drawbacks
166+
167+
**Multigres Dependency**: The data-handler module must depend on Multigres codebase, creating a version coupling. If Multigres Topo Server schema changes, the operator must be updated.
168+
169+
**Etcd Direct Access**: Cell reconciler requires direct etcd access, which may have security implications. Operators must ensure proper network policies and authentication.
170+
171+
**Additional CRD Complexity**: Adds another CRD for users to understand, though Cell is a fundamental Multigres concept.
172+
173+
# Alternatives
174+
175+
## Alternative 1: Manual Cell Registration
176+
177+
Users manually create Cell entries in etcd using Multigres tools.
178+
179+
**Pros**:
180+
- No operator code needed
181+
- Users have complete control
182+
183+
**Cons**:
184+
- Error-prone manual process
185+
- No Kubernetes-native lifecycle management
186+
- Doesn't integrate with MultigresCluster automation
187+
- No cleanup on deletion
188+
189+
**Rejected**: Doesn't fit the operator pattern and increases operational burden.
190+
191+
## Alternative 2: Cell Registration in Component Controllers
192+
193+
Each component reconciler (MultiGateway, MultiPooler) registers itself in the Topo Server.
194+
195+
**Pros**:
196+
- No separate Cell CRD needed
197+
- Decentralized registration
198+
199+
**Cons**:
200+
- Adds Multigres dependency to all component controllers
201+
- No centralized Cell definition
202+
- Coordination between components becomes complex
203+
- Harder to manage Cell lifecycle
204+
205+
**Rejected**: Violates separation of concerns and spreads Multigres dependencies across all modules.
206+
207+
## Alternative 3: Init Job for Cell Registration
208+
209+
Use a Kubernetes Job to register Cell on cluster creation.
210+
211+
**Pros**:
212+
- Simple one-time registration
213+
- No ongoing reconciliation needed
214+
215+
**Cons**:
216+
- No automatic updates if Cell definition changes
217+
- No cleanup on deletion
218+
- Can't handle Cell modifications
219+
- Not declarative - Cell state not in Kubernetes API
220+
221+
**Rejected**: Doesn't provide proper lifecycle management.
222+
223+
# Infrastructure Needed
224+
225+
- **Multigres Codebase**: `pkg/data-handler` module depends on `github.com/multigres/multigres`
226+
- **Etcd Access**: Cell reconciler needs network access to etcd (Topo Server)
227+
- **Kubebuilder**: For generating Cell CRD manifests
228+
- **Testing Etcd**: envtest setup must include etcd for integration tests

0 commit comments

Comments
 (0)