Skip to content

Commit 04eca4b

Browse files
committed
Add planning doc for Cell CRD
1 parent 6f40394 commit 04eca4b

File tree

1 file changed

+226
-0
lines changed

1 file changed

+226
-0
lines changed
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
---
2+
title: Create Cell management Go module as Multigres aware operator internal
3+
state: draft
4+
tags: []
5+
---
6+
7+
# Summary
8+
9+
Implement Cell management functionality through a `Cell` CRD and reconciler in the `pkg/data-handler` module. A Cell represents a logical grouping of Multigres components and must be registered in the Topo Server (etcd) for the cluster to function. The Cell reconciler automates Cell registration, updates, and cleanup by interacting with the Topo Server using Multigres APIs.
10+
11+
This allows users to manage Cell lifecycle declaratively through Kubernetes, rather than manually configuring entries in etcd.
12+
13+
# Motivation
14+
15+
Multigres clusters require Cell entries in the Topo Server to coordinate components. Without proper Cell registration:
16+
- Components cannot discover each other
17+
- Routing and coordination fail
18+
- The cluster cannot function
19+
20+
Manual Cell management is error-prone and doesn't fit the Kubernetes operator pattern. By implementing a Cell CRD and reconciler, we enable:
21+
22+
1. **Declarative Cell Management**: Users define Cells as Kubernetes resources
23+
2. **Automated Registration**: Operator handles Topo Server interactions
24+
3. **Lifecycle Management**: Cell updates and cleanup happen automatically
25+
4. **Kubernetes-Native**: Cell state visible through `kubectl`, consistent with other operator resources
26+
5. **Integration**: MultigresCluster can create Cells automatically as part of cluster setup
27+
28+
## Goals
29+
- Implement Cell CRD and reconciler for managing Cell entries in Topo Server
30+
- Enable automated Cell registration when Multigres components are deployed
31+
- Handle Cell lifecycle (creation, updates, deletion) through Kubernetes API
32+
- Integrate Cell management into the operator's multi-module architecture
33+
34+
## Non-Goals
35+
- Managing individual Multigres component resources (handled by resource-handler module)
36+
- Orchestrating component startup order (components handle their own dependencies)
37+
- Direct etcd manipulation outside of Cell definitions
38+
- Implementing general-purpose Topo Server management
39+
40+
# Proposal
41+
42+
Implement Cell management in the `pkg/data-handler` module with a `Cell` CRD and reconciler that manages Cell entries in the Topo Server (etcd).
43+
44+
## Cell CRD Structure
45+
46+
```go
47+
type CellSpec struct {
48+
// Name of the cell
49+
Name string `json:"name"`
50+
51+
// Etcd endpoints for Topo Server
52+
EtcdEndpoints []string `json:"etcdEndpoints"`
53+
54+
// Optional TLS configuration for etcd
55+
TLS *TLSConfig `json:"tls,omitempty"`
56+
57+
// References to Multigres components in this cell
58+
Components CellComponentsSpec `json:"components"`
59+
}
60+
61+
type CellComponentsSpec struct {
62+
// MultiGateway references
63+
Gateways []corev1.ObjectReference `json:"gateways,omitempty"`
64+
65+
// MultiPooler references
66+
Poolers []corev1.ObjectReference `json:"poolers,omitempty"`
67+
}
68+
69+
type CellStatus struct {
70+
// Registered indicates if Cell is registered in Topo Server
71+
Registered bool `json:"registered"`
72+
73+
// TopoServerReachable indicates if etcd is accessible
74+
TopoServerReachable bool `json:"topoServerReachable"`
75+
76+
// Conditions for Cell state
77+
Conditions []metav1.Condition `json:"conditions,omitempty"`
78+
}
79+
```
80+
81+
## Reconciliation Logic
82+
83+
The Cell reconciler performs these steps:
84+
85+
1. **Validate etcd connectivity**: Ensure Topo Server endpoints are reachable
86+
2. **Register Cell**: Create or update Cell entry in Topo Server using Multigres APIs
87+
3. **Update component references**: Register component locations in Cell definition
88+
4. **Handle finalizers**: Clean up Cell entry from Topo Server on deletion
89+
5. **Update status**: Reflect registration state and any errors
90+
91+
## Integration with Other Modules
92+
93+
- **MultigresCluster** (in cluster-handler) can create Cell resources as part of cluster setup
94+
- **Component CRDs** (in resource-handler) are referenced by Cell but managed independently
95+
- Cell reconciler uses Multigres libraries to interact with Topo Server
96+
97+
# Design Details
98+
99+
## Module Location
100+
101+
Cell reconciler lives in `pkg/data-handler/controller/cell/`. This module has its own `go.mod` and can include Multigres dependencies.
102+
103+
## Reconciler Implementation
104+
105+
```go
106+
func (r *CellReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
107+
// 1. Fetch Cell CR
108+
// 2. Add finalizer if not present
109+
// 3. Handle deletion (remove from Topo Server)
110+
// 4. Connect to etcd using provided endpoints
111+
// 5. Register/update Cell in Topo Server
112+
// 6. Validate component references exist
113+
// 7. Update Cell status
114+
// 8. Requeue if needed
115+
}
116+
```
117+
118+
## Finalizer Handling
119+
120+
- Add finalizer `cell.multigres.io/finalizer` on Cell creation
121+
- On deletion, remove Cell entry from Topo Server before removing finalizer
122+
- Ensures Cell is properly deregistered before CR is deleted
123+
124+
## Error Handling
125+
126+
- **Etcd unreachable**: Set `TopoServerReachable: false`, requeue with backoff
127+
- **Invalid component references**: Log warning, continue (components may not exist yet)
128+
- **Registration failure**: Update Condition with error details, requeue
129+
130+
## Test Plan
131+
132+
1. **Unit Tests**:
133+
- Test Cell spec validation
134+
- Test etcd client configuration
135+
- Mock Topo Server interactions
136+
137+
2. **Integration Tests**:
138+
- Use envtest with real etcd for Topo Server
139+
- Test Cell registration and updates
140+
- Test finalizer cleanup removes Cell from etcd
141+
- Test reconciliation with missing component references
142+
143+
3. **E2E Tests**:
144+
- Deploy full Multigres cluster with Cell
145+
- Verify Cell appears in Topo Server
146+
- Delete Cell, verify cleanup
147+
148+
## Version Skew Strategy
149+
150+
**Cell CRD vs Multigres Version**:
151+
- Cell reconciler must be compatible with Multigres Topo Server schema
152+
- Pin Multigres dependency to compatible version in `pkg/data-handler/go.mod`
153+
- Document required Multigres version range in CRD
154+
155+
**Topo Server Schema Changes**:
156+
- If Multigres changes Cell schema, update reconciler and CRD together
157+
- Use conversion webhooks if Cell CRD version changes
158+
159+
# Implementation History
160+
161+
- 2025-10-09: Initial draft created
162+
163+
# Drawbacks
164+
165+
**Multigres Dependency**: The data-handler module must depend on Multigres codebase, creating a version coupling. If Multigres Topo Server schema changes, the operator must be updated.
166+
167+
**Etcd Direct Access**: Cell reconciler requires direct etcd access, which may have security implications. Operators must ensure proper network policies and authentication.
168+
169+
**Additional CRD Complexity**: Adds another CRD for users to understand, though Cell is a fundamental Multigres concept.
170+
171+
# Alternatives
172+
173+
## Alternative 1: Manual Cell Registration
174+
175+
Users manually create Cell entries in etcd using Multigres tools.
176+
177+
**Pros**:
178+
- No operator code needed
179+
- Users have complete control
180+
181+
**Cons**:
182+
- Error-prone manual process
183+
- No Kubernetes-native lifecycle management
184+
- Doesn't integrate with MultigresCluster automation
185+
- No cleanup on deletion
186+
187+
**Rejected**: Doesn't fit the operator pattern and increases operational burden.
188+
189+
## Alternative 2: Cell Registration in Component Controllers
190+
191+
Each component reconciler (MultiGateway, MultiPooler) registers itself in the Topo Server.
192+
193+
**Pros**:
194+
- No separate Cell CRD needed
195+
- Decentralized registration
196+
197+
**Cons**:
198+
- Adds Multigres dependency to all component controllers
199+
- No centralized Cell definition
200+
- Coordination between components becomes complex
201+
- Harder to manage Cell lifecycle
202+
203+
**Rejected**: Violates separation of concerns and spreads Multigres dependencies across all modules.
204+
205+
## Alternative 3: Init Job for Cell Registration
206+
207+
Use a Kubernetes Job to register Cell on cluster creation.
208+
209+
**Pros**:
210+
- Simple one-time registration
211+
- No ongoing reconciliation needed
212+
213+
**Cons**:
214+
- No automatic updates if Cell definition changes
215+
- No cleanup on deletion
216+
- Can't handle Cell modifications
217+
- Not declarative - Cell state not in Kubernetes API
218+
219+
**Rejected**: Doesn't provide proper lifecycle management.
220+
221+
# Infrastructure Needed
222+
223+
- **Multigres Codebase**: `pkg/data-handler` module depends on `github.com/multigres/multigres`
224+
- **Etcd Access**: Cell reconciler needs network access to etcd (Topo Server)
225+
- **Kubebuilder**: For generating Cell CRD manifests
226+
- **Testing Etcd**: envtest setup must include etcd for integration tests

0 commit comments

Comments
 (0)