Skip to content

Commit 6d78636

Browse files
authored
Merge pull request #1 from numtide/docs-for-context
Add initial project context
2 parents 2d7adf7 + 13f25f4 commit 6d78636

File tree

4 files changed

+594
-0
lines changed

4 files changed

+594
-0
lines changed

docs/README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# Project Overview
2+
3+
## What is This?
4+
5+
Kubernetes operator for deploying and managing Multigres clusters.
6+
7+
## Core Principle
8+
9+
**Clean Separation**: The operator manages Kubernetes resources (Deployments, StatefulSets, Services) only. It has no knowledge of Multigres application internals. Multigres components handle their own startup dependencies and coordination.
10+
11+
## What Does It Deploy?
12+
13+
A Multigres cluster consists of:
14+
- **etcd**: Distributed key-value store for cluster coordination
15+
- **MultiGateway**: PostgreSQL protocol gateway
16+
- **MultiOrch**: Orchestration service
17+
- **MultiPooler**: Connection pooling with embedded PostgreSQL and pgctld
18+
19+
The operator creates and manages all necessary Kubernetes resources for these components.
20+
21+
## Technology
22+
23+
- **Language**: Go
24+
- **Framework**: Kubebuilder
25+
- **Observability**: OpenTelemetry (traces, metrics, logs)
26+
- **Testing**: Go testing + envtest (100% coverage goal)
27+
28+
## Project Structure
29+
30+
### Code
31+
32+
```
33+
multigres-operator/
34+
├── api/v1alpha1/ # CRD definitions
35+
├── cmd/multigres-operator/ # Main entry point
36+
└── internal/ # Controller and resource builders
37+
```
38+
39+
### Documentation and Configuration
40+
41+
```
42+
multigres-operator/
43+
├── config/ # Kubernetes manifests
44+
├── docs/ # Architecture and guides
45+
└── plans/ # Planning documents
46+
```
47+
48+
## Documentation
49+
50+
- **architecture.md**: System design, components, technology choices
51+
- **implementation-guide.md**: Development workflow, testing, coding standards
52+
- **interface-design.md**: CRD API design, status fields, kubectl output
53+
54+
### Multigres Documentation
55+
56+
- https://multigres.com/: Documentation for Multigres architecture and design details
57+
- https://github.com/multigres/multigres: Main repository for Multigres implementation

docs/architecture.md

Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
# System Architecture
2+
3+
## Core Philosophy
4+
5+
### Idiomatic Go
6+
- **Simplicity and Clarity**: Straightforward code over clever abstractions
7+
- **Explicit Dependencies**: Function signatures and struct fields make dependencies obvious
8+
- **Error Handling**: Errors include context and are wrapped for debugging
9+
- **Pure Resource Builders**: Resource functions are deterministic - same input produces same output
10+
- **Structured Concurrency**: Use sync.WaitGroup for coordinated goroutines with clear lifecycle
11+
12+
### Kubernetes Operator Safety
13+
- **Finalizers for Lifecycle**: Prevent resource deletion until cleanup is complete
14+
- **Owner References for Cleanup**: Automatic garbage collection of child resources when parent is deleted
15+
- **Idempotent Reconciliation**: Safe to run reconcile loop multiple times - converges to desired state
16+
- **Status Subresource**: Observed state lives in status, never in spec
17+
- **Infrastructure-Only Concerns**: Operator manages compute resources, not application logic or startup dependencies
18+
19+
## Design Principles
20+
21+
### Clean Separation of Concerns
22+
- **Operator**: Manages Kubernetes resources (Deployments, StatefulSets, Services, HPAs)
23+
- **Multigres Components**: Handle their own application logic, readiness, and inter-service dependencies
24+
- **No Application Knowledge**: Operator doesn't orchestrate Multigres component startup order
25+
- **Eventually Consistent**: All resources created concurrently; components become ready when dependencies are available
26+
27+
### Layered Architecture
28+
- **Controller Layer**: Orchestrates reconciliation, manages finalizers and status updates
29+
- **Reconciler Layer**: Component-specific reconciliation (etcd, multigateway, multiorch, multipooler)
30+
- **Resource Builder Layer**: Pure functions that construct Kubernetes manifests
31+
- **Parallel Reconciliation**: All component reconcilers run concurrently via goroutines
32+
33+
### Testability
34+
- **Pure Functions**: Resource builders are deterministic and table-test friendly
35+
- **Test Helpers**: Mocks and test doubles enable testing without external dependencies
36+
- **Integration Tests**: envtest provides real Kubernetes API for controller testing
37+
- **Minimal Interfaces**: Each reconciler has consistent, simple signature
38+
39+
## System Architecture
40+
41+
```
42+
multigres-operator/
43+
├── api/v1alpha1/ # CRD definitions and types
44+
├── cmd/multigres-operator/ # Main entry point
45+
├── internal/
46+
│ ├── controller/ # Main reconciler and component reconcilers
47+
│ │ ├── etcd/
48+
│ │ ├── multigateway/
49+
│ │ ├── multiorch/
50+
│ │ └── multipooler/
51+
│ ├── resources/ # Pure resource builder functions
52+
│ ├── webhook/ # Admission webhooks (defaulting, validation)
53+
│ └── testutil/ # Test helpers and utilities
54+
├── config/ # Kubernetes manifests for operator deployment
55+
├── docs/ # Architecture, conventions, and development guides
56+
└── plans/ # Planning documents
57+
```
58+
59+
### API Layer (`api/v1alpha1`)
60+
- **Purpose**: Defines Multigres custom resource schema
61+
- **Components**: MultigresSpec, MultigresStatus, component specs (MultiGatewaySpec, MultiOrchSpec, MultiPoolerSpec, EtcdSpec)
62+
- **Validation**: Kubebuilder markers for OpenAPI validation and defaults
63+
64+
### Controller Layer (`internal/controller`)
65+
- **MultigresReconciler**: Main controller - manages lifecycle, finalizers, status aggregation
66+
- **Component Reconcilers**: etcd, multigateway, multiorch, multipooler reconcilers run in parallel
67+
- **Responsibilities**: Create/update resources, check component health, update status
68+
69+
### Resource Layer (`internal/resources`)
70+
- **Pure Functions**: Build Kubernetes manifests (Deployments, StatefulSets, Services, HPAs)
71+
- **Label Management**: Consistent label generation for resource selection
72+
- **No Side Effects**: Same input always produces same output
73+
74+
## Core Components
75+
76+
### Reconciliation Flow
77+
78+
The operator follows a standard Kubernetes reconciliation pattern:
79+
80+
1. **Watch**: Monitor Multigres custom resources for changes
81+
2. **Reconcile**: When changes detected, run reconciliation loop
82+
3. **Converge**: Create/update Kubernetes resources to match desired state
83+
4. **Status Update**: Reflect observed state in Multigres status subresource
84+
5. **Requeue**: Schedule next reconciliation if needed
85+
86+
### Component Reconcilers
87+
88+
Each Multigres component has its own reconciler:
89+
90+
- **etcd Reconciler**: Manages StatefulSet for etcd cluster, headless and client Services
91+
- **multigateway Reconciler**: Manages Deployment, Service, and optional HPA for MultiGateway
92+
- **multiorch Reconciler**: Manages Deployment and optional HPA for MultiOrch
93+
- **multipooler Reconciler**: Manages StatefulSet with multi-container pods (pooler, pgctld, postgres), and optional HPA
94+
95+
All component reconcilers run in parallel.
96+
97+
### Resource Builders
98+
99+
Pure functions that generate Kubernetes manifests:
100+
101+
- **Deterministic**: Same inputs always produce same outputs
102+
- **No Side Effects**: Don't make API calls or modify global state
103+
- **Testable**: Easily unit tested in isolation
104+
- **Composable**: Small functions that build specific resource types
105+
106+
### Validation Strategy
107+
108+
Using **CRD validation markers only** for simplicity:
109+
110+
- OpenAPI v3 schema constraints (numeric ranges, enums, patterns, etc.)
111+
- Default values specified in CRD
112+
- API server enforces validation without external calls
113+
- No admission webhooks required
114+
115+
### Observability
116+
117+
**OpenTelemetry Integration**:
118+
- **Traces**: Reconciliation flow, API calls, component creation with spans
119+
- **Metrics**: Reconciliation duration, error rates, component health, resource counts
120+
- **Logs**: Structured logs with trace context correlation
121+
122+
**Health Checks**:
123+
- **Liveness Probe**: HTTP endpoint to detect if operator needs restart
124+
- **Readiness Probe**: HTTP endpoint to indicate operator can handle requests
125+
126+
**Kubernetes Events**:
127+
- Emitted for significant state changes and errors
128+
- Surfaced via `kubectl describe` for user visibility
129+
130+
### Admission Webhooks (Future Consideration)
131+
132+
**Current Decision**: Start without admission webhooks to keep installation simple and reduce moving parts during initial development.
133+
134+
**When to Add**: Consider webhooks when:
135+
- Need validation beyond OpenAPI v3 schema (cross-field validation, complex business rules)
136+
- Want dynamic defaults based on cluster state
137+
- Need mutation beyond simple defaults
138+
139+
**Certificate Management Options**:
140+
- **Init container pattern** (preferred): Jobs generate certs, init container waits - no runtime dependencies, standard CNCF pattern
141+
- **cert-manager**: Automatic cert management - adds runtime dependency but simplifies renewal
142+
- **Manual certificates**: Full control - operational overhead for rotation
143+
144+
Current preference is init container pattern (same as Istio, NGINX Ingress), but final decision will be made during implementation based on operational requirements.
145+
146+
## Technology Stack
147+
148+
### Language and Runtime
149+
- **Language**: Go 1.24+
150+
- **Key Features**: Concurrency, interfaces, strong typing
151+
152+
### Key Dependencies
153+
- **Framework**: Kubebuilder v3 - scaffolding and patterns for Kubernetes operators
154+
- **controller-runtime**: Core controller and client libraries
155+
- **client-go**: Kubernetes API client
156+
- **OpenTelemetry**: Traces, metrics, and logs
157+
- **Testing**: Standard Go testing, envtest for integration tests
158+
159+
### Build Tools
160+
- **Make**: Task orchestration (build, test, deploy)
161+
- **Docker**: Container image building
162+
- **kubectl/kustomize**: Kubernetes manifest management
163+
- **GitHub Actions**: CI/CD pipeline for testing and releases
164+
- **Optional**: Nix + direnv for reproducible dev environment
165+
166+
## Performance Considerations
167+
168+
### Parallel Reconciliation
169+
- Component reconcilers run concurrently
170+
- Each component reconciler is independent and stateless
171+
- All components complete before status aggregation
172+
173+
### Resource Efficiency
174+
- Pure resource builders don't allocate unnecessary memory
175+
- Status updates batched - one update per reconciliation loop
176+
- Requeue delays prevent tight loops when resources aren't ready
177+
178+
### Kubernetes API Calls
179+
- Owner references enable automatic garbage collection (no manual cleanup)
180+
- Watches reduce unnecessary reconciliation triggers
181+
- Client-side caching via controller-runtime reduces API server load
182+
183+
## Error Handling Strategy
184+
185+
### Error Types
186+
- **Reconciliation Errors**: Failed to create/update Kubernetes resources
187+
- **API Errors**: Kubernetes API server communication failures
188+
- **Validation Errors**: Invalid spec values caught by CRD validation
189+
- **Health Check Errors**: Component not ready yet (non-fatal, triggers requeue)
190+
191+
### Error Reporting
192+
- Errors wrapped with context for debugging
193+
- Structured logging with key-value pairs
194+
- Critical errors returned to trigger requeue
195+
- Non-critical errors logged but don't fail reconciliation
196+
197+
### Recovery Strategies
198+
- **Automatic Requeue**: Failed reconciliations automatically retry with exponential backoff
199+
- **Status Conditions**: Error details reflected in status conditions for debugging
200+
- **Idempotent Operations**: Safe to retry - won't duplicate resources
201+
- **Component Independence**: One component's failure doesn't block others
202+
203+
## Future Architecture Considerations
204+
205+
### High Availability
206+
- Leader election (already supported via controller-runtime flag)
207+
- Multiple operator replicas for redundancy
208+
- Graceful shutdown handling for in-flight reconciliations
209+
- Zero-downtime upgrades
210+
211+
### Advanced Features
212+
- Custom resource pruning and cleanup policies
213+
- Multi-cluster support for Multigres deployments
214+
- Backup and restore integration
215+
- Advanced scheduling and placement strategies

0 commit comments

Comments
 (0)