Production-ready K3s cluster automation for Apple M3 Ultra Mac hardware.
- π Automated Rollback & Recovery - Safe operations with automatic rollback on failure
- π Built-in Observability - Prometheus, Grafana, and Loki pre-configured
- π Security Hardening - macOS-specific security configurations
- π― GitOps Ready - Prepared for ArgoCD/Flux deployment
- π₯οΈ macOS Optimized - Handles sleep/wake, mDNS, and APFS snapshots
- π Comprehensive Logging - All operations logged with rotation
- macOS on Apple Silicon (M3 Ultra)
- Homebrew installed
- SSH key-based authentication configured
- At least 20GB free disk space per node
- Network connectivity between all nodes
# Clone the repository
git clone https://github.com/doogie-bigmack/m3_ultra_cluster.git
cd m3_ultra_cluster
# Configure your nodes
cp configs/base/config.env.example configs/base/config.env
# Edit config.env with your node IPs and usernames
# Run preflight checks
./scripts/bootstrap/00-preflight.sh
# Install dependencies
./scripts/bootstrap/01-install-deps.sh
# Initialize control plane
./scripts/bootstrap/02-init-control.sh
# Join worker nodes
./scripts/bootstrap/03-join-workers.sh
# Configure SSH keychain (optional but recommended)
./scripts/operations/setup-ssh-keychain.sh
βββ scripts/
β βββ bootstrap/ # Initial cluster setup
β βββ rollback/ # Safety and recovery tools
β βββ security/ # Security hardening scripts
β βββ observability/ # Monitoring stack deployment
β βββ operations/ # Day-2 operations
β βββ lib/ # Shared functions
βββ configs/ # Configuration files
βββ manifests/ # Kubernetes YAML files
βββ docs/ # Documentation
βββ tests/ # Test suites
flowchart LR
subgraph Developer
Dev[Developer Mac]
end
subgraph Bootstrap
BS[Bootstrap Scripts]
end
subgraph Operations
OPS[Operation Scripts\nsetup-ssh-keychain.sh]
end
Dev --> BS
Dev --> OPS
BS --> CP[Control-Plane Node]
BS --> WN[Worker Nodes]
OPS --> CP
OPS --> WN
CP -->|k3s agent join| WN
subgraph "Core Services"
Core[etcd + API-server + Scheduler]
end
CP --> Core
subgraph Addons
MetalLB[MetalLB]
Ingress[Ingress-NGINX]
CertManager[Cert-Manager]
ArgoCD[ArgoCD]
PromOp[Prometheus Operator]
end
MetalLB --> CP
Ingress --> CP
CertManager --> CP
ArgoCD --> CP
PromOp --> CP
subgraph Observability
Prometheus --> PromOp
Grafana --> PromOp
Loki --> PromOp
Alertmanager --> PromOp
end
ArgoCD --> Apps[GitOps Apps]
subgraph Developer
Dev[Developer Mac]
end
subgraph Bootstrap
Script[Bootstrap Scripts]
end
Dev --> Script
Script --> CP[Control-Plane Node]
Script --> WN[Worker Nodes]
CP -->|k3s agent join| WN
subgraph "Core Services"
Core[etcd + API-server + Scheduler]
end
CP --> Core
subgraph Addons
MetalLB[MetalLB]
Ingress[Ingress-NGINX]
CertManager[Cert-Manager]
ArgoCD[ArgoCD]
PromOp[Prometheus Operator]
end
MetalLB --> CP
Ingress --> CP
CertManager --> CP
ArgoCD --> CP
PromOp --> CP
subgraph Observability
Prometheus --> PromOp
Grafana --> PromOp
Loki --> PromOp
Alertmanager --> PromOp
end
ArgoCD --> Apps[GitOps Apps]
sequenceDiagram
participant Dev as Developer
participant BS as Bootstrap Scripts
participant OPS as Operation Scripts
participant CP as Control-Plane
participant W1 as Worker Node
Dev->>BS: 00-preflight.sh
BS->>Dev: Validate config
Dev->>BS: 01-install-deps.sh
BS->>CP: Install packages
BS->>W1: Install packages
Dev->>BS: 02-init-control.sh
BS->>CP: k3s server install
CP-->>BS: Return join-token
BS->>Dev: Save token
Dev->>BS: 03-join-workers.sh
BS->>W1: k3s agent install --token
W1-->>CP: Join cluster
BS->>CP: Deploy addons & observability
Dev->>OPS: setup-ssh-keychain.sh
OPS->>CP: Propagate SSH keychain
OPS->>W1: Propagate SSH keychain
participant Dev as Developer
participant Script as Bootstrap Script
participant CP as Control-Plane
participant W1 as Worker Node
Dev->>Script: Run 00-preflight.sh
Script->>Dev: Validate tools / config
Dev->>Script: Run 01-install-deps.sh
Script->>CP: Install packages via SSH
Script->>W1: Install packages via SSH
Dev->>Script: Run 02-init-control.sh
Script->>CP: k3s server install
CP-->>Script: Return join-token
Script->>Dev: Write token to local file
Dev->>Script: Run 03-join-workers.sh
Script->>W1: k3s agent install --token
W1-->>CP: Join cluster
Script->>CP: Deploy addons (MetalLB, Ingressβ¦)
Script->>CP: Deploy observability stack
The cluster supports per-node username configuration. Edit configs/base/nodes.conf
:
# Format: IP_ADDRESS USERNAME ROLE NOTES
<CONTROL_PLANE_IP> <USERNAME> control-plane <NODE_NAME>
<WORKER_IP> <USERNAME> worker <NODE_NAME>
Key configuration options in configs/base/config.env
:
CLUSTER_NAME
- Your cluster identifierCONTROL_PLANE_IP
- Control plane node IPWORKER_IPS
- Array of worker node IPsK3S_VERSION
- K3s version to install
This project follows security best practices:
- No secrets or credentials in repository
- All sensitive data in
.env.local
(git-ignored) - Automated security scanning via GitHub Actions
- RBAC and network policies pre-configured
- TLS certificates auto-generated and rotated
Every operation includes rollback capability:
# Create system snapshot before changes
./scripts/rollback/snapshot-system.sh
# Uninstall K3s from a node
./scripts/rollback/uninstall-node.sh <node-ip>
# Restore system to snapshot
./scripts/rollback/restore-system.sh
The cluster includes a full observability stack:
- Prometheus - Metrics collection
- Grafana - Visualization dashboards
- Loki - Log aggregation
- AlertManager - Alert routing
Deploy with:
./scripts/observability/deploy-stack.sh
Run the test suite:
# Unit tests
./tools/test-unit.sh
# Integration tests (requires running cluster)
./tools/test-integration.sh
# Smoke tests
./tools/test-smoke.sh
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'feat: add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- K3s team for the lightweight Kubernetes distribution
- k3sup for simplified installation
- The macOS community for Apple Silicon insights
- This project is specifically designed for macOS on Apple Silicon
- Ensure all nodes have synchronized time (NTP)
- Disable sleep on cluster nodes for stability
- Regular backups are recommended
- Open an issue for bugs or feature requests
- Check troubleshooting guide first
- Join our discussions for questions