-
Notifications
You must be signed in to change notification settings - Fork 0
Open
0 / 50 of 5 issues completedDescription
Implement Hetzner Certificate Rotation Service
Overview
Implement a lightweight, long-running daemon that manages certificate lifecycle on bare-metal machines in Hetzner data centers. The service handles automatic certificate renewal, secure storage, and service integration while maintaining connectivity to AWS-hosted services via Tailscale VPN.
Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Certificate │ │ Hetzner Cert │ │ Keycloak │
│ API Service │◄──►│ Rotation Service│◄──►│ (JWT Auth) │
│ (AWS) │ │ (Hetzner) │ │ (AWS) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│
▼
┌──────────────────┐
│ Local Storage │
│ (/etc/certs/) │
└──────────────────┘
Connectivity via Tailscale VPN
Core Responsibilities
- Bootstrap with initial dual-purpose certificates for machine identity
- Authenticate with Keycloak using mTLS to exchange certificates for JWTs
- Automatically renew certificates before expiration using the Certificate API
- Manage local certificate storage and service reloading
- Provide graceful degradation during connectivity issues
- Maintain audit trails for certificate operations
Sub-Issues
- feat: adds cuetools package #1 Core Service Architecture & Configuration Management
- feat: adds blueprint package #2 Certificate Bootstrap & Storage Management
- chore: resizes README logo #3 Keycloak Authentication & JWT Management
- feat: adds forge CLI #4 Certificate Renewal & Rotation Logic
- feat: adds top-level ci config to blueprint #5 Monitoring, Health Checks & Service Integration
Project Structure
services/certificates/refresher/
├── cmd/
│ └── rotator/
│ └── main.go # Service entry point
├── internal/
│ ├── api/
│ │ ├── certificate.go # Certificate API client
│ │ └── client.go # HTTP client with retry logic
│ ├── auth/
│ │ ├── keycloak.go # Keycloak mTLS auth
│ │ └── jwt.go # JWT token management
│ ├── bootstrap/
│ │ └── bootstrap.go # Initial certificate setup
│ ├── config/
│ │ └── config.go # Configuration management
│ ├── rotation/
│ │ ├── scheduler.go # Certificate rotation scheduler
│ │ ├── renewer.go # Certificate renewal logic
│ │ └── validator.go # Certificate validation
│ ├── storage/
│ │ ├── filesystem.go # Local certificate storage
│ │ └── permissions.go # File permission management
│ ├── monitoring/
│ │ ├── metrics.go # Prometheus metrics
│ │ ├── health.go # Health reporting
│ │ └── logging.go # Structured logging
│ └── services/
│ ├── reload.go # Service reload management
│ └── registry.go # Service discovery
├── pkg/
│ ├── crypto/
│ │ └── utils.go # Certificate utilities
│ ├── network/
│ │ └── connectivity.go # Network health checking
│ └── errors/
│ └── errors.go # Custom error types
├── deployments/
│ ├── systemd/
│ │ └── hetzner-cert-rotation.service
│ └── docker/ # For testing
│ └── Dockerfile
├── configs/
│ ├── config.yaml.example
│ └── bootstrap.yaml.example
├── scripts/
│ ├── install.sh # Installation script
│ └── bootstrap.sh # Bootstrap script
├── docs/
│ └── operations.md # Operational procedures
├── go.mod
├── go.sum
└── README.md
Key Requirements
Certificate Management
- 7-day certificate validity period
- Renewal at 30% of lifetime remaining (2.1 days)
- Support for both bootstrap certificates and API-managed certificates
- Atomic certificate replacement with backup retention (keep 3 versions)
Authentication
- mTLS authentication with Keycloak for JWT token exchange
- JWT tokens valid for 1 hour, refreshed 10 minutes before expiry
- Cached JWT tokens for resilience during network issues
Network & Resilience
- All AWS connectivity via Tailscale VPN
- Graceful degradation during network outages
- Exponential backoff for retries: 1min, 5min, 15min
- Alert after 3-5 consecutive renewal failures
Security
- Private keys generated locally, never transmitted
- Restrictive file permissions (certs: 644, keys: 600)
- TLS 1.3 for all external communications
- Audit logging for all certificate operations
Performance
- Maximum 50MB resident memory usage
- <1% CPU utilization during normal operation
- Certificate renewal completed within 30 seconds
Acceptance Criteria
-
Bootstrap Process
- Service can start with manually provisioned bootstrap certificate
- Successfully transitions from bootstrap cert to API-managed cert
- Never attempts to renew bootstrap certificate
-
Certificate Lifecycle
- Automatically renews certificates at 30% lifetime remaining
- Handles both initial issuance and renewal endpoints correctly
- Maintains certificate backups and performs atomic replacements
-
Authentication Flow
- Successfully authenticates with Keycloak using mTLS
- Exchanges certificates for JWT tokens
- Caches and refreshes JWT tokens appropriately
-
Service Integration
- Reloads nginx, HAProxy, and other configured services after rotation
- Provides health check endpoints for monitoring
- Exports Prometheus metrics for observability
-
Resilience
- Operates in degraded mode during network outages
- Retries failed operations with exponential backoff
- Maintains service availability with existing certificates during failures
-
Security & Compliance
- Follows all specified permission requirements
- Generates audit logs matching the specified schema
- Never exposes private keys or sensitive data
Testing Requirements
- Unit tests for all core components
- Integration tests for end-to-end flows
- Network failure simulation tests
- Performance benchmarks to validate resource usage
- Security validation for file permissions and key management
Documentation Deliverables
- Operational procedures document
- Installation and bootstrap scripts
- Configuration examples
- Troubleshooting guide
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels