Skip to content

Core Service Architecture & Configuration Management #205

@jmgilman

Description

@jmgilman

Core Service Architecture & Configuration Management

Overview

Set up the foundational architecture for the Hetzner Certificate Rotation Service, including the main service structure, configuration management system, and systemd integration. This issue establishes the core framework upon which all other components will be built.

Requirements

Project Structure

Create the base Go project structure:

services/certificates/refresher/
├── cmd/
│   └── rotator/
│       └── main.go                 # Service entry point
├── internal/
│   ├── config/
│   │   └── config.go             # Configuration management
│   └── monitoring/
│       └── logging.go            # Basic structured logging setup
├── pkg/
│   └── errors/
│       └── errors.go             # Custom error types
├── configs/
│   ├── config.yaml.example
│   └── bootstrap.yaml.example
├── deployments/
│   └── systemd/
│       └── hetzner-cert-rotation.service
├── go.mod
├── go.sum
└── README.md

Configuration Management

Implement configuration loading and validation for two configuration files:

Main Configuration (config.yaml)

service:
  name: "hetzner-cert-rotation"
  pid_file: "/var/run/hetzner-cert-rotation.pid"
  log_level: "info"
  log_format: "json"

machine:
  identity: "hetzner-build-01" # Unique machine identifier
  role: "build-server" # build-server, application-server, etc.
  environment: "prod" # dev, preprod, prod

certificates:
  storage_path: "/etc/certs"
  permissions:
    cert_file: 0644
    key_file: 0600
    directory: 0755
  backup_retention: 3 # Keep 3 previous certificate versions

rotation:
  check_interval: "1h" # How often to check certificate expiry
  renewal_threshold: 0.3 # Renew at 30% of lifetime remaining
  renewal_window: "72h" # Start renewal attempts this far before expiry
  max_attempts: 5 # Maximum renewal attempts before alerting

certificate_api:
  base_url: "https://certificate-api.internal"
  endpoints:
    issue: "/api/v1/certificates/issue"
    renew: "/api/v1/certificates/renew"
    status: "/api/v1/certificates/{serial}"
    ca_chain: "/api/v1/certificates/ca"
  timeout: "30s"
  retry_attempts: 3
  retry_backoff_intervals: ["1m", "5m", "15m"]

keycloak:
  base_url: "https://keycloak.internal"
  realm: "main"
  client_id: "hetzner-machine"
  token_endpoint: "/realms/main/protocol/openid-connect/token"
  jwt_cache_duration: "50m" # Refresh JWT 10 minutes before expiry

network:
  health_check_interval: "30s"
  connectivity_timeout: "10s"
  degraded_mode_threshold: "5m" # Enter degraded mode after 5 minutes offline

monitoring:
  metrics_port: 9091
  health_port: 8081

services:
  reload_commands:
    nginx: "systemctl reload nginx"
    haproxy: "systemctl reload haproxy"
    docker: "systemctl restart docker"

Bootstrap Configuration (bootstrap.yaml)

bootstrap:
  certificate_path: "/etc/certs/bootstrap/cert.pem"
  private_key_path: "/etc/certs/bootstrap/key.pem"
  ca_chain_path: "/etc/certs/bootstrap/ca-chain.pem"

initial_renewal:
  certificate_profile: "machine"
  common_name: "${MACHINE_IDENTITY}"
  san_entries:
    - "${MACHINE_IDENTITY}.internal"
    - "${MACHINE_IDENTITY}.hetzner"
  validity_days: 7

Main Service Entry Point

Implement cmd/rotator/main.go with:

  • Command-line flag parsing
  • Configuration file loading with validation
  • Environment variable substitution (for ${MACHINE_IDENTITY})
  • Graceful shutdown handling (SIGTERM, SIGINT)
  • Signal handling for manual operations (SIGUSR1 for manual renewal trigger)
  • PID file management
  • Basic service lifecycle logging

Structured Logging

Set up structured logging with:

  • JSON format support
  • Log levels (debug, info, warn, error)
  • Contextual fields (machine identity, environment, etc.)
  • Log rotation integration with systemd/journald

Error Types

Define custom error types in pkg/errors/errors.go:

type ErrorCode string

const (
    ErrCodeConfigInvalid      ErrorCode = "CONFIG_INVALID"
    ErrCodeCertificateExpired ErrorCode = "CERT_EXPIRED"
    ErrCodeNetworkUnavailable ErrorCode = "NETWORK_UNAVAILABLE"
    ErrCodeAuthFailed         ErrorCode = "AUTH_FAILED"
    ErrCodeRenewalFailed      ErrorCode = "RENEWAL_FAILED"
    ErrCodeServiceReloadFailed ErrorCode = "SERVICE_RELOAD_FAILED"
)

type ServiceError struct {
    Code    ErrorCode
    Message string
    Details map[string]interface{}
    Cause   error
}

Systemd Service Unit

Create deployments/systemd/hetzner-cert-rotation.service:

[Unit]
Description=Hetzner Certificate Rotation Service
After=network-online.target tailscaled.service
Wants=network-online.target

[Service]
Type=notify
ExecStart=/usr/local/bin/hetzner-cert-rotation -config /etc/hetzner-cert-rotation/config.yaml
ExecReload=/bin/kill -SIGUSR1 $MAINPID
Restart=always
RestartSec=30
User=cert-rotation
Group=ssl-cert
StandardOutput=journal
StandardError=journal
SyslogIdentifier=hetzner-cert-rotation

# Security hardening
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/etc/certs /var/run

[Install]
WantedBy=multi-user.target

Acceptance Criteria

  1. Configuration Loading

    • Successfully loads and validates both config.yaml and bootstrap.yaml
    • Supports environment variable substitution
    • Validates all required fields and value ranges
    • Provides clear error messages for configuration issues
  2. Service Lifecycle

    • Service starts and creates PID file
    • Responds to SIGTERM/SIGINT with graceful shutdown
    • Responds to SIGUSR1 for manual trigger (logs receipt for now)
    • Integrates with systemd notify protocol
  3. Logging

    • Outputs structured JSON logs at configured level
    • Includes contextual information (machine identity, etc.)
    • Properly integrates with systemd journal
  4. Error Handling

    • Uses custom error types consistently
    • Provides detailed error context for troubleshooting

Implementation Notes

  • Use viper or similar for configuration management
  • Use zerolog or zap for structured logging
  • Implement configuration validation using struct tags
  • Ensure all file paths are validated for existence/permissions where appropriate
  • Add configuration hot-reload capability if time permits (watch config file for changes)

Dependencies

  • Go 1.21 or later
  • No external service dependencies for this phase
  • Standard library plus approved third-party packages (viper, zerolog/zap)

Testing Requirements

  • Unit tests for configuration loading and validation
  • Unit tests for error type creation and formatting
  • Integration test for service startup and shutdown
  • Test environment variable substitution
  • Test signal handling (SIGTERM, SIGINT, SIGUSR1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions