diff --git a/custom-domain/dstack-ingress/.gitignore b/custom-domain/dstack-ingress/.gitignore new file mode 100644 index 0000000..402fa39 --- /dev/null +++ b/custom-domain/dstack-ingress/.gitignore @@ -0,0 +1,4 @@ +/.claude/ +/CLAUDE.md +/test/ +__pycache__ diff --git a/custom-domain/dstack-ingress/DNS_PROVIDERS.md b/custom-domain/dstack-ingress/DNS_PROVIDERS.md new file mode 100644 index 0000000..8a3f551 --- /dev/null +++ b/custom-domain/dstack-ingress/DNS_PROVIDERS.md @@ -0,0 +1,114 @@ +# DNS Provider Configuration Guide + +This guide explains how to configure dstack-ingress to work with different DNS providers for managing custom domains and SSL certificates. + +## Supported DNS Providers + +- **Cloudflare** - The original and default provider +- **Linode DNS** - For Linode-hosted domains + +## Environment Variables + +### Common Variables (Required for all providers) + +- `DOMAIN` - Your custom domain (e.g., `app.example.com`) +- `GATEWAY_DOMAIN` - dstack gateway domain (e.g., `_.dstack-prod5.phala.network`) +- `CERTBOT_EMAIL` - Email for Let's Encrypt registration +- `TARGET_ENDPOINT` - Backend application endpoint to proxy to +- `DNS_PROVIDER` - DNS provider to use (`cloudflare`, `linode`) + +### Optional Variables + +- `SET_CAA` - Enable CAA record setup (default: false) +- `PORT` - HTTPS port (default: 443) +- `TXT_PREFIX` - Prefix for TXT records (default: "_tapp-address") + +## Provider-Specific Configuration + +### Cloudflare + +```bash +DNS_PROVIDER=cloudflare +CLOUDFLARE_API_TOKEN=your-api-token +``` + +**Required Permissions:** +- Zone:Read +- DNS:Edit + +### Linode DNS + +```bash +DNS_PROVIDER=linode +LINODE_API_TOKEN=your-api-token +``` + +**Required Permissions:** +- Domains: Read/Write access + +**Important Note for Linode:** +- Linode has a limitation where CAA and CNAME records cannot coexist on the same subdomain +- To work around this, the system will attempt to use A records instead of CNAME records +- If the gateway domain can be resolved to an IP, an A record will be created +- If resolution fails, it falls back to CNAME (but CAA records won't work on that subdomain) +- This is a Linode-specific limitation not present in other providers + +## Docker Compose Example + +```yaml +version: '3.8' + +services: + ingress: + image: dstack-ingress:latest + ports: + - "443:443" + environment: + # Common configuration + - DNS_PROVIDER=linode + - DOMAIN=app.example.com + - GATEWAY_DOMAIN=_.dstack-prod5.phala.network + - CERTBOT_EMAIL=admin@example.com + - TARGET_ENDPOINT=http://backend:8080 + + # Linode specific + - LINODE_API_TOKEN=your-api-token + volumes: + - ./letsencrypt:/etc/letsencrypt + - ./evidences:/evidences +``` + +## Migration from Cloudflare-only Setup + +If you're currently using the Cloudflare-only version: + +1. **No changes needed for Cloudflare users** - The default behavior remains Cloudflare +2. **For other providers** - Add the `DNS_PROVIDER` environment variable and provider-specific credentials + +## Troubleshooting + +### DNS Provider Detection + +If you see "Could not detect DNS provider type", ensure you have either: +- Set `DNS_PROVIDER` environment variable explicitly, OR +- Set provider-specific credential environment variables (e.g., `CLOUDFLARE_API_TOKEN`) + +### Certificate Generation Issues + +Different providers may have different propagation times. The default is 120 seconds, but you may need to adjust based on your provider's behavior. + +### Permission Errors + +Ensure your API tokens/credentials have the necessary permissions listed above for your provider. + +## API Token Generation + +### Cloudflare +1. Go to https://dash.cloudflare.com/profile/api-tokens +2. Create token with Zone:Read and DNS:Edit permissions +3. Scope to specific zones if desired + +### Linode +1. Go to https://cloud.linode.com/profile/tokens +2. Create a Personal Access Token +3. Grant "Domains" Read/Write access \ No newline at end of file diff --git a/custom-domain/dstack-ingress/Dockerfile b/custom-domain/dstack-ingress/Dockerfile index 90cc6dc..4ea1b0d 100644 --- a/custom-domain/dstack-ingress/Dockerfile +++ b/custom-domain/dstack-ingress/Dockerfile @@ -32,9 +32,10 @@ RUN set -e; \ RUN mkdir -p /etc/letsencrypt /var/www/certbot /usr/share/nginx/html -COPY ./scripts/* /scripts/ -RUN chmod +x /scripts/* +COPY ./scripts /scripts/ +RUN chmod +x /scripts/*.sh /scripts/*.py ENV PATH="/scripts:$PATH" +ENV PYTHONPATH="/scripts" COPY .GIT_REV /etc/ ENTRYPOINT ["/scripts/entrypoint.sh"] diff --git a/custom-domain/dstack-ingress/README.md b/custom-domain/dstack-ingress/README.md index 594069a..cfd82df 100644 --- a/custom-domain/dstack-ingress/README.md +++ b/custom-domain/dstack-ingress/README.md @@ -1,13 +1,14 @@ # Custom Domain Setup for dstack Applications -This repository provides a solution for setting up custom domains with automatic SSL certificate management for dstack applications using Cloudflare DNS and Let's Encrypt. +This repository provides a solution for setting up custom domains with automatic SSL certificate management for dstack applications using various DNS providers and Let's Encrypt. ## Overview This project enables you to run dstack applications with your own custom domain, complete with: - Automatic SSL certificate provisioning and renewal via Let's Encrypt -- Cloudflare DNS configuration for CNAME, TXT, and CAA records +- Multi-provider DNS support (Cloudflare, Linode DNS, more to come) +- Automatic DNS configuration for CNAME, TXT, and CAA records - Nginx reverse proxy to route traffic to your application - Certificate evidence generation for verification - Strong SSL/TLS configuration with modern cipher suites (AES-GCM and ChaCha20-Poly1305) @@ -17,16 +18,20 @@ This project enables you to run dstack applications with your own custom domain, The dstack-ingress system provides a seamless way to set up custom domains for dstack applications with automatic SSL certificate management. Here's how it works: 1. **Initial Setup**: + - When first deployed, the container automatically obtains SSL certificates from Let's Encrypt using DNS validation - - It configures Cloudflare DNS by creating necessary CNAME, TXT, and optional CAA records + - It configures your DNS provider by creating necessary CNAME, TXT, and optional CAA records - Nginx is configured to use the obtained certificates and proxy requests to your application 2. **DNS Configuration**: + - A CNAME record is created to point your custom domain to the dstack gateway domain - A TXT record is added with application identification information to help dstack-gateway to route traffic to your application - If enabled, CAA records are set to restrict which Certificate Authorities can issue certificates for your domain + - The system automatically detects your DNS provider based on environment variables 3. **Certificate Management**: + - SSL certificates are automatically obtained during initial setup - A scheduled task runs twice daily to check for certificate renewal - When certificates are renewed, Nginx is automatically reloaded to use the new certificates @@ -40,7 +45,8 @@ The dstack-ingress system provides a seamless way to set up custom domains for d ### Prerequisites -- Host your domain on Cloudflare and have access to the Cloudflare account with API token +- Host your domain on one of the supported DNS providers +- Have appropriate API credentials for your DNS provider (see [DNS Provider Configuration](DNS_PROVIDERS.md) for details) ### Deployment @@ -57,7 +63,13 @@ services: ports: - "443:443" environment: + # DNS Provider + - DNS_PROVIDER=cloudflare + + # Cloudflare example - CLOUDFLARE_API_TOKEN=${CLOUDFLARE_API_TOKEN} + + # Common configuration - DOMAIN=${DOMAIN} - GATEWAY_DOMAIN=${GATEWAY_DOMAIN} - CERTBOT_EMAIL=${CERTBOT_EMAIL} @@ -68,21 +80,23 @@ services: - cert-data:/etc/letsencrypt restart: unless-stopped app: - image: nginx # Replace with your application image + image: nginx # Replace with your application image restart: unless-stopped volumes: - cert-data: # Persistent volume for certificates + cert-data: # Persistent volume for certificates ``` -Explanation of environment variables: +**Core Environment Variables:** -- `CLOUDFLARE_API_TOKEN`: Your Cloudflare API token +- `DNS_PROVIDER`: DNS provider to use (cloudflare, linode) - `DOMAIN`: Your custom domain -- `GATEWAY_DOMAIN`: The dstack gateway domain. (e.g. `_.dstack-prod5.phala.network` for Phala Cloud) +- `GATEWAY_DOMAIN`: The dstack gateway domain (e.g. `_.dstack-prod5.phala.network` for Phala Cloud) - `CERTBOT_EMAIL`: Your email address used in Let's Encrypt certificate requests - `TARGET_ENDPOINT`: The plain HTTP endpoint of your dstack application - `SET_CAA`: Set to `true` to enable CAA record setup +For provider-specific configuration details, see [DNS Provider Configuration](DNS_PROVIDERS.md). + #### Option 2: Build Your Own Image If you prefer to build the image yourself: @@ -95,6 +109,7 @@ If you prefer to build the image yourself: ``` **Important**: You must use the `build-image.sh` script to build the image. This script ensures reproducible builds with: + - Specific buildkit version (v0.20.2) - Deterministic timestamps (`SOURCE_DATE_EPOCH=0`) - Package pinning for consistency @@ -150,10 +165,12 @@ The dstack-ingress system provides mechanisms to verify and attest that your cus When certificates are issued or renewed, the system automatically generates a set of cryptographically linked evidence files: 1. **Access Evidence Files**: + - Evidence files are accessible at `https://your-domain.com/evidences/` - Key files include `acme-account.json`, `cert.pem`, `sha256sum.txt`, and `quote.json` 2. **Verification Chain**: + - `quote.json` contains a TDX quote with the SHA-256 digest of `sha256sum.txt` embedded in the report_data field - `sha256sum.txt` contains cryptographic checksums of both `acme-account.json` and `cert.pem` - When the TDX quote is verified, it cryptographically proves the integrity of the entire evidence chain @@ -178,9 +195,10 @@ The output will display CAA records that restrict certificate issuance exclusive All Let's Encrypt certificates are logged in public Certificate Transparency (CT) logs, enabling independent verification: **CT Log Verification**: - - Visit [crt.sh](https://crt.sh/) and search for your domain - - Confirm that the certificates match those issued by the dstack-ingress system - - This public logging ensures that all certificates are visible and can be monitored for unauthorized issuance + +- Visit [crt.sh](https://crt.sh/) and search for your domain +- Confirm that the certificates match those issued by the dstack-ingress system +- This public logging ensures that all certificates are visible and can be monitored for unauthorized issuance ## License diff --git a/custom-domain/dstack-ingress/docker-compose.yaml b/custom-domain/dstack-ingress/docker-compose.yaml index acb26d7..344e1f9 100644 --- a/custom-domain/dstack-ingress/docker-compose.yaml +++ b/custom-domain/dstack-ingress/docker-compose.yaml @@ -21,4 +21,3 @@ services: volumes: cert-data: - diff --git a/custom-domain/dstack-ingress/scripts/certman.py b/custom-domain/dstack-ingress/scripts/certman.py new file mode 100644 index 0000000..d2eba1b --- /dev/null +++ b/custom-domain/dstack-ingress/scripts/certman.py @@ -0,0 +1,232 @@ +#!/usr/bin/env python3 + +from dns_providers import DNSProviderFactory +import argparse +import os +import subprocess +import sys +from typing import List, Optional, Tuple + +# Add script directory to path to import dns_providers +sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + + +class CertManager: + """Certificate management using DNS provider infrastructure.""" + + def __init__(self, provider_type: Optional[str] = None): + """Initialize cert manager with DNS provider.""" + # Use the same DNS provider factory + self.provider_type = provider_type or self._detect_provider_type() + self.provider = DNSProviderFactory.create_provider(self.provider_type) + + def _detect_provider_type(self) -> str: + """Detect provider type (reuse factory logic).""" + return DNSProviderFactory._detect_provider_type() + + def install_plugin(self) -> bool: + """Install certbot plugin for the current provider.""" + if not self.provider.CERTBOT_PACKAGE: + print(f"No certbot package defined for {self.provider_type}") + return False + + print(f"Installing certbot plugin: {self.provider.CERTBOT_PACKAGE}") + + # Use virtual environment pip if available + pip_cmd = ["pip", "install", self.provider.CERTBOT_PACKAGE] + if "VIRTUAL_ENV" in os.environ: + venv_pip = os.path.join(os.environ["VIRTUAL_ENV"], "bin", "pip") + if os.path.exists(venv_pip): + pip_cmd[0] = venv_pip + + try: + result = subprocess.run(pip_cmd, capture_output=True, text=True) + if result.returncode != 0: + print(f"Failed to install plugin: {result.stderr}", file=sys.stderr) + return False + print(f"Successfully installed {self.provider.CERTBOT_PACKAGE}") + return True + except Exception as e: + print(f"Error installing plugin: {e}", file=sys.stderr) + return False + + def setup_credentials(self) -> bool: + """Setup credentials file for certbot using provider implementation.""" + return self.provider.setup_certbot_credentials() + + def _build_certbot_command(self, action: str, domain: str, email: str) -> List[str]: + """Build certbot command using provider configuration.""" + plugin = self.provider.CERTBOT_PLUGIN + if not plugin: + raise ValueError(f"No certbot plugin configured for {self.provider_type}") + + propagation_seconds = self.provider.CERTBOT_PROPAGATION_SECONDS + + base_cmd = ["certbot", action] + + # Add DNS plugin configuration + base_cmd.extend( + [ + f"--{plugin}", + f"--{plugin}-propagation-seconds", + str(propagation_seconds), + "--non-interactive", + ] + ) + + # Add credentials file if provider has one configured + if self.provider.CERTBOT_CREDENTIALS_FILE: + credentials_file = os.path.expanduser( + self.provider.CERTBOT_CREDENTIALS_FILE + ) + if os.path.exists(credentials_file): + base_cmd.extend([f"--{plugin}-credentials", credentials_file]) + + if action == "certonly": + base_cmd.extend( + ["--email", email, "--agree-tos", "--no-eff-email", "-d", domain] + ) + + return base_cmd + + def obtain_certificate(self, domain: str, email: str) -> bool: + """Obtain a new certificate for the domain.""" + print(f"Obtaining new certificate for {domain} using {self.provider_type}") + + cmd = self._build_certbot_command("certonly", domain, email) + + try: + result = subprocess.run(cmd, capture_output=True, text=True) + if result.returncode != 0: + print(f"Certificate obtaining failed: {result.stderr}", file=sys.stderr) + return False + + print(f"Certificate obtained successfully for {domain}") + return True + + except Exception as e: + print(f"Error running certbot: {e}", file=sys.stderr) + return False + + def renew_certificate(self, domain: str) -> Tuple[bool, bool]: + """Renew certificates. + + Returns: + (success, renewed): success status and whether renewal was actually performed + """ + print(f"Renewing certificate using {self.provider_type}") + + cmd = self._build_certbot_command("renew", domain, "") + + try: + result = subprocess.run(cmd, capture_output=True, text=True) + if result.returncode != 0: + print(f"Certificate renewal failed: {result.stderr}", file=sys.stderr) + return False, False + + # Check if no renewals were needed + if "No renewals were attempted" in result.stdout: + print("No certificates need renewal") + return True, False + + print("Certificate renewed successfully") + return True, True + + except Exception as e: + print(f"Error running certbot: {e}", file=sys.stderr) + return False, False + + def certificate_exists(self, domain: str) -> bool: + """Check if certificate already exists for domain.""" + cert_path = f"/etc/letsencrypt/live/{domain}/fullchain.pem" + return os.path.isfile(cert_path) + + def run_action( + self, domain: str, email: str, action: str = "auto" + ) -> Tuple[bool, bool]: + """High-level certificate management. + + Returns: + (success, needs_evidence): success status and whether evidence should be generated + """ + if action == "auto": + if self.certificate_exists(domain): + success, renewed = self.renew_certificate(domain) + return success, renewed # Only generate evidence if actually renewed + else: + success = self.obtain_certificate(domain, email) + return success, success # Always generate evidence for new certificates + elif action == "obtain": + success = self.obtain_certificate(domain, email) + return success, success + elif action == "renew": + success, renewed = self.renew_certificate(domain) + return success, renewed + else: + raise ValueError(f"Invalid action: {action}") + + +def main(): + parser = argparse.ArgumentParser( + description="Manage SSL certificates with certbot using DNS providers" + ) + parser.add_argument( + "action", choices=["obtain", "renew", "auto", "setup"], help="Action to perform" + ) + parser.add_argument("--domain", help="Domain name") + parser.add_argument("--email", help="Email for Let's Encrypt registration") + parser.add_argument("--provider", help="DNS provider (cloudflare, linode, etc)") + + args = parser.parse_args() + + try: + manager = CertManager(args.provider) + + # Handle setup action + if args.action == "setup": + if not manager.install_plugin(): + sys.exit(1) + if not manager.setup_credentials(): + sys.exit(1) + print(f"Setup completed for {manager.provider_type} provider") + return + + # Domain is required for certificate operations + if not args.domain: + print( + "Error: --domain is required for certificate operations", + file=sys.stderr, + ) + sys.exit(1) + + # Email is required for obtain and auto actions + if args.action in ["obtain", "auto"] and not args.email: + if not os.environ.get("CERTBOT_EMAIL"): + print( + "Error: --email is required or set CERTBOT_EMAIL environment variable", + file=sys.stderr, + ) + sys.exit(1) + args.email = os.environ["CERTBOT_EMAIL"] + + success, needs_evidence = manager.run_action( + args.domain, args.email, args.action + ) + + if not success: + sys.exit(1) + + # Exit with code 2 if no evidence generation is needed (no renewal was performed) + if not needs_evidence: + sys.exit(2) + + except ValueError as e: + print(f"Error: {e}", file=sys.stderr) + sys.exit(1) + except Exception as e: + print(f"Unexpected error: {e}", file=sys.stderr) + sys.exit(1) + + +if __name__ == "__main__": + main() diff --git a/custom-domain/dstack-ingress/scripts/cloudflare_dns.py b/custom-domain/dstack-ingress/scripts/cloudflare_dns.py deleted file mode 100755 index 6e68336..0000000 --- a/custom-domain/dstack-ingress/scripts/cloudflare_dns.py +++ /dev/null @@ -1,307 +0,0 @@ -#!/usr/bin/env python3 - -import argparse -import json -import os -import sys -import requests -from typing import Dict, List, Optional - - -class CloudflareDNSClient: - """A client for managing DNS records in Cloudflare with better error handling.""" - - def __init__(self, api_token: str, zone_id: Optional[str] = None): - self.api_token = api_token - self.zone_id = zone_id - self.base_url = "https://api.cloudflare.com/client/v4" - self.headers = { - "Authorization": f"Bearer {api_token}", - "Content-Type": "application/json" - } - - def _make_request(self, method: str, endpoint: str, data: Optional[Dict] = None) -> Dict: - """Make a request to the Cloudflare API with error handling.""" - url = f"{self.base_url}/{endpoint}" - try: - if method.upper() == "GET": - response = requests.get(url, headers=self.headers) - elif method.upper() == "POST": - response = requests.post(url, headers=self.headers, json=data) - elif method.upper() == "DELETE": - response = requests.delete(url, headers=self.headers) - else: - raise ValueError(f"Unsupported HTTP method: {method}") - - response.raise_for_status() - result = response.json() - - if not result.get("success", False): - errors = result.get("errors", []) - error_msg = "\n".join([f"Code: {e.get('code')}, Message: {e.get('message')}" for e in errors]) - print(f"API Error: {error_msg}", file=sys.stderr) - # Print the request data for debugging - if data: - print(f"Request data: {json.dumps(data)}", file=sys.stderr) - return {"success": False, "errors": errors} - - return result - except requests.exceptions.RequestException as e: - print(f"Request Error: {str(e)}", file=sys.stderr) - # Print the request data for debugging - if data: - print(f"Request data: {json.dumps(data)}", file=sys.stderr) - return {"success": False, "errors": [{"message": str(e)}]} - except json.JSONDecodeError: - print(f"JSON Decode Error: Could not parse response", file=sys.stderr) - return {"success": False, "errors": [{"message": "Could not parse response"}]} - except Exception as e: - print(f"Unexpected Error: {str(e)}", file=sys.stderr) - return {"success": False, "errors": [{"message": str(e)}]} - - def get_zone_id(self, domain: str) -> Optional[str]: - """Get the zone ID for a domain.""" - # Find the zone with the longest matching suffix for the domain - zone_name_len = 0 - zone_id = None - - page = 1 - total_pages = 1 - - while page <= total_pages: - result = self._make_request("GET", f"zones?page={page}") - - if not result.get("success", False): - return None - - zones = result.get("result", []) - if not zones and page == 1: - print(f"No zones found for any domain", file=sys.stderr) - return None - - result_info = result.get("result_info", {}) - if result_info: - total_pages = result_info.get("total_pages", total_pages) - - for zone in zones: - zone_name = zone.get("name", "") - if domain == zone_name: - return zone.get("id") - if domain.endswith(f".{zone_name}") and len(zone_name) > zone_name_len: - zone_name_len = len(zone_name) - zone_id = zone.get("id") - - page += 1 - - if zone_id: - self.zone_id = zone_id - return zone_id - else: - print(f"Zone ID not found in response for domain: {domain}", file=sys.stderr) - return None - - def get_dns_records(self, name: str, record_type: Optional[str] = None) -> List[Dict]: - """Get DNS records for a domain.""" - if not self.zone_id: - print("Zone ID is required", file=sys.stderr) - return [] - - params = f"zones/{self.zone_id}/dns_records?name={name}" - if record_type: - params += f"&type={record_type}" - - print(f"Checking for existing DNS records for {name}") - result = self._make_request("GET", params) - - if not result.get("success", False): - return [] - - records = result.get("result", []) - return records - - def delete_dns_record(self, record_id: str) -> bool: - """Delete a DNS record.""" - if not self.zone_id: - print("Zone ID is required", file=sys.stderr) - return False - - print(f"Deleting record ID: {record_id}") - result = self._make_request("DELETE", f"zones/{self.zone_id}/dns_records/{record_id}") - - return result.get("success", False) - - def create_cname_record(self, name: str, content: str, ttl: int = 60, proxied: bool = False) -> bool: - """Create a CNAME record.""" - if not self.zone_id: - print("Zone ID is required", file=sys.stderr) - return False - - data = { - "type": "CNAME", - "name": name, - "content": content, - "ttl": ttl, - "proxied": proxied - } - - print(f"Adding CNAME record for {name} pointing to {content}") - result = self._make_request("POST", f"zones/{self.zone_id}/dns_records", data) - - return result.get("success", False) - - def create_txt_record(self, name: str, content: str, ttl: int = 60) -> bool: - """Create a TXT record.""" - if not self.zone_id: - print("Zone ID is required", file=sys.stderr) - return False - - data = { - "type": "TXT", - "name": name, - "content": f'"{content}"', - "ttl": ttl - } - - print(f"Adding TXT record for {name} with content {content}") - result = self._make_request("POST", f"zones/{self.zone_id}/dns_records", data) - - return result.get("success", False) - - def create_caa_record(self, name: str, tag: str, value: str, flags: int = 0, ttl: int = 60) -> bool: - """Create a CAA record.""" - if not self.zone_id: - print("Zone ID is required", file=sys.stderr) - return False - - # Clean up the value - remove any existing quotes that might cause issues - clean_value = value.strip('"') - - # Cloudflare API expects a different structure for CAA records - # The data field should contain flags, tag, and value separately - data = { - "type": "CAA", - "name": name, - "ttl": ttl, - "data": { - "flags": flags, - "tag": tag, - "value": clean_value - } - } - - print(f"Adding CAA record for {name} with tag {tag} and value {clean_value}") - result = self._make_request("POST", f"zones/{self.zone_id}/dns_records", data) - - return result.get("success", False) - - -def main(): - parser = argparse.ArgumentParser(description="Manage Cloudflare DNS records") - parser.add_argument("action", choices=["get_zone_id", "set_cname", "set_txt", "set_caa"], - help="Action to perform") - parser.add_argument("--domain", required=True, help="Domain name") - parser.add_argument("--api-token", help="Cloudflare API token") - parser.add_argument("--zone-id", help="Cloudflare Zone ID") - parser.add_argument("--content", help="Record content (target for CNAME, value for TXT/CAA)") - parser.add_argument("--caa-tag", choices=["issue", "issuewild", "iodef"], - help="CAA record tag") - parser.add_argument("--caa-value", help="CAA record value") - - args = parser.parse_args() - - # Get API token from environment if not provided - api_token = args.api_token or os.environ.get("CLOUDFLARE_API_TOKEN") - if not api_token: - print("Error: Cloudflare API token is required", file=sys.stderr) - sys.exit(1) - - # Create DNS client - client = CloudflareDNSClient(api_token, args.zone_id) - - if args.action == "get_zone_id": - zone_id = client.get_zone_id(args.domain) - if not zone_id: - sys.exit(1) - print(zone_id) # Output zone ID for shell script to capture - - elif args.action == "set_cname": - if not args.content: - print("Error: --content is required for CNAME records", file=sys.stderr) - sys.exit(1) - - # Get zone ID if not provided - if not client.zone_id: - zone_id = client.get_zone_id(args.domain) - if not zone_id: - sys.exit(1) - # Make sure to use the zone_id from the client object, not the printed output - client.zone_id = zone_id - - # Check for existing records and delete them - existing_records = client.get_dns_records(args.domain, "CNAME") - for record in existing_records: - client.delete_dns_record(record["id"]) - - # Create new CNAME record - success = client.create_cname_record(args.domain, args.content) - if not success: - sys.exit(1) - - elif args.action == "set_txt": - # Get zone ID if not provided - if not client.zone_id: - zone_id = client.get_zone_id(args.domain) - if not zone_id: - sys.exit(1) - # Make sure to use the zone_id from the client object, not the printed output - client.zone_id = zone_id - - # Check for existing records and delete them - existing_records = client.get_dns_records(args.domain, "TXT") - for record in existing_records: - client.delete_dns_record(record["id"]) - - # Create new TXT record - success = client.create_txt_record(args.domain, args.content) - if not success: - sys.exit(1) - - elif args.action == "set_caa": - if not args.caa_tag or not args.caa_value: - print("Error: --caa-tag and --caa-value are required for CAA records", file=sys.stderr) - sys.exit(1) - - # Get zone ID if not provided - if not client.zone_id: - zone_id = client.get_zone_id(args.domain) - if not zone_id: - sys.exit(1) - # Make sure to use the zone_id from the client object, not the printed output - client.zone_id = zone_id - - # Check for existing records - existing_records = client.get_dns_records(args.domain, "CAA") - for record in existing_records: - # With the new API format, we need to check the data structure - record_data = record.get("data", {}) - record_tag = record_data.get("tag", "") - record_value = record_data.get("value", "") - - # If we find a record with the same tag and value, no need to update - if record_tag == args.caa_tag and record_value == args.caa_value: - print(f"CAA record with the same content already exists") - return - - # If it's the same tag but different value, delete it - if record_tag == args.caa_tag: - client.delete_dns_record(record["id"]) - - # Create new CAA record - success = client.create_caa_record(args.domain, args.caa_tag, args.caa_value) - if not success: - print(f"Failed to create CAA record for {args.domain}") - sys.exit(1) - - -if __name__ == "__main__": - main() diff --git a/custom-domain/dstack-ingress/scripts/dns_manager.py b/custom-domain/dstack-ingress/scripts/dns_manager.py new file mode 100755 index 0000000..98d8392 --- /dev/null +++ b/custom-domain/dstack-ingress/scripts/dns_manager.py @@ -0,0 +1,93 @@ +#!/usr/bin/env python3 + +from dns_providers import DNSProviderFactory +import argparse +import os +import sys + +sys.path.append(os.path.dirname(os.path.abspath(__file__))) + + +def main(): + parser = argparse.ArgumentParser( + description="Manage DNS records across multiple providers" + ) + parser.add_argument( + "action", + choices=["set_cname", "set_alias", "set_txt", "set_caa"], + help="Action to perform", + ) + parser.add_argument("--domain", required=True, help="Domain name") + parser.add_argument("--provider", help="DNS provider (cloudflare, linode)") + # Zone ID is now handled internally by each provider + parser.add_argument( + "--content", help="Record content (target for alias/CNAME, value for TXT/CAA)" + ) + parser.add_argument( + "--caa-tag", choices=["issue", "issuewild", "iodef"], help="CAA record tag" + ) + parser.add_argument("--caa-value", help="CAA record value") + + args = parser.parse_args() + + try: + # Create DNS provider instance + provider = DNSProviderFactory.create_provider(args.provider) + + if args.action == "set_cname": + if not args.content: + print("Error: --content is required for CNAME records", file=sys.stderr) + sys.exit(1) + + success = provider.set_alias_record(args.domain, args.content) + if not success: + print(f"Failed to set alias record for {args.domain}", file=sys.stderr) + sys.exit(1) + print(f"Successfully set alias record for {args.domain}") + + elif args.action == "set_alias": + if not args.content: + print("Error: --content is required for alias records", file=sys.stderr) + sys.exit(1) + + success = provider.set_alias_record(args.domain, args.content) + if not success: + print(f"Failed to set alias record for {args.domain}", file=sys.stderr) + sys.exit(1) + print(f"Successfully set alias record for {args.domain}") + + elif args.action == "set_txt": + if not args.content: + print("Error: --content is required for TXT records", file=sys.stderr) + sys.exit(1) + + success = provider.set_txt_record(args.domain, args.content) + if not success: + print(f"Failed to set TXT record for {args.domain}", file=sys.stderr) + sys.exit(1) + print(f"Successfully set TXT record for {args.domain}") + + elif args.action == "set_caa": + if not args.caa_tag or not args.caa_value: + print( + "Error: --caa-tag and --caa-value are required for CAA records", + file=sys.stderr, + ) + sys.exit(1) + + success = provider.set_caa_record(args.domain, args.caa_tag, args.caa_value) + if not success: + print(f"Failed to set CAA record for {args.domain}", file=sys.stderr) + sys.exit(1) + print(f"Successfully set CAA record for {args.domain}") + + except ValueError as e: + print(f"Error: {str(e)}", file=sys.stderr) + sys.exit(1) + except Exception as e: + print(f"Unexpected error: {str(e)}", file=sys.stderr) + sys.exit(1) + + +if __name__ == "__main__": + main() diff --git a/custom-domain/dstack-ingress/scripts/dns_providers/__init__.py b/custom-domain/dstack-ingress/scripts/dns_providers/__init__.py new file mode 100644 index 0000000..4302232 --- /dev/null +++ b/custom-domain/dstack-ingress/scripts/dns_providers/__init__.py @@ -0,0 +1,4 @@ +from .base import DNSProvider, DNSRecord, RecordType, CAARecord +from .factory import DNSProviderFactory + +__all__ = ["DNSProvider", "DNSRecord", "RecordType", "CAARecord", "DNSProviderFactory"] diff --git a/custom-domain/dstack-ingress/scripts/dns_providers/base.py b/custom-domain/dstack-ingress/scripts/dns_providers/base.py new file mode 100644 index 0000000..c9e5c24 --- /dev/null +++ b/custom-domain/dstack-ingress/scripts/dns_providers/base.py @@ -0,0 +1,275 @@ +#!/usr/bin/env python3 + +import os + +from abc import ABC, abstractmethod +from typing import Dict, List, Optional, Any +from dataclasses import dataclass +from enum import Enum + + +class RecordType(Enum): + A = "A" + AAAA = "AAAA" + CNAME = "CNAME" + TXT = "TXT" + MX = "MX" + NS = "NS" + CAA = "CAA" + SRV = "SRV" + PTR = "PTR" + + +@dataclass +class DNSRecord: + """Represents a DNS record.""" + + id: Optional[str] + name: str + type: RecordType + content: str + ttl: int = 60 + proxied: bool = False + priority: Optional[int] = None + data: Optional[Dict[str, Any]] = None + + +@dataclass +class CAARecord: + """Represents a CAA record with specific fields.""" + + name: str + flags: int + tag: str + value: str + ttl: int = 60 + + +class DNSProvider(ABC): + """Abstract base class for DNS providers.""" + + DETECT_ENV = "" + + # Certbot configuration - override in subclasses + CERTBOT_PLUGIN = "" + CERTBOT_PACKAGE = "" + CERTBOT_PROPAGATION_SECONDS = 120 + CERTBOT_CREDENTIALS_FILE = "" # Path to credentials file + + def __init__(self): + """Initialize the DNS provider.""" + pass + + def setup_certbot_credentials(self) -> bool: + """Setup credentials file for certbot. Override in subclasses if needed.""" + return True # Default: no setup needed + + @classmethod + def suitable(cls) -> bool: + """Check if the current environment is suitable for this DNS provider.""" + return os.environ.get(cls.DETECT_ENV) is not None + + @abstractmethod + def get_dns_records( + self, name: str, record_type: Optional[RecordType] = None + ) -> List[DNSRecord]: + """Get DNS records for a domain. + + Args: + name: The record name + record_type: Optional record type filter + + Returns: + List of DNS records + """ + pass + + @abstractmethod + def create_dns_record(self, record: DNSRecord) -> bool: + """Create a DNS record. + + Args: + record: The DNS record to create + + Returns: + True if successful, False otherwise + """ + pass + + @abstractmethod + def delete_dns_record(self, record_id: str, domain: str) -> bool: + """Delete a DNS record. + + Args: + record_id: The record ID to delete + domain: The domain name (for zone lookup) + + Returns: + True if successful, False otherwise + """ + pass + + @abstractmethod + def create_caa_record(self, caa_record: CAARecord) -> bool: + """Create a CAA record. + + Args: + caa_record: The CAA record to create + + Returns: + True if successful, False otherwise + """ + pass + + def set_a_record( + self, name: str, ip_address: str, ttl: int = 60, proxied: bool = False + ) -> bool: + """Set an A record (delete existing and create new). + + Args: + name: The record name + ip_address: The IP address + ttl: Time to live + proxied: Whether to proxy through provider (if supported) + + Returns: + True if successful, False otherwise + """ + existing_records = self.get_dns_records(name, RecordType.A) + for record in existing_records: + # Check if record already exists with same IP + if record.content == ip_address: + print("A record with the same IP already exists") + return True + if record.id: + self.delete_dns_record(record.id, name) + + new_record = DNSRecord( + id=None, + name=name, + type=RecordType.A, + content=ip_address, + ttl=ttl, + proxied=proxied, + ) + return self.create_dns_record(new_record) + + def set_alias_record( + self, + name: str, + content: str, + ttl: int = 60, + proxied: bool = False, + ) -> bool: + """Set an alias record (delete existing and create new). + + Creates a CNAME record by default. Some providers may override this + to use A records instead (e.g., Linode to avoid CAA conflicts). + + Args: + name: The record name + content: The alias target (domain name) + ttl: Time to live + proxied: Whether to proxy through provider (if supported) + + Returns: + True if successful, False otherwise + """ + return self.set_cname_record(name, content, ttl, proxied) + + def set_cname_record( + self, + name: str, + content: str, + ttl: int = 60, + proxied: bool = False, + ) -> bool: + """Set an alias record (delete existing and create new). + + Creates a CNAME record by default. Some providers may override this + to use A records instead (e.g., Linode to avoid CAA conflicts). + + Args: + name: The record name + content: The alias target (domain name) + ttl: Time to live + proxied: Whether to proxy through provider (if supported) + + Returns: + True if successful, False otherwise + """ + existing_records = self.get_dns_records(name, RecordType.CNAME) + for record in existing_records: + # Check if record already exists with same content + if record.content == content: + print("CNAME record with the same content already exists") + return True + if record.id: + self.delete_dns_record(record.id, name) + + new_record = DNSRecord( + id=None, + name=name, + type=RecordType.CNAME, + content=content, + ttl=ttl, + proxied=proxied, + ) + return self.create_dns_record(new_record) + + def set_txt_record(self, name: str, content: str, ttl: int = 60) -> bool: + """Set a TXT record (delete existing and create new). + + Args: + name: The record name + content: The TXT content + ttl: Time to live + + Returns: + True if successful, False otherwise + """ + existing_records = self.get_dns_records(name, RecordType.TXT) + for record in existing_records: + # Check if record already exists with same content + if record.content == content or record.content == f'"{content}"': + print("TXT record with the same content already exists") + return True + if record.id: + self.delete_dns_record(record.id, name) + + new_record = DNSRecord( + id=None, name=name, type=RecordType.TXT, content=content, ttl=ttl + ) + return self.create_dns_record(new_record) + + def set_caa_record( + self, + name: str, + tag: str, + value: str, + flags: int = 0, + ttl: int = 60, + ) -> bool: + """Set a CAA record (delete existing with same tag and create new). + + Args: + name: The record name + tag: The CAA tag (issue, issuewild, iodef) + value: The CAA value + flags: The CAA flags + ttl: Time to live + + Returns: + True if successful, False otherwise + """ + existing_records = self.get_dns_records(name, RecordType.CAA) + for record in existing_records: + if record.data and record.data.get("tag") == tag: + if record.data.get("value") == value: + print("CAA record with the same content already exists") + return True + if record.id: + self.delete_dns_record(record.id, name) + + caa_record = CAARecord(name=name, flags=flags, tag=tag, value=value, ttl=ttl) + return self.create_caa_record(caa_record) diff --git a/custom-domain/dstack-ingress/scripts/dns_providers/cloudflare.py b/custom-domain/dstack-ingress/scripts/dns_providers/cloudflare.py new file mode 100644 index 0000000..d3ed099 --- /dev/null +++ b/custom-domain/dstack-ingress/scripts/dns_providers/cloudflare.py @@ -0,0 +1,271 @@ +#!/usr/bin/env python3 + +import os +import sys +import json +import requests +from typing import Dict, List, Optional +from .base import DNSProvider, DNSRecord, CAARecord, RecordType + + +class CloudflareDNSProvider(DNSProvider): + """DNS provider implementation for Cloudflare.""" + + DETECT_ENV = "CLOUDFLARE_API_TOKEN" + + # Certbot configuration + CERTBOT_PLUGIN = "dns-cloudflare" + CERTBOT_PACKAGE = "certbot-dns-cloudflare==4.0.0" + CERTBOT_PROPAGATION_SECONDS = 120 + CERTBOT_CREDENTIALS_FILE = "~/.cloudflare/cloudflare.ini" + + def __init__(self): + super().__init__() + self.api_token = os.getenv("CLOUDFLARE_API_TOKEN") + if not self.api_token: + raise ValueError("CLOUDFLARE_API_TOKEN environment variable is required") + self.base_url = "https://api.cloudflare.com/client/v4" + self.headers = { + "Authorization": f"Bearer {self.api_token}", + "Content-Type": "application/json", + } + self.zone_id: Optional[str] = None # Will be set when needed + self.zone_domain: Optional[str] = None # Cache the domain for the zone + + def setup_certbot_credentials(self) -> bool: + """Setup Cloudflare credentials file for certbot.""" + credentials_file = os.path.expanduser(self.CERTBOT_CREDENTIALS_FILE) + credentials_dir = os.path.dirname(credentials_file) + + try: + # Create credentials directory + os.makedirs(credentials_dir, exist_ok=True) + + # Write credentials file + with open(credentials_file, "w") as f: + f.write(f"dns_cloudflare_api_token = {self.api_token}\n") + + # Set secure permissions + os.chmod(credentials_file, 0o600) + print(f"Cloudflare credentials file created: {credentials_file}") + + # Pre-fetch zone ID if we have a domain + domain = os.getenv("DOMAIN") + if domain: + self._ensure_zone_id(domain) + + return True + + except Exception as e: + print(f"Error setting up Cloudflare credentials: {e}", file=sys.stderr) + return False + + def _make_request( + self, method: str, endpoint: str, data: Optional[Dict] = None + ) -> Dict: + """Make a request to the Cloudflare API with error handling.""" + url = f"{self.base_url}/{endpoint}" + try: + if method.upper() == "GET": + response = requests.get(url, headers=self.headers) + elif method.upper() == "POST": + response = requests.post(url, headers=self.headers, json=data) + elif method.upper() == "DELETE": + response = requests.delete(url, headers=self.headers) + else: + raise ValueError(f"Unsupported HTTP method: {method}") + + response.raise_for_status() + result = response.json() + + if not result.get("success", False): + errors = result.get("errors", []) + error_msg = "\n".join( + [ + f"Code: {e.get('code')}, Message: {e.get('message')}" + for e in errors + ] + ) + print(f"API Error: {error_msg}", file=sys.stderr) + if data: + print(f"Request data: {json.dumps(data)}", file=sys.stderr) + return {"success": False, "errors": errors} + + return result + except requests.exceptions.RequestException as e: + print(f"Request Error: {str(e)}", file=sys.stderr) + if data: + print(f"Request data: {json.dumps(data)}", file=sys.stderr) + return {"success": False, "errors": [{"message": str(e)}]} + except json.JSONDecodeError: + print("JSON Decode Error: Could not parse response", file=sys.stderr) + return { + "success": False, + "errors": [{"message": "Could not parse response"}], + } + except Exception as e: + print(f"Unexpected Error: {str(e)}", file=sys.stderr) + return {"success": False, "errors": [{"message": str(e)}]} + + def _get_zone_info(self, domain: str) -> Optional[tuple[str, str]]: + """Get the zone ID and zone name for a domain.""" + zone_name_len = 0 + zone_id = None + zone_name_found = None + + page = 1 + total_pages = 1 + + while page <= total_pages: + result = self._make_request("GET", f"zones?page={page}") + + if not result.get("success", False): + return None + + zones = result.get("result", []) + if not zones and page == 1: + print("No zones found for any domain", file=sys.stderr) + return None + + result_info = result.get("result_info", {}) + if result_info: + total_pages = result_info.get("total_pages", total_pages) + + for zone in zones: + zone_name = zone.get("name", "") + if domain == zone_name: + return (zone.get("id"), zone_name) + if domain.endswith(f".{zone_name}") and len(zone_name) > zone_name_len: + zone_name_len = len(zone_name) + zone_id = zone.get("id") + zone_name_found = zone_name + + page += 1 + + if zone_id and zone_name_found: + return (zone_id, zone_name_found) + else: + print( + f"Zone ID not found in response for domain: {domain}", file=sys.stderr + ) + return None + + def _ensure_zone_id(self, domain: str) -> Optional[str]: + """Ensure we have a zone ID for the domain, fetching if necessary.""" + if self.zone_id and self.zone_domain: + if domain == self.zone_domain or domain.endswith(f".{self.zone_domain}"): + return self.zone_id + + zone_info = self._get_zone_info(domain) + if zone_info: + self.zone_id, self.zone_domain = zone_info + return self.zone_id + + def get_dns_records( + self, name: str, record_type: Optional[RecordType] = None + ) -> List[DNSRecord]: + """Get DNS records for a domain.""" + zone_id = self._ensure_zone_id(name) + if not zone_id: + print(f"Error: Could not find zone for domain {name}", file=sys.stderr) + return [] + + params = f"zones/{zone_id}/dns_records?name={name}" + if record_type: + params += f"&type={record_type.value}" + + print(f"Checking for existing DNS records for {name}") + result = self._make_request("GET", params) + + if not result.get("success", False): + return [] + + records = [] + for record_data in result.get("result", []): + record = DNSRecord( + id=record_data.get("id"), + name=record_data.get("name"), + type=RecordType(record_data.get("type")), + content=record_data.get("content"), + ttl=record_data.get("ttl", 60), + proxied=record_data.get("proxied", False), + priority=record_data.get("priority"), + data=record_data.get("data"), + ) + records.append(record) + + return records + + def create_dns_record(self, record: DNSRecord) -> bool: + """Create a DNS record.""" + zone_id = self._ensure_zone_id(record.name) + if not zone_id: + print( + f"Error: Could not find zone for domain {record.name}", file=sys.stderr + ) + return False + + data = { + "type": record.type.value, + "name": record.name, + "content": record.content, + "ttl": record.ttl, + } + + if record.type == RecordType.CNAME and hasattr(record, "proxied"): + data["proxied"] = record.proxied + + if record.type == RecordType.TXT: + data["content"] = f'"{record.content}"' + + if record.priority is not None: + data["priority"] = record.priority + + print(f"Adding {record.type.value} record for {record.name}") + result = self._make_request("POST", f"zones/{zone_id}/dns_records", data) + + return result.get("success", False) + + def delete_dns_record(self, record_id: str, domain: str) -> bool: + """Delete a DNS record.""" + zone_id = self._ensure_zone_id(domain) + if not zone_id: + print(f"Error: Could not find zone for domain {domain}", file=sys.stderr) + return False + + print(f"Deleting record ID: {record_id}") + result = self._make_request( + "DELETE", f"zones/{zone_id}/dns_records/{record_id}" + ) + + return result.get("success", False) + + def create_caa_record(self, caa_record: CAARecord) -> bool: + """Create a CAA record.""" + zone_id = self._ensure_zone_id(caa_record.name) + if not zone_id: + print( + f"Error: Could not find zone for domain {caa_record.name}", + file=sys.stderr, + ) + return False + + clean_value = caa_record.value.strip('"') + + data = { + "type": "CAA", + "name": caa_record.name, + "ttl": caa_record.ttl, + "data": { + "flags": caa_record.flags, + "tag": caa_record.tag, + "value": clean_value, + }, + } + + print( + f"Adding CAA record for {caa_record.name} with tag {caa_record.tag} and value {clean_value}" + ) + result = self._make_request("POST", f"zones/{zone_id}/dns_records", data) + + return result.get("success", False) diff --git a/custom-domain/dstack-ingress/scripts/dns_providers/factory.py b/custom-domain/dstack-ingress/scripts/dns_providers/factory.py new file mode 100644 index 0000000..84743d7 --- /dev/null +++ b/custom-domain/dstack-ingress/scripts/dns_providers/factory.py @@ -0,0 +1,68 @@ +#!/usr/bin/env python3 + +import os +from typing import Optional +from .base import DNSProvider +from .cloudflare import CloudflareDNSProvider +from .linode import LinodeDNSProvider + + +class DNSProviderFactory: + """Factory class for creating DNS provider instances.""" + + PROVIDERS = { + "cloudflare": CloudflareDNSProvider, + "linode": LinodeDNSProvider, + } + + @classmethod + def create_provider( + cls, + provider_type: Optional[str] = None, + ) -> DNSProvider: + """Create a DNS provider instance. + + Args: + provider_type: Type of DNS provider + If not specified, will be detected from environment variables + + Returns: + DNSProvider instance + + Raises: + ValueError: If provider type is invalid + """ + # Auto-detect provider type from environment if not specified + if not provider_type: + provider_type = cls._detect_provider_type() + + provider_type = provider_type.lower() + + if provider_type not in cls.PROVIDERS: + raise ValueError( + f"Unsupported DNS provider: {provider_type}. Supported providers: {', '.join(cls.PROVIDERS.keys())}" + ) + + # Lazy import the provider class + provider_class = cls.PROVIDERS[provider_type] + return provider_class() + + @classmethod + def _detect_provider_type(cls) -> str: + """Detect DNS provider type from environment variables.""" + if os.environ.get("DNS_PROVIDER"): + return os.environ["DNS_PROVIDER"] + + for name, provider in cls.PROVIDERS.items(): + if provider.suitable(): + return name + + raise ValueError( + "Could not detect DNS provider type from environment variables. " + "Please set DNS_PROVIDER environment variable." + ) + + @classmethod + def get_supported_providers(cls) -> list: + """Get list of supported DNS providers.""" + return list(cls.PROVIDERS.keys()) diff --git a/custom-domain/dstack-ingress/scripts/dns_providers/linode.py b/custom-domain/dstack-ingress/scripts/dns_providers/linode.py new file mode 100644 index 0000000..de264ad --- /dev/null +++ b/custom-domain/dstack-ingress/scripts/dns_providers/linode.py @@ -0,0 +1,343 @@ +#!/usr/bin/env python3 + +import os +import sys +import json +import socket +import requests +from typing import Dict, List, Optional +from .base import DNSProvider, DNSRecord, CAARecord, RecordType + + +class LinodeDNSProvider(DNSProvider): + """DNS provider implementation for Linode DNS.""" + + DETECT_ENV = "LINODE_API_TOKEN" + + # Certbot configuration + CERTBOT_PLUGIN = "dns-linode" + CERTBOT_PACKAGE = "certbot-dns-linode" + CERTBOT_PROPAGATION_SECONDS = 300 + CERTBOT_CREDENTIALS_FILE = "~/.linode/credentials.ini" + + def __init__(self): + super().__init__() + self.api_token = os.getenv("LINODE_API_TOKEN") + if not self.api_token: + raise ValueError("LINODE_API_TOKEN environment variable is required") + self.base_url = "https://api.linode.com/v4" + self.headers = { + "Authorization": f"Bearer {self.api_token}", + "Content-Type": "application/json", + } + self.zone_id: Optional[str] = None # Will be set when needed + self.zone_domain: Optional[str] = None # Cache the domain for the zone + + def setup_certbot_credentials(self) -> bool: + """Setup Linode credentials file for certbot.""" + credentials_file = os.path.expanduser(self.CERTBOT_CREDENTIALS_FILE) + credentials_dir = os.path.dirname(credentials_file) + + try: + # Create credentials directory + os.makedirs(credentials_dir, exist_ok=True) + + # Write credentials file + with open(credentials_file, "w") as f: + f.write("# WARNING: This file contains sensitive credentials for Linode DNS API.\n") + f.write("# Ensure this file is kept secure and not shared.\n") + f.write(f"dns_linode_key = {self.api_token}\n") + + # Set secure permissions + os.chmod(credentials_file, 0o600) + print(f"Linode credentials file created: {credentials_file}") + + # Pre-fetch zone ID if we have a domain + domain = os.getenv("DOMAIN") + if domain: + self._ensure_zone_id(domain) + + return True + + except Exception as e: + print(f"Error setting up Linode credentials: {e}", file=sys.stderr) + return False + + def _make_request( + self, method: str, endpoint: str, data: Optional[Dict] = None + ) -> Dict: + """Make a request to the Linode API with error handling.""" + url = f"{self.base_url}/{endpoint}" + try: + if method.upper() == "GET": + response = requests.get(url, headers=self.headers) + elif method.upper() == "POST": + response = requests.post(url, headers=self.headers, json=data) + elif method.upper() == "PUT": + response = requests.put(url, headers=self.headers, json=data) + elif method.upper() == "DELETE": + response = requests.delete(url, headers=self.headers) + else: + raise ValueError(f"Unsupported HTTP method: {method}") + + if response.status_code == 404: + return { + "success": False, + "errors": [{"field": "not_found", "reason": "Resource not found"}], + } + + response.raise_for_status() + + # For DELETE requests, Linode returns empty response + if method.upper() == "DELETE" and response.status_code == 200: + return {"success": True} + + # For successful GET/POST/PUT, parse JSON + if response.content: + result = response.json() + return {"success": True, "data": result} + else: + return {"success": True} + + except requests.exceptions.RequestException as e: + print(f"Request Error: {str(e)}", file=sys.stderr) + if data: + print(f"Request data: {json.dumps(data)}", file=sys.stderr) + return {"success": False, "errors": [{"reason": str(e)}]} + except json.JSONDecodeError: + print("JSON Decode Error: Could not parse response", file=sys.stderr) + return { + "success": False, + "errors": [{"reason": "Could not parse response"}], + } + except Exception as e: + print(f"Unexpected Error: {str(e)}", file=sys.stderr) + return {"success": False, "errors": [{"reason": str(e)}]} + + def _get_zone_id(self, domain: str) -> Optional[str]: + """Get the domain ID for a domain in Linode.""" + result = self._make_request("GET", "domains") + + if not result.get("success", False): + return None + + domains = result.get("data", {}).get("data", []) + + best_match_domain = None + best_match_length = 0 + + for domain_obj in domains: + domain_name = domain_obj.get("domain", "") + if domain == domain_name: + return str(domain_obj.get("id")) + if ( + domain.endswith(f".{domain_name}") + and len(domain_name) > best_match_length + ): + best_match_length = len(domain_name) + best_match_domain = domain_obj.get("id") + + if best_match_domain: + return str(best_match_domain) + else: + print(f"Domain not found: {domain}", file=sys.stderr) + return None + + def _get_subdomain(self, fqdn: str, domain_id: str) -> str: + """Get the subdomain part for a record.""" + # First, get the domain name + result = self._make_request("GET", f"domains/{domain_id}") + if not result.get("success", False): + return fqdn + + domain_name = result.get("data", {}).get("domain", "") + + if fqdn == domain_name: + return "" # Root domain + elif fqdn.endswith(f".{domain_name}"): + return fqdn[: -len(domain_name) - 1] + else: + return fqdn + + def _ensure_zone_id(self, domain: str) -> Optional[str]: + """Ensure we have a zone ID for the domain, fetching if necessary.""" + # If we already have a zone_id and it's for a parent domain, reuse it + if self.zone_id and self.zone_domain: + if domain == self.zone_domain or domain.endswith(f".{self.zone_domain}"): + return self.zone_id + + # Otherwise fetch the zone ID + self.zone_id = self._get_zone_id(domain) + if self.zone_id: + # Store the base domain for this zone + # For Linode, we need to get the actual domain from the API + result = self._make_request("GET", f"domains/{self.zone_id}") + if result.get("success", False): + self.zone_domain = result.get("data", {}).get("domain", "") + return self.zone_id + + def get_dns_records( + self, name: str, record_type: Optional[RecordType] = None + ) -> List[DNSRecord]: + """Get DNS records for a domain.""" + zone_id = self._ensure_zone_id(name) + if not zone_id: + print(f"Error: Could not find zone for domain {name}", file=sys.stderr) + return [] + + result = self._make_request("GET", f"domains/{zone_id}/records") + + if not result.get("success", False): + return [] + + print(f"Checking for existing DNS records for {name}") + + records = [] + subdomain = self._get_subdomain(name, zone_id) + + for record_data in result.get("data", {}).get("data", []): + record_name = record_data.get("name", "") + + # Match records by subdomain + if record_name == subdomain: + record_type_str = record_data.get("type", "") + + # Filter by record type if specified + if record_type and record_type.value != record_type_str: + continue + + # Parse CAA record data if applicable + data = None + if record_type_str == "CAA": + # Linode stores CAA with separate tag and target fields + target = record_data.get("target", "") + tag = record_data.get("tag", "issue") + + data = { + "flags": 0, # Linode doesn't support flags (always 0) + "tag": tag, + "value": target.strip('"'), + } + + records.append( + DNSRecord( + id=str(record_data.get("id")), + name=name, + type=RecordType(record_type_str), + content=record_data.get("target", ""), + ttl=record_data.get("ttl_sec", 60), + priority=record_data.get("priority"), + data=data, + ) + ) + + return records + + def create_dns_record(self, record: DNSRecord) -> bool: + """Create a DNS record.""" + zone_id = self._ensure_zone_id(record.name) + if not zone_id: + print( + f"Error: Could not find zone for domain {record.name}", file=sys.stderr + ) + return False + + subdomain = self._get_subdomain(record.name, zone_id) + + data = { + "type": record.type.value, + "name": subdomain, + "target": record.content, + "ttl_sec": record.ttl, + } + + # Handle specific record types + if record.type == RecordType.TXT: + # Ensure TXT records have quotes + if not record.content.startswith('"'): + data["target"] = f'"{record.content}"' + + if record.priority is not None: + data["priority"] = record.priority + + print(f"Adding {record.type.value} record for {record.name}") + result = self._make_request("POST", f"domains/{zone_id}/records", data) + + return result.get("success", False) + + def delete_dns_record(self, record_id: str, domain: str) -> bool: + """Delete a DNS record.""" + zone_id = self._ensure_zone_id(domain) + if not zone_id: + print(f"Error: Could not find zone for domain {domain}", file=sys.stderr) + return False + + print(f"Deleting record ID: {record_id}") + result = self._make_request("DELETE", f"domains/{zone_id}/records/{record_id}") + + return result.get("success", False) + + def set_alias_record( + self, + name: str, + content: str, + ttl: int = 60, + proxied: bool = False, + ) -> bool: + """Override to use A record instead of CNAME for Linode to avoid CAA conflicts. + + Linode doesn't allow CAA and CNAME records on the same subdomain. + Using A records solves this limitation. + """ + # Resolve domain to IP + domain = content + print(f"Trying to resolve: {domain}") + ip_address = socket.gethostbyname(domain) + print(f"✅ Resolved {domain} to IP: {ip_address}") + + if not ip_address: + raise socket.gaierror("Could not resolve any variant of the domain") + + # Delete any existing CNAME records for this name (clean transition) + existing_cname_records = self.get_dns_records(name, RecordType.CNAME) + for record in existing_cname_records: + if record.id: + self.delete_dns_record(record.id, name) + + print( + f"Creating A record for {name} pointing to {ip_address} (instead of CNAME to {content})" + ) + # Use the base class's set_a_record method with idempotency + return self.set_a_record(name, ip_address, ttl, proxied=False) + + def create_caa_record(self, caa_record: CAARecord) -> bool: + """Create a CAA record.""" + zone_id = self._ensure_zone_id(caa_record.name) + if not zone_id: + print( + f"Error: Could not find zone for domain {caa_record.name}", + file=sys.stderr, + ) + return False + + subdomain = self._get_subdomain(caa_record.name, zone_id) + + # Clean up the value + clean_value = caa_record.value.strip('"') + + # Linode CAA format uses separate tag and target fields + # The flags are not supported in Linode API (always 0) + data = { + "type": "CAA", + "name": subdomain, + "tag": caa_record.tag, + "target": clean_value, + "ttl_sec": caa_record.ttl, + } + + print( + f"Adding CAA record for {caa_record.name} with tag {caa_record.tag} and value {clean_value}" + ) + result = self._make_request("POST", f"domains/{zone_id}/records", data) + + return result.get("success", False) diff --git a/custom-domain/dstack-ingress/scripts/entrypoint.sh b/custom-domain/dstack-ingress/scripts/entrypoint.sh index 972453b..17c671d 100644 --- a/custom-domain/dstack-ingress/scripts/entrypoint.sh +++ b/custom-domain/dstack-ingress/scripts/entrypoint.sh @@ -1,15 +1,27 @@ #!/bin/bash + set -e PORT=${PORT:-443} TXT_PREFIX=${TXT_PREFIX:-"_tapp-address"} +echo "Setting up Python environment" + setup_py_env() { if [ ! -d "/opt/app-venv" ]; then python3 -m venv --system-site-packages /opt/app-venv fi source /opt/app-venv/bin/activate - pip install certbot-dns-cloudflare==4.0.0 + + pip install requests + + # Use the unified certbot manager to install plugins and setup credentials + echo "Setting up certbot environment" + certman.py setup + if [ $? -ne 0 ]; then + echo "Error: Failed to setup certbot environment" + exit 1 + fi } PROXY_CMD="proxy" @@ -18,46 +30,46 @@ if [[ "${TARGET_ENDPOINT}" == grpc://* ]]; then fi setup_nginx_conf() { - cat < /etc/nginx/conf.d/default.conf + cat </etc/nginx/conf.d/default.conf server { listen ${PORT} ssl; http2 on; server_name ${DOMAIN}; - + # SSL certificate configuration ssl_certificate /etc/letsencrypt/live/${DOMAIN}/fullchain.pem; ssl_certificate_key /etc/letsencrypt/live/${DOMAIN}/privkey.pem; - + # Modern SSL configuration - TLS 1.2 and 1.3 only ssl_protocols TLSv1.2 TLSv1.3; - + # Strong cipher suites - Only AES-GCM and ChaCha20-Poly1305 ssl_ciphers 'TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305'; - + # Prefer server cipher suites ssl_prefer_server_ciphers on; - + # ECDH curve for ECDHE ciphers ssl_ecdh_curve secp384r1; - + # Enable OCSP stapling ssl_stapling on; ssl_stapling_verify on; ssl_trusted_certificate /etc/letsencrypt/live/${DOMAIN}/fullchain.pem; resolver 8.8.8.8 8.8.4.4 valid=300s; resolver_timeout 5s; - + # SSL session configuration ssl_session_timeout 1d; ssl_session_cache shared:SSL:50m; ssl_session_tickets off; - + # SSL buffer size (optimized for TLS 1.3) ssl_buffer_size 4k; - + # Disable SSL renegotiation ssl_early_data off; - + location / { ${PROXY_CMD}_pass ${TARGET_ENDPOINT}; ${PROXY_CMD}_set_header Host \$host; @@ -75,28 +87,20 @@ EOF mkdir -p /var/log/nginx } -obtain_certificate() { - # Request certificate using the virtual environment - certbot certonly --dns-cloudflare \ - --dns-cloudflare-credentials ~/.cloudflare/cloudflare.ini \ - --dns-cloudflare-propagation-seconds 120 \ - --email "$CERTBOT_EMAIL" \ - --agree-tos --no-eff-email --non-interactive \ - -d "$DOMAIN" -} -set_cname_record() { - # Use the Python client to set the CNAME record - # This will automatically check for and delete existing records - cloudflare_dns.py set_cname \ - --zone-id "$CLOUDFLARE_ZONE_ID" \ +set_alias_record() { + # Use the unified DNS manager to set the alias record + source /opt/app-venv/bin/activate + echo "Setting alias record for $DOMAIN" + dns_manager.py set_alias \ --domain "$DOMAIN" \ --content "$GATEWAY_DOMAIN" - + if [ $? -ne 0 ]; then - echo "Error: Failed to set CNAME record for $DOMAIN" + echo "Error: Failed to set alias record for $DOMAIN" exit 1 fi + echo "Alias record set for $DOMAIN" } set_txt_record() { @@ -105,12 +109,12 @@ set_txt_record() { # Generate a unique app ID if not provided APP_ID=${APP_ID:-$(curl -s --unix-socket /var/run/tappd.sock http://localhost/prpc/Tappd.Info | jq -j '.app_id')} - # Use the Python client to set the TXT record - cloudflare_dns.py set_txt \ - --zone-id "$CLOUDFLARE_ZONE_ID" \ + # Use the unified DNS manager to set the TXT record + source /opt/app-venv/bin/activate + dns_manager.py set_txt \ --domain "${TXT_PREFIX}.${DOMAIN}" \ --content "$APP_ID:$PORT" - + if [ $? -ne 0 ]; then echo "Error: Failed to set TXT record for $DOMAIN" exit 1 @@ -126,39 +130,39 @@ set_caa_record() { local ACCOUNT_URI ACCOUNT_URI=$(jq -j '.uri' /evidences/acme-account.json) echo "Adding CAA record for $DOMAIN, accounturi=$ACCOUNT_URI" - cloudflare_dns.py set_caa \ - --zone-id "$CLOUDFLARE_ZONE_ID" \ + source /opt/app-venv/bin/activate + dns_manager.py set_caa \ --domain "$DOMAIN" \ --caa-tag "issue" \ --caa-value "letsencrypt.org;validationmethods=dns-01;accounturi=$ACCOUNT_URI" - + if [ $? -ne 0 ]; then - echo "Error: Failed to set CAA record for $DOMAIN" - exit 1 + echo "Warning: Failed to set CAA record for $DOMAIN" + echo "This is not critical - certificates can still be issued without CAA records" + echo "Consider disabling CAA records by setting SET_CAA=false if this continues to fail" + # Don't exit - CAA records are optional for certificate generation fi } bootstrap() { - echo "Obtaining new certificate for $DOMAIN" - setup_py_env - obtain_certificate - generate-evidences.sh - set_cname_record + echo "Bootstrap: Setting up $DOMAIN" + source /opt/app-venv/bin/activate + renew-certificate.sh -n + set_alias_record set_txt_record set_caa_record touch /.bootstrapped } -# Create Cloudflare credentials file -mkdir -p ~/.cloudflare -echo "dns_cloudflare_api_token = $CLOUDFLARE_API_TOKEN" > ~/.cloudflare/cloudflare.ini -chmod 600 ~/.cloudflare/cloudflare.ini +# Credentials are now handled by certman.py setup + +# Setup Python environment and install dependencies first +setup_py_env # Check if it's the first time the container is started if [ ! -f "/.bootstrapped" ]; then bootstrap else - source /opt/app-venv/bin/activate echo "Certificate for $DOMAIN already exists" fi diff --git a/custom-domain/dstack-ingress/scripts/generate-evidences.sh b/custom-domain/dstack-ingress/scripts/generate-evidences.sh index 6db82ea..33096ed 100644 --- a/custom-domain/dstack-ingress/scripts/generate-evidences.sh +++ b/custom-domain/dstack-ingress/scripts/generate-evidences.sh @@ -1,11 +1,10 @@ #!/bin/bash -set -e ACME_ACCOUNT_FILE=$(ls /etc/letsencrypt/accounts/acme-v02.api.letsencrypt.org/directory/*/regr.json) CERT_FILE=/etc/letsencrypt/live/${DOMAIN}/fullchain.pem mkdir -p /evidences -cd /evidences +cd /evidences || exit cp "${ACME_ACCOUNT_FILE}" acme-account.json cp "${CERT_FILE}" cert.pem @@ -21,4 +20,8 @@ done QUOTED_HASH="${PADDED_HASH}" curl -s --unix-socket /var/run/tappd.sock "http://localhost/prpc/Tappd.RawQuote?report_data=${QUOTED_HASH}" > quote.json +if [ $? -ne 0 ]; then + echo "Error: Failed to generate evidences" + exit 1 +fi echo "Generated evidences successfully" diff --git a/custom-domain/dstack-ingress/scripts/renew-certificate.sh b/custom-domain/dstack-ingress/scripts/renew-certificate.sh old mode 100644 new mode 100755 index de4105a..b8e412c --- a/custom-domain/dstack-ingress/scripts/renew-certificate.sh +++ b/custom-domain/dstack-ingress/scripts/renew-certificate.sh @@ -1,32 +1,51 @@ #!/bin/bash source /opt/app-venv/bin/activate -echo "Renewing certificate for $DOMAIN" - -# Perform the actual renewal with explicit credentials and capture the output -RENEW_OUTPUT=$(certbot renew --dns-cloudflare --dns-cloudflare-credentials ~/.cloudflare/cloudflare.ini --dns-cloudflare-propagation-seconds 120 --non-interactive 2>&1) -RENEW_STATUS=$? - -# Check if renewal failed -if [ $RENEW_STATUS -ne 0 ]; then - echo "Certificate renewal failed" >&2 +INITIAL=false +while getopts "n" opt; do + case $opt in + n) + INITIAL=true + ;; + \?) + echo "Invalid option: -$OPTARG" >&2 + exit 1 + ;; + esac +done + +# Use the unified certbot manager +SCRIPT_DIR="$(dirname "$(readlink -f "$0")")" +python3 "$SCRIPT_DIR/certman.py" auto --domain "$DOMAIN" --email "$CERTBOT_EMAIL" +CERT_STATUS=$? + +if [ $CERT_STATUS -eq 1 ]; then + echo "Certificate management failed" >&2 exit 1 -fi - -# Check if no renewals were attempted -if echo "$RENEW_OUTPUT" | grep -q "No renewals were attempted"; then +elif [ $CERT_STATUS -eq 2 ]; then echo "No certificates need renewal, skipping evidence generation" - exit 0 + if [ "$INITIAL" = false ]; then + exit 0 + fi fi -# Only generate evidences if certificates were actually renewed +# Generate evidences (for both obtain and renew) +echo "Generating evidence files..." generate-evidences.sh -# Only reload Nginx if we got here (meaning certificates were renewed) -if ! nginx -s reload; then - echo "Nginx reload failed" >&2 - exit 2 +# Reload Nginx for certificate updates +# Check if certificate exists to determine if this was obtain or renew +if [ -f "/etc/letsencrypt/live/$DOMAIN/fullchain.pem" ]; then + if [ "$INITIAL" = true ]; then + echo "Certificate obtained successfully for $DOMAIN" + else + if ! nginx -s reload; then + echo "Nginx reload failed" >&2 + exit 2 + else + echo "Certificate renewed and Nginx reloaded successfully for $DOMAIN" + fi + fi fi exit 0 - diff --git a/custom-domain/dstack-ingress/scripts/renewal-daemon.sh b/custom-domain/dstack-ingress/scripts/renewal-daemon.sh old mode 100644 new mode 100755