|
| 1 | +# Document Hetzner SSH Key Dual Injection and Root Access Security |
| 2 | + |
| 3 | +**Issue**: [#266](https://github.com/torrust/torrust-tracker-deployer/issues/266) |
| 4 | +**Parent Epic**: None (standalone documentation task) |
| 5 | +**Related**: |
| 6 | + |
| 7 | +- [Hetzner Provider Documentation](../user-guide/providers/hetzner.md) |
| 8 | +- [LXD Provider Documentation](../user-guide/providers/lxd.md) |
| 9 | +- [SSH Keys Guide](../tech-stack/ssh-keys.md) |
| 10 | + |
| 11 | +## Overview |
| 12 | + |
| 13 | +When deploying to Hetzner Cloud, the deployer configures SSH key access through two independent mechanisms: |
| 14 | + |
| 15 | +1. **OpenTofu `hcloud_ssh_key` resource**: Registers the SSH public key in Hetzner's account-level key registry and attaches it to the server, enabling root SSH access |
| 16 | +2. **cloud-init `ssh_authorized_keys`**: Injects the same SSH public key into the application user's (`torrust`) authorized_keys file |
| 17 | + |
| 18 | +This results in **both root and application user having SSH access** after deployment. While this behavior is intentional (provides cloud-init debugging capability), users should be aware of this security implication and can optionally disable root SSH access after successful deployment. |
| 19 | + |
| 20 | +This issue tracks documenting this behavior across the codebase and providing guidance for users who want stricter security. |
| 21 | + |
| 22 | +## Goals |
| 23 | + |
| 24 | +- [ ] Document the SSH key dual injection behavior |
| 25 | +- [ ] Explain the security implications |
| 26 | +- [ ] Provide instructions for disabling root SSH access post-deployment |
| 27 | +- [ ] Create a decision record explaining the architectural rationale |
| 28 | +- [ ] Document why LXD provider doesn't have this behavior |
| 29 | + |
| 30 | +## 🏗️ Architecture Requirements |
| 31 | + |
| 32 | +**DDD Layer**: N/A (documentation only) |
| 33 | +**Module Path**: N/A |
| 34 | +**Pattern**: Documentation and Decision Record |
| 35 | + |
| 36 | +### Architectural Context |
| 37 | + |
| 38 | +The Hetzner provider uses a dual SSH key injection pattern: |
| 39 | + |
| 40 | +```text |
| 41 | +┌─────────────────────────────────────────────────────────────────┐ |
| 42 | +│ Hetzner Cloud │ |
| 43 | +├─────────────────────────────────────────────────────────────────┤ |
| 44 | +│ 1. OpenTofu creates hcloud_ssh_key resource │ |
| 45 | +│ └─ Key appears in Hetzner Console → Security → SSH Keys │ |
| 46 | +│ │ |
| 47 | +│ 2. OpenTofu creates server with ssh_keys = [hcloud_ssh_key.id] │ |
| 48 | +│ └─ Hetzner injects key into root's ~/.ssh/authorized_keys │ |
| 49 | +│ │ |
| 50 | +│ 3. cloud-init runs user-data script │ |
| 51 | +│ └─ Creates 'torrust' user with same key in authorized_keys │ |
| 52 | +└─────────────────────────────────────────────────────────────────┘ |
| 53 | +``` |
| 54 | + |
| 55 | +The LXD provider only uses cloud-init (mechanism 3), making the behavior provider-specific. |
| 56 | + |
| 57 | +## Specifications |
| 58 | + |
| 59 | +### Why Both Mechanisms Exist |
| 60 | + |
| 61 | +| Mechanism | Purpose | When It Runs | |
| 62 | +| -------------------------------- | ------------------------------ | ------------------------------------ | |
| 63 | +| OpenTofu `hcloud_ssh_key` | Emergency/debug access as root | During server creation (before boot) | |
| 64 | +| cloud-init `ssh_authorized_keys` | Application user access | After first boot | |
| 65 | + |
| 66 | +**Primary reason for keeping OpenTofu SSH key**: If cloud-init fails (syntax error, network issue, script error), the server would be completely inaccessible without root SSH access. The OpenTofu SSH key provides a recovery path. |
| 67 | + |
| 68 | +### Security Implications |
| 69 | + |
| 70 | +**Risks of root SSH access**: |
| 71 | + |
| 72 | +- Root has unrestricted system access |
| 73 | +- Compromised SSH key grants full system control |
| 74 | +- Violates principle of least privilege |
| 75 | + |
| 76 | +**Mitigations in place**: |
| 77 | + |
| 78 | +- Application runs as non-root user (`torrust`) |
| 79 | +- User has passwordless sudo for administrative tasks |
| 80 | +- SSH key is the same for both users (no additional exposure) |
| 81 | + |
| 82 | +### How to Disable Root SSH Access |
| 83 | + |
| 84 | +Users who want stricter security can disable root SSH access after verifying deployment succeeded: |
| 85 | + |
| 86 | +#### Option 1: Remove root's authorized_keys |
| 87 | + |
| 88 | +```bash |
| 89 | +ssh torrust@<server-ip> "sudo rm /root/.ssh/authorized_keys" |
| 90 | +``` |
| 91 | + |
| 92 | +#### Option 2: Disable root login via SSH config |
| 93 | + |
| 94 | +```bash |
| 95 | +ssh torrust@<server-ip> "sudo sed -i 's/#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config && sudo systemctl restart sshd" |
| 96 | +``` |
| 97 | + |
| 98 | +#### Option 3: Delete SSH key from Hetzner Console |
| 99 | + |
| 100 | +1. Go to Hetzner Cloud Console → Security → SSH Keys |
| 101 | +2. Find the key named `torrust-tracker-vm-<environment>-ssh-key` |
| 102 | +3. Delete it (note: this only affects future servers, not the current one) |
| 103 | + |
| 104 | +### Why LXD Provider Is Different |
| 105 | + |
| 106 | +The LXD provider doesn't create a provider-level SSH key resource because: |
| 107 | + |
| 108 | +1. **Local access**: LXD runs locally, so `lxc exec` provides direct console access without SSH |
| 109 | +2. **No account-level keys**: LXD doesn't have a concept of account-level SSH key registry |
| 110 | +3. **Simpler model**: cloud-init is sufficient for all access needs |
| 111 | + |
| 112 | +## Implementation Plan |
| 113 | + |
| 114 | +### Phase 1: Decision Record (30 min) |
| 115 | + |
| 116 | +- [ ] Task 1.1: Create ADR `docs/decisions/hetzner-ssh-key-dual-injection.md` documenting the architectural decision |
| 117 | +- [ ] Task 1.2: Update ADR index in `docs/decisions/README.md` |
| 118 | + |
| 119 | +### Phase 2: Security Documentation (30 min) |
| 120 | + |
| 121 | +- [ ] Task 2.1: Create `docs/security/ssh-root-access-hetzner.md` explaining the security implications |
| 122 | +- [ ] Task 2.2: Include step-by-step instructions for disabling root access |
| 123 | +- [ ] Task 2.3: Document the debugging use case that justifies keeping it |
| 124 | + |
| 125 | +### Phase 3: Provider Documentation Updates (30 min) |
| 126 | + |
| 127 | +- [ ] Task 3.1: Update `docs/user-guide/providers/hetzner.md` with a new "SSH Key Behavior" section |
| 128 | +- [ ] Task 3.2: Update `docs/user-guide/providers/lxd.md` explaining why it doesn't have this behavior |
| 129 | +- [ ] Task 3.3: Add cross-references between provider docs and security doc |
| 130 | + |
| 131 | +### Phase 4: Template Comments (15 min) |
| 132 | + |
| 133 | +- [ ] Task 4.1: Add explanatory comment to `templates/tofu/hetzner/main.tf` at the `hcloud_ssh_key` resource |
| 134 | + |
| 135 | +## Acceptance Criteria |
| 136 | + |
| 137 | +> **Note for Contributors**: These criteria define what the PR reviewer will check. Use this as your pre-review checklist before submitting the PR to minimize back-and-forth iterations. |
| 138 | +
|
| 139 | +**Quality Checks**: |
| 140 | + |
| 141 | +- [ ] Pre-commit checks pass: `./scripts/pre-commit.sh` |
| 142 | + |
| 143 | +**Task-Specific Criteria**: |
| 144 | + |
| 145 | +- [ ] ADR created with proper status, date, and all sections filled |
| 146 | +- [ ] ADR index updated in README.md |
| 147 | +- [ ] Security document explains both the risk and the mitigation |
| 148 | +- [ ] Security document includes working commands to disable root access |
| 149 | +- [ ] Hetzner provider docs reference the security document |
| 150 | +- [ ] LXD provider docs explain why it's different |
| 151 | +- [ ] Template has clear comment explaining the SSH key resource purpose |
| 152 | +- [ ] All links between documents are valid |
| 153 | +- [ ] No spelling errors (cspell passes) |
| 154 | + |
| 155 | +## Related Documentation |
| 156 | + |
| 157 | +- [Hetzner Cloud SSH Keys Documentation](https://docs.hetzner.com/cloud/servers/getting-started/connecting-to-a-server/) |
| 158 | +- [cloud-init User Data Documentation](https://cloudinit.readthedocs.io/en/latest/reference/examples.html#including-users-and-groups) |
| 159 | +- [OpenSSH Security Best Practices](https://www.openssh.com/security.html) |
| 160 | + |
| 161 | +## Notes |
| 162 | + |
| 163 | +- This is a documentation-only change; no code modifications required |
| 164 | +- Future enhancement: Consider making root SSH access configurable via environment configuration |
| 165 | +- The same SSH key is used for both mechanisms, so there's no additional key exposure risk |
| 166 | +- This pattern is common in cloud deployments where cloud-init reliability cannot be guaranteed |
0 commit comments