Skip to content

Commit fd63be3

Browse files
committed
Merge #223: fix: [#222] Configure SSH port via cloud-init with reboot pattern
41cda98 fix: [#222] Make SSH port configuration conditional on non-default port (Jose Celano) 06bb95f fix: [#222] configure SSH port via cloud-init with reboot pattern (Jose Celano) d5d00ba feat: [#222] configure SSH port via cloud-init during VM provisioning (Jose Celano) Pull request description: ## Overview Implements custom SSH port configuration during VM provisioning using cloud-init's reboot pattern following Hetzner best practices. Fixes #222 ## Solution Overview - Cloud-init writes SSH config file and triggers system reboot - Reboot ensures clean SSH restart with new port configuration - Provision handler waits for configured port (not default port 22) - Increased timeout from 60s to 120s for cloud-init + reboot time ## Key Changes 1. **Cloud-init template**: Added `write_files` + `runcmd` with reboot 2. **Provision handler**: Use configured SSH port in `wait_for_readiness()` 3. **SSH adapter**: Increased `DEFAULT_MAX_RETRY_ATTEMPTS` from 30 to 60 4. **Documentation**: Created ADR and updated issue spec ## Technical Details - Cloud-init creates `/etc/ssh/sshd_config.d/99-custom-port.conf` - System reboot guarantees SSH only on custom port (no port 22) - Total timeout: 120 seconds (60 attempts × 2 second interval) - Testing confirmed: SSH listens only on configured port after reboot ## Why Reboot Approach? - `systemctl restart` doesn't kill old SSH process when port changes - `bootcmd` ineffective - systemd auto-restarts SSH after bootcmd - Reboot is cleaner and follows Hetzner cloud-config tutorial ## Files Modified - `templates/tofu/common/cloud-init.yml.tera` - `src/application/command_handlers/provision/handler.rs` - `src/adapters/ssh/config.rs` - `docs/decisions/cloud-init-ssh-port-reboot.md` (new) - `docs/decisions/README.md` - `docs/issues/222-configure-ssh-service-port.md` - `project-words.txt` ## References - [Hetzner cloud-config tutorial section 5.3](https://community.hetzner.com/tutorials/basic-cloud-config) - Issue #222 - [ADR: Cloud-Init SSH Port Configuration with Reboot](docs/decisions/cloud-init-ssh-port-reboot.md) ACKs for top commit: josecelano: ACK 41cda98 Tree-SHA512: 6701124fcfb11cb94bbe3c61c5d3d0e3e6184ed8c9e857dd5be3226ac12eff139609eadb11855f21c1450a946ec9bc48ea2c85df0eb6d0ecc171f03ab9314d10
2 parents 572094c + 41cda98 commit fd63be3

File tree

12 files changed

+442
-652
lines changed

12 files changed

+442
-652
lines changed

docs/decisions/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ This directory contains architectural decision records for the Torrust Tracker D
66

77
| Status | Date | Decision | Summary |
88
| ------------- | ---------- | ----------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------- |
9+
| ✅ Accepted | 2025-12-11 | [Cloud-Init SSH Port Configuration with Reboot](./cloud-init-ssh-port-reboot.md) | Use cloud-init with reboot pattern to configure custom SSH ports during VM provisioning |
910
| ✅ Accepted | 2025-12-10 | [Single Docker Image for Sequential E2E Command Testing](./single-docker-image-sequential-testing.md) | Use single Docker image with sequential command execution instead of multi-image phases |
1011
| ✅ Accepted | 2025-12-09 | [Register Command SSH Port Override](./register-ssh-port-override.md) | Add optional --ssh-port argument to register command for non-standard SSH ports |
1112
| ✅ Accepted | 2025-11-19 | [Disable MD060 Table Formatting Rule](./md060-table-formatting-disabled.md) | Disable MD060 to allow flexible table formatting and emoji usage |
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
# Decision: Cloud-Init SSH Port Configuration with Reboot
2+
3+
## Status
4+
5+
Accepted
6+
7+
## Date
8+
9+
2025-12-11
10+
11+
## Context
12+
13+
The deployer needs to support custom SSH ports for security and flexibility. The SSH port configuration must be applied **during VM provisioning** (not later in the configure phase) because:
14+
15+
1. **Provision phase dependencies**: The `WaitForCloudInitStep` runs during provision and uses Ansible to wait for cloud-init completion. Ansible connects using the custom port from the inventory configuration.
16+
17+
2. **Timing requirement**: If SSH is not already listening on the custom port when `WaitForCloudInitStep` executes, the provision command fails with connection errors.
18+
19+
3. **Architectural correctness**: SSH port is infrastructure configuration, not application configuration. It should be set during infrastructure provisioning, not as a post-provisioning step.
20+
21+
The challenge was ensuring SSH service reliably restarts with the new port configuration during cloud-init execution. Multiple approaches were tested:
22+
23+
- **systemctl restart**: Does not kill the old SSH process when port changes, resulting in SSH listening on both ports 22 and the custom port
24+
- **pkill + systemctl start**: Works but is brittle and non-standard
25+
- **bootcmd disable + runcmd restart**: Ineffective because systemd automatically re-enables and starts SSH after bootcmd completes
26+
27+
## Decision
28+
29+
We configure the custom SSH port via cloud-init using the **`write_files` + `reboot` pattern**, following Hetzner's cloud-config best practices:
30+
31+
1. **Write SSH configuration file** using cloud-init's `write_files` directive:
32+
33+
```yaml
34+
{% if ssh_port != 22 %}
35+
write_files:
36+
- path: /etc/ssh/sshd_config.d/99-custom-port.conf
37+
content: |
38+
# Custom SSH port configuration
39+
Port {{ ssh_port }}
40+
permissions: "0644"
41+
owner: root:root
42+
```
43+
44+
2. **Trigger system reboot** in cloud-init's `runcmd` phase:
45+
46+
```yaml
47+
runcmd:
48+
- reboot
49+
{% endif %}
50+
```
51+
52+
**Conditional Configuration**: The SSH port configuration and reboot only execute when `ssh_port != 22`, avoiding unnecessary reboots for environments using the default SSH port.
53+
54+
The reboot ensures:
55+
56+
- SSH service cleanly restarts with the new configuration
57+
- No old SSH processes remain on port 22
58+
- All services start in a consistent state
59+
- Package updates are applied (if cloud-init installed packages)
60+
61+
Additionally, we made two critical fixes to the provision handler:
62+
63+
1. **Use configured SSH port**: Changed `wait_for_readiness()` to use `SocketAddr::new(ip, ssh_port)` instead of `SshConfig::with_default_port()`, ensuring the provision handler waits for SSH on the correct custom port (not port 22).
64+
65+
2. **Increase SSH connectivity timeout**: Raised `DEFAULT_MAX_RETRY_ATTEMPTS` from 30 to 60 attempts (120 seconds total), accounting for the ~70-80 second cloud-init completion time plus reboot time.
66+
67+
## Consequences
68+
69+
### Positive
70+
71+
- **Clean SSH restart**: Reboot guarantees SSH only listens on the custom port, no lingering processes on port 22
72+
- **Industry best practice**: Follows Hetzner's documented cloud-config pattern for SSH port changes
73+
- **Simple and reliable**: Single `reboot` command is simpler than managing service lifecycle manually
74+
- **Correct architecture**: Infrastructure configuration happens during infrastructure provisioning
75+
- **No special cases**: Ansible can connect normally using the configured port without overrides or workarounds
76+
- **Compile-time safety**: Provision handler correctly waits for the configured port, preventing connection failures
77+
- **Conditional execution**: Only reboots when custom port is needed (ssh_port != 22), avoiding unnecessary reboots for default configurations
78+
79+
### Negative
80+
81+
- **Slower provisioning**: Reboot adds ~10-20 seconds to VM initialization time
82+
- **Additional wait time**: Provision handler must wait longer (120s instead of 60s) for cloud-init and reboot to complete
83+
- **Complexity**: Three separate changes required (cloud-init template, provision handler port usage, timeout increase)
84+
85+
### Risks
86+
87+
- **Reboot timing**: If reboot takes longer than expected, SSH connectivity check might timeout (mitigated by 120-second timeout)
88+
- **Cloud-init failure**: If reboot fails or cloud-init has errors, the provision will fail (acceptable - we want to catch infrastructure issues early)
89+
90+
## Alternatives Considered
91+
92+
### Alternative 1: Ansible Playbook in Configure Phase
93+
94+
**Approach**: Use an Ansible playbook during the `configure` phase to reconfigure SSH port after provisioning.
95+
96+
**Why Rejected**:
97+
98+
- **Timing problem**: `WaitForCloudInitStep` in provision already fails before reaching configure phase
99+
- **Architectural mismatch**: SSH port is infrastructure config, should be set during VM initialization
100+
- **Added complexity**: Requires special connection handling (connect on 22, reconfigure, reconnect on custom port)
101+
- **More failure points**: Port transition adds potential for connection issues
102+
103+
### Alternative 2: systemctl restart Without Reboot
104+
105+
**Approach**: Use cloud-init `runcmd` to execute `systemctl restart ssh` without full system reboot.
106+
107+
**Why Rejected**:
108+
109+
- **Doesn't kill old process**: `systemctl restart` doesn't terminate the existing SSH daemon when port changes
110+
- **Dual port listening**: Results in SSH listening on both port 22 (old) and custom port (new)
111+
- **Testing showed failure**: Multiple test attempts confirmed SSH remained on port 22 after cloud-init "completion"
112+
113+
### Alternative 3: pkill + systemctl start
114+
115+
**Approach**: Kill SSH processes with `pkill -9 sshd`, then start fresh with `systemctl start ssh`.
116+
117+
**Why Rejected**:
118+
119+
- **Non-standard**: Violates best practices for service management
120+
- **Brittle**: Process killing is less reliable than clean reboot
121+
- **Not industry pattern**: No documentation or precedent for this approach
122+
123+
### Alternative 4: Wait for Port 22, Then Handle Port Change
124+
125+
**Approach**: Keep provision handler waiting for port 22, handle port transition separately.
126+
127+
**Why Rejected**:
128+
129+
- **Wrong abstraction**: Provision handler should use the configured port, not hardcode defaults
130+
- **Added complexity**: Would require special logic to detect port changes mid-provision
131+
- **Race conditions**: SSH might move to custom port at unpredictable times during cloud-init
132+
133+
## Related Decisions
134+
135+
- [Register Command SSH Port Override](./register-ssh-port-override.md) - Relates to SSH port handling in different commands
136+
- [Environment Variable Prefix](./environment-variable-prefix.md) - Relates to configuration management patterns
137+
138+
## References
139+
140+
- [Hetzner Cloud-Config Tutorial](https://community.hetzner.com/tutorials/basic-cloud-config) - Section 5.3 documents the reboot pattern for SSH configuration
141+
- [Cloud-Init Documentation](https://cloudinit.readthedocs.io/en/latest/) - Official cloud-init reference
142+
- [Issue #222: Configure SSH Service Port](../issues/222-configure-ssh-service-port.md) - Original issue specification
143+
- [OpenSSH sshd_config.d](https://manpages.debian.org/bookworm/openssh-server/sshd_config.5.en.html#Include) - Ubuntu SSH configuration directory pattern

0 commit comments

Comments
 (0)