Skip to content

Commit a8b4109

Browse files
committed
Improve docs section 8
co-authored with chatgpt
1 parent 2c6a33b commit a8b4109

1 file changed

Lines changed: 43 additions & 30 deletions

File tree

docs/README.md

Lines changed: 43 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1387,46 +1387,59 @@ for more information.
13871387
13881388
## 8. Deployment
13891389
1390-
To create the resources defined by your main, enter the following command
1391-
```
1390+
To create the resources defined in your Terraform configuration, run:
1391+
1392+
```bash
13921393
terraform apply
13931394
```
13941395
1395-
The command will produce the same output as the `plan` command, but after
1396-
the output it will ask for a confirmation to perform the proposed actions.
1397-
Enter `yes`.
1396+
This command will first display the execution plan (equivalent to `terraform plan`) and then prompt you to confirm the proposed actions. Type `yes` to proceed.
1397+
1398+
Terraform will then create the infrastructure resources defined in the configuration. This step typically takes a few minutes. Once completed, Terraform will output:
1399+
1400+
- Guest account usernames and passwords
1401+
- The sudo-enabled username
1402+
- The floating IP address of the login node
1403+
1404+
### Important: Cluster Readiness
1405+
1406+
Although Terraform reports completion once the connection information is displayed,
1407+
**the cluster is not immediately ready for use**.
1408+
1409+
Instance creation is only the first phase of the cluster build. A second, automated configuration phase follows, during which Magic Castle installs and configures core services such as:
1410+
user accounts, FreeIPA, Slurm, JupyterHub, etc.
13981411
1399-
Terraform will then proceed to create the resources defined by the
1400-
configuration file. It should take a few minutes. Once the creation process
1401-
is completed, Terraform will output the guest account usernames and password,
1402-
the sudoer username and the floating ip of the login
1403-
node.
1412+
This configuration phase typically takes **approximately 15 minutes** after the instances are created.
14041413
1405-
**Warning**: although the instance creation process is finished once Terraform
1406-
outputs the connection information, you will not be able to
1407-
connect and use the cluster immediately. The instance creation is only the
1408-
first phase of the cluster-building process. The configuration: the
1409-
creation of the user accounts, installation of FreeIPA, Slurm, configuration
1410-
of JupyterHub, etc.; takes around 15 minutes after the instances are created.
1414+
### Instance Configuration Process
14111415
1412-
Once booted, instances follow a two stage configuration process:
1416+
Each instance goes through a two-stage configuration process:
14131417
1414-
1. Using cloud-init, upgrade operating system packages and install puppet.
1415-
2. Using puppet, install and configure software specific to the instance roles as defined by tags (i.e.: `node`).
1418+
1. **cloud-init**
1419+
- Upgrades operating system packages
1420+
- Installs Puppet
1421+
2. **Puppet**
1422+
- Installs and configures software based on the instance role, as defined by instance tags (e.g. `node`)
1423+
1424+
#### Logs and Troubleshooting
1425+
1426+
Logs for each stage are available at:
1427+
1428+
1. **cloud-init**: `/var/log/cloud-init-output.log`
1429+
2. **Puppet**: `journalctl -u puppet`
1430+
1431+
If an error occurs during the first (cloud-init) stage, a warning is displayed in the instance
1432+
message of the day (e.g.: `/etc/motd`). The failed commands are recorded in:
1433+
1434+
```
1435+
/run/cloud-init-failed
1436+
```
14161437
1417-
The log for each are available under :
1438+
Because successful completion of the first stage is required for the second stage to proceed, the configuration process halts if cloud-init fails.
14181439
1419-
1. cloud-init: `/var/log/cloud-init-output.log`
1420-
2. puppet: `journalctl -u puppet`
1440+
You may resume the configuration by manually re-running the failed commands listed in `/run/cloud-init-failed` once the underlying issue has been resolved.
14211441
1422-
When an issue happen during an instance first stage, a warning is logged in its `/etc/motd`.
1423-
The configuration commands that had issue are logged `/run/cloud-init-failed`. Because the
1424-
first stage completion is essential to the second stage, the configuration process is halted
1425-
when issues arise during the first stage. You may relaunch the configuration
1426-
process by running manually the commands that have failed and that are listed in
1427-
`/run/cloud-init-failed`. Issues during the first stage are rare events and most often the
1428-
result of issue with external dependencies i.e.: github is unavailable,
1429-
rpm repo is not responsive.
1442+
Failures during the first stage are rare and are most often caused by external dependencies, such as temporary unavailability of GitHub or package repositories.
14301443
14311444
### 8.1 Deployment Customization
14321445

0 commit comments

Comments
 (0)