Skip to content

Commit 16dc026

Browse files
authored
Merge pull request #232668 from divargas-msft/patch-7
[Doc-a-thon] Updating cloud-init-troubleshooting.md
2 parents 1305874 + db7d0d6 commit 16dc026

File tree

1 file changed

+35
-31
lines changed

1 file changed

+35
-31
lines changed

articles/virtual-machines/linux/cloud-init-troubleshooting.md

Lines changed: 35 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -4,36 +4,36 @@ description: Troubleshoot provisioning an Azure VM using cloud-init.
44
author: mattmcinnes
55
ms.service: virtual-machines
66
ms.topic: troubleshooting
7-
ms.date: 07/06/2020
7+
ms.date: 03/29/2023
88
ms.author: mattmcinnes
99
ms.reviewer: cynthn
1010
ms.subservice: cloud-init
1111
---
1212

13-
1413
# Troubleshooting VM provisioning with cloud-init
1514

16-
**Applies to:** :heavy_check_mark: Linux VMs :heavy_check_mark: Flexible scale sets
15+
**Applies to:** :heavy_check_mark: Linux VMs :heavy_check_mark: Flexible scale sets
1716

1817
If you have been creating generalized custom images, using cloud-init to do provisioning, but have found that VM did not create correctly, you will need to troubleshoot your custom images.
1918

2019
Some examples, of issues with provisioning:
21-
- VM gets stuck at 'creating' for 40 minutes, and the VM creation is marked as failed
22-
- `CustomData` does not get processed
23-
- The ephemeral disk fails to mount
24-
- Users do not get created, or there are user access issues
25-
- Networking is not set up correctly
26-
- Swap file or partition failures
20+
21+
- VM gets stuck at 'creating' for 40 minutes, and the VM creation is marked as failed.
22+
- `CustomData` does not get processed.
23+
- The ephemeral disk fails to mount.
24+
- Users do not get created, or there are user access issues.
25+
- Networking is not set up correctly.
26+
- Swap file or partition failures.
2727

2828
This article steps you through how to troubleshoot cloud-init. For more in-depth details, see [cloud-init deep dive](./cloud-init-deep-dive.md).
2929

30-
## Step 1: Test the deployment without `customData`
30+
## <a id="step1"></a> Step 1: Test the deployment without `customData`
3131

32-
Cloud-init can accept `customData`, that is passed to it, when the VM is created. First you should ensure this is not causing any issues with deployments. Try to provisioning the VM without passing in any configuration. If you find the VM fails to provision, continue with the steps below, if you find the configuration you are passing is not being applied go [step 4]().
32+
Cloud-init can accept `customData`, that is passed to it, when the VM is created. First you should ensure this is not causing any issues with deployments. Try to provisioning the VM without passing in any configuration. If you find the VM fails to provision, continue with the steps below, if you find the configuration you are passing is not being applied go [step 4](#step4).
3333

34-
## Step 2: Review image requirements
35-
The primary cause of VM provisioning failure is the OS image doesn't satisfy the prerequisites for running on Azure. Make sure your images are properly prepared before attempting to provision them in Azure.
34+
## <a id="step2"></a> Step 2: Review image requirements
3635

36+
The primary cause of VM provisioning failure is the OS image doesn't satisfy the prerequisites for running on Azure. Make sure your images are properly prepared before attempting to provision them in Azure.
3737

3838
The following articles illustrate the steps to prepare various linux distributions that are supported in Azure:
3939

@@ -48,33 +48,39 @@ The following articles illustrate the steps to prepare various linux distributio
4848

4949
For the [supported Azure cloud-init images](./using-cloud-init.md), the Linux distributions already have all the required packages and configurations in place to correctly provision the image in Azure. If you find your VM is failing to create from your own curated image, try a supported Azure Marketplace image that already is configured for cloud-init, with your optional `customData`. If the `customData` works correctly with an Azure Marketplace image, then there is probably an issue with your curated image.
5050

51-
## Step 3: Collect & review VM logs
51+
## <a id="step3"></a> Step 3: Collect & review VM logs
5252

5353
When the VM fails to provision, Azure will show 'creating' status, for 20 minutes, and then reboot the VM, and wait another 20 minutes before finally marking the VM deployment as failed, before finally marking it with an `OSProvisioningTimedOut` error.
5454

5555
While the VM is running, you will need the logs from the VM to understand why provisioning failed. To understand why VM provisioning failed, do not stop the VM. Keep the VM running. You will need to keep the failed VM in a running state in order to collect logs. To collect the logs, use one of the following methods:
5656

57+
- [Enable Boot Diagnostics](/previous-versions/azure/virtual-machines/linux/tutorial-monitor#enable-boot-diagnostics) before creating the VM and then [View](/previous-versions/azure/virtual-machines/linux/tutorial-monitor#view-boot-diagnostics) them during the boot.
58+
5759
- [Serial Console](/troubleshoot/azure/virtual-machines/serial-console-grub-single-user-mode)
5860

59-
- [Enable Boot Diagnostics](/previous-versions/azure/virtual-machines/linux/tutorial-monitor#enable-boot-diagnostics) before creating the VM and then [View](/previous-versions/azure/virtual-machines/linux/tutorial-monitor#view-boot-diagnostics) them during the boot.
61+
- [Run AZ VM Repair](/troubleshoot/azure/virtual-machines/repair-linux-vm-using-azure-virtual-machine-repair-commands) to attach and mount the OS disk using [chroot](/troubleshoot/azure/virtual-machines/chroot-environment-linux), which will allow you to collect these logs:
6062

61-
- [Run AZ VM Repair](/troubleshoot/azure/virtual-machines/repair-linux-vm-using-azure-virtual-machine-repair-commands) to attach and mount the OS disk, which will allow you to collect these logs:
6263
```bash
63-
/var/log/cloud-init*
64-
/var/log/waagent*
65-
/var/log/syslog*
66-
/var/log/rsyslog*
67-
/var/log/messages*
68-
/var/log/kern*
69-
/var/log/dmesg*
70-
/var/log/boot*
64+
sudo cat /rescue/var/log/cloud-init*
65+
sudo cat /rescue/var/log/waagent*
66+
sudo cat /rescue/var/log/syslog*
67+
sudo cat /rescue/var/log/rsyslog*
68+
sudo cat /rescue/var/log/messages*
69+
sudo cat /rescue/var/log/kern*
70+
sudo cat /rescue/var/log/dmesg*
71+
sudo cat /rescue/var/log/boot*
7172
```
72-
To start initial troubleshooting, start with the cloud-init logs, and understand where the failure occurred, then use the other logs to deep dive, and provide additional insights.
73+
74+
> [!NOTE]
75+
> Alternatively, you can create a rescue VM manually by using the Azure portal. For more information, see [Troubleshoot a Linux VM by attaching the OS disk to a recovery VM using the Azure portal](/troubleshoot/azure/virtual-machines/troubleshoot-recovery-disks-portal-linux).
76+
77+
To start initial troubleshooting, start with the cloud-init logs, and understand where the failure occurred, then use the other logs to deep dive, and provide additional insights.
78+
7379
* /var/log/cloud-init.log
7480
* /var/log/cloud-init-output.log
7581
* Serial/boot logs
7682

77-
In all logs, start searching for "Failed", "WARNING", "WARN", "err", "error", "ERROR". Setting configuration to ignore case-sensitive searches is recommended.
83+
In all logs, start searching for "Failed", "WARNING", "WARN", "err", "error", "ERROR". Setting configuration to ignore case-sensitive searches is recommended.
7884

7985
> [!TIP]
8086
> If you are troubleshooting a custom image, you should consider adding a user during the image. If the provisioning fails to set the admin user, you can still log in to the OS.
@@ -85,7 +91,7 @@ Here are more details about what to look for in each cloud-init log.
8591

8692
### /var/log/cloud-init.log
8793

88-
By default, all cloud-init events with a priority of debug or higher, are written to `/var/log/cloud-init.log`. This provides verbose logs of every event that occurred during cloud-init initialization.
94+
By default, all cloud-init events with a priority of debug or higher, are written to `/var/log/cloud-init.log`. This provides verbose logs of every event that occurred during cloud-init initialization.
8995

9096
For example:
9197

@@ -99,7 +105,6 @@ Stderr: mount: unknown filesystem type 'udf'
99105
2020-01-31 00:21:53,352 - DataSourceAzure.py[WARNING]: /dev/sr0 was not mountable
100106
```
101107

102-
103108
Once you have found an error or warning, read backwards in the cloud-init log to understand what cloud-init was attempting before it hit the error or warning. In many cases cloud-init will have run OS commands or performed provisioning operations prior to the error, which can provide insights as to why errors appeared in the logs. The following example shows that cloud-init attempted to mount a device right before it hit an error.
104109

105110
```output
@@ -114,21 +119,20 @@ The logging for `/var/log/cloud-init.log` can also be reconfigured within /etc/c
114119

115120
You can get information from the `stdout` and `stderr` during the [stages of cloud-init](cloud-init-deep-dive.md). This normally involves routing table information, networking information, ssh host key verification information, `stdout` and `stderr` for each stage of cloud-init, along with the timestamp for each stage. If desired, `stderr` and `stdout` logging can be reconfigured from `/etc/cloud/cloud.cfg.d/05_logging.cfg`.
116121

117-
### Serial/boot logs
122+
### Serial/boot logs
118123

119124
Cloud-init has multiple dependencies, these are documented in required prerequisites for images on Azure, such as networking, storage, ability to mount an ISO, and mount and format the temporary disk. Any of these may throw errors and cause cloud-init to fail. For example, if the VM cannot get a DHCP lease, cloud-init will fail.
120125

121126
If you still cannot isolate why cloud-init failed to provision then you need to understand what cloud-init stages, and when modules run. See [Diving deeper into cloud-init](cloud-init-deep-dive.md) for more details.
122127

128+
## <a id="step4"></a> Step 4: Investigate why the configuration isn't being applied
123129

124-
## Step 4: Investigate why the configuration isn't being applied
125130
Not every failure in cloud-init results in a fatal provisioning failure. For example, if you are using the `runcmd` module in a cloud-init config, a non-zero exit code from the command it is running will cause the VM provisioning to fail. This is because it runs after core provisioning functionality that happens in the first 3 stages of cloud-init. To troubleshoot why the configuration did not apply, review the logs in Step 3, and cloud-init modules manually. For example:
126131

127132
- `runcmd` - do the scripts run without errors? Run the configuration manually from the terminal to ensure they run as expected.
128133
- Installing packages - does the VM have access to package repositories?
129134
- You should also check the `customData` data configuration that was provided to the VM, this is located in `/var/lib/cloud/instances/<unique-instance-identifier>/user-data.txt`.
130135

131-
132136
## Next steps
133137

134138
If you still cannot isolate why cloud-init did not run the configuration, you need to look more closely at what happens in each cloud-init stage, and when modules run. See [Diving deeper into cloud-init configuration](./cloud-init-deep-dive.md) for more information.

0 commit comments

Comments
 (0)