You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/virtual-machines/linux/cloud-init-troubleshooting.md
+35-31Lines changed: 35 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,36 +4,36 @@ description: Troubleshoot provisioning an Azure VM using cloud-init.
4
4
author: mattmcinnes
5
5
ms.service: virtual-machines
6
6
ms.topic: troubleshooting
7
-
ms.date: 07/06/2020
7
+
ms.date: 03/29/2023
8
8
ms.author: mattmcinnes
9
9
ms.reviewer: cynthn
10
10
ms.subservice: cloud-init
11
11
---
12
12
13
-
14
13
# Troubleshooting VM provisioning with cloud-init
15
14
16
-
**Applies to:**:heavy_check_mark: Linux VMs :heavy_check_mark: Flexible scale sets
15
+
**Applies to:**:heavy_check_mark: Linux VMs :heavy_check_mark: Flexible scale sets
17
16
18
17
If you have been creating generalized custom images, using cloud-init to do provisioning, but have found that VM did not create correctly, you will need to troubleshoot your custom images.
19
18
20
19
Some examples, of issues with provisioning:
21
-
- VM gets stuck at 'creating' for 40 minutes, and the VM creation is marked as failed
22
-
-`CustomData` does not get processed
23
-
- The ephemeral disk fails to mount
24
-
- Users do not get created, or there are user access issues
25
-
- Networking is not set up correctly
26
-
- Swap file or partition failures
20
+
21
+
- VM gets stuck at 'creating' for 40 minutes, and the VM creation is marked as failed.
22
+
-`CustomData` does not get processed.
23
+
- The ephemeral disk fails to mount.
24
+
- Users do not get created, or there are user access issues.
25
+
- Networking is not set up correctly.
26
+
- Swap file or partition failures.
27
27
28
28
This article steps you through how to troubleshoot cloud-init. For more in-depth details, see [cloud-init deep dive](./cloud-init-deep-dive.md).
29
29
30
-
## Step 1: Test the deployment without `customData`
30
+
## <aid="step1"></a> Step 1: Test the deployment without `customData`
31
31
32
-
Cloud-init can accept `customData`, that is passed to it, when the VM is created. First you should ensure this is not causing any issues with deployments. Try to provisioning the VM without passing in any configuration. If you find the VM fails to provision, continue with the steps below, if you find the configuration you are passing is not being applied go [step 4]().
32
+
Cloud-init can accept `customData`, that is passed to it, when the VM is created. First you should ensure this is not causing any issues with deployments. Try to provisioning the VM without passing in any configuration. If you find the VM fails to provision, continue with the steps below, if you find the configuration you are passing is not being applied go [step 4](#step4).
33
33
34
-
## Step 2: Review image requirements
35
-
The primary cause of VM provisioning failure is the OS image doesn't satisfy the prerequisites for running on Azure. Make sure your images are properly prepared before attempting to provision them in Azure.
The primary cause of VM provisioning failure is the OS image doesn't satisfy the prerequisites for running on Azure. Make sure your images are properly prepared before attempting to provision them in Azure.
37
37
38
38
The following articles illustrate the steps to prepare various linux distributions that are supported in Azure:
39
39
@@ -48,33 +48,39 @@ The following articles illustrate the steps to prepare various linux distributio
48
48
49
49
For the [supported Azure cloud-init images](./using-cloud-init.md), the Linux distributions already have all the required packages and configurations in place to correctly provision the image in Azure. If you find your VM is failing to create from your own curated image, try a supported Azure Marketplace image that already is configured for cloud-init, with your optional `customData`. If the `customData` works correctly with an Azure Marketplace image, then there is probably an issue with your curated image.
50
50
51
-
## Step 3: Collect & review VM logs
51
+
## <aid="step3"></a> Step 3: Collect & review VM logs
52
52
53
53
When the VM fails to provision, Azure will show 'creating' status, for 20 minutes, and then reboot the VM, and wait another 20 minutes before finally marking the VM deployment as failed, before finally marking it with an `OSProvisioningTimedOut` error.
54
54
55
55
While the VM is running, you will need the logs from the VM to understand why provisioning failed. To understand why VM provisioning failed, do not stop the VM. Keep the VM running. You will need to keep the failed VM in a running state in order to collect logs. To collect the logs, use one of the following methods:
56
56
57
+
-[Enable Boot Diagnostics](/previous-versions/azure/virtual-machines/linux/tutorial-monitor#enable-boot-diagnostics) before creating the VM and then [View](/previous-versions/azure/virtual-machines/linux/tutorial-monitor#view-boot-diagnostics) them during the boot.
-[Enable Boot Diagnostics](/previous-versions/azure/virtual-machines/linux/tutorial-monitor#enable-boot-diagnostics) before creating the VM and then [View](/previous-versions/azure/virtual-machines/linux/tutorial-monitor#view-boot-diagnostics) them during the boot.
61
+
-[Run AZ VM Repair](/troubleshoot/azure/virtual-machines/repair-linux-vm-using-azure-virtual-machine-repair-commands) to attach and mount the OS disk using [chroot](/troubleshoot/azure/virtual-machines/chroot-environment-linux), which will allow you to collect these logs:
60
62
61
-
-[Run AZ VM Repair](/troubleshoot/azure/virtual-machines/repair-linux-vm-using-azure-virtual-machine-repair-commands) to attach and mount the OS disk, which will allow you to collect these logs:
62
63
```bash
63
-
/var/log/cloud-init*
64
-
/var/log/waagent*
65
-
/var/log/syslog*
66
-
/var/log/rsyslog*
67
-
/var/log/messages*
68
-
/var/log/kern*
69
-
/var/log/dmesg*
70
-
/var/log/boot*
64
+
sudo cat /rescue/var/log/cloud-init*
65
+
sudo cat /rescue/var/log/waagent*
66
+
sudo cat /rescue/var/log/syslog*
67
+
sudo cat /rescue/var/log/rsyslog*
68
+
sudo cat /rescue/var/log/messages*
69
+
sudo cat /rescue/var/log/kern*
70
+
sudo cat /rescue/var/log/dmesg*
71
+
sudo cat /rescue/var/log/boot*
71
72
```
72
-
To start initial troubleshooting, start with the cloud-init logs, and understand where the failure occurred, then use the other logs to deep dive, and provide additional insights.
73
+
74
+
> [!NOTE]
75
+
> Alternatively, you can create a rescue VM manually by using the Azure portal. For more information, see [Troubleshoot a Linux VM by attaching the OS disk to a recovery VM using the Azure portal](/troubleshoot/azure/virtual-machines/troubleshoot-recovery-disks-portal-linux).
76
+
77
+
To start initial troubleshooting, start with the cloud-init logs, and understand where the failure occurred, then use the other logs to deep dive, and provide additional insights.
78
+
73
79
* /var/log/cloud-init.log
74
80
* /var/log/cloud-init-output.log
75
81
* Serial/boot logs
76
82
77
-
In all logs, start searching for "Failed", "WARNING", "WARN", "err", "error", "ERROR". Setting configuration to ignore case-sensitive searches is recommended.
83
+
In all logs, start searching for "Failed", "WARNING", "WARN", "err", "error", "ERROR". Setting configuration to ignore case-sensitive searches is recommended.
78
84
79
85
> [!TIP]
80
86
> If you are troubleshooting a custom image, you should consider adding a user during the image. If the provisioning fails to set the admin user, you can still log in to the OS.
@@ -85,7 +91,7 @@ Here are more details about what to look for in each cloud-init log.
85
91
86
92
### /var/log/cloud-init.log
87
93
88
-
By default, all cloud-init events with a priority of debug or higher, are written to `/var/log/cloud-init.log`. This provides verbose logs of every event that occurred during cloud-init initialization.
94
+
By default, all cloud-init events with a priority of debug or higher, are written to `/var/log/cloud-init.log`. This provides verbose logs of every event that occurred during cloud-init initialization.
89
95
90
96
For example:
91
97
@@ -99,7 +105,6 @@ Stderr: mount: unknown filesystem type 'udf'
99
105
2020-01-31 00:21:53,352 - DataSourceAzure.py[WARNING]: /dev/sr0 was not mountable
100
106
```
101
107
102
-
103
108
Once you have found an error or warning, read backwards in the cloud-init log to understand what cloud-init was attempting before it hit the error or warning. In many cases cloud-init will have run OS commands or performed provisioning operations prior to the error, which can provide insights as to why errors appeared in the logs. The following example shows that cloud-init attempted to mount a device right before it hit an error.
104
109
105
110
```output
@@ -114,21 +119,20 @@ The logging for `/var/log/cloud-init.log` can also be reconfigured within /etc/c
114
119
115
120
You can get information from the `stdout` and `stderr` during the [stages of cloud-init](cloud-init-deep-dive.md). This normally involves routing table information, networking information, ssh host key verification information, `stdout` and `stderr` for each stage of cloud-init, along with the timestamp for each stage. If desired, `stderr` and `stdout` logging can be reconfigured from `/etc/cloud/cloud.cfg.d/05_logging.cfg`.
116
121
117
-
### Serial/boot logs
122
+
### Serial/boot logs
118
123
119
124
Cloud-init has multiple dependencies, these are documented in required prerequisites for images on Azure, such as networking, storage, ability to mount an ISO, and mount and format the temporary disk. Any of these may throw errors and cause cloud-init to fail. For example, if the VM cannot get a DHCP lease, cloud-init will fail.
120
125
121
126
If you still cannot isolate why cloud-init failed to provision then you need to understand what cloud-init stages, and when modules run. See [Diving deeper into cloud-init](cloud-init-deep-dive.md) for more details.
122
127
128
+
## <aid="step4"></a> Step 4: Investigate why the configuration isn't being applied
123
129
124
-
## Step 4: Investigate why the configuration isn't being applied
125
130
Not every failure in cloud-init results in a fatal provisioning failure. For example, if you are using the `runcmd` module in a cloud-init config, a non-zero exit code from the command it is running will cause the VM provisioning to fail. This is because it runs after core provisioning functionality that happens in the first 3 stages of cloud-init. To troubleshoot why the configuration did not apply, review the logs in Step 3, and cloud-init modules manually. For example:
126
131
127
132
-`runcmd` - do the scripts run without errors? Run the configuration manually from the terminal to ensure they run as expected.
128
133
- Installing packages - does the VM have access to package repositories?
129
134
- You should also check the `customData` data configuration that was provided to the VM, this is located in `/var/lib/cloud/instances/<unique-instance-identifier>/user-data.txt`.
130
135
131
-
132
136
## Next steps
133
137
134
138
If you still cannot isolate why cloud-init did not run the configuration, you need to look more closely at what happens in each cloud-init stage, and when modules run. See [Diving deeper into cloud-init configuration](./cloud-init-deep-dive.md) for more information.
0 commit comments