Skip to content

Commit d70b9f7

Browse files
Merge pull request #10075 from pagienge/patch-9
AB#8079: adding sudo and general doc updates
2 parents 5d514b0 + 01dd853 commit d70b9f7

File tree

1 file changed

+76
-58
lines changed

1 file changed

+76
-58
lines changed
Lines changed: 76 additions & 58 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,9 @@
11
---
22
title: Repair a Linux VM automatically with the help of ALAR
3-
description: This article describes how to automatically repair a non bootable VM with the Azure Linux Auto Repair (ALAR) scripts.
3+
description: This article describes how to automatically repair a non-bootable VM with the Azure Linux Auto Repair (ALAR) scripts.
44
services: virtual-machines-linux
55
documentationcenter: ''
6-
author: malachma
7-
manager: noambi
6+
author: pagienge
87
editor: v-jsitser
98
tags: virtual-machines
109
ms.custom: sap:VM Admin - Linux (Guest OS), linux-related-content
@@ -13,8 +12,8 @@ ms.topic: troubleshooting
1312
ms.workload: infrastructure-services
1413
ms.tgt_pltfrm: vm-linux
1514
ms.devlang: azurecli
16-
ms.date: 09/24/2024
17-
ms.author: malachma
15+
ms.date: 10/31/2025
16+
ms.author: pagienge
1817
---
1918

2019
# Use Azure Linux Auto Repair (ALAR) to fix a Linux VM
@@ -27,13 +26,61 @@ ALAR utilizes the VM repair extension that's described in [Repair a Linux VM by
2726

2827
ALAR covers the following repair scenarios:
2928

30-
- Malformed /etc/fstab
31-
syntax error
32-
missing disk
33-
- Damaged initrd or missing initrd line in the /boot/grub/grub.cfg
34-
- Last installed kernel isn't bootable
35-
- Serial console and GRUB serial are incorrectly configured or are missing
36-
- GRUB/EFI installation or configuration damaged
29+
- No-boot scenarios
30+
- Malformed */etc/fstab*
31+
- syntax error
32+
- missing disk
33+
- Damaged initrd or missing initrd line in the */boot/grub/grub.cfg*
34+
- Last installed kernel isn't bootable
35+
- GRUB/EFI installation or configuration damaged
36+
- Disk space/auditd forced shutdowns
37+
- Configuration issues
38+
- Serial console and GRUB serial are incorrectly configured or are missing
39+
- Sudo misconfiguration
40+
41+
## How to use ALAR
42+
43+
The ALAR scripts use the [az vm repair](/cli/azure/vm/repair) extension, `run` command, and its `--run-id` option. The value of the `--run-id` option for the automated recovery is `linux-alar2`. To fix a Linux VM by using an ALAR script, follow these steps:
44+
45+
> [!NOTE]
46+
> The VM Contributor role doesn't provide enough permissions to run these scripted operations, as they require permissions to read, write, and delete resources in the resource group that includes the target VM. Therefore roles such as Contributor or Owner at the resource group level is required.
47+
48+
1. Create a rescue VM:
49+
50+
```azurecli-interactive
51+
az vm repair create --verbose --resource-group <RG-NAME> --name <VM-NAME>
52+
```
53+
54+
- There are currently three parameters that prompt for values if they aren't given on the command line. Add these parameters and values to the command for a non-interactive execution
55+
- `--repair-username <RESCUE-USERNAME>`
56+
- `--repair-password <RESCUE-PASS>`
57+
- `--associate-public-ip`
58+
- See the [az vm repair](/cli/azure/vm/repair) documentation for more options that can be used to control the creation of the repair VM
59+
60+
2. Run the `linux-alar2` script, along with parameters for one or more of the ALAR actions on the rescue VM:
61+
62+
```azurecli-interactive
63+
az vm repair run --verbose --resource-group <RG-NAME> --name <VM-NAME> --run-id linux-alar2 --parameters <action1,action2,...> --run-on-repair
64+
```
65+
66+
See the following for valid action names.
67+
68+
3. Swap the copy of the OS disk back to the original VM and delete the temporary resources:
69+
70+
```azurecli-interactive
71+
az vm repair restore --verbose --resource-group <RG-NAME> --name <VM-NAME>
72+
```
73+
74+
> [!NOTE]
75+
> The original and new disks aren't deleted during the `restore` phase.
76+
77+
In all of the example commands these are the parameters shown:
78+
79+
- `RG-NAME`: The name of the resource group containing the broken VM.
80+
- `VM-NAME`: The name of the broken VM.
81+
- `RESCUE-USERNAME`: The user created on the repair VM for login. It's the equivalent of the user created on a new VM in the Azure portal.
82+
- `RESCUE-PASS`: The password for `RESCUE-USERNAME`, enclosed in single quotes. For example: `'password!234'`.
83+
- `action1,action2`, etc.: One or more of the defined actions available to apply to the broken VM. See the following for a complete list of actions and in the [ALAR GitHub ReadMe](https://github.com/Azure/ALAR). You can pass one or more actions that are run consecutively. For multiple operations, delineate them using commas without spaces, like `fstab,sudo`.
3784
3885
## The ALAR actions
3986
@@ -43,11 +90,13 @@ This action strips off any lines in the */etc/fstab* file that aren't needed to
4390
4491
For more information about issues with a malformed */etc/fstab* file, see [Troubleshoot Linux VM starting issues because fstab errors](./linux-virtual-machine-cannot-start-fstab-errors.md).
4592
46-
### kernel
93+
### efifix
4794
48-
This action changes the default kernel. The script replaces the broken kernel with the previously installed version.
95+
This action can be used to reinstall the required software to boot from a GEN2 VM. The *grub.cfg* file is also regenerated.
4996
50-
For more information about messages that might be logged on the serial console for kernel-related startup events, see [How to recover an Azure Linux virtual machine from kernel-related boot issues](kernel-related-boot-issues.md).
97+
### grubfix
98+
99+
This action can be used to reinstall GRUB and regenerate the *grub.cfg* file.
51100
52101
### initrd
53102
@@ -64,63 +113,30 @@ In both cases, the following information is logged before the error entries are
64113
65114
![Unpacking failed](media/repair-linux-vm-using-ALAR/unpacking-failed.png)
66115
116+
### kernel
117+
118+
This action changes the default kernel by replacing the default/broken kernel with a previously installed version.
119+
120+
For more information about messages that might be logged on the serial console for kernel-related startup events, see [How to recover an Azure Linux virtual machine from kernel-related boot issues](kernel-related-boot-issues.md).
121+
67122
### serialconsole
68123
69124
This action corrects an incorrect or malformed serial console configuration for the Linux kernel or GRUB. We recommend that you run this action in the following cases:
70125
71126
- No GRUB menu is displayed at VM startup.
72127
- No operating system related information is written to the serial console.
73128
74-
### grubfix
75-
76-
This action can be used to reinstall GRUB and regenerate the *grub.cfg* file.
77-
78-
### efifix
129+
### sudo
79130
80-
This action can be used to reinstall the required software to boot from a GEN2 VM. The *grub.cfg* file is also regenerated.
131+
The `sudo` action resets the permissions on the */etc/sudoers* file and all files in */etc/sudoers.d* to the required 0440 modes and check other best practices. A basic check is run to detect and report on duplicate user entries and move only the */etc/sudoers.d/waagent* file if it's found to conflict with other files.
81132
82133
### auditd
83134
84-
If your VM shuts down immediately upon startup due to the audit daemon configuration, use this action. This action modifies the audit daemon configuration (in the */etc/audit/auditd.conf* file) by changing the `HALT` value configured for any `action` parameters to `SYSLOG`, which doesn't force the system to shut down. In a Logical Volume Manager (LVM) environment, if the logical volume that contains the audit logs is full and there's available space in the volume group, the logical volume will also be extended by 10% of the current size. However, if you're not using an LVM environment or there's no available space, only the configuration file is altered.
135+
If your VM shuts down immediately upon startup due to the audit daemon configuration, use this action. This action modifies the audit daemon configuration (in the */etc/audit/auditd.conf* file) by changing the `HALT` value configured for any `action` parameters to `SYSLOG`, which doesn't force the system to shut down. In a Logical Volume Manager (LVM) environment, if the logical volume that contains the audit logs is full and there's available space in the volume group, the logical volume can be extended by 10% of the current size. However, if you're not using an LVM environment or there's no available space, only the `auditd` configuration file is altered.
85136
86137
> [!IMPORTANT]
87-
> This action will change the VM's security posture by altering the audit daemon configuration so that the VM shutdown issue can be resolved. Once the VM is running and accessible, you need to revert the audit daemon configuration to the original state. For this purpose, a backup of the *auditd.conf* file is created in */etc/audit* by the ALAR action.
88-
89-
## How to use ALAR
90-
91-
The ALAR scripts use the repair extension `run` command and its `--run-id` option. The value of the `--run-id` option for the automated recovery is `linux-alar2`. To fix a Linux VM by using an ALAR script, follow these steps:
92-
93-
> [!NOTE]
94-
> The VM Contributor role doesn't provide enough permissions to run the scripts, as they require permissions to read, write, and delete resources in the resource group that includes the target VM. Therefore roles such as Contributor or Owner at the resource group level is required.
95-
96-
1. Create a rescue VM:
97-
98-
```azurecli-interactive
99-
az vm repair create --verbose -g RG-NAME -n VM-NAME --repair-username RESCUE-UID --repair-password RESCUE-PASS --copy-disk-name DISK-COPY
100-
```
101-
2. Run a script with one of the ALAR actions on the rescue VM:
102-
103-
```azurecli-interactive
104-
az vm repair run --verbose -g RG-NAME -n VM-NAME --run-id linux-alar2 --parameters ACTION --run-on-repair
105-
```
106-
3. Swap the OS disks and delete the temporary resources:
107-
108-
```azurecli-interactive
109-
az vm repair restore --verbose -g RG-NAME -n VM-NAME
110-
```
111-
112-
> [!NOTE]
113-
> The original and new disks won't be deleted.
138+
> This action changes the VM's security posture by altering the audit daemon configuration so that the VM shutdown issue can be resolved. Once the VM is running and accessible, you need to evaluate the configuration and potentially revert it to the original state. For this purpose, a backup of the *auditd.conf* file is created in */etc/audit* by the ALAR action.
114139
115-
Here are explanations for the parameters in the commands above:
116-
117-
- `RG-NAME`: The name of the resource group containing the broken VM.
118-
- `VM-NAME`: The name of the broken VM.
119-
- `RESCUE-UID`: The user created on the repair VM for login. It's the equivalent of the user created on a new VM in the Azure portal.
120-
- `RESCUE-PASS`: The password for `RESCUE-UID`, enclosed in single quotes. For example: `'password!234'`.
121-
- `DISK-COPY`: The name of the OS disk copy that will be created from the broken VM.
122-
- `ACTION`: A scripted task to run, such as `initrd` or `fstab`.
123-
You can pass over single or multiple recovery operations. For multiple operations, delineate them using commas without spaces, such as `fstab,initrd`.
124140
125141
## Limitation
126142
@@ -133,3 +149,5 @@ If you experience a bug or want to request an enhancement to the ALAR tool, post
133149
You can also find the latest information about the ALAR tool on [GitHub](https://github.com/Azure/ALAR).
134150
135151
[!INCLUDE [Azure Help Support](../../../includes/azure-help-support.md)]
152+
153+
[!INCLUDE [Third-party contact disclaimer](~/includes/third-party-contact-disclaimer.md)]

0 commit comments

Comments
 (0)