Skip to content

Commit af93e00

Browse files
authored
Merge pull request #217622 from JnHs/jh-arcrb-tsnov
various troubleshooting updates
2 parents f2b2887 + feabd3e commit af93e00

File tree

1 file changed

+45
-60
lines changed

1 file changed

+45
-60
lines changed

articles/azure-arc/resource-bridge/troubleshoot-resource-bridge.md

Lines changed: 45 additions & 60 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Troubleshoot Azure Arc resource bridge (preview) issues
33
description: This article tells how to troubleshoot and resolve issues with the Azure Arc resource bridge (preview) when trying to deploy or connect to the service.
4-
ms.date: 09/26/2022
4+
ms.date: 11/09/2022
55
ms.topic: conceptual
66
---
77

@@ -13,33 +13,21 @@ This article provides information on troubleshooting and resolving issues that m
1313

1414
### Logs
1515

16-
For any issues encountered with the Azure Arc resource bridge, you can collect logs for further investigation. To collect the logs, use the Azure CLI [`az arcappliance logs`](/cli/azure/arcappliance/logs) command. This command needs to be run from the client machine from which you've deployed the Azure Arc resource bridge.
16+
For issues encountered with Arc resource bridge, collect logs for further investigation using the Azure CLI [`az arcappliance logs`](/cli/azure/arcappliance/logs) command. This command needs to be run from the same deployment machine that was used to run commands to deploy the Arc resource bridge. If there is a problem collecting logs, most likely the deployment machine is unable to reach the Appliance VM, and the network administrator needs to allow communication between the deployment machine to the Appliance VM.
1717

18-
The `az arcappliance logs` command requires SSH to the Azure Arc resource bridge VM. The SSH key is saved to the client machine where the deployment of the appliance was performed from. To use a different client machine to run the Azure CLI command, you need to make sure the following files are copied to the new client machine:
18+
The `az arcappliance logs` command requires SSH to the Azure Arc resource bridge VM. The SSH key is saved to the deployment machine. To use a different machine to run the logs command, make sure the following files are copied to the machine in the same location:
1919

2020
```azurecli
2121
$HOME\.KVA\.ssh\logkey.pub
2222
$HOME\.KVA\.ssh\logkey
2323
```
2424

25-
To run the `az arcappliance logs` command, the path to the kubeconfig must be provided. The kubeconfig is generated after successful completion of the `az arcappliance deploy` command and is placed in the same directory as the CLI command in ./kubeconfig or as specified in `--outfile` (if the parameter was passed).
25+
To run the `az arcappliance logs` command, the Appliance VM IP, Control Plane IP, or kubeconfig can be passed in the corresponding parameter. If `az arcappliance deploy` was not completed, then the kubeconfig file may be empty, so it can't be used for logs collection. In this case, the Appliance VM IP address can be used to collect logs.
2626

27-
If `az arcappliance deploy` was not completed, then the kubeconfig file may exist but may be empty or missing data, so it can't be used for logs collection. In this case, the Appliance VM IP address can be used to collect logs instead. The Appliance VM IP is assigned when the `az arcappliance deploy` command is run, after Control Plane Endpoint reconciliation. For example, if the message displayed in the command window reads "Appliance IP is 10.97.176.27", the command to use for logs collection would be:
27+
The Appliance VM IP is assigned when the `az arcappliance deploy` command is run, after Control Plane endpoint reconciliation. For example, if the message displayed in the command window reads "Appliance IP is 192.168.1.1", the command to use for logs collection would be:
2828

2929
```azurecli
30-
az arcappliance logs hci --out-dir c:\logs --ip 10.97.176.27
31-
```
32-
33-
To view the logs, run the following command:
34-
35-
```azurecli
36-
az arcappliance logs <provider> --kubeconfig <path to kubeconfig>
37-
```
38-
39-
To save the logs to a destination folder, run the following command:
40-
41-
```azurecli
42-
az arcappliance logs <provider> --kubeconfig <path to kubeconfig> --out-dir <path to specified output directory>
30+
az arcappliance logs hci --ip 192.168.1.1 --out-dir c:\logs`
4331
```
4432

4533
To specify the IP address of the Azure Arc resource bridge virtual machine, run the following command:
@@ -50,7 +38,7 @@ az arcappliance logs <provider> --out-dir <path to specified output directory> -
5038

5139
### Remote PowerShell is not supported
5240

53-
If you run `az arcappliance` CLI commands for Arc Resource Bridge via remote PowerShell, you may experience various problems. For instance, you might see an [EOF error when using the `logs` command](#logs-command-fails-with-eof-error), or an [authentication handshake failure error when trying to install the resource bridge on an Azure Stack HCI cluster](#authentication-handshake-failure).
41+
If you run `az arcappliance` CLI commands for Arc Resource Bridge via remote PowerShell, you may experience various problems. For instance, you might see an [authentication handshake failure error when trying to install the resource bridge on an Azure Stack HCI cluster](#authentication-handshake-failure) or another type of error.
5442

5543
Using `az arcappliance` commands from remote PowerShell is not currently supported. Instead, sign in to the node through Remote Desktop Protocol (RDP) or use a console session.
5644

@@ -80,38 +68,6 @@ To resolve this error, the .wssd\python and .wssd\kva folders in the user profil
8068

8169
When you run the Azure CLI commands, the following error may be returned: *The refresh token has expired or is invalid due to sign-in frequency checks by conditional access.* The error occurs because when you sign in to Azure, the token has a maximum lifetime. When that lifetime is exceeded, you need to sign in to Azure again by using the `az login` command.
8270

83-
### `logs` command fails with EOF error
84-
85-
When running the `az arcappliance logs` Azure CLI command, you may see an error: `Appliance logs command failed with error: EOF when reading a line.` This may occur in scenarios similar to the following:
86-
87-
```azurecli
88-
az arcappliance logs hci --kubeconfig .\kubeconfig --out-dir c:\temp --ip 192.168.200.127
89-
+ CategoryInfo : NotSpecified: (WARNING: Comman...s/CLI_refstatus:String) [], RemoteException
90-
+ FullyQualifiedErrorId : NativeCommandError
91-
92-
Please enter cloudservice FQDN/IP: Appliance logs command failed with error: EOF when reading a line[v-Host1]: PS C:\Users\AzureStackAdminD\Documents> az arcappliance logs hci --kubeconfig .\kubeconfig --out-dir c:\temp --ip 192.168.200.127
93-
+ CategoryInfo : NotSpecified: (WARNING: Comman...s/CLI_refstatus:String) [], RemoteException
94-
+ FullyQualifiedErrorId : NativeCommandError
95-
96-
Please enter cloudservice FQDN/IP: Appliance logs command failed with error: EOF when reading a line
97-
```
98-
99-
The `az arcappliance logs` CLI command runs in interactive mode, meaning that it prompts the user for parameters. If the command is run in a scenario where it can't prompt the user for parameters, this error will occur. This is especially common when trying to use remote PowerShell to run the command.
100-
101-
To avoid this error, use Remote Desktop Protocol (RDP) or a console session to sign directly in to the node and locally run the `logs` command (or any `az arcappliance` command). Remote PowerShell is not currently supported by Azure Arc resource bridge.
102-
103-
You can also avoid this error by pre-populating the values that the `logs` command prompts for, thus avoiding the prompt. The example below provides these values into a variable which is then passed to the `logs` command. Be sure to replace `$loginValues` with your cloudservice IP address and the full path to your token credentials.
104-
105-
```azurecli
106-
$loginValues="192.168.200.2
107-
C:\kvatoken.tok"
108-
109-
$user_in = ""
110-
foreach ($val in $loginValues) { $user_in = $user_in + $val + "`n" }
111-
112-
$user_in | az arcappliance logs hci --kubeconfig C:\Users\AzureStackAdminD\.kube\config
113-
```
114-
11571
### Default host resource pools are unavailable for deployment
11672

11773
When using the `az arcappliance createConfig` or `az arcappliance run` command, there will be an interactive experience which shows the list of the VMware entities where user can select to deploy the virtual appliance. This list will show all user-created resource pools along with default cluster resource pools, but the default host resource pools aren't listed.
@@ -122,7 +78,7 @@ When the appliance is deployed to a host resource pool, there is no high availab
12278

12379
### Restricted outbound connectivity
12480

125-
Make sure the URLs listed below are added to your allowlist.
81+
Below is the list of firewall and proxy URLs that need to be allowlisted to enable communication from the host machine, Appliance VM, and Control Plane IP to the required Arc resource bridge URLs.
12682

12783
#### Proxy URLs used by appliance agents and services
12884

@@ -132,11 +88,11 @@ Make sure the URLs listed below are added to your allowlist.
13288
|Azure Arc Identity service | 443 | `https://*.his.arc.azure.com` | Appliance VM IP and Control Plane IP need outbound connection. | Manages identity and access control for Azure resources |
13389
|Azure Arc configuration service | 443 | `https://*.dp.kubernetesconfiguration.azure.com`| Appliance VM IP and Control Plane IP need outbound connection. | Used for Kubernetes cluster configuration.|
13490
|Cluster connect service | 443 | `https://*.servicebus.windows.net` | Appliance VM IP and Control Plane IP need outbound connection. | Provides cloud-enabled communication to connect on-premises resources with the cloud. |
135-
|Guest Notification service| 443 | `https://guestnotificationservice.azure.com`| Appliance VM IP and Control Plane IP need outbound connection. | Used to connect on-prem resources to Azure.|
136-
|SFS API endpoint | 443 | msk8s.api.cdp.microsoft.com | Host machine, Appliance VM IP and Control Plane IP need outbound connection. | Used when downloading product catalog, product bits, and OS images from SFS. |
91+
|Guest Notification service| 443 | `https://guestnotificationservice.azure.com`| Appliance VM IP and Control Plane IP need outbound connection. | Used to connect on-premises resources to Azure.|
92+
|SFS API endpoint | 443 | msk8s.api.cdp.microsoft.com | Deployment machine, Appliance VM IP and Control Plane IP need outbound connection. | Used when downloading product catalog, product bits, and OS images from SFS. |
13793
|Resource bridge (appliance) Dataplane service| 443 | `https://*.dp.prod.appliances.azure.com`| Appliance VM IP and Control Plane IP need outbound connection. | Communicate with resource provider in Azure.|
13894
|Resource bridge (appliance) container image download| 443 | `*.blob.core.windows.net, https://ecpacr.azurecr.io`| Appliance VM IP and Control Plane IP need outbound connection. | Required to pull container images. |
139-
|Resource bridge (appliance) image download| 80 | `*.dl.delivery.mp.microsoft.com`| Host machine, Appliance VM IP and Control Plane IP need outbound connection. | Download the Arc Resource Bridge OS images. |
95+
|Resource bridge (appliance) image download| 80 | `*.dl.delivery.mp.microsoft.com`| Deployment machine, Appliance VM IP and Control Plane IP need outbound connection. | Download the Arc Resource Bridge OS images. |
14096
|Azure Arc for Kubernetes container image download| 443 | `https://azurearcfork8sdev.azurecr.io`| Appliance VM IP and Control Plane IP need outbound connection. | Required to pull container images. |
14197
|ADHS telemetry service | 443 | adhs.events.data.microsoft.com| Appliance VM IP and Control Plane IP need outbound connection. | Runs inside the appliance/mariner OS. Used periodically to send Microsoft required diagnostic data from control plane nodes. Used when telemetry is coming off Mariner, which would mean any Kubernetes control plane. |
14298
|Microsoft events data service | 443 |v20.events.data.microsoft.com| Appliance VM IP and Control Plane IP need outbound connection. | Used periodically to send Microsoft required diagnostic data from the Azure Stack HCI or Windows Server host. Used when telemetry is coming off Windows like Windows Server or HCI. |
@@ -166,13 +122,42 @@ There are only two certificates that should be relevant when deploying the Arc r
166122

167123
### KVA timeout error
168124

169-
Azure Arc resource bridge is a Kubernetes management cluster that is deployed in an appliance VM directly on the on-premises infrastructure. While trying to deploy Azure Arc resource bridge, a "KVA timeout error" may appear if there is a networking problem that doesn't allow communication of the Arc Resource Bridge appliance VM to the host, DNS, network or internet. This error is typically displayed for the following reasons:
125+
While trying to deploy Arc Resource Bridge, a "KVA timeout error" may appear. The "KVA timeout error" is a generic error that can be the result of a variety of network misconfigurations that involve the deployment machine, Appliance VM, or Control Plane IP not having communication with each other, to the internet, or required URLs. This communication failure is often due to issues with DNS resolution, proxy settings, network configuration, or internet access.
126+
127+
For clarity, "deployment machine" refers to the machine where deployment CLI commands are being run. "Appliance VM" is the VM that hosts Arc resource bridge. "Control Plane IP" is the IP of the control plane for the Kubernetes management cluster in the Appliance VM.
128+
129+
#### Top causes of the KVA timeout error 
130+
131+
- Deployment machine is unable to communicate with Control Plane IP and Appliance VM IP.
132+
- Appliance VM is unable to communicate with the deployment machine, vCenter endpoint (for VMware), or MOC cloud agent endpoint (for Azure Stack HCI). 
133+
- Appliance VM does not have internet access.
134+
- Appliance VM has internet access, but connectivity to one or more required URLs is being blocked, possibly due to a proxy or firewall.
135+
- Appliance VM is unable to reach a DNS server that can resolve internal names, such as vCenter endpoint for vSphere or cloud agent endpoint for Azure Stack HCI. The DNS server must also be able to resolve external addresses, such as Azure service addresses and container registry names. 
136+
- Proxy server configuration on the deployment machine or Arc resource bridge configuration files is incorrect. This can impact both the deployment machine and the Appliance VM. When the `az arcappliance prepare` command is run, the deployment machine won't be able to connect and download OS images if the host proxy isn't correctly configured. Internet access on the Appliance VM might be broken by incorrect or missing proxy configuration, which impacts the VM’s ability to pull container images. 
137+
138+
#### Troubleshoot KVA timeout error
139+
140+
To resolve the error, one or more network misconfigurations may need to be addressed. Follow the steps below to address the most common reasons for this error.
141+
142+
1. When there is a problem with deployment, the first step is to collect logs by Appliance VM IP (not by kubeconfig, as the kubeconfig may be empty if deploy command did not complete). Problems collecting logs are most likely due to the deployment machine being unable to reach the Appliance VM.
143+
144+
Once logs are collected, extract the folder and open kva.log. Review the kva.log for more information on the failure to help pinpoint the cause of the KVA timeout error.
145+
146+
1. The deployment machine must be able to communicate with the Appliance VM IP and Control Plane IP. Ping the Control Plane IP and Appliance VM IP from the deployment machine and verify there is a response from both IPs.
147+
148+
If a request times out, the deployment machine is not able to communicate with the IP(s). This could be caused by a closed port, network misconfiguration or a firewall block. Work with your network administrator to allow communication between the deployment machine to the Control Plane IP and Appliance VM IP.
149+
150+
1. Appliance VM IP and Control Plane IP must be able to communicate with the deployment machine and vCenter endpoint (for VMware) or MOC cloud agent endpoint (for HCI). Work with your network administrator to ensure the network is configured to permit this. This may require adding a firewall rule to open port 443 from the Appliance VM IP and Control Plane IP to vCenter or port 65000 and 55000 for Azure Stack HCI MOC cloud agent. Review [network requirements for Azure Stack HCI](/azure-stack/hci/manage/azure-arc-vm-management-prerequisites#network-port-requirements) and [VMware](/azure/azure-arc/vmware-vsphere/quick-start-connect-vcenter-to-arc-using-script) for Arc resource bridge.
151+
152+
1. Appliance VM IP and Control Plane IP need internet access to [these required URLs](#restricted-outbound-connectivity). Azure Stack HCI requires [additional URLs](/azure-stack/hci/manage/azure-arc-vm-management-prerequisites). Work with your network administrator to ensure that the IPs can access the required URLs.
153+
154+
1. In a non-proxy environment, the deployment machine must have external and internal DNS resolution. The deployment machine must be able to reach a DNS server that can resolve internal names such as vCenter endpoint for vSphere or cloud agent endpoint for Azure Stack HCI. The DNS server also needs to be able to [resolve external addresses](#restricted-outbound-connectivity), such as Azure URLs and OS image download URLs. Work with your system administrator to ensure that the deployment machine has internal and external DNS resolution. In a proxy environment, the DNS resolution on the proxy server should resolve internal endpoints and [required external addresses](#restricted-outbound-connectivity).
155+
156+
To test DNS resolution to an internal address from the deployment machine in a non-proxy scenario, open command prompt and run `nslookup <vCenter endpoint or HCI MOC cloud agent IP>`. You should receive an answer if the deployment machine has internal DNS resolution in a non-proxy scenario. 
170157

171-
- The appliance VM IP address doesn't have DNS resolution.
172-
- The appliance VM IP address doesn't have internet access to download the required image.
173-
- The host doesn't have routability to the appliance VM IP address.
158+
1. Appliance VM needs to be able to reach a DNS server that can resolve internal names such as vCenter endpoint for vSphere or cloud agent endpoint for Azure Stack HCI. The DNS server also needs to be able to resolve external/internal addresses, such as Azure service addresses and container registry names for download of the Arc resource bridge container images from the cloud.
174159

175-
To resolve this error, ensure that all IP addresses assigned to the Arc Resource Bridge appliance VM can be resolved by DNS and have access to the internet, and that the host can successfully route to the IP addresses.
160+
Verify that the DNS server IP used to create the configuration files has internal and external address resolution. If not, [delete the appliance](/cli/azure/arcappliance/delete), recreate the Arc resource bridge configuration files with the correct DNS server settings, and then deploy Arc resource bridge using the new configuration files.
176161

177162
## Azure-Arc enabled VMs on Azure Stack HCI issues
178163

0 commit comments

Comments
 (0)