Skip to content

Commit 3e2c31d

Browse files
authored
Merge pull request #270118 from duongau/vpntroubleshooting
Azure VPN Troubleshooting - acrolinx edits
2 parents 78a3a2e + df182f3 commit 3e2c31d

File tree

1 file changed

+39
-31
lines changed

1 file changed

+39
-31
lines changed

articles/vpn-gateway/troubleshoot-vpn-with-azure-diagnostics.md

Lines changed: 39 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -21,29 +21,29 @@ The following logs are available* in Azure:
2121
|--- | --- |
2222
|**GatewayDiagnosticLog** | Contains diagnostic logs for gateway configuration events, primary changes, and maintenance events. |
2323
|**TunnelDiagnosticLog** | Contains tunnel state change events. Tunnel connect/disconnect events have a summarized reason for the state change if applicable. |
24-
|**RouteDiagnosticLog** | Logs changes to static routes and BGP events that occur on the gateway. |
25-
|**IKEDiagnosticLog** | Logs IKE control messages and events on the gateway. |
24+
|**RouteDiagnosticLog** | Logs changes to static routes and BGP (Border Gateway Protocol) events that occur on the gateway. |
25+
|**IKEDiagnosticLog** | Logs IKE (Internet Key Exchange) control messages and events on the gateway. |
2626
|**P2SDiagnosticLog** | Logs point-to-site control messages and events on the gateway. |
2727

2828
*for Policy Based gateways, only GatewayDiagnosticLog and RouteDiagnosticLog are available.
2929

30-
Notice that there are several columns available in these tables. In this article, we are only presenting the most relevant ones for easier log consumption.
30+
Notice that there are several columns available in these tables. In this article, we're only presenting the most relevant ones for easier log consumption.
3131

3232
## <a name="setup"></a>Set up logging
3333

3434
Follow this procedure to learn how set up diagnostic log events from Azure VPN Gateway using Azure Log Analytics:
3535

36-
1. Create a Log Analytics Workspace using [this article](../azure-monitor/logs/quick-create-workspace.md).
36+
1. Create a new Log Analytics Workspace using the steps found in [create a Log Analytics Workspace](../azure-monitor/logs/quick-create-workspace.md).
3737

38-
2. Find your VPN gateway on the Monitor > Diagnostics settings blade.
38+
2. Locate your VPN gateway on the **Monitor > Diagnostics** settings page.
3939

40-
:::image type="content" source="./media/troubleshoot-vpn-with-azure-diagnostics/setup_step2.png " alt-text="Screenshot of the Diagnostic settings blade." lightbox="./media/troubleshoot-vpn-with-azure-diagnostics/setup_step2.png":::
40+
:::image type="content" source="./media/troubleshoot-vpn-with-azure-diagnostics/setup_step2.png " alt-text="Screenshot of the Diagnostic settings page." lightbox="./media/troubleshoot-vpn-with-azure-diagnostics/setup_step2.png":::
4141

42-
3. Select the gateway and click on "Add Diagnostic Setting".
42+
3. Select the VPN gateway and then select **Add Diagnostic Setting**.
4343

4444
:::image type="content" source="./media/troubleshoot-vpn-with-azure-diagnostics/setup_step3.png " alt-text="Screenshot of the Add diagnostic setting interface." lightbox="./media/troubleshoot-vpn-with-azure-diagnostics/setup_step3.png":::
4545

46-
4. Fill in the diagnostic setting name, select all the log categories and choose the Log Analytics Workspace.
46+
4. Input the **Diagnostic setting name**, choose all the **Log categories** and select the appropriate **Log Analytics Workspace**.
4747

4848
:::image type="content" source="./media/troubleshoot-vpn-with-azure-diagnostics/setup_step4.png " alt-text="Detailed screenshot of the Add diagnostic setting properties." lightbox="./media/troubleshoot-vpn-with-azure-diagnostics/setup_step4.png":::
4949

@@ -63,25 +63,26 @@ AzureDiagnostics
6363
| sort by TimeGenerated asc
6464
```
6565

66-
This query on **GatewayDiagnosticLog** will show you multiple columns.
66+
This query on **GatewayDiagnosticLog** shows you multiple columns.
6767

6868
|***Name*** | ***Description*** |
6969
|--- | --- |
7070
|**TimeGenerated** | the timestamp of each event, in UTC timezone.|
7171
|**OperationName** |the event that happened. It can be either of *SetGatewayConfiguration, SetConnectionConfiguration, HostMaintenanceEvent, GatewayTenantPrimaryChanged, MigrateCustomerSubscription, GatewayResourceMove, ValidateGatewayConfiguration*.|
7272
|**Message** | the detail of what operation is happening, and lists successful/failure results.|
7373

74-
The example below shows the activity logged when a new configuration was applied:
74+
The following example shows the activity logged when a new configuration was applied:
7575

7676
:::image type="content" source="./media/troubleshoot-vpn-with-azure-diagnostics/image-26-set-gateway.png" alt-text="Example of a Set Gateway Operation seen in GatewayDiagnosticLog.":::
7777

7878

79-
Notice that a SetGatewayConfiguration will be logged every time some configuration is modified both on a VPN Gateway or a Local Network Gateway.
80-
Cross referencing the results from the **GatewayDiagnosticLog** table with those of the **TunnelDiagnosticLog** table can help us determine if a tunnel connectivity failure has started at the same time as a configuration was changed, or a maintenance took place. If so, we have a great pointer towards the possible root cause.
79+
Notice that a **SetGatewayConfiguration** gets logged every time a configuration is modified both on a VPN Gateway or a Local Network Gateway.
80+
81+
Comparing the results from the **GatewayDiagnosticLog** table with the results of the **TunnelDiagnosticLog** table can help determine if a tunnel connectivity failure happened during a configuration change or maintenance activity. If so, it provides a significant indication towards the potential root cause.
8182

8283
## <a name="TunnelDiagnosticLog"></a>TunnelDiagnosticLog
8384

84-
The **TunnelDiagnosticLog** table is very useful to inspect the historical connectivity statuses of the tunnel.
85+
The **TunnelDiagnosticLog** table is useful to inspect the historical connectivity statuses of the tunnel.
8586

8687
Here you have a sample query as reference.
8788

@@ -93,14 +94,14 @@ AzureDiagnostics
9394
| sort by TimeGenerated asc
9495
```
9596

96-
This query on **TunnelDiagnosticLog** will show you multiple columns.
97+
This query on **TunnelDiagnosticLog** shows you multiple columns.
9798

9899

99100
|***Name*** | ***Description*** |
100101
|--- | --- |
101102
|**TimeGenerated** | the timestamp of each event, in UTC timezone.|
102103
|**OperationName** | the event that happened. It can be either *TunnelConnected* or *TunnelDisconnected*.|
103-
| **remoteIP\_s** | the IP address of the on-premises VPN device. In real world scenarios, it is useful to filter by the IP address of the relevant on-premises device shall there be more than one.|
104+
| **remoteIP\_s** | the IP address of the on-premises VPN device. In real world scenarios, it's useful to filter by the IP address of the relevant on-premises device shall there be more than one.|
104105
| **Instance\_s** | the gateway role instance that triggered the event. It can be either GatewayTenantWorker\_IN\_0 or GatewayTenantWorker\_IN\_1, which are the names of the two instances of the gateway.|
105106
| **Resource** | indicates the name of the VPN gateway. |
106107
| **ResourceGroup** | indicates the resource group where the gateway is.|
@@ -111,14 +112,14 @@ Example output:
111112
:::image type="content" source="./media/troubleshoot-vpn-with-azure-diagnostics/image-16-tunnel-connected.png" alt-text="Example of a Tunnel Connected Event seen in TunnelDiagnosticLog.":::
112113

113114

114-
The **TunnelDiagnosticLog** is very useful to troubleshoot past events about unexpected VPN disconnections. Its lightweight nature offers the possibility to analyze large time ranges over several days with little effort.
115+
The **TunnelDiagnosticLog** is useful to troubleshoot past events about unexpected VPN disconnections. Its lightweight nature offers the possibility to analyze large time ranges over several days with little effort.
115116
Only after you identify the timestamp of a disconnection, you can switch to the more detailed analysis of the **IKEdiagnosticLog** table to dig deeper into the reasoning of the disconnections shall those be IPsec related.
116117

117118

118119
Some troubleshooting tips:
119-
- If you see a disconnection event on one gateway instance, followed by a connection event on the **different** gateway instance in a few seconds, you are looking at a gateway failover. This is usually an expected behavior due to maintenance on a gateway instance. To learn more about this behavior, see [About Azure VPN gateway redundancy](./vpn-gateway-highlyavailable.md#activestandby).
120-
- The same behavior will be observed if you intentionally run a Gateway Reset on the Azure side - which causes a reboot of the active gateway instance. To learn more about this behavior, see [Reset a VPN Gateway](./reset-gateway.md).
121-
- If you see a disconnection event on one gateway instance, followed by a connection event on the **same** gateway instance in a few seconds, you may be looking at a network glitch causing a DPD timeout, or a disconnection erroneously sent by the on-premises device.
120+
- If you observe a disconnection event on one gateway instance, followed by a connection event on a different gateway instance within a few seconds, it indicates a gateway failover. Such a event typically arises due to maintenance on a gateway instance. To learn more about this behavior, see [About Azure VPN gateway redundancy](./vpn-gateway-highlyavailable.md#activestandby).
121+
- The same behavior is observed if you intentionally run a **Gateway Reset** on the Azure side - which causes a reboot of the active gateway instance. To learn more about this behavior, see [Reset a VPN Gateway](./reset-gateway.md).
122+
- If you see a disconnection event on one gateway instance, followed by a connection event on the **same** gateway instance in a few seconds, you might be looking at a network glitch causing a DPD timeout, or a disconnection erroneously sent by the on-premises device.
122123

123124
## <a name="RouteDiagnosticLog"></a>RouteDiagnosticLog
124125

@@ -132,15 +133,15 @@ AzureDiagnostics
132133
| project TimeGenerated, OperationName, Message, Resource, ResourceGroup
133134
```
134135

135-
This query on **RouteDiagnosticLog** will show you multiple columns.
136+
This query on **RouteDiagnosticLog** shows you multiple columns.
136137

137138
|***Name*** | ***Description*** |
138139
|--- | --- |
139140
|**TimeGenerated** | the timestamp of each event, in UTC timezone.|
140141
|**OperationName** | the event that happened. Can be either of *StaticRouteUpdate, BgpRouteUpdate, BgpConnectedEvent, BgpDisconnectedEvent*.|
141142
| **Message** | the detail of what operation is happening.|
142143

143-
The output will show useful information about BGP peers connected/disconnected and routes exchanged.
144+
The output shows useful information about BGP peers connected/disconnected and routes exchanged.
144145

145146
Example:
146147

@@ -150,7 +151,7 @@ Example:
150151

151152
## <a name="IKEDiagnosticLog"></a>IKEDiagnosticLog
152153

153-
The **IKEDiagnosticLog** table offers verbose debug logging for IKE/IPsec. This is very useful to review when troubleshooting disconnections, or failure to connect VPN scenarios.
154+
The **IKEDiagnosticLog** table offers verbose debug logging for IKE/IPsec. This is useful to review when troubleshooting disconnections, or failure to connect VPN scenarios.
154155

155156
Here you have a sample query as reference.
156157

@@ -164,24 +165,24 @@ AzureDiagnostics
164165
| sort by TimeGenerated asc
165166
```
166167

167-
This query on **IKEDiagnosticLog** will show you multiple columns.
168+
This query on **IKEDiagnosticLog** shows you multiple columns.
168169

169170

170171
|***Name*** | ***Description*** |
171172
|--- | --- |
172173
|**TimeGenerated** | the timestamp of each event, in UTC timezone.|
173-
| **RemoteIP** | the IP address of the on-premises VPN device. In real world scenarios, it is useful to filter by the IP address of the relevant on-premises device shall there be more than one. |
174-
|**LocalIP** | the IP address of the VPN Gateway we are troubleshooting. In real world scenarios, it is useful to filter by the IP address of the relevant VPN gateway shall there be more than one in your subscription. |
175-
|**Event** | contains a diagnostic message useful for troubleshooting. They usually start with a keyword and refer to the actions performed by the Azure Gateway: **\[SEND\]** indicates an event caused by an IPSec packet sent by the Azure Gateway. **\[RECEIVED\]** indicates an event in consequence of a packet received from on-premises device. **\[LOCAL\]** indicates an action taken locally by the Azure Gateway. |
174+
| **RemoteIP** | the IP address of the on-premises VPN device. In real world scenarios, it's useful to filter by the IP address of the relevant on-premises device shall there be more than one. |
175+
|**LocalIP** | the IP address of the VPN Gateway we're troubleshooting. In real world scenarios, it's useful to filter by the IP address of the relevant VPN gateway shall there be more than one in your subscription. |
176+
|**Event** | contains a diagnostic message useful for troubleshooting. They usually start with a keyword and refer to the actions performed by the Azure Gateway: **\[SEND\]** indicates an event caused by an IPSec packet sent by the Azure Gateway. **\[RECEIVED\]** indicates an event in consequence of a packet received from on-premises device. **\[LOCAL\]** indicates an action taken locally by the Azure Gateway. |
176177

177178

178-
Notice how RemoteIP, LocalIP, and Event columns are not present in the original column list on AzureDiagnostics database, but are added to the query by parsing the output of the "Message" column to simplify its analysis.
179+
Notice how RemoteIP, LocalIP, and Event columns aren't present in the original column list on AzureDiagnostics database, but are added to the query by parsing the output of the "Message" column to simplify its analysis.
179180

180181
Troubleshooting tips:
181182

182183
- In order to identify the start of an IPSec negotiation, you need to find the initial SA\_INIT message. Such message could be sent by either side of the tunnel. Whoever sends the first packet is called "initiator" in IPsec terminology, while the other side becomes the "responder". The first SA\_INIT message is always the one where rCookie = 0.
183184

184-
- If the IPsec tunnel fails to establish, Azure will keep retrying every few seconds. For this reason, troubleshooting "VPN down" issues is very convenient on IKEdiagnosticLog because you do not have to wait for a specific time to reproduce the issue. Also, the failure will in theory always be the same every time we try so you could just zoom into one "sample" failing negotiation at any time.
185+
- If the IPsec tunnel fails to establish, Azure keeps retrying every few seconds. For this reason, troubleshooting "VPN down" issues is convenient on IKEdiagnosticLog because you don't have to wait for a specific time to reproduce the issue. Also, the failure will in theory always be the same every time we try so you could just zoom into one "sample" failing negotiation at any time.
185186

186187
- The SA\_INIT contains the IPSec parameters that the peer wants to use for this IPsec negotiation.
187188
The official document
@@ -208,13 +209,20 @@ This query on **P2SDiagnosticLog** will show you multiple columns.
208209
|**OperationName** | the event that happened. Will be *P2SLogEvent*.|
209210
| **Message** | the detail of what operation is happening.|
210211

211-
The output will show all of the Point to Site settings that the gateway has applied, as well as the IPsec policies in place.
212+
The output shows all of the Point to Site settings that the gateway has applied, and the IPsec policies in place.
212213

213214
:::image type="content" source="./media/troubleshoot-vpn-with-azure-diagnostics/image-28-p2s-log-event.png" alt-text="Example of Point to Site connection seen in P2SDiagnosticLog.":::
214215

215-
Also, whenever a client will connect via IKEv2 or OpenVPN Point to Site, the table will log packet activity, EAP/RADIUS conversations and successful/failure results by user.
216+
Additionally, when a client establishes a connection using OpenVPN and Microsoft Entra ID authentication for point-to-site, the table records packet activity as follows:
217+
218+
```
219+
[MSG] [default] [OVPN_XXXXXXXXXXXXXXXXXXXXXXXXXXX] Connect request received. IP=0.X.X.X:XXX
220+
[MSG] [default] [OVPN_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx] AAD authentication succeeded. Username=***[email protected]
221+
[MSG] [default] [OVPN_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx] Connection successful. Username=***[email protected] IP=10.0.0.1
222+
```
216223

217-
:::image type="content" source="./media/troubleshoot-vpn-with-azure-diagnostics/image-29-eap.png" alt-text="Example of EAP authentication seen in P2SDiagnosticLog.":::
224+
> [!NOTE]
225+
> In the point-to-site log, the username is partially obscured. The first octet of the client user IP is substituted with a `0`.
218226
219227
## Next Steps
220228

0 commit comments

Comments
 (0)