diff --git a/src/assets/images/magic-transit/tunnel-health-check-packets.png b/src/assets/images/magic-transit/tunnel-health-check-packets.png deleted file mode 100644 index 060168d881b01e..00000000000000 Binary files a/src/assets/images/magic-transit/tunnel-health-check-packets.png and /dev/null differ diff --git a/src/content/docs/magic-transit/reference/tunnel-health-checks.mdx b/src/content/docs/magic-transit/reference/tunnel-health-checks.mdx index f54c61af202e8b..4ac972925b9cb0 100644 --- a/src/content/docs/magic-transit/reference/tunnel-health-checks.mdx +++ b/src/content/docs/magic-transit/reference/tunnel-health-checks.mdx @@ -11,11 +11,11 @@ import { Render } from "~/components"; diff --git a/src/content/docs/magic-wan/reference/tunnel-health-checks.mdx b/src/content/docs/magic-wan/reference/tunnel-health-checks.mdx index 9166ee5811bd5a..ab59f0fba6890d 100644 --- a/src/content/docs/magic-wan/reference/tunnel-health-checks.mdx +++ b/src/content/docs/magic-wan/reference/tunnel-health-checks.mdx @@ -12,14 +12,11 @@ import { Render } from "~/components"; file="tunnel-health/tunnel-health-checks" product="magic-transit" params={{ - healthCheckFrequencyURL: - "/magic-wan/configuration/common-settings/tunnel-health-checks/", + addTunnels: "/magic-wan/configuration/manually/how-to/configure-tunnels/#add-tunnels", + changeHealthCheckRate: "/magic-wan/configuration/common-settings/tunnel-health-checks/", + probeHealth: "#health-state-and-prioritization", productName: "Magic WAN", - onboardingURL: - "/magic-wan/configuration/manually/how-to/configure-static-routes/", - configureTunnelEndpointsURL: - "/magic-wan/configuration/manually/how-to/configure-tunnels/", - urlChangeHealthCheckType: - "/magic-wan/configuration/manually/how-to/configure-tunnels/#add-tunnels", + staticRoutes: "/magic-wan/configuration/manually/how-to/configure-static-routes/", + tunnelEndpoints: "/magic-wan/configuration/manually/how-to/configure-tunnels/" }} /> diff --git a/src/content/partials/magic-transit/tunnel-health/tunnel-health-checks.mdx b/src/content/partials/magic-transit/tunnel-health/tunnel-health-checks.mdx index e6db5d0f82c59e..2739b4cd4c002c 100644 --- a/src/content/partials/magic-transit/tunnel-health/tunnel-health-checks.mdx +++ b/src/content/partials/magic-transit/tunnel-health/tunnel-health-checks.mdx @@ -1,33 +1,187 @@ --- params: - - healthCheckFrequencyURL + - addTunnels + - changeHealthCheckRate + - probeHealth - productName - - onboardingURL - - configureTunnelEndpointsURL - - urlChangeHealthCheckType + - staticRoutes + - tunnelEndpoints --- -import { Details, GlossaryTooltip, Markdown, Render } from "~/components"; +import { Render } from "~/components"; -A tunnel health check probe contains an [ICMP (Internet Control Message Protocol)](https://www.cloudflare.com/learning/ddos/glossary/internet-control-message-protocol-icmp/) reply packet that originates from an IP address on the origin side of the tunnel and whose destination address is a public Cloudflare IP. +A tunnel health check probe consists of an [ICMP (Internet Control Message Protocol)](https://www.cloudflare.com/learning/ddos/glossary/internet-control-message-protocol-icmp/) payload encapsulated in the protocol of the tunnel the probe is being conducted for. For example, if the tunnel is an IPsec tunnel, the ICMP packet is encrypted within the Encapsulating Security Payload (ESP) packet of the tunnel. -Cloudflare encapsulates the ICMP reply packet and sends the probe across the tunnel to the origin. When the probe reaches the origin router, the router decapsulates the ICMP reply and forwards it to the specified destination IP. The probe is successful when Cloudflare receives the reply. +A tunnel health check probe comes from Cloudflare to the tunnel origin, then returns a response to Cloudflare. This response is used to determine the outcome of the probe, which is used to calculate the state of the tunnel (this is explained in greater detail below). - +## Tunnel health check attributes + +A tunnel health check probe has important attributes described below. + +### Target + +A tunnel health check probe tests whether Cloudflare can successfully connect to a specific address or endpoint via the tunnel. The target is the address you want to ensure is reachable through the tunnel. This helps verify that the tunnel is functional and traffic can flow properly to the intended destination. It is optional, and there are certain defaults depending on the direction of the health check (refer to [Direction](#direction) for more information). + +### Direction + +A tunnel health check probe can have two possible directions — unidirectional and bidirectional. + +#### Unidirectional + +A unidirectional health check probe stays encapsulated in one direction and comes into the origin via the tunnel (from Cloudflare to the origin). The response comes back to Cloudflare unencapsulated and is routed outside of the tunnel following standard Internet routing. + +The target defaults to the publicly routable origin specified as the `customer_endpoint` on the tunnel, if present. Otherwise, you can use a custom target. + +#### Bidirectional + +A bidirectional probe stays encapsulated in both directions, that is, the probe comes in via the tunnel and the response also leaves encapsulated via the tunnel. + +By default, these packets are destined for the Cloudflare side of the interface address field set on the tunnel, and are sourced from the client of the tunnel. For example, if the interface address is `10.100.0.8/31`, then the packet will be destined for `10.100.0.9` and sourced from `10.100.0.8`. + +Note that the interface address field is always a `/30` or `/31` CIDR range. In the case of a `/31` range, the IP provided will be the Cloudflare side, whereas the other will be the client side. For example, if the interface address is `10.100.0.8/31`, `10.100.0.8` is the Cloudflare side, and `10.100.0.9` is the client side. In the case of a `/30` range, the IP provided will be the Cloudflare side whereas the other IP (excluding the broadcast and network identifier) will be the client side. For example, if the interface address is `10.100.0.9/30`, `10.100.0.9` will be the Cloudflare side and `10.100.0.10` will be the client side. + +A bidirectional health check can also be configured with a custom public target and is the recommended approach for an Azure Active Standby tunnel setup. + +These packets will flow to and from Cloudflare over the tunnels you have configured to provide full visibility into the traffic path between our network and your sites. You will need to configure traffic selectors to accept the health check packets in the case of IPsec tunnels. + +Refer to Add tunnels to learn how to configure bidirectional or unidirectional health checks. + +#### Legacy bidirectional health checks + +For customers using the legacy health check system with a public IP range, Cloudflare recommends: + +- Configuring the tunnel health check target IP address to one within the `172.64.240.252/30` prefix range. +- Applying a policy-based route that matches packets with a source IP address equal to the configured tunnel health check target (for example `172.64.240.253/32`), and route them over the tunnel back to Cloudflare. -Every Cloudflare data center configured to process your traffic sends tunnel health check probes. The rate at which these health check probes are sent varies based on tunnel and location. This rate can also be tuned up or down on a per tunnel basis by modifying the `health_check` rate of a tunnel with the API. +### Type -When a probe attempt fails for a [healthy](#health-state-and-prioritization) tunnel, each server detecting the failure quickly probes up to two more times to obtain an accurate result. We also do the same if a tunnel has been down and probes start returning success. Because Cloudflare global network servers send probes up to every second, you can expect your network to receive several hundred health check packets per second - each Cloudflare data center will only send one health check packet as part of a probe. This represents a relatively trivial amount of traffic. +A tunnel health check probe can have two possible types: request and reply. For each type, the source and destination address depends on the direction. Refer to Add tunnels to learn how to change this setting. -:::note[Note] -To avoid control plane policies enforced by the origin network, tunnel health checks use an encapsulated ICMP reply instead of an ICMP echo request. To use echo request packets, change your health check type to **Request** in your tunnels. Refer to Configure tunnel endpoints to learn how to change this setting. +#### Request style + +In a request style health check the payload probe is an ICMP request. + +For a unidirectional probe, the source address is the Cloudflare side of the tunnel (a publicly routable address) and the destination is the origin router (also publicly routable). The origin router receives the probe and produces an ICMP response with the opposite source and destination, and sends it outside of the tunnel. + +For a bidirectional probe, the source address is the interface address of the Cloudflare side of the tunnel (a privately routable address) and the destination is the interface address of the tunnel (also privately routable). The origin router receives the probe and produces an ICMP response with the opposite source and destination and sends it into the tunnel. + +#### Reply style + +In a reply style health check the payload probe is an ICMP response. + +For a unidirectional probe, the destination address is the Cloudflare side of the tunnel (a publicly routable address) and the source is the origin router (also publicly routable). The origin router receives the probe and sends it back as the response, unchanged, outside of the tunnel. + +For a Bidirectional probe, the destination address is the interface address of the Cloudflare side of the tunnel (a privately routable address) and the source is the interface address of the tunnel (also privately routable). The origin router receives the probe packet and sends the probe packet back as the response (unchanged) into the tunnel as the destination is routed via the tunnel. + +:::note +To avoid control plane policies enforced by the origin network, tunnel health checks can be set to use a request style health check if reply style health checks are being dropped. ::: -
+### Summary table with tunnel health check probe types + +| Attribute | Type | Unidirectional health checks | Bidirectional health checks | +| :---: | :---: | :---: | :---: | +| Source Address | Reply Style | Cloudflare Address (Publicly Routable) | Cloudflare Interface Address (Privately Routable) | +| Destination Address | Reply Style | Origin Tunnel Endpoint (Publicly Routable) | Origin Interface Address (Privately Routable) / Custom Target | +| Source Address | Request Style | Origin Tunnel Endpoint (Publicly Routable) | Origin Interface Address (Privately Routable) / Custom Target | +| Destination Address | Reply Style | Cloudflare Address (Publicly Routable) | Cloudflare Interface Address (Privately Routable) | + +### Graphics summarizing health check types + +#### Bidirectional request style + +```mermaid +flowchart TB + subgraph Tunnel Healthcheck Probe + cloudflare(Cloudflare) --- bare_echo_request([ICMP Echo Request]) + bare_echo_request --> tunnel[Tunnel] + tunnel --- encapsulated_echo_request([Tunnel Protocol < ICMP Echo Request >]) + encapsulated_echo_request --> internet([Internet]) + internet --- encapsulated_echo_request_2([Tunnel Protocol < ICMP Echo Request >]) + encapsulated_echo_request_2 --> origin_tunnel(Tunnel) + origin_tunnel --- received_bare_echo_request([ICMP Echo Request]) + received_bare_echo_request --> origin(Origin) + end + subgraph Tunnel Healthcheck Response + origin --> bare_echo_reply([ICMP Echo Reply]) + bare_echo_reply --- origin_tunnel_2(Tunnel) + origin_tunnel_2 --- encapsulated_echo_reply([Tunnel Protocol < ICMP Echo Reply >]) + encapsulated_echo_reply --- internet_2([Internet]) + internet_2 --> encapsulated_echo_reply_2([Tunnel Protocol < ICMP Echo Reply >]) + encapsulated_echo_reply_2 --> tunnel_2[Tunnel] + tunnel_2 --> bare_echo_reply_2([ICMP Echo Reply]) + bare_echo_reply_2 --> cloudflare + end +``` + +#### Bidirectional reply style + +```mermaid +flowchart TB + subgraph Tunnel Healthcheck Probe + cloudflare(Cloudflare) --- bare_echo_probe([ICMP Echo Reply]) + bare_echo_probe --> tunnel[Tunnel] + tunnel --- encapsulated_echo_probe([Tunnel Protocol < ICMP Echo Reply >]) + encapsulated_echo_probe --> internet([Internet]) + internet --- encapsulated_echo_probe_2([Tunnel Protocol < ICMP Echo Reply >]) + encapsulated_echo_probe_2 --> origin_tunnel(Tunnel) + origin_tunnel --- received_bare_echo_reply([ICMP Echo Reply]) + received_bare_echo_reply --> origin(Origin) + end + subgraph Tunnel Healthcheck Response + origin --> bare_echo_reply([ICMP Echo Reply]) + bare_echo_reply --- origin_tunnel_2(Tunnel) + origin_tunnel_2 --- encapsulated_echo_reply([Tunnel Protocol < ICMP Echo Reply >]) + encapsulated_echo_reply --- internet_2([Internet]) + internet_2 --> encapsulated_echo_reply_2([Tunnel Protocol < ICMP Echo Reply >]) + encapsulated_echo_reply_2 --> tunnel_2[Tunnel] + tunnel_2 --> bare_echo_reply_2([ICMP Echo Reply]) + bare_echo_reply_2 --> cloudflare + end +``` + +#### Unidirectional echo request + +```mermaid +flowchart TB + cloudflare(Cloudflare) --- bare_echo_probe([ICMP Echo Request]) + bare_echo_probe --> tunnel[Tunnel] + tunnel --- encapsulated_echo_probe([Tunnel Protocol < ICMP Echo Request >]) + encapsulated_echo_probe --> internet([Internet]) + internet --- encapsulated_echo_probe_2([Tunnel Protocol < ICMP Echo Request >]) + encapsulated_echo_probe_2 --> origin_tunnel(Tunnel) + origin_tunnel --- received_bare_echo_reply([ICMP Echo Request]) + received_bare_echo_reply --> origin(Origin) + origin --- received_bare_echo_reply_2([ICMP Echo Reply]) + received_bare_echo_reply_2 --> internet_2([Internet]) + internet_2 --> cloudflare +``` + +#### Unidirectional echo reply + +```mermaid +flowchart TB + cloudflare(Cloudflare) --- bare_echo_probe([ICMP Echo Reply]) + bare_echo_probe --> tunnel[Tunnel] + tunnel --- encapsulated_echo_probe([Tunnel Protocol < ICMP Echo Reply >]) + encapsulated_echo_probe --> internet([Internet]) + internet --- encapsulated_echo_probe_2([Tunnel Protocol < ICMP Echo Reply >]) + encapsulated_echo_probe_2 --> origin_tunnel(Tunnel) + origin_tunnel --- received_bare_echo_reply([ICMP Echo Reply]) + received_bare_echo_reply --> origin(Origin) + origin --- received_bare_echo_reply_2([ICMP Echo Reply]) + received_bare_echo_reply_2 --> internet_2([Internet]) + internet_2 --> cloudflare +``` + +### Rate + + + +Every Cloudflare data center configured to process your traffic sends tunnel health check probes. The rate at which these health check probes are sent varies based on tunnel and location. This rate can also be tuned up or down on a per tunnel basis by modifying the `health_check` rate of a tunnel with the API or the dash. A customer can set the rate value as _low_, _mid_ or _high_, with _mid_ being the default option. The actual rate formula considers the number of servers in a Cloudflare data center or the number of servers with the customer namespace provisioned on them for dynamically provisioned namespaces. Thus, the rate is not a specific number; it is dynamic and depends on the size of our network. -![Wireshark example for tunnel health checks with ICMP reply packet](~/assets/images/magic-transit/tunnel-health-check-packets.png) +When a probe attempt fails for a healthy tunnel, each server detecting the failure quickly probes up to two more times to obtain an accurate result. We also do the same if a tunnel has been down and probes start returning success. Because Cloudflare global network servers send probes up to every second, you can expect your network to receive several hundred health check packets per second — each Cloudflare data center will only send one health check packet as part of a probe. This represents a relatively trivial amount of traffic. -
## Health state and prioritization @@ -35,12 +189,12 @@ There are three tunnel health states: healthy, degraded, and down. Healthy tunnels are preferred to degraded tunnels, and degraded tunnels are preferred to those that are down. -{props.productName} steers traffic to tunnels based on priorities you set when you assign tunnel route priorities during onboarding. Tunnel routes with lower values have priority over those with higher values. +{props.productName} steers traffic to tunnels based on priorities you set when you assign tunnel route priorities during onboarding. Tunnel routes with lower values have priority over those with higher values. -:::note[Note] +:::note Cloudflare global network servers may be able to reach the origin infrastructure from some locations at a given time but not others. This occurs because Cloudflare does not synchronize health checks among global network servers and because the Internet is not homogeneous. -As a result, tunnel health may be in different states in different parts of the world at the same time. In the example from the previous paragraph, both tunnels could receive traffic simultaneously, even though Tunnel 1 has priority over Tunnel 2. +As a result, tunnel health may be in different states in different parts of the world at the same time. ::: ## Tunnel state determination @@ -61,11 +215,11 @@ When {props.productName} identifies a route that is not healthy, it applies thes - **Degraded**: Add `500,000` to priority. - **Down**: Add `1,000,000` to priority. -The values for failure penalties are intentionally extreme so that they always exceed the priority values assigned during routing configuration. +The values for failure penalties are intentionally extreme so that they always exceed the priority values assigned during routing configuration. Applying a penalty instead of removing the route altogether preserves redundancy and maintains options for customers with only one tunnel. Penalties also support the case when multiple tunnels are unhealthy. -### Cloudflare data centers and tunnels +## Cloudflare data centers and tunnels In the event a Cloudflare data center is down, Cloudflare's global network does not advertise your prefixes, and your packets are routed to the next closest data center. To check the system status for Cloudflare's global network and dashboard, refer to [Cloudflare System Status](https://www.cloudflarestatus.com/). @@ -75,9 +229,9 @@ Once a tunnel is in the down state, global network servers continue to emit prob Tunnels in a degraded state transition to healthy when the failure rate for the previous 30 probes is less than 0.1%. This transition may take up to 30 minutes. -{props.productName}'s tunnel health check system allows a tunnel to quickly transition from healthy to degraded or down, but tunnel transition occurs slowly from degraded or down to healthy. This scenario is referred to as hysteresis - which is when a system's output depends on its history of past inputs - and dampens changes to tunnel routing caused by flapping and other intermittent network failures. +{props.productName}'s tunnel health check system allows a tunnel to quickly transition from healthy to degraded or down, but tunnel transition occurs slowly from degraded or down to healthy. This scenario is referred to as hysteresis — which is when a system's output depends on its history of past inputs — and dampens changes to tunnel routing caused by flapping and other intermittent network failures. -:::note[Note] +:::note Cloudflare always attempts to send traffic over available tunnel routes with the highest priority (lowest route value), even when all configured tunnels are in an unhealthy state. ::: @@ -88,7 +242,7 @@ Consider two tunnels and their associated routing priorities. Remember that lowe - Tunnel 1, route priority `100` - Tunnel 2, route priority `200` -When both tunnels are in a healthy state, routing priority directs traffic exclusively to Tunnel 1 because its route priority of 100 beats that of Tunnel 2. Tunnel 2 does not receive any traffic, except for tunnel health check probes. Endpoint health checks only flow over Tunnel 1 to their destination inside the origin network. +When both tunnels are in a healthy state, routing priority directs traffic exclusively to Tunnel 1 because its route priority of `100` beats that of Tunnel 2. Tunnel 2 does not receive any traffic, except for tunnel health check probes. Endpoint health checks only flow over Tunnel 1 to their destination inside the origin network. ### Failure response @@ -116,22 +270,8 @@ During onboarding, you specify IP addresses to configure endpoint health checks. ### Tunnel health checks -Tunnel health checks monitor the status of the Generic Routing Encapsulation (GRE) and IPsec tunnels that route traffic from Cloudflare to your origin network. {props.productName} relies on health checks to steer traffic to the best available routes. - -During onboarding, you specify the tunnel endpoints the tunnel probes originating from Cloudflare's global network will target. - -Tunnel health check results are exposed [via API](/analytics/graphql-api/tutorials/querying-magic-transit-tunnel-healthcheck-results/). These results are aggregated from individual health check results done on Cloudflare servers. - -#### Bidirectional health checks - -To check for tunnel health, Cloudflare sends packets in the form of ICMP echo replies. These packets are destined for the Cloudflare side of the interface address field set on the IPsec tunnel, and are sourced from the client of the tunnel. For example, if the interface address is `10.100.0.8/31`, then the packet will be destined for `10.100.0.9` and sourced from `10.100.0.8`. - -Note that the interface address field is always a `/30` or `/31` CIDR range. In the case of a `/31` range, the IP provided will be the Cloudflare side, whereas the other will be the client side. For example, if the interface address is `10.100.0.8/31`, `10.100.0.8` is the Cloudflare side, and `10.100.0.9` is the client side. In case of a `/30` range, the IP provided will be the Cloudflare side whereas the other IP (excluding the broadcast and network identifier) will be the client side. For example, if the interface address is `10.100.0.9/30`, `10.100.0.9` will be the Cloudflare side and `10.100.0.10` will be the client side. - -These packets will flow to and from Cloudflare over the IPsec tunnels you have configured to provide full visibility into the traffic path between our network and your sites. You will need to configure traffic selectors to accept the health check packets. - -Refer to Add tunnels to learn how to configure bidirectional or unidirectional health checks. +Tunnel health checks monitor the status of the tunnels that route traffic from Cloudflare to your origin network. {props.productName} relies on health checks to steer traffic to the best available routes. -#### Legacy health checks system +During onboarding, you specify the tunnel endpoints or tunnel health check targets the tunnel probes originating from Cloudflare's global network will target. - +Tunnel health check results are exposed [via API](/analytics/graphql-api/tutorials/querying-magic-transit-tunnel-healthcheck-results/). These results are aggregated from individual health check results done on Cloudflare servers. \ No newline at end of file