[MT] Endpoint health checks (cloudflare#23911)

marciocloudflare · Oxyjun · sdnts · commit 977084eff45a · 2025-07-24T16:06:15.000-04:00
* added beta badge

* added content

* added curl component

* added api partial info

* updated url

* refined text

* Apply suggestions from code review

Co-authored-by: Jun Lee &lt;junlee@cloudflare.com&gt;

---------

Co-authored-by: Jun Lee &lt;junlee@cloudflare.com&gt;
diff --git a/src/content/docs/magic-transit/network-health/run-endpoint-health-checks.mdx b/src/content/docs/magic-transit/network-health/run-endpoint-health-checks.mdx
@@ -1,11 +1,16 @@
 ---
 pcx_content_type: how-to
-title: Run endpoint health checks
+title: Run endpoint health checks (beta)
 sidebar:
+  badge:
+    text: Beta
   order: 1
+  label: Run endpoint health checks
 
 ---
 
+import { CURL, Render } from '~/components';
+
 Magic Transit uses endpoint health checks to determine the overall health of your [inter-network connections](/magic-transit/reference/gre-ipsec-tunnels/). Probes originate from Cloudflare infrastructure, outside customer network namespaces, and target IP addresses deep within your network, beyond the tunnel-terminating border router. These "long distance" probes are purely diagnostic.
 
 When choosing which endpoint IP addresses to monitor with health checks, use these guidelines:
@@ -15,11 +20,133 @@ When choosing which endpoint IP addresses to monitor with health checks, use the
 
 Cloudflare pings health check IPs from within the [published Cloudflare IP range](https://www.cloudflare.com/ips/), which is also available via the [Cloudflare API](/api/resources/ips/methods/list/).
 
-Refer to the table below for an example of an endpoint health check configuration.
+When configuring an endpoint health check for an IP prefix, you need to select an IP address that is within the range of that IP prefix. Refer to the table below for an example of an endpoint health check configuration.
 
 | Prefix            | Endpoint IP address |
 | ----------------- | ------------------- |
 | `103.21.244.0/24` | `103.21.244.100`    |
 | `103.21.245.0/24` | `103.21.245.100`    |
 
 Refer to [Tunnel health checks](/magic-transit/reference/tunnel-health-checks/) for more information on this topic.
+
+## Configure endpoint health checks (beta)
+
+Endpoint health checks can only be configured via the Cloudflare API. Endpoint health checks can not be configured via the dashboard, and they are not shown in the dashboard. Currently, configuring health checks is a beta feature.
+
+Refer to the [API documentation](/api/resources/diagnostics/subresources/endpoint-healthchecks/) to learn how to create, list, and delete endpoint health checks. Here is an example of an API request which sends a request to the Cloudflare API to create a new endpoint health check.
+
+<Render file="account-id-api-key" product="networking-services" />
+
+<CURL
+  url="https://api.cloudflare.com/client/v4/accounts/account_id/diagnostics/endpoint-healthcheck"
+  method="POST"
+  json={{
+    "check_type": "icmp",
+    "endpoint": "8.31.160.1"
+  }}
+/>
+
+```json output
+{
+    "result": {
+        "id": "<HEALTH_CHECK_ID>",
+        "check_type": "icmp",
+        "endpoint": "8.31.160.1"
+    },
+    "success": true,
+    "errors": [],
+    "messages": []
+}
+```
+
+## Query GraphQL endpoint health checks
+
+You can also use GraphQL to query endpoints within your network. This allows you to know if an IP within your network is reachable, and also set alerts in case there is a problem.
+
+You can query the following categories:
+
+- `checkId`: The ID of the check associated with the health check.
+- `checkType`: The type of check associated with the health check.
+- `date`: The health check event timestamp truncated to the day.
+- `datetime`: The health check event timestamp.
+- `datetimeFifteenMinutes`: The health check event timestamp truncated to multiples of 15 minutes.
+- `datetimeFiveMinutes`: The health check event timestamp truncated to multiples of five minutes.
+- `datetimeHalfOfHour`: The health check event timestamp truncated to multiples of 30 minutes.
+- `datetimeHour`: The health check event timestamp truncated to the hour.
+- `datetimeMinute`: The health check event timestamp truncated to the minute.
+- `endpoint`: The endpoint of the check associated with the health check.
+
+Refer to the example below on how to create a query to check your endpoint's health:
+
+```graphql
+query Viewer {
+    viewer {
+        accounts(filter: { accountTag: "YOUR_ACCOUNT_TAG" }) {
+            magicEndpointHealthCheckAdaptiveGroups(
+                filter: { date_geq: "2025-05-10" }
+                limit: 10
+            ) {
+                count
+                dimensions {
+                    checkId
+                    checkType
+                    date
+                    datetime
+                    datetimeFifteenMinutes
+                    datetimeFiveMinutes
+                    datetimeHalfOfHour
+                    datetimeHour
+                    datetimeMinute
+                    endpoint
+                }
+                sum {
+                    failures
+                    total
+                }
+            }
+        }
+    }
+}
+```
+
+## Configure alerts for endpoint health checks
+
+You can set up alerts to be notified when the state of your endpoint's health is below a threshold defined by you.
+
+1. Make a `GET` request to get a list of IDs for all of the endpoint health checks configured:
+
+<CURL
+  url="https://api.cloudflare.com/client/v4/accounts/<account id>/diagnostics/endpoint-healthcheck"
+  method="GET"
+/>
+
+```json output
+{
+    "result": [
+        {
+            "id": "<HEALTH_CHECK_ID>",
+            "check_type": "icmp",
+            "endpoint": "8.31.160.1"
+        }
+    ],
+    "success": true,
+    "errors": [],
+    "messages": []
+}
+```
+
+2. Take note of the `id` value for the endpoint you want to get alerts for.
+3. Log in to the [Cloudflare dashboard](https://dash.cloudflare.com/), and select your account.
+4. Go to **Notifications** > **Add**.
+5. From the drop down menu, select *Magic Transit*.
+6. Select **Magic Endpoint Health Check Alert**.
+7. Provide a name for your new notification and optionally provide a description.
+8. In the *Service Level Objective (SLO)* dropdown, select the SLO threshold for your notification. The SLO defines the percentage of endpoint health checks that should be passing. If the number of endpoint health checks passing is less than the SLO, an alert is generated:
+	- **High** - 99% of endpoint health checks
+	- **Medium** - 98% of endpoint health checks
+	- **Low** - 97% of endpoint health checks
+9. In the dropdown menu below SLOs, select the `id` value that matches the `id` you got through the API in step one. This `id` should match the endpoint health check you want to get notifications for.
+10. Select your preferred notification method (like email or Webhooks).
+11. Select **Save**.
+
+You will now receive notifications via your preferred method whenever the SLO for your endpoint health checks falls below your chosen threshold.