diff --git a/content/en/incident_response/on-call/_index.md b/content/en/incident_response/on-call/_index.md index 9ce7d996740..ecdf503b83a 100644 --- a/content/en/incident_response/on-call/_index.md +++ b/content/en/incident_response/on-call/_index.md @@ -25,7 +25,7 @@ Datadog On-Call integrates monitoring, paging, and incident response into one pl - **Pages** represent something to get alerted for, such as a monitor, incident, or security signal. A Page can have a status of `Triggered`, `Acknowledged`, or `Resolved`. - **Teams** are groups configured within Datadog to handle specific types of Pages, based on expertise and operational roles. -- **Routing rules** allow Teams to finely adjust their reactions to specific types of incoming events. These rules can set a Page's urgency level and route Pages to different escalation policies depending on the event's metadata. +- **Routing rules** allow Teams to finely adjust their reactions to specific types of incoming events. These rules can set a Page's urgency level, route Pages to different escalation policies depending on the event's metadata, and configure [support hours][7] to delay escalation notifications to defined time windows. - **Escalation policies** determine how Pages are escalated within or across Teams. - **Schedules** set timetables for when specific Team members are on-call to respond to Pages. @@ -106,3 +106,4 @@ On-Call is a seat-based SKU. To learn more about how On-Call is billed and how t [4]: /account_management/rbac/#role-based-access-control [5]: https://www.datadoghq.com/pricing/?product=incident-response#products [6]: /account_management/billing/incident_response/ +[7]: /incident_response/on-call/routing_rules#support-hours diff --git a/content/en/incident_response/on-call/routing_rules.md b/content/en/incident_response/on-call/routing_rules.md index eb9c2e79ee6..b9613213a0f 100644 --- a/content/en/incident_response/on-call/routing_rules.md +++ b/content/en/incident_response/on-call/routing_rules.md @@ -23,6 +23,10 @@ With routing rules, you can define granular logic to control how alerts reach yo - During business hours, route alerts to an escalation policy. - After hours, route critical alerts to paging, and non-critical alerts to chat. +- Delay escalation outside of support hours: + - Define [support hours](#support-hours) on an escalation policy action to postpone notifications until the next active window. + - For example, a Page that arrives at 2:00 AM on Saturday creates a case immediately, but does not notify responders until 9:00 AM on Monday. + - Use Dynamic Urgency to automatically detect urgency from the monitor alert: - `warn` status ➝ low urgency - `alert` status ➝ high urgency @@ -55,11 +59,81 @@ Routing rules use [Datadog query syntax][3] and support multiple `if/else` condi | `priority` | Monitor priority (1–5) | `priority:(1 OR 2)` | | `alert_status` | Monitor status (`error`, `warn`, `success`) | `alert_status:(error OR warn)` | +## Support hours + +Support hours let you define time windows during which an escalation policy actively notifies responders. When a Page arrives outside of support hours, Datadog creates the Page immediately but **postpones** the escalation policy until the next active support hours window. After the postponement period ends, the escalation policy begins executing normally. + +### How support hours work + +1. An alert triggers a Page to an On-Call team. +1. Routing rules are evaluated from top to bottom to find a matching rule. +1. The matching rule's escalation policy action checks the current time against the configured support hours: + - **Inside support hours**: The escalation policy executes immediately and responders are notified. + - **Outside support hours**: The Page is created and the escalation policy is postponed. Datadog records a timeline entry on the Page indicating the postponement. When support hours resume, the escalation policy begins executing. + +### Support hours compared to time restrictions + +Routing rules support two types of time-based controls. They serve different purposes: + +| Feature | What it controls | Behavior outside the time window | +|---------|-----------------|----------------------------------| +| **Time restrictions** | When the routing rule **evaluates** | The rule is skipped and the next rule is tried. No Page is created by this rule. | +| **Support hours** | When the escalation policy **notifies responders** | The Page is created immediately, but notifications are postponed until the next active window. | + +For example, if your team handles priority 2 alerts and wants to track all alerts but only page responders during business hours, use **support hours**. If your team should not handle certain alerts at all outside of business hours (and another rule or team should handle them instead), use **time restrictions**. + +