Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion content/en/incident_response/on-call/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Datadog On-Call integrates monitoring, paging, and incident response into one pl

- **Pages** represent something to get alerted for, such as a monitor, incident, or security signal. A Page can have a status of `Triggered`, `Acknowledged`, or `Resolved`.
- **Teams** are groups configured within Datadog to handle specific types of Pages, based on expertise and operational roles.
- **Routing rules** allow Teams to finely adjust their reactions to specific types of incoming events. These rules can set a Page's urgency level and route Pages to different escalation policies depending on the event's metadata.
- **Routing rules** allow Teams to finely adjust their reactions to specific types of incoming events. These rules can set a Page's urgency level, route Pages to different escalation policies depending on the event's metadata, and configure [support hours][7] to delay escalation notifications to defined time windows.
- **Escalation policies** determine how Pages are escalated within or across Teams.
- **Schedules** set timetables for when specific Team members are on-call to respond to Pages.

Expand Down Expand Up @@ -106,3 +106,4 @@ On-Call is a seat-based SKU. To learn more about how On-Call is billed and how t
[4]: /account_management/rbac/#role-based-access-control
[5]: https://www.datadoghq.com/pricing/?product=incident-response#products
[6]: /account_management/billing/incident_response/
[7]: /incident_response/on-call/routing_rules#support-hours
76 changes: 75 additions & 1 deletion content/en/incident_response/on-call/routing_rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@ With routing rules, you can define granular logic to control how alerts reach yo
- During business hours, route alerts to an escalation policy.
- After hours, route critical alerts to paging, and non-critical alerts to chat.

- Delay escalation outside of support hours:
- Define [support hours](#support-hours) on an escalation policy action to postpone notifications until the next active window.
- For example, a Page that arrives at 2:00 AM on Saturday creates a case immediately, but does not notify responders until 9:00 AM on Monday.

- Use Dynamic Urgency to automatically detect urgency from the monitor alert:
- `warn` status ➝ low urgency
- `alert` status ➝ high urgency
Expand Down Expand Up @@ -55,11 +59,81 @@ Routing rules use [Datadog query syntax][3] and support multiple `if/else` condi
| `priority` | Monitor priority (1–5) | `priority:(1 OR 2)` |
| `alert_status` | Monitor status (`error`, `warn`, `success`) | `alert_status:(error OR warn)` |

## Support hours

Support hours let you define time windows during which an escalation policy actively notifies responders. When a Page arrives outside of support hours, Datadog creates the Page immediately but **postpones** the escalation policy until the next active support hours window. After the postponement period ends, the escalation policy begins executing normally.

### How support hours work

1. An alert triggers a Page to an On-Call team.
1. Routing rules are evaluated from top to bottom to find a matching rule.
1. The matching rule's escalation policy action checks the current time against the configured support hours:
- **Inside support hours**: The escalation policy executes immediately and responders are notified.
- **Outside support hours**: The Page is created and the escalation policy is postponed. Datadog records a timeline entry on the Page indicating the postponement. When support hours resume, the escalation policy begins executing.

### Support hours compared to time restrictions

Routing rules support two types of time-based controls. They serve different purposes:

| Feature | What it controls | Behavior outside the time window |
|---------|-----------------|----------------------------------|
| **Time restrictions** | When the routing rule **evaluates** | The rule is skipped and the next rule is tried. No Page is created by this rule. |
| **Support hours** | When the escalation policy **notifies responders** | The Page is created immediately, but notifications are postponed until the next active window. |

For example, if your team handles priority 2 alerts and wants to track all alerts but only page responders during business hours, use **support hours**. If your team should not handle certain alerts at all outside of business hours (and another rule or team should handle them instead), use **time restrictions**.

<div class="alert alert-warning">You cannot configure both time restrictions and support hours on the same routing rule. Use one or the other.</div>

### Configure support hours

To add support hours to a routing rule's escalation policy action, configure a time zone and one or more time windows (restrictions).

Each support hours configuration includes:
- **Time zone**: An IANA time zone (for example, `America/New_York`, `Europe/Paris`, or `Asia/Tokyo`).
- **Restrictions**: One or more time windows that define when the escalation policy is active. Each restriction specifies:
- A **start day** and **start time**
- An **end day** and **end time**

Times use the `HH:MM:SS` format (for example, `09:00:00` for 9:00 AM).

If multiple restriction windows are defined, the escalation policy is active if the current time matches **any** of the windows.

#### Example: Business hours only (Monday through Friday, 9 AM to 5 PM)

Set a single restriction window:
- **Start day**: Monday, **Start time**: 09:00:00
- **End day**: Friday, **End time**: 17:00:00
- **Time zone**: `America/New_York`

Pages that arrive outside this window (for example, at 2:00 AM on Saturday) are postponed until 9:00 AM on the following Monday.

#### Example: Split shift (mornings and afternoons)

Define two restriction windows to cover non-contiguous hours:

**Window 1:**
- **Start day**: Monday, **Start time**: 09:00:00
- **End day**: Friday, **End time**: 12:00:00

**Window 2:**
- **Start day**: Monday, **Start time**: 14:00:00
- **End day**: Friday, **End time**: 18:00:00

Pages that arrive between 12:00 PM and 2:00 PM are postponed until the afternoon window opens.

## Best practices

- Balance visibility with urgency:
- Use paging and escalation policies for critical alerts that require immediate action.
- Use Slack or Teams for lower-severity issues that need awareness but don’t warrant an on-call response.
- Use Slack or Teams for lower-severity issues that need awareness but don't warrant an on-call response.

- Use support hours to protect responders from off-hours notifications:
- For non-critical alerts, configure support hours to match your team's working hours. Pages are tracked immediately but responders are only notified during active windows.
- For critical alerts that require immediate attention regardless of time, do **not** set support hours on the escalation policy.

- Choose between time restrictions and support hours based on your routing needs:
- Use **time restrictions** when a different routing rule or team should handle the alert outside of business hours.
- Use **support hours** when your team should own the alert at all times but only page responders during defined hours.

## Further reading

Expand Down
Loading