Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/guides/migration-recipe.md
Original file line number Diff line number Diff line change
Expand Up @@ -155,7 +155,7 @@ index 68387c9..7a8ace1 100644
sudo service nginx stop
```
```{note}
Don't forget to pause service checks for both the old and new hosts in things like Dead Man's Snitch, Pingdom, etc.
Don't forget to pause service checks for both the old and new hosts in things like Dead Man's Snitch, Datadog, etc.
```
4. Ensure that any additional volumes are mounted and in the correct location:
- Check what disks are currently mounted and where: `df`
Expand Down
7 changes: 2 additions & 5 deletions docs/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,8 @@ Amazon Route 53
It is currently manually managed by Infrastructure Staff.

DataDog
`DataDog <https://www.datadoghq.com>`_ provides metrics, dashboards, and alerts.

Pingdom
`Pingdom <https://www.pingdom.com>`_ provides monitoring and complains to us
when services are down.
`DataDog <https://www.datadoghq.com>`_ provides metrics, dashboards, alerts,
provides some monitoring and complains to us when services are down.

PagerDuty
`PagerDuty <https://www.pagerduty.com>`_ is used for on-call rotation for PSF
Expand Down
2 changes: 2 additions & 0 deletions pillar/dev/secrets/datadog.sls
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
datadog_api_key: deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
datadog_app_key: deadbeefdeadbeefdeadbeefdeadbeefdeadbeef
76 changes: 76 additions & 0 deletions salt/datadog/synthetics.sls
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
{% set minion_id = grains['id'] %}
{% set fqdn = grains['fqdn'] %}
{% set api_key = pillar.get('datadog_api_key') %}
{% set app_key = pillar.get('datadog_app_key') %}
{% set datadog_locations = salt['http.query']('https://api.datadoghq.com/api/v1/synthetics/locations', header_list=['DD-API-KEY: ' + api_key, 'DD-APPLICATION-KEY: ' + app_key], decode=True) %}
{% set existing_monitors = salt['http.query']('https://api.datadoghq.com/api/v1/synthetics/tests', header_list=['DD-API-KEY: ' + api_key, 'DD-APPLICATION-KEY: ' + app_key], decode=True) %}
{% set monitor_name = minion_id + ' HTTP Health' %}
{% set monitor_exists = existing_monitors.get('dict', {}).get('tests', []) | selectattr('name', 'equalto', monitor_name) | list | length > 0 %}

#notable this fails to capture multi-host minions (bugs has bugs.python, bugs.jython, bugs.roundup and
# codespeed has speed.python and speed.pypy)
{% set web_roles = ['loadbalancer', 'docs', 'downloads', 'hg', 'moin', 'planet', 'bugs', 'buildbot', 'codespeed', 'pythontest'] %}
{% set matched_roles = [] %}
{% for role in web_roles %}
{% if salt["match.compound"](pillar["roles"][role]["pattern"]) %}
{% set _ = matched_roles.append(role) %}
{% endif %}
{% endfor %}
{% set is_web_minion = matched_roles|length > 0 %}
{% set is_loadbalancer = 'loadbalancer' in matched_roles %}

# hit the haproxy status page for loadbalancers or the root for other web minions
# web minions also haev _haproxy_status endpoint but im not sure if that needs to be checked?
{% set health_url = "https://" + fqdn + ("/_haproxy_status" if is_loadbalancer else "/") %}

{% if is_web_minion and api_key and app_key and not monitor_exists %}
create-synthetics-monitor-{{ minion_id }}:
http.query:
- name: https://api.datadoghq.com/api/v1/synthetics/tests
- method: POST
- header_list:
- "DD-API-KEY: {{ api_key }}"
- "DD-APPLICATION-KEY: {{ app_key }}"
- "Content-Type: application/json"
- data: |
{
"name": "{{ minion_id }} HTTP Health",
"type": "api",
"subtype": "http",
"config": {
"request": {
"url": "{{ health_url }}",
"method": "GET",
"timeout": 30
},
"assertions": [
{
"type": "statusCode",
"operator": "is",
"target": 200
},
{
"type": "responseTime",
"operator": "lessThan",
"target": 2000
}
]
},
"locations": {{ datadog_locations.get('dict', {}).get('locations', []) | map(attribute='id') | list | tojson }},
"options": {
"tick_every": 60,
"min_failure_duration": 180,
"min_location_failed": 5,
"retry": {
"count": 1,
"interval": 300
}
},
"message": "{{ minion_id }} is down in 5 or more locations! @pagerduty-Datadog",
"tags": [
"minion_id:{{ minion_id }}",
"auto_created:salt.synthetics.sls"
]
}
- status: 200
{% endif %}
2 changes: 2 additions & 0 deletions salt/top.sls
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ base:
- tls
- rsyslog
- datadog
- datadog.synthetics
- secrets.datadog
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the right place for this? it felt wrong but it needed them to be available

- base.motd
- base.swap

Expand Down