Skip to content

Commit 387ac4c

Browse files
authored
Adding notes about watcher to the Distributed Team Architecture Guide (elastic#141685)
1 parent 1ff721e commit 387ac4c

File tree

1 file changed

+81
-0
lines changed

1 file changed

+81
-0
lines changed

docs/internal/DistributedArchitectureGuide.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1321,3 +1321,84 @@ Relevant classes:
13211321
[StableMasterHealthIndicatorService]: https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/cluster/coordination/StableMasterHealthIndicatorService.java
13221322
[HealthMetadata]: https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/health/metadata/HealthMetadata.java
13231323
[HealthMetadataService]: https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/health/metadata/HealthMetadataService.java
1324+
1325+
# Watcher
1326+
1327+
[Watcher]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/Watcher.java
1328+
[WatcherLifeCycleService]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/WatcherLifeCycleService.java
1329+
[ExecutionService]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/execution/ExecutionService.java
1330+
[WatcherService]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/WatcherService.java
1331+
[TickerScheduleTriggerEngine]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/trigger/schedule/engine/TickerScheduleTriggerEngine.java
1332+
[EmailAction]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/actions/email/EmailAction.java
1333+
[EmailService]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/notification/email/EmailService.java
1334+
[WebhookAction]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/actions/webhook/WebhookAction.java
1335+
[WebhookService]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/notification/WebhookService.java
1336+
[Actions Package]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/actions
1337+
[ReportingAttachment]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/notification/email/attachment/ReportingAttachment.java
1338+
1339+
Watcher lets you set a schedule to run a query, and if a condition is met it executes an action.
1340+
As an example, the following performs a search every 10 minutes. If the number of hits found is greater than 0 then it logs an error message.
1341+
```console
1342+
PUT _watcher/watch/log_error_watch
1343+
{
1344+
"trigger" : { "schedule" : { "interval" : "10m" }},
1345+
"input" : {
1346+
"search" : {
1347+
"request" : {
1348+
"indices" : [ "logs" ],
1349+
"body" : {
1350+
"query" : {
1351+
"match" : { "message": "error" }
1352+
}
1353+
}
1354+
}
1355+
}
1356+
},
1357+
"condition" : {
1358+
"compare" : { "ctx.payload.hits.total" : { "gt" : 0 }}
1359+
},
1360+
"actions" : {
1361+
"log_error" : {
1362+
"logging" : {
1363+
"text" : "Found {{ctx.payload.hits.total}} errors in the logs"
1364+
}
1365+
}
1366+
}
1367+
}
1368+
1369+
```
1370+
1371+
## How Watcher Works
1372+
1373+
- We have an API to define a “watch”, which includes the schedule, the query, the condition, and the action
1374+
- Watch definitions are kept in the `.watches` index
1375+
- Information about currently running watches is in the `.triggered_watches` index
1376+
- History is written to the `.watcher_history` index
1377+
- Watcher ([WatcherLifeCycleService]) runs on all nodes, but only executes watches on a node that has a copy of the `.watches` shard that the particular watch is in (see [WatcherService])
1378+
- Uses a hash to choose the node if there is more than one shard
1379+
- Example common use cases:
1380+
- Periodically send data to a 3rd party system
1381+
- Email users with alerts if certain conditions appear in log files
1382+
- Periodically generate a report using Kibana, and email that report as an attachment. This is supported by declaring a [ReportingAttachment] [ReportingAttachment] to the [EmailAction] [EmailAction] in the watch definition.
1383+
1384+
## Relevant classes:
1385+
1386+
- [Watcher] [Watcher] – the plugin class
1387+
- [WatcherLifeCycleService] [WatcherLifeCycleService] – created by the Watch plugin on each node
1388+
- [WatcherService] [WatcherService] – decides which watches this node ought to run
1389+
- [ExecutionService] [ExecutionService] – executes watches
1390+
- [TickerScheduleTriggerEngine] [TickerScheduleTriggerEngine] – handles the periodic (non-cron) schedules that we see the most
1391+
- [EmailAction] [EmailAction] / [EmailService] [EmailService] – emails to third-party email server
1392+
- [WebhookAction] [WebhookAction] / [WebhookService] [WebhookService] – sends requests to external endpoints
1393+
- [Various other actions] [Actions Package] (for example posting to Slack, Jira, etc.)
1394+
1395+
## Debugging
1396+
1397+
- The most useful debugging information is in the Elasticsearch logs and the `.watcher_history` index
1398+
- It is often useful to get the contents of the `.watches` index
1399+
- Frequent sources of problems:
1400+
- There is no guarantee that an interval schedule watch will run at exactly the requested interval after the last run
1401+
- In older versions (before 8.17), the counter for the interval schedule restarts if the shard moves. For example, if the interval is once every 12 hours, and the shard moves 10 hours into that interval, it will be at least 12 more hours until it runs.
1402+
- Calls to remote systems ([EmailAction] and [WebhookAction]) are a frequent source of failures. Watcher sends the request but doesn't know what happens after that. If you see that the call was successful in `.watcher_history`, the best way to continue the investigation is in the logs of the remote system.
1403+
- Even if watcher fails during a call to a remote system, the error is likely to be outside of watcher (e.g. network problems). Check the error message in `.watcher_history`.
1404+

0 commit comments

Comments
 (0)