You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"text" : "Found {{ctx.payload.hits.total}} errors in the logs"
1364
+
}
1365
+
}
1366
+
}
1367
+
}
1368
+
1369
+
```
1370
+
1371
+
## How Watcher Works
1372
+
1373
+
- We have an API to define a “watch”, which includes the schedule, the query, the condition, and the action
1374
+
- Watch definitions are kept in the `.watches` index
1375
+
- Information about currently running watches is in the `.triggered_watches` index
1376
+
- History is written to the `.watcher_history` index
1377
+
- Watcher ([WatcherLifeCycleService]) runs on all nodes, but only executes watches on a node that has a copy of the `.watches` shard that the particular watch is in (see [WatcherService])
1378
+
- Uses a hash to choose the node if there is more than one shard
1379
+
- Example common use cases:
1380
+
- Periodically send data to a 3rd party system
1381
+
- Email users with alerts if certain conditions appear in log files
1382
+
- Periodically generate a report using Kibana, and email that report as an attachment. This is supported by declaring a [ReportingAttachment][ReportingAttachment] to the [EmailAction][EmailAction] in the watch definition.
1383
+
1384
+
## Relevant classes:
1385
+
1386
+
-[Watcher][Watcher] – the plugin class
1387
+
-[WatcherLifeCycleService][WatcherLifeCycleService] – created by the Watch plugin on each node
1388
+
-[WatcherService][WatcherService] – decides which watches this node ought to run
-[TickerScheduleTriggerEngine][TickerScheduleTriggerEngine] – handles the periodic (non-cron) schedules that we see the most
1391
+
-[EmailAction][EmailAction] / [EmailService][EmailService] – emails to third-party email server
1392
+
-[WebhookAction][WebhookAction] / [WebhookService][WebhookService] – sends requests to external endpoints
1393
+
-[Various other actions][Actions Package] (for example posting to Slack, Jira, etc.)
1394
+
1395
+
## Debugging
1396
+
1397
+
- The most useful debugging information is in the Elasticsearch logs and the `.watcher_history` index
1398
+
- It is often useful to get the contents of the `.watches` index
1399
+
- Frequent sources of problems:
1400
+
- There is no guarantee that an interval schedule watch will run at exactly the requested interval after the last run
1401
+
- In older versions (before 8.17), the counter for the interval schedule restarts if the shard moves. For example, if the interval is once every 12 hours, and the shard moves 10 hours into that interval, it will be at least 12 more hours until it runs.
1402
+
- Calls to remote systems ([EmailAction] and [WebhookAction]) are a frequent source of failures. Watcher sends the request but doesn't know what happens after that. If you see that the call was successful in `.watcher_history`, the best way to continue the investigation is in the logs of the remote system.
1403
+
- Even if watcher fails during a call to a remote system, the error is likely to be outside of watcher (e.g. network problems). Check the error message in `.watcher_history`.
0 commit comments