You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Watcher lets you set a schedule to run a query, and if a condition is met it executes an action.
1338
1340
As an example, the following performs a search every 10 minutes. If the number of hits found is greater than 0 then it logs an error message.
@@ -1369,14 +1371,15 @@ PUT _watcher/watch/log_error_watch
1369
1371
## How Watcher Works
1370
1372
1371
1373
- We have an API to define a “watch”, which includes the schedule, the query, the condition, and the action
1372
-
- Watch definitions are kept in the .watches index
1373
-
- Information about currently running watches is in the .triggered_watches index
1374
-
- History is written to the .watcher_history index
1375
-
- Watcher ([WatcherLifeCycleService]) runs on all nodes, but only executes watches on a node that has a copy of the shard that the particular watch is in (see [WatcherService])
1374
+
- Watch definitions are kept in the `.watches` index
1375
+
- Information about currently running watches is in the `.triggered_watches` index
1376
+
- History is written to the `.watcher_history` index
1377
+
- Watcher ([WatcherLifeCycleService]) runs on all nodes, but only executes watches on a node that has a copy of the `.watches`shard that the particular watch is in (see [WatcherService])
1376
1378
- Uses a hash to choose the node if there is more than one shard
1377
1379
- Example common use cases:
1378
-
- Periodically send data to a 3rd party system to generate a report
1380
+
- Periodically send data to a 3rd party system
1379
1381
- Email users with alerts if certain conditions appear in log files
1382
+
- Periodically generate a report using Kibana, and email that report as an attachment. This is supported by declaring a [ReportingAttachment][ReportingAttachment] to the [EmailAction][EmailAction] in the watch definition.
1380
1383
1381
1384
## Relevant classes:
1382
1385
@@ -1387,13 +1390,15 @@ PUT _watcher/watch/log_error_watch
1387
1390
-[TickerScheduleTriggerEngine][TickerScheduleTriggerEngine] – handles the periodic (non-cron) schedules that we see the most
1388
1391
-[EmailAction][EmailAction] / [EmailService][EmailService] – emails to third-party email server
1389
1392
-[WebhookAction][WebhookAction] / [WebhookService][WebhookService] – sends requests to external endpoints
1393
+
-[Various other actions][Actions Package] (for example posting to Slack, Jira, etc.)
1390
1394
1391
1395
## Debugging
1392
1396
1393
-
- The most useful debugging information is in the Elasticsearch logs and the .watcher_history index
1394
-
- It is often useful to get the contents of the .watches index
1397
+
- The most useful debugging information is in the Elasticsearch logs and the `.watcher_history` index
1398
+
- It is often useful to get the contents of the `.watches` index
1395
1399
- Frequent sources of problems:
1396
1400
- There is no guarantee that an interval schedule watch will run at exactly the requested interval after the last run
1397
-
- The counter for the interval schedule restarts if the shard moves. For example, if the interval is once every 12 hours, and the shard moves 10 hours into that interval, it will be at least 12 more hours until it runs.
1398
-
- Calls to remote systems ([EmailAction] and [WebhookAction]) are a frequent source of failures. Watcher sends the request but doesn't know what happens after that. If you see that the call was succesful in .watcher_history, the best way to continue the investigation is in the logs of the remote system.
1401
+
- In older versions (before 8.17), the counter for the interval schedule restarts if the shard moves. For example, if the interval is once every 12 hours, and the shard moves 10 hours into that interval, it will be at least 12 more hours until it runs.
1402
+
- Calls to remote systems ([EmailAction] and [WebhookAction]) are a frequent source of failures. Watcher sends the request but doesn't know what happens after that. If you see that the call was successful in `.watcher_history`, the best way to continue the investigation is in the logs of the remote system.
1403
+
- Even if watcher fails during a call to a remote system, the error is likely to be outside of watcher (e.g. network problems). Check the error message in `.watcher_history`.
0 commit comments