Skip to content

Commit db2efd3

Browse files
committed
code review feedback
1 parent 6299dd2 commit db2efd3

File tree

1 file changed

+14
-9
lines changed

1 file changed

+14
-9
lines changed

docs/internal/DistributedArchitectureGuide.md

Lines changed: 14 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1333,6 +1333,8 @@ Relevant classes:
13331333
[EmailService]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/notification/email/EmailService.java
13341334
[WebhookAction]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/actions/webhook/WebhookAction.java
13351335
[WebhookService]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/notification/WebhookService.java
1336+
[Actions Package]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/actions
1337+
[ReportingAttachment]: https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/watcher/src/main/java/org/elasticsearch/xpack/watcher/notification/email/attachment/ReportingAttachment.java
13361338

13371339
Watcher lets you set a schedule to run a query, and if a condition is met it executes an action.
13381340
As an example, the following performs a search every 10 minutes. If the number of hits found is greater than 0 then it logs an error message.
@@ -1369,14 +1371,15 @@ PUT _watcher/watch/log_error_watch
13691371
## How Watcher Works
13701372

13711373
- We have an API to define a “watch”, which includes the schedule, the query, the condition, and the action
1372-
- Watch definitions are kept in the .watches index
1373-
- Information about currently running watches is in the .triggered_watches index
1374-
- History is written to the .watcher_history index
1375-
- Watcher ([WatcherLifeCycleService]) runs on all nodes, but only executes watches on a node that has a copy of the shard that the particular watch is in (see [WatcherService])
1374+
- Watch definitions are kept in the `.watches` index
1375+
- Information about currently running watches is in the `.triggered_watches` index
1376+
- History is written to the `.watcher_history` index
1377+
- Watcher ([WatcherLifeCycleService]) runs on all nodes, but only executes watches on a node that has a copy of the `.watches` shard that the particular watch is in (see [WatcherService])
13761378
- Uses a hash to choose the node if there is more than one shard
13771379
- Example common use cases:
1378-
- Periodically send data to a 3rd party system to generate a report
1380+
- Periodically send data to a 3rd party system
13791381
- Email users with alerts if certain conditions appear in log files
1382+
- Periodically generate a report using Kibana, and email that report as an attachment. This is supported by declaring a [ReportingAttachment] [ReportingAttachment] to the [EmailAction] [EmailAction] in the watch definition.
13801383

13811384
## Relevant classes:
13821385

@@ -1387,13 +1390,15 @@ PUT _watcher/watch/log_error_watch
13871390
- [TickerScheduleTriggerEngine] [TickerScheduleTriggerEngine] – handles the periodic (non-cron) schedules that we see the most
13881391
- [EmailAction] [EmailAction] / [EmailService] [EmailService] – emails to third-party email server
13891392
- [WebhookAction] [WebhookAction] / [WebhookService] [WebhookService] – sends requests to external endpoints
1393+
- [Various other actions] [Actions Package] (for example posting to Slack, Jira, etc.)
13901394

13911395
## Debugging
13921396

1393-
- The most useful debugging information is in the Elasticsearch logs and the .watcher_history index
1394-
- It is often useful to get the contents of the .watches index
1397+
- The most useful debugging information is in the Elasticsearch logs and the `.watcher_history` index
1398+
- It is often useful to get the contents of the `.watches` index
13951399
- Frequent sources of problems:
13961400
- There is no guarantee that an interval schedule watch will run at exactly the requested interval after the last run
1397-
- The counter for the interval schedule restarts if the shard moves. For example, if the interval is once every 12 hours, and the shard moves 10 hours into that interval, it will be at least 12 more hours until it runs.
1398-
- Calls to remote systems ([EmailAction] and [WebhookAction]) are a frequent source of failures. Watcher sends the request but doesn't know what happens after that. If you see that the call was succesful in .watcher_history, the best way to continue the investigation is in the logs of the remote system.
1401+
- In older versions (before 8.17), the counter for the interval schedule restarts if the shard moves. For example, if the interval is once every 12 hours, and the shard moves 10 hours into that interval, it will be at least 12 more hours until it runs.
1402+
- Calls to remote systems ([EmailAction] and [WebhookAction]) are a frequent source of failures. Watcher sends the request but doesn't know what happens after that. If you see that the call was successful in `.watcher_history`, the best way to continue the investigation is in the logs of the remote system.
1403+
- Even if watcher fails during a call to a remote system, the error is likely to be outside of watcher (e.g. network problems). Check the error message in `.watcher_history`.
13991404

0 commit comments

Comments
 (0)