diff --git a/deploy-manage/distributed-architecture/kibana-tasks-management.md b/deploy-manage/distributed-architecture/kibana-tasks-management.md index 6626bca592..fa96ab546c 100644 --- a/deploy-manage/distributed-architecture/kibana-tasks-management.md +++ b/deploy-manage/distributed-architecture/kibana-tasks-management.md @@ -11,9 +11,9 @@ products: {{kib}} Task Manager is used by features such as Alerting, Actions, and Reporting to run mission critical work as persistent background tasks. These background tasks distribute work across multiple {{kib}} instances. This has three major benefits: -* **Persistence**: All task state and scheduling is stored in {{es}}, so if you restart {{kib}}, tasks will pick up where they left off. -* **Scaling**: Multiple {{kib}} instances can read from and update the same task queue in {{es}}, allowing the work load to be distributed across instances. If a {{kib}} instance no longer has capacity to run tasks, you can increase capacity by adding additional {{kib}} instances. For more information on scaling, see [{{kib}} task manager scaling considerations](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). -* **Load Balancing**: Task Manager is equipped with a reactive self-healing mechanism, which allows it to reduce the amount of work it executes in reaction to an increased load related error rate in {{es}}. Additionally, when Task Manager experiences an increase in recurring tasks, it attempts to space out the work to better balance the load. +- **Persistence**: All task state and scheduling is stored in {{es}}, so if you restart {{kib}}, tasks will pick up where they left off. +- **Scaling**: Multiple {{kib}} instances can read from and update the same task queue in {{es}}, allowing the work load to be distributed across instances. If a {{kib}} instance no longer has capacity to run tasks, you can increase capacity by adding additional {{kib}} instances. For more information on scaling, see [{{kib}} task manager scaling considerations](../../deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). +- **Load Balancing**: Task Manager is equipped with a reactive self-healing mechanism, which allows it to reduce the amount of work it executes in reaction to an increased load related error rate in {{es}}. Additionally, when Task Manager experiences an increase in recurring tasks, it attempts to space out the work to better balance the load. ::::{important} Task definitions for alerts and actions are stored in the index called `.kibana_task_manager`. @@ -28,16 +28,18 @@ If you lose this index, all scheduled alerts and actions are lost. {{kib}} background tasks are managed as follows: -* An {{es}} task index is polled for overdue tasks at 3-second intervals. You can change this interval using the [`xpack.task_manager.poll_interval`](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting. -* Tasks are claimed by updating them in the {{es}} index, using optimistic concurrency control to prevent conflicts. Each {{kib}} instance can run a maximum of 10 concurrent tasks, so a maximum of 10 tasks are claimed each interval. -* {{es}} and {{kib}} instances use the system clock to determine the current time. To ensure schedules are triggered when expected, synchronize the clocks of all nodes in the cluster using a time service such as [Network Time Protocol](http://www.ntp.org/). -* Tasks are run on the {{kib}} server. -* Task Manager ensures that tasks: - * Are only executed once - * Are retried when they fail (if configured to do so) - * Are rescheduled to run again at a future point in time (if configured to do so) -::::{important} -It is possible for tasks to run late or at an inconsistent schedule. +- An {{es}} task index is polled for overdue tasks at 500-millisecond intervals. You can change this interval using the [`xpack.task_manager.poll_interval`](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting. +- Tasks are claimed by updating them in the {{es}} index, using optimistic concurrency control to prevent conflicts. Each {{kib}} instance can run a maximum of 10 concurrent tasks, so a maximum of 10 tasks are claimed each interval. +- {{es}} and {{kib}} instances use the system clock to determine the current time. To ensure schedules are triggered when expected, synchronize the clocks of all nodes in the cluster using a time service such as [Network Time Protocol](http://www.ntp.org/). +- Tasks are run on the {{kib}} server.
+ It is recommended to use an isolated node for the background task. + You can achieve that by setting `node.roles` to `background_tasks` for on-prem or by scaling Kibana to 8G+ in ECH. +- Task Manager ensures that tasks: + - Are only executed once + - Are retried when they fail (if configured to do so) + - Are rescheduled to run again at a future point in time (if configured to do so) + ::::{important} + It is possible for tasks to run late or at an inconsistent schedule. This is usually a symptom of the specific usage or scaling strategy of the cluster in question. @@ -48,6 +50,3 @@ For details on the settings that can influence the performance and throughput of For detailed troubleshooting guidance, see [Troubleshooting](../../troubleshoot/kibana/task-manager.md). :::: - - - diff --git a/deploy-manage/production-guidance/kibana-alerting-production-considerations.md b/deploy-manage/production-guidance/kibana-alerting-production-considerations.md index dcb5c5b511..434c625887 100644 --- a/deploy-manage/production-guidance/kibana-alerting-production-considerations.md +++ b/deploy-manage/production-guidance/kibana-alerting-production-considerations.md @@ -12,21 +12,17 @@ products: - id: kibana --- - - # {{kib}} alerting: performance and scaling [alerting-production-considerations] - Alerting runs both rule checks and actions as persistent background tasks managed by the Task Manager. When relying on rules and actions as mission critical services, make sure you follow the [production considerations](kibana-task-manager-scaling-considerations.md) for Task Manager. - ## Running background rule checks and actions [alerting-background-tasks] {{kib}} uses background tasks to run rules and actions, distributed across all {{kib}} instances in the cluster. -By default, each {{kib}} instance polls for work at three second intervals, and can run a maximum of ten concurrent tasks. These tasks are then run on the {{kib}} server. +By default, each {{kib}} instance polls for work at 500-millisecond intervals, and can run a maximum of ten concurrent tasks. These tasks are then run on the {{kib}} server. Rules are recurring background tasks which are rescheduled according to the check interval on completion. Actions are non-recurring background tasks which are deleted on completion. @@ -41,45 +37,38 @@ For detailed guidance, see [Alerting Troubleshooting](../../explore-analyze/aler :::: - - ## Scaling guidance [alerting-scaling-guidance] As rules and actions leverage background tasks to perform the majority of work, scaling Alerting is possible by following the [Task Manager Scaling Guidance](kibana-task-manager-scaling-considerations.md#task-manager-scaling-guidance). When estimating the required task throughput, keep the following in mind: -* Each rule uses a single recurring task that is scheduled to run at the cadence defined by its check interval. -* Each action uses a single task. However, because actions are taken per instance, alerts can generate a large number of non-recurring tasks. +- Each rule uses a single recurring task that is scheduled to run at the cadence defined by its check interval. +- Each action uses a single task. However, because actions are taken per instance, alerts can generate a large number of non-recurring tasks. -It is difficult to predict how much throughput is needed to ensure all rules and actions are executed at consistent schedules. By counting rules as recurring tasks and actions as non-recurring tasks, a rough throughput [can be estimated](kibana-task-manager-scaling-considerations.md#task-manager-rough-throughput-estimation) as a *tasks per minute* measurement. +It is difficult to predict how much throughput is needed to ensure all rules and actions are executed at consistent schedules. By counting rules as recurring tasks and actions as non-recurring tasks, a rough throughput [can be estimated](kibana-task-manager-scaling-considerations.md#task-manager-rough-throughput-estimation) as a _tasks per minute_ measurement. Predicting the buffer required to account for actions depends heavily on the rule types you use, the amount of alerts they might detect, and the number of actions you might choose to assign to action groups. With that in mind, regularly [monitor the health](../monitor/kibana-task-manager-health-monitoring.md) of your Task Manager instances. - ## Event log index lifecycle management [event-log-ilm] ::::{warning} This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. :::: - -Alerts and actions log activity in a set of "event log" data streams, one per {{kib}} version, named `.kibana-event-log-{{VERSION}}`. These data streams are configured with a lifecycle data retention of 90 days. This can be updated to other values via the standard data stream lifecycle APIs. Note that the event log data contains the data shown in the alerting pages in {{kib}}, so reducing the data retention period will result in less data being available to view. +Alerts and actions log activity in a set of "event log" data streams, one per {{kib}} version, named `.kibana-event-log-{{VERSION}}`. These data streams are configured with a lifecycle data retention of 90 days. This can be updated to other values via the standard data stream lifecycle APIs. Note that the event log data contains the data shown in the alerting pages in {{kib}}, so reducing the data retention period will result in less data being available to view. For more information on data stream lifecycle management, see: [Data stream lifecycle](../../manage-data/lifecycle/data-stream.md). - ## Circuit breakers [alerting-circuit-breakers] There are several scenarios where running alerting rules and actions can start to negatively impact the overall health of a {{kib}} instance either by clogging up Task Manager throughput or by consuming so much CPU/memory that other operations cannot complete in a reasonable amount of time. There are several [configurable](kibana://reference/configuration-reference/alerting-settings.md#alert-settings) circuit breakers to help minimize these effects. - ### Rules with very short intervals [_rules_with_very_short_intervals] Running large numbers of rules at very short intervals can quickly clog up Task Manager throughput, leading to higher schedule drift. Use `xpack.alerting.rules.minimumScheduleInterval.value` to set a minimum schedule interval for rules. The default (and recommended) value for this configuration is `1m`. Use `xpack.alerting.rules.minimumScheduleInterval.enforce` to specify whether to strictly enforce this minimum. While the default value for this setting is `false` to maintain backwards compatibility with existing rules, set this to `true` to prevent new and updated rules from running at an interval below the minimum. -Another related setting is `xpack.alerting.rules.maxScheduledPerMinute`, which limits the number of rules that can run per minute. For example if it’s set to `400`, you can have 400 rules with one minute check intervals or 2,000 rules with 5 minute check intervals. You cannot create or edit a rule if its check interval would cause this setting to be exceeded. To stay within this limit, delete or disable some rules or update the check intervals so that your rules run less frequently. - +Another related setting is `xpack.alerting.rules.maxScheduledPerMinute`, which limits the number of rules that can run per minute. For example if it’s set to `400`, you can have 400 rules with one minute check intervals or 2,000 rules with 5 minute check intervals. You cannot create or edit a rule if its check interval would cause this setting to be exceeded. To stay within this limit, delete or disable some rules or update the check intervals so that your rules run less frequently. Default value of this config is 32,000, you should change this if you want to run more than 32,000 rules per minute. ### Rules that run for a long time [_rules_that_run_for_a_long_time] @@ -87,21 +76,20 @@ Rules that run for a long time typically do so because they are issuing resource ```yaml xpack.alerting.rules.run: - timeout: '1m' + timeout: "1m" ruleTypeOverrides: - - id: '.index-threshold' - timeout: '10m' + - id: ".index-threshold" + timeout: "10m" ``` When a rule run is cancelled, any alerts and actions that were generated during the run are discarded. This behavior is controlled by the `xpack.alerting.cancelAlertsOnRuleTimeout` configuration, which defaults to `true`. Set this to `false` to receive alerts and actions after the timeout, although be aware that these may be incomplete and possibly inaccurate. - ### Rules that spawn too many actions [_rules_that_spawn_too_many_actions] Rules that spawn too many actions can quickly clog up Task Manager throughput. This can occur if: -* A rule configured with a single action generates many alerts. For example, if a rule configured to run a single email action generates 100,000 alerts, then 100,000 actions will be scheduled during a run. -* A rule configured with multiple actions generates alerts. For example, if a rule configured to run an email action, a server log action and a webhook action generates 30,000 alerts, then 90,000 actions will be scheduled during a run. +- A rule configured with a single action generates many alerts. For example, if a rule configured to run a single email action generates 100,000 alerts, then 100,000 actions will be scheduled during a run. +- A rule configured with multiple actions generates alerts. For example, if a rule configured to run an email action, a server log action and a webhook action generates 30,000 alerts, then 90,000 actions will be scheduled during a run. Use `xpack.alerting.rules.run.actions.max` to limit the maximum number of actions a rule can generate per run. This value can also be configured by connector type using `xpack.alerting.rules.run.actions.connectorTypeOverrides`. For example, the following config sets the global maximum number of actions to 100 while allowing rules with **Email** actions to generate up to 200 actions. @@ -110,7 +98,6 @@ xpack.alerting.rules.run: actions: max: 100 connectorTypeOverrides: - - id: '.email' + - id: ".email" max: 200 ``` - diff --git a/deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md b/deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md index 64f04f6403..e2ea9e5b4d 100644 --- a/deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md +++ b/deploy-manage/production-guidance/kibana-task-manager-scaling-considerations.md @@ -16,9 +16,9 @@ products: {{kib}} Task Manager is leveraged by features such as [alerting](/explore-analyze/alerts-cases/alerts.md), [actions](/explore-analyze/alerts-cases/alerts.md#rules-actions), and [reporting](/explore-analyze/report-and-share.md) to run mission critical work as persistent background tasks. These background tasks distribute work across multiple {{kib}} instances. This has three major benefits: -* **Persistence**: All task state and scheduling is stored in {{es}}, so if you restart {{kib}}, tasks will pick up where they left off. -* **Scaling**: Multiple {{kib}} instances can read from and update the same task queue in {{es}}, allowing the work load to be distributed across instances. If a {{kib}} instance no longer has capacity to run tasks, you can increase capacity by adding additional {{kib}} instances. -* **Load Balancing**: Task Manager is equipped with a reactive self-healing mechanism, which allows it to reduce the amount of work it executes in reaction to an increased load related error rate in {{es}}. Additionally, when Task Manager experiences an increase in recurring tasks, it attempts to space out the work to better balance the load. +- **Persistence**: All task state and scheduling is stored in {{es}}, so if you restart {{kib}}, tasks will pick up where they left off. +- **Scaling**: Multiple {{kib}} instances can read from and update the same task queue in {{es}}, allowing the work load to be distributed across instances. If a {{kib}} instance no longer has capacity to run tasks, you can increase capacity by adding additional {{kib}} instances. +- **Load Balancing**: Task Manager is equipped with a reactive self-healing mechanism, which allows it to reduce the amount of work it executes in reaction to an increased load related error rate in {{es}}. Additionally, when Task Manager experiences an increase in recurring tasks, it attempts to space out the work to better balance the load. ::::{important} Task definitions for alerts and actions are stored in the index called `.kibana_task_manager`. @@ -28,20 +28,18 @@ You must have at least one replica of this index for production deployments. If you lose this index, all scheduled alerts and actions are lost. :::: - ## Running background tasks [task-manager-background-tasks] {{kib}} background tasks are managed as follows: -* An {{es}} task index is polled for overdue tasks at 3-second intervals. You can change this interval using the [`xpack.task_manager.poll_interval`](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting. -* Tasks are claimed by updating them in the {{es}} index, using optimistic concurrency control to prevent conflicts. Each {{kib}} instance can run a maximum of 10 concurrent tasks, so a maximum of 10 tasks are claimed each interval. -* Tasks are run on the {{kib}} server. -* Task Manager ensures that tasks: - - * Are only executed once - * Are retried when they fail (if configured to do so) - * Are rescheduled to run again at a future point in time (if configured to do so) +- An {{es}} task index is polled for overdue tasks at 500-millisecond intervals. You can change this interval using the [`xpack.task_manager.poll_interval`](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting. +- Tasks are claimed by updating them in the {{es}} index, using optimistic concurrency control to prevent conflicts. Each {{kib}} instance can run a maximum of 10 concurrent tasks, so a maximum of 10 tasks are claimed each interval. +- Tasks are run on the {{kib}} server. +- Task Manager ensures that tasks: + - Are only executed once + - Are retried when they fail (if configured to do so) + - Are rescheduled to run again at a future point in time (if configured to do so) ::::{important} It is possible for tasks to run late or at an inconsistent schedule. @@ -56,36 +54,34 @@ For detailed troubleshooting guidance, see [Troubleshooting](../../troubleshoot/ :::: - - ## Deployment considerations [_deployment_considerations] {{es}} and {{kib}} instances use the system clock to determine the current time. To ensure schedules are triggered when expected, synchronize the clocks of all nodes in the cluster using a time service such as [Network Time Protocol](http://www.ntp.org/). - ## Scaling guidance [task-manager-scaling-guidance] How you deploy {{kib}} largely depends on your use case. Predicting the throughput a deployment requires to support Task Management is difficult because features can schedule an unpredictable number of tasks at a variety of scheduled cadences. However, there is a relatively straight forward method you can follow to produce a rough estimate based on your expected usage. - ### Default scale [task-manager-default-scaling] -By default, {{kib}} polls for tasks at a rate of 10 tasks every 3 seconds. This means that you can expect a single {{kib}} instance to support up to 200 *tasks per minute* (`200/tpm`). +By default, {{kib}} polls for tasks at a rate of 10 tasks every 500 milliseconds. This means that you can expect a single {{kib}} instance to support up to 1200 _tasks per minute_ (`1200/tpm`). + +- As of v8.15 a new task claim strategy was introduced: `mget`. And as of v8.18, `mget` has been made the default strategy -as part of some performance improvement efforts-, with a default polling interval of 500 milliseconds. Since these changes offers a better task execution performance, It is highly recommended you to upgrade to v8.18. + +- Maximum number of concurrent tasks can be changed by using `xpack.task_manager.capacity`, The default value is 10, the minimum and maximum values ​​are 5 and 50 respectively. -In practice, a {{kib}} instance will only achieve the upper bound of `200/tpm` if the duration of task execution is below the polling rate of 3 seconds. For the most part, the duration of tasks is below that threshold, but it can vary greatly as {{es}} and {{kib}} usage grow and task complexity increases (such as alerts executing heavy queries across large datasets). +In practice, a {{kib}} instance will only achieve the upper bound of `1200/tpm` if the duration of task execution is below the polling rate of 500 milliseconds. But for the most part, the duration of tasks is above that threshold, it can vary greatly as {{es}} and {{kib}} usage grow and task complexity increases (such as alerts executing heavy queries across large datasets). Therefore you should find your the average execution time of your tasks to estimate the number of {{kib}} instances you need. By [estimating a rough throughput requirement](#task-manager-rough-throughput-estimation), you can estimate the number of {{kib}} instances required to reliably execute tasks in a timely manner. An appropriate number of {{kib}} instances can be estimated to match the required scale. For details on monitoring the health of {{kib}} Task Manager, follow the guidance in [Health monitoring](../monitor/kibana-task-manager-health-monitoring.md). - ### Scaling horizontally [task-manager-scaling-horizontally] At times, the sustainable approach might be to expand the throughput of your cluster by provisioning additional {{kib}} instances. By default, each additional {{kib}} instance will add an additional 10 tasks that your cluster can run concurrently, but you can also scale each {{kib}} instance vertically, if your diagnosis indicates that they can handle the additional workload. - ### Scaling vertically [task-manager-scaling-vertically] Other times it, might be preferable to increase the throughput of individual {{kib}} instances. @@ -94,7 +90,6 @@ Tweak the capacity with the [`xpack.task_manager.capacity`](kibana://reference/c Tweak the poll interval with the [`xpack.task_manager.poll_interval`](kibana://reference/configuration-reference/task-manager-settings.md#task-manager-settings) setting, which enables each {{kib}} instance to pull scheduled tasks at a higher rate. This setting can impact the performance of the {{es}} cluster as the workload will be higher. - ### Choosing a scaling strategy [task-manager-choosing-scaling-strategy] Each scaling strategy comes with its own considerations, and the appropriate strategy largely depends on your use case. @@ -112,15 +107,13 @@ Task Manager, like the rest of the {{stack}}, is designed to scale horizontally. Scaling horizontally requires a higher degree of coordination between {{kib}} instances. One way Task Manager coordinates with other instances is by delaying its polling schedule to avoid conflicts with other instances. By using [health monitoring](../monitor/kibana-task-manager-health-monitoring.md) to evaluate the [date of the `last_polling_delay`](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime) across a deployment, you can estimate the frequency at which Task Manager resets its delay mechanism. A higher frequency suggests {{kib}} instances conflict at a high rate, which you can address by scaling vertically rather than horizontally, reducing the required coordination. - ### Rough throughput estimation [task-manager-rough-throughput-estimation] Predicting the required throughput a deployment might need to support Task Management is difficult, as features can schedule an unpredictable number of tasks at a variety of scheduled cadences. However, a rough lower bound can be estimated, which is then used as a guide. Throughput is best thought of as a measurements in tasks per minute. -A default {{kib}} instance can support up to `200/tpm`. - +A default {{kib}} instance can support up to `1200/tpm`. #### Automatic estimation [_automatic_estimation] @@ -128,7 +121,6 @@ A default {{kib}} instance can support up to `200/tpm`. This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features. :::: - As demonstrated in [Evaluate your capacity estimation](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-capacity-estimation), the Task Manager [health monitoring](../monitor/kibana-task-manager-health-monitoring.md) performs these estimations automatically. These estimates are based on historical data and should not be used as predictions, but can be used as a rough guide when scaling the system. @@ -146,29 +138,24 @@ When evaluating the proposed {{kib}} instance number under `proposed.provisioned :::: - - #### Manual estimation [_manual_estimation] -By [evaluating the workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload), you can make a rough estimate as to the required throughput as a *tasks per minute* measurement. +By [evaluating the workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload), you can make a rough estimate as to the required throughput as a _tasks per minute_ measurement. -For example, suppose your current workload reveals a required throughput of `440/tpm`. You can address this scale by provisioning 3 {{kib}} instances, with an upper throughput of `600/tpm`. This scale would provide approximately 25% additional capacity to handle ad-hoc non-recurring tasks and potential growth in recurring tasks. +For example, suppose your current workload reveals a required throughput of `1920/tpm`. You can address this scale by provisioning 2 {{kib}} instances, with an upper throughput of `2400/tpm`. This scale would provide approximately 25% additional capacity to handle ad-hoc non-recurring tasks and potential growth in recurring tasks. -Given a deployment of 100 recurring tasks, estimating the required throughput depends on the scheduled cadence. Suppose you expect to run 50 tasks at a cadence of `10s`, the other 50 tasks at `20m`. In addition, you expect a couple dozen non-recurring tasks every minute. +Given a deployment of 600 recurring tasks, estimating the required throughput depends on the scheduled cadence. Suppose you expect to run 300 tasks at a cadence of `10s`, the other 300 tasks at `20m`. In addition, you expect a couple dozen non-recurring tasks every minute. -A non-recurring task requires a single execution, which means that a single {{kib}} instance could execute all 100 tasks in less than a minute, using only half of its capacity. As these tasks are only executed once, the {{kib}} instance will sit idle once all tasks are executed. For that reason, don’t include non-recurring tasks in your *tasks per minute* calculation. Instead, include a buffer in the final *lower bound* to incur the cost of ad-hoc non-recurring tasks. +A non-recurring task requires a single execution, which means that a single {{kib}} instance could execute all 100 tasks in less than a minute, using only half of its capacity. As these tasks are only executed once, the {{kib}} instance will sit idle once all tasks are executed. For that reason, don’t include non-recurring tasks in your _tasks per minute_ calculation. Instead, include a buffer in the final _lower bound_ to incur the cost of ad-hoc non-recurring tasks. A recurring task requires as many executions as its cadence can fit in a minute. A recurring task with a `10s` schedule will require `6/tpm`, as it will execute 6 times per minute. A recurring task with a `20m` schedule only executes 3 times per hour and only requires a throughput of `0.05/tpm`, a number so small it that is difficult to take it into account. -For this reason, we recommend grouping tasks by *tasks per minute* and *tasks per hour*, as demonstrated in [Evaluate your workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload), averaging the *per hour* measurement across all minutes. +For this reason, we recommend grouping tasks by _tasks per minute_ and _tasks per hour_, as demonstrated in [Evaluate your workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload), averaging the _per hour_ measurement across all minutes. It is highly recommended that you maintain at least 20% additional capacity, beyond your expected workload, as spikes in ad-hoc tasks is possible at times of high activity (such as a spike in actions in response to an active alert). -Given the predicted workload, you can estimate a lower bound throughput of `340/tpm` (`6/tpm` * 50 + `3/tph` * 50 + 20% buffer). As a default, a {{kib}} instance provides a throughput of `200/tpm`. A good starting point for your deployment is to provision 2 {{kib}} instances. You could then monitor their performance and reassess as the required throughput becomes clearer. - -Although this is a *rough* estimate, the *tasks per minute* provides the lower bound needed to execute tasks on time. - -Once you estimate *tasks per minute* , add a buffer for non-recurring tasks. How much of a buffer is required largely depends on your use case. Ensure enough of a buffer is provisioned by [evaluating your workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload) as it grows and tracking the ratio of recurring to non-recurring tasks by [evaluating your runtime](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime). - +Given the predicted workload, you can estimate a lower bound throughput of `2175/tpm` (`6/tpm` \* 300 + `0.05/tph` \* 300 + 20% buffer). As a default, a {{kib}} instance provides a throughput of `1200/tpm`. A good starting point for your deployment is to provision 2 {{kib}} instances. You could then monitor their performance and reassess as the required throughput becomes clearer. +Although this is a _rough_ estimate, the _tasks per minute_ provides the lower bound needed to execute tasks on time. +Once you estimate _tasks per minute_ , add a buffer for non-recurring tasks. How much of a buffer is required largely depends on your use case. Ensure enough of a buffer is provisioned by [evaluating your workload](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-workload) as it grows and tracking the ratio of recurring to non-recurring tasks by [evaluating your runtime](../../troubleshoot/kibana/task-manager.md#task-manager-health-evaluate-the-runtime).